You are here: American University College of Arts & Sciences News Can We Trust AI? Making Artificial Intelligence Safe and Responsible

Contact Us

Battelle-Tompkins, Room 200 on a map

CAS Dean's Office 4400 Massachusetts Avenue NW Washington, DC 20016-8012 United States

Back to top

Technology

Can We Trust AI? Making Artificial Intelligence Safe and Responsible

Professors Nathalie Japkowicz and Zois Boukouvalas publish “Machine Learning Evaluation: Towards Reliable and Responsible AI,” a timely guide to testing and evaluating AI

By  | 

AI apps on smartphone. Credit: Rokas - stock.adobe.comCredit: Rokas - stock.adobe.com

Every day, AI plays a greater role in our lives—powering search engines and social media, optimizing Uber routes and Waze directions, enhancing health diagnostics, fortifying cybersecurity, and streamlining supply chains. The list goes on. But as companies race to integrate AI into their products and services, rigorous evaluation has become essential—because when AI fails, the consequences can be severe.

“Evaluating machine learning models is crucial in today's world because AI influences so many aspects of daily life,” says American University Computer Science Professor Nathalie Japkowicz. “If these systems aren’t carefully checked for accuracy, fairness, or even basic common sense, they can do real harm. That’s why AI evaluation isn’t just important—it’s necessary.”

 Machine Learning Evaluation: Towards Reliable and Responsible AI By Nathalie Japkowicz and Zois Boukouvalas Japkowicz and fellow Data Science Professor Zois Boukouvalas teamed up to publish a timely book on this subject, titled Machine Learning Evaluation: Towards Reliable and Responsible AI (Cambridge University Press, November 2024). The book brings together information from research papers, blogs, and other sources into one easy-to-use guide for anyone who is concerned about responsible AI—and it looks at ways to measure how machine learning systems perform, including how to ensure fairness, avoid bias, and ensure that results are easy to understand.  

“With this book,” Boukouvalas says, “our goal is to make evaluation tools widely accessible to everyone working with AI, empowering them to assess and enhance AI systems effectively and in a fair manner.”

When AI Goes Wrong: Bias and Equity Issues

Artificial Intelligence systems may seem neutral and objective, but they can produce biased or inaccurate results because they learn from human-generated data, which can reflect existing prejudices. As Boukouvalas points out, flawed AI systems can reinforce inequities in surprising and troubling ways. In 2015, Google Photos faced backlash when its AI, trained on biased data, misidentified people of color with offensive labels. In response, Google expanded its dataset diversity and introduced rigorous bias audits to improve accuracy and fairness. A similar issue occurred during the COVID-19 pandemic when early vaccine distribution algorithms unintentionally favored wealthier areas. By reassessing these models with fairness metrics, researchers were able to ensure that the vaccines reached underprivileged communities. It was a correction that saved lives, says Boukouvalas.

Another example is facial recognition technology, in everything from unlocking phones to security systems. “If these systems are trained with biased or incomplete data, they might struggle to recognize faces of certain demographics, leading to false identifications or failures in critical applications,” Japkowicz says. “There have been real-life instances where facial recognition systems misidentified people of color at alarming rates, causing unjust accusations and tarnished reputations.”

Life-Threatening Consequences

Sometimes, when AI makes mistakes, it can be fatal. In healthcare, where AI promises to revolutionize patient care, the stakes are incredibly high, like in hospitals that use AI to help prioritize patients in emergency rooms. “If the model behind this system is not properly evaluated, it might make life-or-death decisions based on flawed assumptions,” Japkowicz says.

Perhaps the most well-publicized issue is self-driving cars, which rely on AI to make split-second decisions about when to brake or accelerate. “If the AI has not been rigorously tested under all sorts of real-world conditions,” Japkowicz explains, “it might fail to recognize a pedestrian at night or misinterpret a traffic signal, leading to tragic accidents.”  

The bottom line is that machine learning models are like decision-making engines, and their decisions could affect real people. “If we do not evaluate these models thoroughly, we risk letting AI perpetuate biases, make unfair choices, or even cause harm,” Boukouvalas says. “By prioritizing rigorous evaluation, we can help ensure that AI systems not only work but work responsibly, making life better for everyone.”

A Guide for Beginners and Experts

Machine Learning Evaluation: Towards Reliable and Responsible AI covers a wide range of topics like these, including how machine learning works with different types of data, how to check if the training data is fair, and how to make sure the results are reliable. The book explores more advanced topics like unsupervised learning, image processing, and detecting unusual patterns. Plus, it discusses key issues like fairness, responsible AI, and how to build high-quality machine learning products. Implementations using Python and scikit-learn are available on the book's website. Readers will also find a large variety of evaluation tools and easy-to-use software implementations.  Boukouvalas and Japkowicz believe that everyone—from casual ChatGPT users to software engineers—needs to know about AI evaluation, and she incorporates evaluation into her courses at American University. “Most of our computer science and data science students get exposed to at least one course dealing with AI during the course of their study program at AU,” Boukouvalas says. “Many students take a series of AI classes, and some do capstone projects and independent research on AI.” 

How to Use AI Responsibly: What We All Can Do

Determining whether an AI system is trustworthy and fair doesn’t require a computer science degree, Boukouvalas and Japkowicz say; it just takes curiosity and a willingness to ask questions. Here is their advice for savvy and responsible AI users:

Don’t Blindly Trust AI

Just because an AI sounds confident doesn’t mean it’s always correct or unbiased. Some tools, like Gemini, are designed to be helpful but may provide incomplete or misleading answers. If something doesn’t feel right, challenge it. Ask for sources, cross-check information, and rephrase your prompts.

Understand How It Works

Transparency is key. Users should ask: How does this tool make decisions? What data was it trained on? AI-powered recommendations, whether for products or hiring decisions, can be skewed by hidden biases in the data. Understanding these underlying factors can help users make more informed choices.

Read the Fine Print

Many AI systems come with disclaimers, and for good reason. Pay attention to limitations. A health-related AI tool may excel at diagnosing common conditions but struggle with rare diseases. A hiring algorithm may be trained on biased datasets, unintentionally reinforcing inequalities. A little skepticism goes a long way.

Put AI to the Test

One of the best ways to assess an AI system is to test it against your own knowledge and experience. Does the AI’s output align with reality? Seek second opinions—whether from colleagues, friends, or experts—before making decisions based on AI recommendations.

Stay Skeptical, Stay Open

Ultimately, AI is a tool—one that can be powerful but is far from perfect. AI is only as good as the data it learns from and the humans who design it. By staying curious, asking questions, and verifying outputs, we can harness AI’s potential while avoiding its pitfalls.

For More Information  

For more information about Machine Learning Evaluation: Towards Reliable and Responsible AI, visit Cambridge University Press.  

For more College of Arts and Sciences Artificial Intelligence news and research, visit the CAS Impact and Innovation website