A new method has been developed that compares the reasoning of a machine-learning model with that of a human, enabling the user to analyze patterns in the model’s behavior. Understanding why a model makes certain decisions is often as important as the correctness of those decisions, particularly in machine learning. For instance, a machine-learning model may accurately predict that a skin lesion is cancerous, but it may have done by using an unrelated blip on a clinical photo. While experts have tools to make sense of a model’s reasoning, these methods only provide insights on one decision at a time, and each decision must be manually evaluated. Since models are usually trained using millions of data inputs, it is almost impossible for a human to evaluate enough decisions to identify patterns.
Researchers at MIT and IBM Research have developed a technique called Shared Interest that enables a user to aggregate, sort, and rank individual explanations to quickly analyze a machine-learning model’s behavior. Shared Interest incorporates quantifiable metrics that compare how well a model’s reasoning matches that of a human, making it easier for a user to uncover concerning trends in the model’s decision-making. For example, the model may be frequently confused by distracting, irrelevant features such as background objects in photos. By aggregating these insights, the user can quickly and quantitatively determine whether the model is trustworthy and ready to be deployed in a real-world situation.
“In developing Shared Interest, our goal is to be able to scale up this analysis process so that you could understand on a more global level what your model’s behavior is,” says lead author Angie Boggust, a graduate student in the Visualization Group of the Computer Science and Artificial Intelligence Laboratory (CSAIL).
Boggust worked with her advisor, Arvind Satyanarayan, who is an assistant professor of computer science and leads the Visualization Group, as well as Benjamin Hoover and the senior author Hendrik Strobelt, both from IBM Research, to write a paper that will be presented at the Conference on Human Factors in Computing Systems.
Boggust started working on this project during a summer internship at IBM, where she was mentored by Strobelt. After returning to MIT, Boggust and Satyanarayan continued to work on the project and collaborated with Strobelt and Hoover to deploy case studies that demonstrate how the technique can be applied in real-world scenarios.
Human-AI alignment
Shared Interest is a process that utilizes saliency methods, which are popular techniques that reveal how a machine-learning model arrived at a specific decision. When a model classifies images, saliency methods highlight the crucial areas of an image that were significant to the model when it made its decision. These important areas are visualized as a saliency map, which is typically a heatmap that is overlaid on the original image. For example, if the model classified an image as a dog, and the saliency map highlights the dog’s head, it means that those pixels were vital to the model when it decided that the image contains a dog.
Shared Interest operates by comparing saliency methods with ground-truth data. Ground-truth data are annotations that are typically created by humans and surround the relevant parts of each image in an image dataset. In the previous example, the box would encompass the entire dog in the photo. When evaluating an image classification model, Shared Interest compares the model-generated saliency data and the human-generated ground-truth data for the same image to determine how well they align.
The technique uses various metrics to quantify that alignment (or misalignment) and then sorts a particular decision into one of eight categories. These categories range from perfectly human-aligned, where the model makes a correct prediction, and the highlighted area in the saliency map is identical to the human-generated box, to completely distracted, where the model makes an incorrect prediction and does not use any image features found in the human-generated box.
“On one end of the spectrum, your model made the decision for the exact same reason a human did, and on the other end of the spectrum, your model and the human are making this decision for totally different reasons. By quantifying that for all the images in your dataset, you can use that quantification to sort through them,” Boggust explains.
The technique is similar for text-based data, where keywords are highlighted instead of image regions.
Rapid analysis
Using three case studies, the researchers demonstrated how Shared Interest could be useful to both non-experts and machine-learning researchers. In the first case study, Shared Interest was employed to assist a dermatologist in determining the trustworthiness of a machine-learning model designed to assist in the diagnosis of cancer from photos of skin lesions. Shared Interest facilitated the dermatologist’s quick evaluation of the model’s correct and incorrect predictions. Ultimately, the dermatologist concluded that the model could not be trusted due to its tendency to make predictions based on image artifacts rather than genuine lesions.
“The value here is that using Shared Interest, we are able to see these patterns emerge in our model’s behavior. In about half an hour, the dermatologist was able to make a confident decision of whether or not to trust the model and whether or not to deploy it,” Boggust says.
In the second instance, Shared Interest collaborated with a machine-learning researcher to demonstrate how a specific saliency method could be assessed by revealing previously unknown flaws in the model. Their technique allowed the researcher to analyze thousands of accurate and inaccurate decisions in a fraction of the time that would typically be required by manual methods.
In the third case study, Shared Interest delved deeper into a specific image classification example. They were able to conduct a what-if analysis by manipulating the image’s ground-truth area to determine which image features were most important for specific predictions.
Although the researchers were impressed with how well Shared Interest performed in these case studies, Boggust warns that the technique is only as reliable as the saliency methods it is based on. If these techniques are biased or inaccurate, Shared Interest will inherit these limitations.
In the future, the researchers aim to apply Shared Interest to different types of data, particularly tabular data used in medical records. They also want to use Shared Interest to help enhance current saliency techniques. Boggust hopes this research will inspire more work that seeks to quantify machine-learning model behavior in ways that are understandable to humans.
This news is a creative derivative product from articles published in famous peer-reviewed journals and Govt reports:
References:
1. Booch, G., Fabiano, F., Horesh, L., Kate, K., Lenchner, J., Linck, N., … & Srivastava, B. (2021, May). Thinking fast and slow in AI. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 17, pp. 15042-15046).
2. How, M. L., Cheah, S. M., Chan, Y. J., Khor, A. C., & Say, E. M. P. (2020). Artificial intelligence-enhanced decision support for informing global sustainable development: A human-centric AI-thinking approach. Information, 11(1), 39.
3. How, M. L., Cheah, S. M., Khor, A. C., & Chan, Y. J. (2020). Artificial intelligence-enhanced predictive insights for advancing financial inclusion: A human-centric ai-thinking approach. Big Data and Cognitive Computing, 4(2), 8.
4. Hu, S., & Clune, J. (2024). Thought cloning: Learning to think while acting by imitating human thinking. Advances in Neural Information Processing Systems, 36.
5. Gholami, M. J., & Al Abdwani, T. (2024). The rise of thinking machines: A review of artificial intelligence in contemporary communication. Journal of Business, Communication & Technology.
6. Boggust, A., Hoover, B., Satyanarayan, A., & Strobelt, H. (2022, April). Shared interest: Measuring human-ai alignment to identify recurring patterns in model behavior. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (pp. 1-17).