Collaborative Projects
In mid-2024, the Lab launched eleven projects that will receive almost $300,000 in funding. The projects pair ten Notre Dame faculty members and 15 IBM researchers who will work together to study the ethical challenges emerging at the research frontier of large language models. These projects will be completed by December 2025.
Evaluation, Metrics, and Benchmarks for Generative AI Systems
Michelle Brachman and Zahra Ashktorab (IBM Research) are collaborating with Diego Gómez-Zará and Toby Jia-Jun Li (Computer Science and Engineering). This research aims to improve the evaluation of AI-driven generative agents in dynamic, real-world contexts with multi-stakeholder involvement. The project will design and test these agents within evolving environments and create a collaborative evaluation framework where stakeholders co-create assessment criteria. The expected outcome is the development of scalable, context-aware evaluation tools that align stakeholder perspectives. This is important for enhancing the deployment of generative agents in complex applications by ensuring more accurate, nuanced assessments of their behavior.
- MetricMate: An Interactive Tool for Generating Evaluation Criteria for LLM-as-a-Judge Workflow presented at the Symposium on Human-Computer Interaction for Work (CHIWORK), June 2025
- Video: LLM as a Judge: Scaling AI Evaluation Strategies, September, 2025
- Blog Post: Can We Trust AI to Judge? Two Research Teams Explore the Opportunities and Limitations of LLM-as-a-Judge, July, 2025
Interpretable and Explainable Foundation Models
Keerthiram Murugesan (IBM Research) is collaborating with Yanfang Ye (Computer Science and Engineering) and Nuno Moniz (Notre Dame-IBM Technology Ethics Lab). This research addresses hallucinations in Large Language Models (LLMs), where models generate incorrect or misleading information, particularly in high-stakes domains like healthcare, finance, and law. The project explores two approaches: grounding LLMs’ responses using external knowledge graphs and applying input attribution to trace the influence of input tokens on generated outputs. Expected outcomes include improved interpretability of LLMs, reduced hallucinations, and better reasoning through knowledge graphs. This work contributes to enhancing the reliability and accuracy of LLMs in critical applications, ensuring more trustworthy AI systems in real-world use cases.
- NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning accepted to the 63rd Annual Meeting of the Association for Computational Linguistics, July 2025
- MOPI-HFRS: A Multi-objective Personalized Health-aware Food Recommendation System with LLM-enhanced Interpretation accepted to the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2025
- AutoData: A Multi-Agent System for Open Web Data Collection accepted to Neural Information Processing Systems (NeurIPS) 2025
Governance, Auditing, and Risk Assessment for Large Language Models
Michael Hind and Elizabeth Daly (IBM Research) are collaborating with Nuno Moniz (Notre Dame-IBM Technology Ethics Lab). This research introduces BenchmarkCards, a framework for standardizing the documentation of LLM benchmark properties, such as biases and evaluation methods, to improve benchmark selection and transparency. The project aims to create a public-access database of LLM benchmarks using this framework, making it easier for researchers to search, compare, and select appropriate benchmarks. Expected outcomes include a centralized, living resource that supports consistent documentation across the field. This is crucial for the AI safety community, enabling more informed, reproducible, and transparent evaluations of LLMs, leading to safer and more effective AI deployment.
- Enhanced the capabilities of IBM's watsonx.governance with new Model Risk Evaluation Engine (Risk Atlas Nexus)
- Developed BenchmarkCards framework, a collection of datasets and benchmarks to help developers build safe and transparent AI
- BenchmarkCards: Standardized Documentation for LLM Benchmarks accepted to Neural Information Processing Systems (NeurIPS) 2025
Fairness and Equity for Large Language Models
Elizabeth Daly (IBM Research) is collaborating with Nitesh Chawla (Lucy Family Institute for Data & Society). This research aims to address issues related to bias in large language models (LLMs) by assessing and mitigating biased attributions in predictions and detecting synthetic data injection. The project proposes techniques for bias assessment through Chain of Thought (CoT) prompts, bias mitigation via fine-tuning, prompting with attribution explanations, and synthetic data detection through various statistical methods and anomaly detection. Expected outcomes include reducing bias in LLM predictions and improving the detection of synthetic data in training processes. This work is important for enhancing the fairness, robustness, and reliability of LLMs, ensuring their ethical and responsible use.
Next-Generation AI Models and Responsible AI
Youssef Mrouehe and Payel Das (IBM Research) are collaborating with Nuno Moniz (Notre Dame-IBM Technology Ethics Lab) and Nitesh Chawla (Lucy Family Institute for Data & Society). This research addresses the computational and memory limitations of current Transformer-based LLMs and Foundation Models, which struggle with long-horizon processing and complex reasoning. The project explores alternative and hybrid architectural approaches, including memory-based designs and dynamic modifications like layer-dropping and attention-zeroing, to improve efficiency, reasoning, and energy usage. Expected outcomes include improved computational efficiency and enhanced learning and reasoning capabilities, with significant energy savings. This work is important for making AI systems more accessible, environmentally sustainable, and capable of more responsible, explainable AI that can adapt to various tasks and requirements.
Robustness for Large Language Models
Pin-Yu Chen and Tien Gao (IBM Research) are collaborating with Xianliang Zhang (Computer Science and Engineering). This research focuses on evaluating the trustworthiness of LLMs in safety-critical applications, initially in lab safety, and extending to other domains like health. The project will develop a lab simulator to test LLM-generated action plans for feasibility and safety, ensuring compliance with protocols and standards. Expected outcomes include improved evaluation of LLMs’ operational trustworthiness and the extension of this methodology to other safety-critical fields. This work is important for ensuring the reliability and safety of LLMs when used in high-stakes environments, ultimately promoting their responsible integration in sensitive applications.
-
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge presented at the International Conference on Learning Representations (ICLR), April 2025
-
On the Trustworthiness of Generative Foundation Models – Guideline, Assessment, and Perspective, September 2025
-
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs, June 2025
-
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search accepted to Neural Information Processing Systems (NeurIPS) 2025
- Video: Can You Trust an AI to Judge Fairly? Exploring LLM Biases, September 2025
Expanding Large-Scale Model Evaluations
Sara Berger (IBM Research) is collaborating with Luis Felipe Rosado Murillo (Anthropology). This research explores the challenges of evaluating large language models (LLMs) by focusing on meta-evaluation—examining how human-model interaction and tacit assumptions affect model assessment. The project will create an inventory of "patterns" and "anti-patterns" based on qualitative interviews and ethnographic studies of open-source LLM communities, alongside complementary quantitative measures of LLM evaluation tools and benchmarks. Expected outcomes include a comprehensive inventory of evaluation best practices and pitfalls, as well as a technical and ethnographic evaluation of emerging benchmark frameworks. This work is important for improving LLM evaluation practices and ensuring better, more transparent, and more self-aware methods for assessing AI safety and impact.
- Who Evaluates the Evaluators? Toward a metapragmatic approach to large language model evaluation practices, presented at the 50th Society for Social Studies of Science conference, September 2025
Data-Driven Work Practices in Large Language Models to Design Ethical and Human-Centric Mitigation Tools
Adriana Alvarado Garcia (IBM Research) is collaborating with Karla Badillo-Urquiola (Computer Science and Engineering). This research aims to improve evaluative datasets for assessing potential harm from LLMs by incorporating domain-specific risk definitions and expert knowledge. The project will collaborate with domain experts in youth online safety to develop a synthetic evaluative dataset that detects harmful or inappropriate responses related to specific risks identified pertaining to online spaces and young people. Expected outcomes include the creation of a domain-specific evaluative dataset, a taxonomy for categorizing risks, and a methodology for incorporating expert input. This work is crucial for ensuring that LLM evaluations are aligned with real-world concerns, helping to mitigate potential harms to vulnerable populations.
-
Emerging Data Practices: Data Work in the Era of Large Language Models presented at the CHI Conference on Human Factors in Computing Systems, May 2025
-
Online Safety for All: Sociocultural Insights from a Systematic Review of Youth Online Safety in the Global South accepted to the 28th ACM SIGCHI Conference on Computer-Supported Cooperative Work & Social Computing (CSCW), October 2025
- Bridging Expertise and Participation in AI: Multistakeholder Approaches to Safer AI Systems for Youth Online Safety workshop accepted to Computer-Supported Cooperative Work and Social Computing (CSCW), October 2025
Agentic AI: Trustworthy Agent-Based Systems
Spectral Tracing: Assessment Methods for Tracing the Impact of Consensus-Based Data Practices in Large Language Model Development
Felicia Jing (IBM Research) is collaborating with Ranjodh Singh Dhaliwal (English) This research examines how consensus-driven practices in dataset curation contribute to data silencing, where disagreement and dissensus are overlooked, particularly in social value alignment and generative voice models. The project will develop a theoretical framework to trace dissensus in data practices and apply it to a case study of OpenAI’s Democratic Inputs Initiative. Expected outcomes include adversarial datasets, new assessment methods, and artifacts that challenge traditional consensus models in AI development. This work is important for promoting more inclusive, reflexive, and representative evaluation strategies, ensuring that marginalized perspectives are not erased in AI systems.
- The Sociological Intimacies of Bots and/as Personas, presented as a keynote at Transmediale, February 2025
- On Emplotment: Synthetic Data and Algorithmic Spatialization accepted to the journal Social Text, 2026
The Holistic Return on Investment of AI Ethics
Francesca Rossi and Brian Goehring (IBM Research) are collaborating with Nicholas Berente (IT, Analytics, and Operations). This research explores the return on investment (ROI) for organizations adopting an ethical approach to AI and technology, addressing both quantitative and qualitative aspects. The project will develop a theoretical framework for measuring ROI leveraging interviews, surveys, and existing literature. Expected outcomes include a comprehensive framework, publications, and tools that demonstrate the business case for ethical technology practices, helping organizations understand their ROI and assess the value of ethical AI investments. This work is important for establishing tech ethics as essential for business success, improving organizational culture, market position, and regulatory compliance.
- The Return on Investment in AI Ethics: A Holistic Framework, presented at the 57th Hawaii International Conference on System Sciences, January 2024
- On the ROI of AI Ethics and Governance Investments: From Loss Aversion to Value Generation, California Management Review, July 2024
- Why Invest in AI Ethics and Governance?, IBM Institute for Business Value, December 2024
- Making Sense of AI Ethics and Governance Investments accepted to the 8th AAAI/ACM Conference on AI, Ethics, and Society, October 2025