Collaborative Projects

Man in blue suit coat in front of digital screen talks to a woman on his left, gesturing with his left hand as he speaks.

In mid-2024, the Lab launched eleven projects that will receive almost $300,000 in funding. The projects pair ten Notre Dame faculty members and 15 IBM researchers who will work together to study the ethical challenges emerging at the research frontier of large language models. These projects will be completed by December 2025.

Evaluation, Metrics, and Benchmarks for Generative AI Systems

Michelle Brachman and Zahra Ashktorab (IBM Research) are collaborating with Diego Gómez-Zará and Toby Jia-Jun Li (Computer Science and Engineering). This research aims to improve the evaluation of AI-driven generative agents in dynamic, real-world contexts with multi-stakeholder involvement. The project will design and test these agents within evolving environments and create a collaborative evaluation framework where stakeholders co-create assessment criteria. The expected outcome is the development of scalable, context-aware evaluation tools that align stakeholder perspectives. This is important for enhancing the deployment of generative agents in complex applications by ensuring more accurate, nuanced assessments of their behavior.

Interpretable and Explainable Foundation Models

Keerthiram Murugesan (IBM Research) is collaborating with Yanfang Ye (Computer Science and Engineering) and Nuno Moniz (Notre Dame-IBM Technology Ethics Lab). This research addresses hallucinations in Large Language Models (LLMs), where models generate incorrect or misleading information, particularly in high-stakes domains like healthcare, finance, and law. The project explores two approaches: grounding LLMs’ responses using external knowledge graphs and applying input attribution to trace the influence of input tokens on generated outputs. Expected outcomes include improved interpretability of LLMs, reduced hallucinations, and better reasoning through knowledge graphs. This work contributes to enhancing the reliability and accuracy of LLMs in critical applications, ensuring more trustworthy AI systems in real-world use cases. 

Governance, Auditing, and Risk Assessment for Large Language Models

Michael Hind and Elizabeth Daly (IBM Research) are collaborating with Nuno Moniz (Notre Dame-IBM Technology Ethics Lab). This research introduces BenchmarkCards, a framework for standardizing the documentation of LLM benchmark properties, such as biases and evaluation methods, to improve benchmark selection and transparency. The project aims to create a public-access database of LLM benchmarks using this framework, making it easier for researchers to search, compare, and select appropriate benchmarks. Expected outcomes include a centralized, living resource that supports consistent documentation across the field. This is crucial for the AI safety community, enabling more informed, reproducible, and transparent evaluations of LLMs, leading to safer and more effective AI deployment.

  • Enhanced the capabilities of IBM's watsonx.governance with new Model Risk Evaluation Engine (Risk Atlas Nexus)
  • Developed BenchmarkCards framework, a collection of datasets and benchmarks to help developers build safe and transparent AI
  • BenchmarkCards: Standardized Documentation for LLM Benchmarks accepted to Neural Information Processing Systems (NeurIPS) 2025

Fairness and Equity for Large Language Models

Elizabeth Daly (IBM Research) is collaborating with Nitesh Chawla (Lucy Family Institute for Data & Society). This research aims to address issues related to bias in large language models (LLMs) by assessing and mitigating biased attributions in predictions and detecting synthetic data injection. The project proposes techniques for bias assessment through Chain of Thought (CoT) prompts, bias mitigation via fine-tuning, prompting with attribution explanations, and synthetic data detection through various statistical methods and anomaly detection. Expected outcomes include reducing bias in LLM predictions and improving the detection of synthetic data in training processes. This work is important for enhancing the fairness, robustness, and reliability of LLMs, ensuring their ethical and responsible use.

Next-Generation AI Models and Responsible AI

Youssef Mrouehe and Payel Das (IBM Research) are collaborating with Nuno Moniz (Notre Dame-IBM Technology Ethics Lab) and Nitesh Chawla (Lucy Family Institute for Data & Society). This research addresses the computational and memory limitations of current Transformer-based LLMs and Foundation Models, which struggle with long-horizon processing and complex reasoning. The project explores alternative and hybrid architectural approaches, including memory-based designs and dynamic modifications like layer-dropping and attention-zeroing, to improve efficiency, reasoning, and energy usage. Expected outcomes include improved computational efficiency and enhanced learning and reasoning capabilities, with significant energy savings. This work is important for making AI systems more accessible, environmentally sustainable, and capable of more responsible, explainable AI that can adapt to various tasks and requirements.

Robustness for Large Language Models

Pin-Yu Chen and Tien Gao (IBM Research) are collaborating with Xianliang Zhang (Computer Science and Engineering). This research focuses on evaluating the trustworthiness of LLMs in safety-critical applications, initially in lab safety, and extending to other domains like health. The project will develop a lab simulator to test LLM-generated action plans for feasibility and safety, ensuring compliance with protocols and standards. Expected outcomes include improved evaluation of LLMs’ operational trustworthiness and the extension of this methodology to other safety-critical fields. This work is important for ensuring the reliability and safety of LLMs when used in high-stakes environments, ultimately promoting their responsible integration in sensitive applications.

Expanding Large-Scale Model Evaluations

Sara Berger (IBM Research) is collaborating with Luis Felipe Rosado Murillo (Anthropology). This research explores the challenges of evaluating large language models (LLMs) by focusing on meta-evaluation—examining how human-model interaction and tacit assumptions affect model assessment. The project will create an inventory of "patterns" and "anti-patterns" based on qualitative interviews and ethnographic studies of open-source LLM communities, alongside complementary quantitative measures of LLM evaluation tools and benchmarks. Expected outcomes include a comprehensive inventory of evaluation best practices and pitfalls, as well as a technical and ethnographic evaluation of emerging benchmark frameworks. This work is important for improving LLM evaluation practices and ensuring better, more transparent, and more self-aware methods for assessing AI safety and impact.

  • Who Evaluates the Evaluators? Toward a metapragmatic approach to large language model evaluation practices, presented at the 50th Society for Social Studies of Science conference, September 2025

Data-Driven Work Practices in Large Language Models to Design Ethical and Human-Centric Mitigation Tools

Adriana Alvarado Garcia (IBM Research) is collaborating with Karla Badillo-Urquiola (Computer Science and Engineering). This research aims to improve evaluative datasets for assessing potential harm from LLMs by incorporating domain-specific risk definitions and expert knowledge. The project will collaborate with domain experts in youth online safety to develop a synthetic evaluative dataset that detects harmful or inappropriate responses related to specific risks identified pertaining to online spaces and young people. Expected outcomes include the creation of a domain-specific evaluative dataset, a taxonomy for categorizing risks, and a methodology for incorporating expert input. This work is crucial for ensuring that LLM evaluations are aligned with real-world concerns, helping to mitigate potential harms to vulnerable populations.

  • Emerging Data Practices: Data Work in the Era of Large Language Models presented at the CHI Conference on Human Factors in Computing Systems, May 2025

  • Online Safety for All: Sociocultural Insights from a Systematic Review of Youth Online Safety in the Global South accepted to the 28th ACM SIGCHI Conference on Computer-Supported Cooperative Work & Social Computing (CSCW), October 2025

  • Bridging Expertise and Participation in AI: Multistakeholder Approaches to Safer AI Systems for Youth Online Safety workshop accepted to Computer-Supported Cooperative Work and Social Computing (CSCW), October 2025

Agentic AI: Trustworthy Agent-Based Systems

 Werner Geyer (IBM Research) is collaborating with Nuno Moniz (Notre Dame-IBM Technology Ethics Lab). This research explores how to design effective human delegation of tasks to single- and multi-agent systems, focusing on trust, communication, and the need for human oversight. The project will investigate how agents can handle complex tasks, such as scheduling, while maintaining trust, remembering past interactions, and applying context. Expected outcomes include identifying success metrics, delegation strategies, and memory models that improve agent effectiveness and trustworthiness. This work helps advance the development of trustworthy agent-based systems, making them more reliable and efficient in real-world use cases, while ensuring human control and collaboration.

Spectral Tracing: Assessment Methods for Tracing the Impact of Consensus-Based Data Practices in Large Language Model Development

Felicia Jing (IBM Research) is collaborating with Ranjodh Singh Dhaliwal (English) This research examines how consensus-driven practices in dataset curation contribute to data silencing, where disagreement and dissensus are overlooked, particularly in social value alignment and generative voice models. The project will develop a theoretical framework to trace dissensus in data practices and apply it to a case study of OpenAI’s Democratic Inputs Initiative. Expected outcomes include adversarial datasets, new assessment methods, and artifacts that challenge traditional consensus models in AI development. This work is important for promoting more inclusive, reflexive, and representative evaluation strategies, ensuring that marginalized perspectives are not erased in AI systems.

  • The Sociological Intimacies of Bots and/as Personas, presented as a keynote at Transmediale, February 2025
  • On Emplotment: Synthetic Data and Algorithmic Spatialization accepted to the journal Social Text, 2026

The Holistic Return on Investment of AI Ethics

Francesca Rossi and Brian Goehring (IBM Research) are collaborating with Nicholas Berente (IT, Analytics, and Operations). This research explores the return on investment (ROI) for organizations adopting an ethical approach to AI and technology, addressing both quantitative and qualitative aspects. The project will develop a theoretical framework for measuring ROI leveraging interviews, surveys, and existing literature. Expected outcomes include a comprehensive framework, publications, and tools that demonstrate the business case for ethical technology practices, helping organizations understand their ROI and assess the value of ethical AI investments. This work is important for establishing tech ethics as essential for business success, improving organizational culture, market position, and regulatory compliance.