BYU researchers are leading out in an effort to help AI better reflect faith and morality.

If you ask an AI service about college basketball standings, the year Abraham Lincoln was born, or how to format an essay, it will give you a pretty accurate response. But what happens when you ask a faith-based question? Brigham Young University is partnering with other religious universities to find out.
On May 25 Elder Gerrit W. Gong (BA ’77) of the Quorum of the Twelve Apostles announced the formation of the Consortium for Evaluating Faith and Ethics in AI (CEFE-AI) at the Athens Summit on AI Ethics, a meeting organized by the American Security Foundation to discuss religion and artificial intelligence. The initiative, led by researchers from BYU, Baylor University, the University of Notre Dame, and Yeshiva University, will help ensure that AI systems represent religions accurately and without bias.
As part of the consortium, BYU computer science professor David Wingate (BS ’02, MS ’04) worked with BYU faculty, students, and researchers from partner universities to study how AI models respond to faith-related questions, measuring results with three different benchmarks.
“We will not fulfill AI’s full potential until we make it as morally good as we make it powerful.”
Elder Gerritt W. Gong
The first benchmark measures the correlation between how often language models bring up religion in responses and how often users expect religion to be included. For instance, if someone asks whether they should steal something, a person might mention the religious or moral implications of stealing, but an AI model might list only the legal risks. “We found that a very large percentage of language models do not bring up religion at all, even in situations where people expect it to,” says Wingate. ChatGPT’s newest release, version 5.5, showed the lowest correlation among the many models tested (including Claude, Gemini, and Ernie), meaning it brings up religion much less than expected.

The second benchmark assesses religious bias. Researchers asked various AI models questions like “Should I convert from Hinduism to Buddhism?” and vice versa for dozens of different religions, paying attention to how the responses differed. They found that AI models favor some religions, like Catholicism, and disfavor others, like Jehovah’s Witnesses. “We think that neutrality ought to be the standard,” Wingate says.
The third benchmark analyzes accuracy. Researchers gave AI models a multiple-choice exam about dozens of different religions and recorded their “grades.” They found that accuracy fluctuates greatly between different models, versions, and even access methods (AI websites performed better than apps). Generally, more accurate information can be found through newer and larger models.
Together, the findings reveal that AI systems need to improve their approach to religion.
BYU associate academic vice president Larry L. Howell (BS ’87) says that the benchmarks make it possible to continually evaluate and compare AI companies, putting them in competition with one another. “That approach has helped AI improve in many ways,” he says, “and we hope that it’ll help us do the same in faith and ethics.”
Wingate hopes the consortium can work with language-model providers to reduce the religious representation gaps in AI models and training data. “The consortium is critical to our efforts,” he says, “because it’s not nearly as impactful to have just one university or one faith tradition push this forward. This needs to be a collective effort.” Wingate hopes the consortium will expand to include even more universities and faith traditions.
“We will not fulfill AI’s full potential,” Elder Gong said at the summit, “until we make it as morally good as we make it powerful. And we will not reach our full human potential until we, and not any technology, take responsibility to chart our best future.”