As part of Xavier University's participation in a Microsoft Copilot Proof of Value, we wanted to answer a simple question: when people compare AI-generated responses side by side—without knowing which tool produced which—do they have a preference?
This study was conducted in ENTR 305: Innovation Management, a course that teaches students the value of experimentation in business. The core lesson: simple, well-designed experiments can generate data that drives better decisions.
*One prompt was excluded from analysis due to a data collection error where Copilot returned no response. One participant with incomplete data was also excluded.
Browse through the prompts and see exactly what students were comparing. Each slide shows the original prompt, both AI responses, and how students voted.
How preferences varied across different kinds of prompts
This research was conducted as part of ENTR 305: Innovation Management at Xavier University during Fall 2025. The course teaches students how to use experimentation as a business tool—and this study was a live demonstration of that principle.
The study was designed in the context of Xavier University's participation in a Microsoft Copilot Proof of Value initiative, which sought to understand how faculty, staff, and students perceived Microsoft's AI assistant compared to alternatives.
All responses were generated in November 2025. For Copilot, we used GPT-5 in "Quick Response" mode; for ChatGPT, we used GPT-5 in instant mode. This ensured comparable speed-optimized outputs from both tools. Position randomization was handled through Qualtrics survey logic to ensure unbiased comparisons. One prompt was excluded from the final analysis due to a data collection error where Copilot failed to return a response.