ChatGPT vs Copilot:
A Blind Preference Study

Which AI do people actually prefer when they don't know which is which?
Xavier University · ENTR 305 · December 2025

Our Research Question

As part of Xavier University's participation in a Microsoft Copilot Proof of Value, we wanted to answer a simple question: when people compare AI-generated responses side by side—without knowing which tool produced which—do they have a preference?

This study was conducted in ENTR 305: Innovation Management, a course that teaches students the value of experimentation in business. The core lesson: simple, well-designed experiments can generate data that drives better decisions.

The Setup: Students received 19 different prompts spanning practical advice, business concepts, professional writing, technical calculations, and personal reflection. For each prompt, they saw two responses labeled simply "Response A" and "Response B"—with the position randomized. They had no idea which AI wrote which response. Using Qualtrics, they rated their preference on a 5-point scale.
42
Xavier Students
19
Prompts Analyzed
5
Task Categories
797
Total Ratings*

*One prompt was excluded from analysis due to a data collection error where Copilot returned no response. One participant with incomplete data was also excluded.

The Results

Overall Finding
Students showed a statistically significant preference for ChatGPT
p = 0.019 · Effect Size: Small (d = 0.38)
How 797 Individual Ratings Broke Down
34.6%
22.5%
42.9%
Preferred Copilot (276)
No Preference (179)
Preferred ChatGPT (342)
3.10
Mean Score
(Scale: 1-5)
24%
More ChatGPT
Than Copilot Votes
p=.019
Statistical
Significance
42
Analyzable
Participants

See the Comparisons

Browse through the prompts and see exactly what students were comparing. Each slide shows the original prompt, both AI responses, and how students voted.

Biggest Winners

🟢 Strongest ChatGPT Preferences
Design Thinking 60% preferred
Break-Even Analysis 55% preferred
B2B vs B2C Marketing 43% preferred
🔵 Where Copilot Held Its Own
T-Shirt Business Advice 52% preferred
Student Budget Planning 48% preferred
Time Management 38% preferred

Results by Task Type

How preferences varied across different kinds of prompts

Practical Guidance
2.93
2.93
→ Copilot
Seeking Information
3.26
3.26
→ ChatGPT
Writing Tasks
3.09
3.09
→ ChatGPT
Technical Help
3.26
3.26
→ ChatGPT
Self-Expression
3.02
3.02
≈ Even
← 1.0 (Strongly Prefer Copilot) 3.0 = No Preference (Strongly Prefer ChatGPT) 5.0 →

Key Takeaways

🎯
The preference is real, but modest
ChatGPT was statistically preferred overall, but 35% of ratings still favored Copilot and 22% saw no difference. This isn't a landslide—it's a small but meaningful edge.
🔀
Task type matters
Copilot actually performed better on practical, actionable advice (like starting a business or managing time). ChatGPT excelled at explanations, writing, and structured technical responses. The "best" AI depends on what you're trying to accomplish.
👁️
Blind testing removes bias
Students had no idea which response came from which AI. This eliminates brand preference, prior experience bias, and interface familiarity—leaving only the quality of the output itself.
📊
Simple experiments yield valuable insights
This study took one class session to deploy via Qualtrics. It generated 797 data points and statistically significant findings. You don't need massive resources to run meaningful experiments.

About This Study

This research was conducted as part of ENTR 305: Innovation Management at Xavier University during Fall 2025. The course teaches students how to use experimentation as a business tool—and this study was a live demonstration of that principle.

The study was designed in the context of Xavier University's participation in a Microsoft Copilot Proof of Value initiative, which sought to understand how faculty, staff, and students perceived Microsoft's AI assistant compared to alternatives.

All responses were generated in November 2025. For Copilot, we used GPT-5 in "Quick Response" mode; for ChatGPT, we used GPT-5 in instant mode. This ensured comparable speed-optimized outputs from both tools. Position randomization was handled through Qualtrics survey logic to ensure unbiased comparisons. One prompt was excluded from the final analysis due to a data collection error where Copilot failed to return a response.

Citation:
ENTR 305 Class (2025). ChatGPT vs Microsoft Copilot Preference Study.
Xavier University, Williams College of Business. December 2025.