WSU Study: ChatGPT Inaccuracies and Inconsistencies Earn AI a D

A recent study conducted by Mesut Cicek, a professor at Washington State University, and his team has explored the capabilities of generative AI tools, specifically focusing on their ability to validate scientific hypotheses. The research highlights both the potential and limitations of AI in processing complex information effectively.

Cicek and his colleagues tested over 700 hypotheses from scientific papers by inputting them into ChatGPT and assessing whether the AI accurately confirmed or refuted these statements. Each hypothesis was evaluated ten times to ensure reliability.

In 2024, AI demonstrated a correct response rate of 76.5%, which improved to 80% in 2025. However, when random guessing was factored in, the true reliability of AI dropped to about 60%, reflecting a poor performance akin to a low D in grading terms.

The AI struggled particularly with identifying false hypotheses, scoring a mere 16.4% accuracy in these instances. Additionally, ChatGPT displayed a lack of consistency; out of ten identical prompts, it only accurately assessed the statements 73% of the time.

“We’re not just talking about accuracy; inconsistency is a significant issue. When you ask the same question multiple times, you’ll get varying answers,” explained Cicek, who serves as an associate professor in the Department of Marketing and International Business at WSU’s Carson College of Business and is the lead author of the study.

“For instance, using ten identical prompts, AI would alternate between true and false responses—sometimes resulting in an equal split of answers.”

These findings, published in the Rutgers Business Review, emphasize the necessity of skepticism and caution when relying on AI for critical tasks that involve nuanced reasoning. Cicek suggests that the much-anticipated emergence of a truly “thinking” artificial general intelligence is still a distant reality.

“Current AI tools lack the understanding of the world that humans possess—they don’t have a ‘brain,'” Cicek remarked. “They merely memorize information and can provide some insights, but they don’t truly grasp the concepts.”

The research team, which included Sevincgul Ulu from Southern Illinois University, Can Uslay from Rutgers University, and Kate Karniouchina from Northeastern University, utilized 719 hypotheses sourced from business journal articles published since 2021. Their aim was to gauge the ability of generative AI tools to answer questions that require intricate reasoning. The complexity of determining whether research supports a hypothesis necessitates a nuanced approach—one that is often difficult for AI to achieve effectively.

The experiments were conducted using the free version of ChatGPT-3.5 in 2024 and the updated free version of ChatGPT-5 mini in 2025. The overall accuracy across both versions was consistent. Accounting for random chance—where a guess has a 50% likelihood of being correct—the AI’s performance remained only 60% better than random guessing in both years.

This study underscores a critical shortcoming in large language model AI tools: while they can articulate convincingly, their reasoning often falls short when faced with complex questions. This can lead to misleading yet persuasive explanations for incorrect answers, Cicek noted.

The researchers advocate for business managers to prioritize verification of AI outputs, approach these results with skepticism, and educate themselves on the strengths and weaknesses of AI capabilities.

While this paper exclusively examined results from ChatGPT, Cicek has conducted similar tests with other AI systems, yielding comparable outcomes. The study also builds on previous work by Cicek that highlighted reasons to be cautious about the hype surrounding AI technologies. A study published in 2024 indicated that consumers are less inclined to purchase products when they are marketed with a strong emphasis on AI.

“Always remain skeptical,” Cicek advised. “I’m not against AI; I use it myself. However, caution is essential.”

This story was originally published by WSU Insider, the news website for students, staff, and communities of Washington State University.

Leave a Reply 取消回复

You May Also Like

I’ve Got a Hunch

Using GPT-5.6: A Guide from Ben’s Bites

Grok and Cursor Collaboration