If you’ve read Twitter or Reddit’s ChatGPT subreddit, you’ll see one question asked more than any other: is ChatGPT getting dumber?
Is the performance of the world’s leading generative AI chatbot decreasing as time passes, or are ChatGPT’s millions of users collectively hallucinating quality issues?
Is ChatGPT Getting Worse?
It’s something many ChatGPT users have wondered: is ChatGPT getting dumber? OpenAI releases frequent updates to ChatGPT designed to tweak its responses, safety, and more, using user feedback, prompts, and user data to inform its direction.
But where ChatGPT felt like a genius solution to almost any problem when it launched, more users than ever report issues with its responses and output. Particularly of note are ChatGPTs reasoning, coding, and mathematical skills, though others note that it struggles with creative tasks, too.
The easiest way for most ChatGPT users to check how its responses have changed over time is to repeat a previously used prompt (preferably from the earlier days of ChatGPT) and analyze the two outputs.
Responses requiring specific outputs, like those involving coding and math, are likely easiest to compare directly.
Stanford Study Suggests ChatGPT Drop Off
A combined Stanford University and UC Berkley research group believes those feelings that ChatGPT is changing could be right. Lingjiao Chen, Matei Zaharia, and James Zou’s paper How Is ChatGPT’s Behavior Changing over Time? [PDF] is one of the first in-depth studies into ChatGPT’s changing capabilities.
The report summary explains:
We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly, GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. GPT-4 was less willing to answer sensitive questions in June than in March, and both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March.
When presented with math problems ChatGPT could previously solve earlier in 2023, responses later in the year were wildly inaccurate. Furthermore, ChatGPT explained in detail why the answer was correct despite being wrong. Instances of AI hallucination are nothing new, but the figures in the charts below suggest a significant change in overall reasoning.
The charts suggest ChatGPT’s responses are drifting, further emphasized by the report.
GPT-4’s accuracy dropped from 97.6% in March to 2.4% in June, and there was a large improvement of GPT-3.5’s accuracy, from 7.4% to 86.8%. In addition, GPT-4’s response became much more compact: its average verbosity (number of generated characters) decreased from 821.2 in March to 3.8 in June. On the other hand, there was about 40% growth in GPT-3.5’s response length. The answer overlap between their March and June versions was also small for both services.
The report explains that ChatGPT’s large language model chain of thought “did not work” when presented with the questions in June. Conversation drift has always been a notable issue with LLMs, but the extreme variance in responses suggests performance issues and changes to ChatGPT.
Is ChatGPT Getting Worse? OpenAI Says No
Is it just a coincidence that both casual and prolific ChatGPT users are noticing ChatGPT’s changing quality?
The research paper would suggest not, but OpenAI VP for Product, Peter Welinder, suggests otherwise.
Furthermore, Welinder later pointed to OpenAI’s releases for ChatGPT and the constant stream of updates the company has delivered throughout 2023.
Still, that didn’t stop numerous responses to his tweet detailing how users have found ChatGPT’s responses wanting, with many taking the time to annotate prompts and responses.
Can OpenAI Restore ChatGPT to Its Original State?
The early days of ChatGPT seem long in the distance now; November 2022 is a hazy memory, and the world of AI moves fast.
For many, the Stanford/Berkeley study perfectly illustrates the issues and frustrations of using ChatGPT. Others claim the tweaks and changes made to ChatGPT to make it a safer, more inclusive tool have also directly altered its ability to reason properly, nerfing its knowledge and overall capabilities to the point it’s unusable.
It seems there is little doubt that ChatGPT has changed. Whether ChatGPT will regain its original prowess is another question entirely.