ChatGPT For Coding? AI Algorithm Gets 52% of Programming Answers Wrong, Study Finds

While programmers have been flocking to AI chatbots like ChatGPT for coding assistance, a new study shows that ChatGPT got 52% of programming answers wrong when tested with various questions.

Considering how the AI algorithm is a program that people are trying to depend on for precision and accuracy, this is a staggering high degree of incorrectness and inaccuracy. The researchers stress how important it is to critically asses content generated by AI.

ChatGPT Gets 52% of Programming Answers Wrong

The findings were noted in the "Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions" study and were presented during the Computer-Human Interaction conference.

The findings underline the experiences of other end users, such as teachers and writers. AI programs like ChatGPT tend to offer answers that are completely incorrect.

As part of the study, the researchers from Purdue University examined 517 questions from Stack Overflow and looked into the attempts of ChatGPT to answer the questions.

The researchers noted that they discovered that 52% of the answers given by the bot consisted of misinformation. Moreover, 77% of answers were found to be more verbose compared to human answers, while 78% were found to have varying inconsistencies with human answers.

ALSO READ : Doctors Warn Against Using ChatGPT for Medical Advice; AI Chatbot Unreliable, Makes up Health Data

ChatGPT Trusted Amidst Blunders

The researchers also conducted a linguistic analysis covering 2,000 randomly chosen ChatGPT answers. They discovered that the answers were more analytical and formal and that they portrayed lesser levels of negative sentiments.

They also found that ChatGPT had a higher likelihood of committing conceptual errors compared to factual ones. The study discovered that several answers are wrong due to the incapability of the bot to grasp the underlying context of the question given.

The researchers also polled 12 programmers, which is quite a small sample size, and discovered that 39% preferred ChatGPT. They also found that 39% were not able to detect AI-generated mistakes.

The study authors noted that the semi-structured interviews that followed showed text-book style, articulate, and polite language, coupled with comprehensiveness, made the answers of the bot appear more convincing.

This tended to make participants lower their guard and overlook misinformation present in the answers of ChatGPT.

When it came to errors that ChatGPT made, end users could only pinpoint them when the mistakes were obvious.

However, when the responses could not be verified easily or had higher complexity, users were unable to find inaccuracies and underestimate the gravity of the errors.

The study shows the major flaws that are still present in the AI algorithm. It also emphasized how important it was to communicate better when it comes to the accuracy and correctness of answers that are AI-generated.

While several language models say that they could make some mistakes, the researchers think that such a notice is insufficient. They recommend that disclaimers be included with the answers, noting the degree of uncertainty and incorrectness.

The study also stressed how important it is to assess AI-generated content critically, especially when it comes to areas where high precision is required.