GPT-3 Can Solve Intelligence, Standardized Tests Like College Undergraduates, UCLA Psychologists Say

GPT-3 performs well in solving standardized and intelligence tests. According to psychologists from the University of California, Los Angeles (UCLA), its performance is comparable to undergraduate students.

GPT-3 Performs Like College Students in Solving Tests

According to research by UCLA psychologists, the GPT-3 artificial intelligence language model performs about as well as college freshmen when given the kind of reasoning difficulties that frequently appear on standardized tests and IQ assessments like the SAT.

The same problems were given to 40 UCLA undergraduate students. The study's senior author, UCLA psychology professor Hongjing Lu, was surprised to find that GPT-3 performed about as well as humans and also made mistakes that humans do, EurekAlert reported.

GPT-3 answered 80% of the problems correctly, significantly higher than the average score of just under 60% achieved by human test takers but still within the bounds of the highest human results.

Additionally, the researchers asked GPT-3 to respond to a series of SAT analogies questions that, in the researchers' opinion, had never been made available online. As a result, the questions were unheard of in GPT-3's training data. When answering the questions, users are asked to choose word pairs with similar associations. For example, the answer to the phrase "Love is to Hate as Rich is to Which Word?" would be "Poor."

They discovered that the AI fared far better than the average score for humans when they compared the GPT-3 results to published results of SAT scores of college applications.

Afterward, the researchers gave GPT-3 and student participants analogy problems based on short stories, asking them to read one paragraph and choose another with the same meaning. On such issues, the technology fared worse than students, despite GPT-4, the most recent version of OpenAI's technology, outperforming GPT-3.

No matter how remarkable our findings may be, it's crucial to stress that this approach has significant drawbacks, according to Taylor Webb, a postdoctoral psychology researcher at UCLA and the study's first author. She noted that it is capable of analogical reasoning but not of human-level simplicity, such as employing tools to do a physical task. Things it suggested when offered those kinds of problems-some of which kids can solve quickly-were absurd.

The Raven's Progressive Matrices exam, which asks the subject to guess the next image in a challenging arrangement of forms, served as the model for the collection of tasks Webb and his colleagues used to evaluate GPT-3's capacity to handle them. Webb transformed the photos into a text format that GPT-3 could understand to let GPT-3 "see" the shapes. This method also ensured that the AI had never seen the queries before.

The study was published in Nature Human Behaviour.

ALSO READ: Can ChatGPT Replace Human Brain? AI Tool Generating Content Comes With a Price

ChatGPT Is More Empathetic Than Human Doctors

In a previous report from Science Times, researchers compared the responses from the AI chatbot, ChatGPT, and physicians after they were given 195 medical questions to assess. The researchers concluded that ChatGPT's responses were ten times more empathetic.

The results encouraged further investigation into using AI assistants in communicating with patients. They believed that the chatbot could respond promptly to queries which might lead to fewer unnecessary clinical visits.

However, according to Anthony Cohn, a professor of automated reasoning at the University of Leeds in the UK, it would be dangerous to rely on any factual information offered by a response from a chatbot, given the propensity for chatbots to "hallucinate" and make up facts. So, he advises having a real doctor review the chatbot's response.