Nicolaus Copernicus Superior School

ChatGPT-4.5 passes the Turing test

In March 2025, researchers at the University of California, San Diego, Cameron R. Jones and Benjamin K. Bergen, published the results of an experiment in which four large language models (LLMs) – ELIZA, GPT-4o, LLaMa-3.1-405B and GPT-4.5 – were put through the Turing test. The purpose of the study was to see how well these models can mimic human communication in a way that is indistinguishable to humans. 

Course of the experiment 

The study involved 284 participants who played the roles of either interrogators or witnesses. Each interrogator conducted simultaneous five-minute text conversations with two witnesses: one human and one language model. At the end of the interaction, participants had to indicate which interlocutor was a human, and which was a chatbot. 

The results showed that the GPT-4.5 model was misidentified as a human 73% of the time, meaning that it was recognized as a human interlocutor more often than an actual human. The LLaMa-3.1-405B model scored 56%, while older models such as ELIZA and GPT-4o were mistaken for a human in 23% and 21% of cases, respectively. 

Significance of the Turing Test 

The Turing Test, proposed by Alan Turing in 1950, was designed to assess a machine’s ability to exhibit intelligent behavior indistinguishable from human behavior. In the classic version of the test, a participant has a conversation with a machine and a human, without knowing who is who, and his task is to guess which of the interlocutors is human. 

Modern interpretations of the test focus on the ability of language models to mimic human communication. In the case of GPT-4.5, its success on the Turing test does not mean that the model has human-level awareness or understanding, but that it can successfully mimic human linguistic behavior in a specific context. 

Controversies and limitations 

Despite its impressive results, the Turing test as a measure of artificial intelligence is controversial. Critics point out that the test only assesses the ability to imitate human behavior, without considering deeper understanding, awareness or intent. In addition, success in the test may be due to the model’s appropriate adaptation to a specific task, rather than general intelligence. 

In the case of GPT-4.5, its ability to convincingly imitate a human was achieved due to, among other things, the use of so-called “persona prompts,” instructions directing the model to assume the identity of an introverted young man using Internet slang. 

Social and ethical implications 

GPT-4.5 crossing of the Turing test threshold raises questions about the potential uses and abuses of such technologies. Models capable of imitating human communication can be used in education, therapy or customer service, but also in disinformation, manipulation or fraud. 

Researchers stress the need to develop regulations and ethical standards for the use of advanced language models. It is crucial to ensure transparency, accountability and public awareness of the capabilities and limitations of artificial intelligence. 

Sources: arXiv, Live Science, v45.diplomacy.edu, Futurism

Skip to content