GPT-4 and the Turing test

I don’t know if the original Turing test was based on just one human participant, or if more than one was used, how those humans were chosen.

Scientists decided to replicate this test by asking 500 people to speak with four respondents, including a human and the 1960s-era AI program ELIZA as well as both GPT-3.5 and GPT-4, the AI that powers ChatGPT. The conversations lasted five minutes — after which participants had to say whether they believed they were talking to a human or an AI. In the study, published May 9 to the pre-print arXiv server, the scientists found that participants judged GPT-4 to be human 54% of the time… GPT-3.5 scored 50% while the human participant scored 67%.

livescience.com

Should the criteria to pass be that 51% of a large random sample of humans could not correctly identify computer vs. human? How bad would the results have to be for the control (identifying the human as human) before we would conclude that the Turing test no longer makes sense?

It’s interesting that the Turing test is presented as a test of intelligence, but many of the things that apparently make computer conversationalists convincingly human are in fact cognitive biases, logical errors, and the appearance of emotionally-influenced decision making. These might be things you would look for if you wanted a computer to be a friend, but they are not things I would like for if I wanted a computer to counter my less rational human impulses and help me make more rational decisions.

Leave a Reply

Your email address will not be published. Required fields are marked *