History of the Turing Test
If you’ve ever wondered at the origin of Artificial Intelligence and how it came to dominate all aspects of human existence, you’re not alone. Named after Alan Turing, who pioneered machine learning during the 1940s and 1950s. The Turing test was introduced in a paper called “Computing Machinery and Intelligence” at the University of Manchester to test a machine's ability to exhibit intelligent behaviour. This was the beginning of man’s escape into the world of AI.
In his paper, Turing proposed a twist on what is called “The Imitation Game”. It involved no use of AI, but rather three human participants in three separate rooms. Each room was connected via a screen and keyboard. One room containing a male participant, the other a female, and the other containing a male or female judge. The female tries to convince the judge that she is the male, and the judge tries to disseminate which is which.
Turing later changed the concept of this game to include an AI, a human and a human questioner. The questioner’s job is then to decide which is the AI and which is the human. Although the test is not definitive, there are some strengths to it. Since its’ formation, many AI have been able to pass; one of the first is a program created by Joseph Weizenbaum called ELIZA.
The Turing Test and others like it are primarily tests of intelligence. Basically, the art of understanding. In humans, intelligence is most often tested based on a subject’s capacity to apprehend the interrelationships of presented facts in order to guide action toward a desired goal. However, current AI systems have demonstrated the capacity for attaining or exceeding defined test goals for intelligence.
AI game playing systems with access to an applicable database has proved superior to the masters-level human competition in applied tests of intelligence in games like chess, checkers, scrabble, backgammon, jeopardy, go, and many more. While, on occasion, humans can still defeat artificial gaming systems, it is expected that in the near future AI systems will surpass human capacities in almost every applied test for “intelligence.”
Alan Turing's 1950 paper laid out the general idea of the test, and also laid out some specifics which he thought would be passed "in about 50 years' time": each judge has just five minutes to talk to each machine, and the machines passed if more than 30% of the judges thought that they were human. Those somewhat arbitrary, if historically faithful, rules were the ones followed by the University of Reading.
It remains impressive that Eugene had 33% of the judges "he" spoke to convinced of his humanity, but the robots still have a long way to go to pass the gold standard of modern Turing tests, using rules laid out in 1990 by the inventor Hugh Loebner. Those rules call for the computer and a human to have a 25-minute conversation with each of four separate judges. The machine only wins if it fools at least half the judges into thinking it's the human (though every year there is a "bronze medal" awarded to the machine that convinces the most judges).
The hardest Turing test described so far is one set up as part of a $20,000 bet between the futurologist Ray Kurzweil and the Lotus founder, Mitch Kapor. Kapor bet that no robot would pass the test before 2029, and the rules call for the challenger and three human foils to have two-hour conversations with each of three judges. The robot must convince two of the three judges that it is human, and be ranked as "more human" on average than at least two of the actual human competitors.
How do the robots win?
Turing test competitions have been held for more than 20 years, and the strategies the robots employ have changed over time. Where originally the stumbling blocks were simply understanding the questions asked by the judges, now the bigger challenge is in answering them in a human-like manner. In recent years, winners have started changing the subject, asking questions of the judges, and simulating moods and typos.
The big breakthrough behind Eugene, the University of Reading's winner, was in giving the robot the persona of a 13-year-old boy. "Our main idea was that [Eugene] can claim that he knows anything, but his age also makes it perfectly reasonable that he doesn't know everything," said the robot's creator, Vladimir Veselov. It also makes affectations like misspellings look more plausible than they would be coming from an "adult".
Limitations of the Turing Test
While the Turing test has its strengths, it has been criticized over the years to be limiting in its’ scope. This is due to the fact that for machines to meet or exceed the defined test goals, questions have to be limited to Yes or No answers and simple phrases. When questions were open-ended and required conversational answers, it was less likely that the computer program could successfully fool the questioner.
In addition, a program such as ELIZA could pass the Turing Test by manipulating symbols it does not understand fully. John Searle argued that this does not determine intelligence comparable to humans.
To many researchers, the question of whether or not a computer can pass a Turing Test has become irrelevant. Instead of focusing on how to convince someone they are conversing with a human and not a computer program, the real focus should be on how to make a human-machine interaction more intuitive and efficient. For example, by using a conversational interface.
Variations and Alternatives to the Turing Test
There have been a number of variations to the Turing Test to make it more relevant. Such variations include:
Reverse Turing Test: Where a human tries to convince a computer that it is not a computer. An example of this is CAPTCHA.
Total Turing Test: Where the questioner can also test perceptual abilities as well as the ability to manipulate objects.
Minimum Intelligent Signal Test: Where only true/false and yes/no questions are given.
Alternatives to the Turing test were later developed because many saw the Turing test to be flawed. These alternatives include tests such as:
The Marcus Test: In which a program that can ‘watch’ a television show is tested by being asked meaningful questions about the show's content.
The Lovelace Test 2.0: This is a test made to detect AI through examining its ability to create art.
Winograd Schema Challenge: This test asks multiple-choice questions in a specific format.
The phrase “The Turing Test” is also sometimes used to refer to certain kinds of purely behavioural allegedly logically sufficient conditions for the presence of mind, or thought, or intelligence, in putatively minded entities. So, for example, Ned Block's “Blockhead” thought experiment is often said to be a (putative) knockdown objection to The Turing Test. (Block (1981) contains a direct discussion of The Turing Test in this context.) Here, what a proponent of this view has in mind is the idea that it is logically possible for an entity to pass the kinds of tests that Descartes and (at least allegedly) Turing have in mind—to use words (and, perhaps, to act) in just the kind of way that human beings do—and yet to be entirely lacking in intelligence, not possessed of a mind, etc.
There are at least two kinds of questions that can be raised about Turing's predictions concerning his Imitation Game. First, there are empirical questions, e.g., Is it true that we now—or will soon—have made computers that can play the imitation game so well that an average interrogator has no more than a 70% chance of making the right identification after five minutes of questioning? Second, there are conceptual questions, e.g., Is it true that, if an average interrogator had no more than a 70% chance of making the right identification after five minutes of questioning, we should conclude that the machine exhibits some level of thought, or intelligence, or mentality?
There is little doubt that Turing would have been disappointed by the state of play at the end of the twentieth century. Participants in the Loebner Prize Competition—an annual event in which computer programs are submitted to the Turing Test— had come nowhere near the standard that Turing envisaged. A quick look at the transcripts of the participants for the preceding decade reveals that the entered programs were all easily detected by a range of not-very-subtle lines of questioning. Moreover, major players in the field regularly claimed that the Loebner Prize Competition was an embarrassment precisely because we were still so far from having a computer program that could carry out a decent conversation for a period of five minutes. It was widely conceded on all sides that the programs entered in the Loebner Prize Competition were designed solely with the aim of winning the minor prize of the best competitor for the year, with no thought that the embodied strategies would actually yield something capable of passing the Turing Test.