What Is a Polygraph Test
A polygraph machine is used to attempt to detect physiological changes that are thought to occur when a person tells a lie.
These measures include the blood pressure, the amount of sweating on the palms and the heart rate. Polygraph testing is currently being used to investigate crime in a number of countries around the world,including the USA, Japan, South Korea and Israel (Raskin, 1990). At present there is a wide-ranging literature evaluating the use of the polygraph machine and associated techniques, largely based on laboratory experimentation.Like any psychometric test, the use of a polygraph machine in the detection of deception has been questioned on the basis of its reliability and its validity. Reliability refers to the ability of a test to be consistent, no matter who is carrying it out. The question of validity asks whether the test is actually measuring what it claims to measure. The answer to both of these questions is vital in answering whether the polygraph should be introduced into British policing. The polygraph machine has been used in a number of different ways to elicit useful information from suspects. Bull, Gudjonsson, Hampson, Baron, Rippon, & Vrij (2004) identify four main techniques: the Relevant/Irrelevant Technique, the Directed Lie Test, the Control Question Test, and the Guilty Knowledge Test. Of these, the majority of research has addressed the last two and so the discussion will concentrate on these.
The Control Question Test
The theory behind the Control Question Test (CQT) is that the physiological responses of a suspect to control questions are compared with those which are directly relevant to the crime. Control questions are specifically chosen to be vague in nature and to relate only indirectly to the crime under investigation (Iacono & Patrick, 1997). This means that they should provoke high levels of physiological arousal in innocent
suspects as they are designed to elicit guilty memories - but those that are not under investigation. By contrast, to an innocent interviewee, the specific questions about the crime should evoke lower physiological arousal as they can be categorically denied. To the guilty interviewee, however, the reverse pattern should be seen with higher physiological response seen to the more specific questions. The basic paradigm for assessing the polygraph test used in laboratory investigations involves a 'mock crime' with participants randomly told to act either innocent or guilty. Raskin (1982), for example, explains that the guilty participants enact the mock crime, while the innocent participants simply have the facts relayed to them. Both groups are given a cash incentive to pass the test, and this goes some way towards giving them the required motivation to pass the test. Many of the earlier studies used the CQT test and found some encouraging results. Carroll (1988) summarises some of these studies, referring first to the Office of Technology Assessment of the United States Congress (1983) which rounded up 14 studies which found an overall accuracy level of 88.6% in the guilty participants, and 82.6 in the innocent participants. However, Carroll (1988) criticises this assessment as some of these studies had flawed methodologies. Instead, using stricter criteria, the figures of 85.4% for guilty and 76.9% for the innocent were found. Carroll (1988) makes two important points about most of these studies. Firstly, there was a fairly high rate of false positives of around 20-25% - instances where the participant was 'innocent' but pronounced guilty. Secondly, the polygraph operators also have their own visual information to go on when carrying out the test, they are not simply relying on the physiological data. This means that the results cannot be fully attributable to the polygraph as the human operator could be partly acting as a 'lie detector'. The most obvious criticism of these kind of studies is that of ecological validity. The test itself relies on the emotional reactions of the participants - how likely is it that monetary inducements are equivalent in motivational terms to the chance of being convicted of a crime? For this reason, MacLaren (2001) points out that the participants have little reason to be worried about the 'important' questions and are unmotivated to try and beat the test - unlike a real guilty suspect. Field studies, then, have attempted to fill this gap, but immediately the problem arises of how it is possible to measure whether a person is really guilty or innocent. In reviewing the data on field studies, Carroll (1988) found that generally the accuracy rates were low at 69.6% - comparing to the 50% obtainable by chance this does not seem high. In addition, there was a very high rate of false positives - 43%. More recent field studies have been reviewed by Bull et al. (2004), who find better average figures for those guilty suspects at between 80% and 90% accuracy, but still poor results for innocent suspects, with false positives ranging from 12% to 47% accuracy. The theoretical problems with the CQT have been pointed out by Ben-Shakhar (2002), amongst others. The whole design of the test is such that the operator of the polygraph is trying to deceive the suspect - something that may be perceived as unethical. It is still possible to imagine good reasons for why an innocent suspect would show arousal to the specific questions - these are still anxiety provoking questions. There is little evidence that this test is standardised, in that the control questions that are asked in each interview are different. This means that much variability in the accuracy of test is probably due to the operator - this reduces the theoretical reliability of the test.
The Guilty Knowledge Test
False positives, then, are one of the major problems with the CQT. The Guilty Knowledge Test (GKT) has been shown to meet this challenge. The GKT is designed to try and uncover whether the interviewee is withholding information about a crime under investigation. This involves asking the suspect a number of specific questions about the crime, each question having a number of alternatives, only one of which is correct. The operator then looks for a pattern of physiological responses to the correct option across the whole test. This test is much more difficult to apply, mainly because it requires the test operator to know a number of facts about the crime that she must be reasonably sure that the guilty suspect would also know - these would tend to be details, although does not exclude major facts. A range of reviews have been carried out on the GKT in laboratory conditions. Ben-Shakhar and Furedy (1990) found accuracy rates of 84% for guilty participants and 94% for innocent participants. Elaad (1998) found rates of 81% for guilty and 96% for innocent. While these are encouraging, again it is the field studies that are more convincing because of ecological validity. Only two of these have been carried out. Elaad (1990) found rates of only 42% for guilty participants but 98% for the innocent. Similarly Elaad, Ginton & Jungman (1992) found 76% for guilty and 94% for innocent. Ben-Shakhar, Bar-Hillel, & Kremnitzer, (2002) defend the low results for guilty suspects, claiming that they were carried out under sub-optimal conditions, being just after a CQT had been carried out and only involving an average of 1.8 questions. Overall though, levels of false positives are much lower for the GKT than the CQT. Perhaps the biggest criticism of the GKT relates to how useful it is in a practical sense. The nature of the test requires that the interviewer has been able to amass half a dozen items of knowledge that the guilty person would be aware of that would not be recognised by an innocent person. In addition, it is not always possible to be confident that the suspect will have remembered or even noticed the particular details which the operator refers to. Bull et al. (2004) makes the point that, in high profile cases, details are often released to the public to aid the solving of the crime, which will make the interviewing of the suspect even harder using a GKT, as innocent suspects will know many more details of the crime, making the choice of details for interview more obscure. The advantage of the GKT is that in evaluating its theoretical underpinnings, some researchers have made much stronger claims for it than the CQT (The Committee to Review the Scientific Evidence on the Polygraph, National Research Council, 2003). The reason for this is that the GKT relies on the response being greater to a particular subset of the questions relative to whatever the physiological response is to the other questions (Carmel, Dayan, Naveh, Raveh & Ben-Shakhar, 2003). This is unlike the CQT where variations in the physiological response of the suspect will tend to disrupt the test. In addition, the GKT does not rely on duping the suspect. The GKT does also have practical advantages. Ben-Shakhar et al. (2002) point out that a problem for the admissibility of polygraph tests in court is that they can become contaminated. In practice, a polygraph operator has the evidence of his eyes as well as the polygraph machine to go on. This may mean that the operator does not entirely base his decision on the physiological data. The advantage of the GKT is that it is much more easy to carry out blind, or for another polygraph tester to simply look at the physiological evidence.
Counter measures and Base Rates
Two other criticisms that apply more generally to all the different types of polygraph tests, are the effects of countermeasures and of base rates. Countermeasures refers to attempts to beat the polygraph test, these Gudjonsson (1988) classifies in three ways: reducing reactivity, suppressing physiological reactions and augmenting physiological reactions. According to Ford (1995) a man named Floyd Fay was able to successfully train 23 of 27 fellow inmates to beat the polygraph test in 20 minutes despite their admission of guilt to crimes for which they had been incarcerated. On the problem of base rates, Bull et al. (2004) point out that the kinds of situations in which polygraph tests are used may mean that there are a large number of suspects to test. This will exacerbate the problems of false positives, although, perhaps, is not such a problem in forensic situations as numbers are more likely to be limited.