9UNP1193SD_Anthony_Wagner_Transcript
University Place: Memories, Lies and the Brain
05/07/15
– Anthony has a bachelor’s degree in psychology from UCLA magna cum laude and with the highest honors in the department, that was the psychology department. He then went to Stanford, where he has a Ph.D degree. That degree is in psychology, also. He has an enormous engagement with journals. He is an editor, has been an editor in some context on a multitude of journals. He’s been a reviewer for more than 40 journals. He’s been a panel reviewer for NIH, NSF. I don’t know how Anthony does it. If you look at his CV, which I looked at to prepare what I’m saying, I was just bowled away. There is just so much activity and such a background of accomplishment.
In addition to doing all of this, he left Stanford to come to the East Coast to be, as I just said, a faculty member in the department of brain and cognitive science at MIT. He stayed there for a few years, then he went back to Stanford and worked his way up where he’s now a professor at Stanford. He’s a faculty affiliate in the neuroscience program, the symbolic systems program, the longevity program and the human biology program. So in addition to being a faculty member with all of the responsibilities of a faculty member, he’s an affiliate in all of these centers as well. He’s had so many publications that I just estimated them, although he did pay me the service of numbering them. 108 peer-reviewed publications and more than 100 invitations to speak. To say that Anthony is well regarded and popular would be the understatement of the semester.
Today he’s gonna talk to us about something which is incredibly important, especially if you read the news, about all of the judicial work that is going on. He’s going to talk, to try to help us understand whether brain imaging, in a judicial court context, can be used to detect whether witnesses are misremembering what happened or lying. For those of you who have been following Bridgegate in New Jersey, you may be aware that there were three principals, two of whom had been indicted and are waiting for trial, and one has pleaded guilty. The two that are waiting are saying that the one who pleaded guilty is lying. We’ll get him in an MRI machine to see, right?
(laughs)
So Anthony will tell us whether this issue could be resolved with brain imaging.
(applause)
– Thank you.
(applause)
Thank you for that very kind introduction. It’s an honor to be here. Today I’ll be, as when I put on my sort of basic scientist hat, I’m a systems neuroscientist, cognitive neuroscientist who focuses on memory systems, with a particular emphasis on the kind of memory system that builds memories for life events like this, episodic memories. Over the years, as we get a little bit longer in the tooth, one begins to have an opportunity to kind of look outward from your basic science to start to wrestle with ways in which your science might have applications for societal problems.
What I’m gonna talk about today is kind of an atypical body of work for me. It’s sort of this extension of my lab’s work, along with some collaborative work with Martha Farah and Liz Phelps, to try to wrestle with the emergence of functional brain imaging, the combination of functional brain imaging with new analysis techniques that might allow us to say something about the contents of people’s minds, looking in from the outside, and to try to wrestle with whether this class of data can and/or should be used in the courts as it relates to memories and lie detection.
To start, let me tell you about two cases that were kind of formative to my thinking and kind of motivated me to be willing to do some work in this area. The first is the case of Aditi Sharma, who came, I think, to the international community’s attention in 2008 when she was convicted of murder. Sharma had a romantic partner, and the romantic partner was killed by arsenic. Sharma was a suspect. There was, as I understand it, largely circumstantial evidence Sharma was brought into a state-run forensics lab that collects scalp EEG data.
Scalp EEG data were recorded, collected while she was presented auditorily sentences that were statements about facts of the crime. Based on the scalp EEG data, the forensics lab neuroscientist expert testified in court that the brain data indicate that she had guilty knowledge. She has evidence of experiential participation in the crime. She had evidence of memories for having gone and bought the arsenic, having poisoned the deceased, and being emotionally relieved after the fact.
These brain data were actually brought to bear in the courtroom. It was trial by magistrate. If you read the judge’s ruling, it was clear that the brain data were heavily persuasive in deciding that, in fact, she was guilty. She ultimately, in my understanding, was ultimately out on appeal. It’s not clear to me whether she was, again, reconvicted. In 2008, this case came to my attention, reporting on CNN, New York Times and elsewhere. Can brain data, can brain imaging, be it scalp EEG or functional MRI, which many in this room use and which my lab heavily relies on, can this class of data be used to say anything about the mental state of an individual with respect to their memories?
Another case. Case of U.S. versus Lorne Semrau. Semrau ran a number of nursing homes. He was charged with fraudulently billing Medicare on the order of, apparently, over $3 million worth of false billings, if I understand correctly. He’s coming to trial or he’s coming to an appeal trial. There’s a case here in Tennessee in the U.S. Court of Appeals for the Sixth Circuit where he wants to enter fMRI-based lie detection evidence collected by Cephos, one of the two companies at the time that were commercially selling fMRI lie detection services. He wants to enter evidence to back up his claim that while he acknowledges that he billed the Medicare all these charges, that his intent seven years prior was not to defraud the government. He wants to enter this evidence in his defense. Should this class of evidence be permitted in the courts?
Over the last, maybe the last eight years, 10 years, there began to be this explosion of cases, mini explosion, but nevertheless, what appeared to be an explosion of cases where individuals were trying to bring to bear largely, though not exclusively, fMRI evidence as evidence regarding their mental states as part of, often, their defense, but as part of a court proceeding.
Now, part of this motivation, undoubtedly, and the excitement and the potential of being able to draw neuroscience data to inform the courts has come from the remarkable progress that has been achieved over the last decade, if not two decades, but the last decade, heavily in the area of using functional magnetic resonance imaging. Some of us in this room were just reminiscing about how it’s hard to imagine predicting where we are today in terms of what we can do and what kinds of information we can detect with functional imaging. It’s hard to image making those predictions at the time when the field was just launching, where we were really struggling just to get initial evidence for signal and visual cortex when somebody is looking at visual stimulus, or what was even harder at the time, signals related to memory.
We’re now at the point, and this is just gonna be highlighting a data point from my lab, but there are data points from Brad Postle’s lab, other labs here with the amazing science being done here at the University of Wisconsin, as well as others in the field, we’re now at the point where we can collect functional brain data. As somebody is looking at something, looking at stimuli, such as looking at pictures of famous individuals or famous locations along with the names, the labels, and we can take those brain data and we can teach a pattern classifier to discriminate between the brain patterns associated with seeing and perceiving a face versus seeing or perceiving a landmark or a location. We can get, in some sense, context, near-perfect decoding on held-out data, on test data, we can get near-perfect decoding by just looking at the brain data and feeding that into the classifier, what the person was looking at.
Moreover, if you take the verbal cues here, such as Steve Martin or Taj Mahal, we use those cues as retrieval, and you probe the person to try to bring to mind an image of the thing that had been previously encountered with that stimulus. Sorry, let me, one correction on this. The pictures were labeled, but we also have this other verbal cue, such as lamp and Steve Martin, rose and the Taj Mahal.
So you take these neutral, non-labeling visual inputs, and you use them as cues back to the learning moment, and probe the person to try to remember what had been associated with the word. Then we can take the brain patterns collected during those moments of retrieval, and we can ask, “Can we read out whether or not “the person is remembering?”
If they’re remembering, can we read out whether they can remember sort of category A, that they saw a landmark, or category B, that they saw a scene, and does the degree of evidence for bringing to mind this perceptual information, does it scale with the rememberer’s subjective report of how precise or vivid they believe their memory is?
So turns out, when you’re taking brain data on trials in which the subject states they can’t remember, they don’t know what they studied with the word, we can’t figure out, by looking at, we can’t decode from their brain patterns whether or not the person learning had encountered a face or a landmark with the word.
When they state that they can generally remember the category of stimulus that had co-occurred with the word, well, we’re now sorta 67% accurate. We can look at brain patterns at the time of remembering, and we can read out, we can decode whether or not the person is remembering having previously encountered a face or a place, and when they state, “Not only can I “remember the general category, I can remember “the specifics of the image. “It’s much crisper and more precise and vivid,” classification, decoding is even higher. Here we’re at 75% in this study.
The numbers don’t matter. These data, and many other observations in the literature, have given rise to this excitement about the potential to read out the mental states of individuals using functional brain data. So there’s been a lot of not only efforts to try to, perhaps, leverage these data to get them into the courts, but there’s been a lot of writings in the neuroscience and the legal, scholarly community.
Around the time of the Sharma case, the conviction based on putatively brain-based memory evidence, around that time, this was before the Semrau case, but the two companies, No Lie MRI and Cephos, were launched, there was a call by the editors of “Nature Neuroscience,” one of the leading neuroscience journals, to neuroscientists to get out of the lab, to step up and to, essentially, do, hopefully, an unbiased, critical assessment of what can we do with these schools, to the extent that they’re gonna begin to be adopted by society.
Well, there should be some understanding of what, perhaps, is or is not, or should or should not be permitted. It was around this time that the MacArthur Foundation’s Law and Neuroscience Project got launched. It was launched in 2007. Some of these ideas and other ideas were beginning to percolate amongst neuroscientists, as well as amongst philosophers, neuroethicists and legal scholars.
MacArthur Law and Neuro Network has been around for two phases, we’re in the second phase. The first was Law and Neuroscience Project, and now we’re in the second phase of Research Network on Law and Neuroscience, trying to tackle particular problems. But the goal of the initiative was to get this interdisciplinary group of individuals in a room and begin to talk amongst ourselves and try to begin to wrestle with ways in which neuroscience might begin to being applied, and whether we should begin, whether we should do some of the critical assessment work that “Nature Neuroscience” was calling for, as well as to imagine ways in which neuroscience data might actually be helpful to the courts in other contexts.
So what I’m gonna talk about today is, in essence, work that my lab has done over the last few years as part of the MacArthur Law and Neuroscience Network, Research Network, where we’re first conducting original science, trying to use fMRI, using fMRI and pattern analyses to see how well one can read out the memory states of individuals, and what the boundary conditions are.
Then two, to critically assess and to reflect on, what I hope is an unbiased way, the existing fMRI literature on lie detection to make some assessment about what that literature may tell us, what inferences or conclusions it permits, and what is needed, whether there are then implications of that for policy and for whether this kind of evidence should get into the courts.
So I’m gonna start with our work on memory detection. What I’m gonna start with is some work that we’ve done. We’ve done a series of studies, first lab-based studies, so very impoverished memories for individual stimuli, such as individual faces, presented to subjects while they’re in a lab, as well as an effort to try to look at memory detection or decoding for real-world autobiographical events. I’ll tell you a little bit about that. Then I’ll describe some of the boundary conditions, what I have here as peril, some of the potential constraints or caveats, and then highlight some unknowns.
We started here with these questions, which we sort of take as potentially, at least two of the legally relevant questions that our law friends might be turning to us neuroscientists to address. Is it possible to detect whether a person recognizes a stimulus as previously encountered or perceives it as novel based on functional brain activity, functional brain imaging? Separate from the first is about the person’s memory state. Are they in the state of remembering or the state of having weak or absent memory? No evidence that they’ve encountered the stimulus before.
Second question is, is it possible to detect whether a person previously encountered a stimulus regardless of their memory state? That is, what’s the actual history of this individual in relationship to the cueing stimulus? Did the individual see that stimulus before or not, independent of whether or not the person remembers? In this first study, led by Jesse Rissman, who’s now at UCLA, we had subjects come in and they performed very simple face memory paradigm.
During the study phase, we didn’t collect brain imaging data. They simply encountered individual faces one at a time for a few seconds. There were 200 faces. At the end of the study phase, subjects moved onto the test phase. We brought ’em in the MR environment. We collected functional magnetic resonance imaging data as they made explicit old-new recognition judgments. So in the scanner, they’re encountering individual faces. Some of the faces are old, they saw ’em during study lists, some of ’em are new. They’re making old-new decisions.
Do they recognize the face as old or not? So for old stimuli, if they recognize the stimulus, this would be a hit. This is an objectively old item that the person recognizes, or they might fail to recognize the old stimulus. This would be a miss, a forgotten trial. We’ve got these objectively new stimuli in which the subject might say, “Yeah, in fact, “I don’t remember seeing that face before.” This is a correct rejection. Or they might have false memory.
They might see a new stimulus and falsely believe that they’ve previously encountered that face before, and this would be a false alarm. In this work, we assumed for it to potentially have some legal relevance. There are a couple of things that the law cares about that often the science community sort of doesn’t, or at least we don’t operate at this level. The law needs to make a determination in this particular case for this particular individual, and often, for this particular event in question.
So, in fact, neuroscience data are needed at the individual subject level and at the individual test item level. Fortunately, it’s, in fact, just this class of questions that multivariate analyses are positioned to address. In all of the work I’ll be describing, I’m gonna describe how we’ve
We’ve taken training patterns, such as a subset of our data on trials in which the subject recognized the face versus trials in which they perceived the face as novel. We’re gonna feed these training patterns into typically a logistic regression classifier. We’re gonna try to teach the classifier to discriminate between these cortical patterns and sometimes subcortical patterns. Then we’re gonna take test data and gonna see how well the classifier does.
I know this is sort of old news for almost everybody in this room. Many folks here at Wisconsin put these methods to very good effect. So we’re gonna take these test data and we’re gonna see how well we do. Let’s start with our first classification problem.
Can we look at brain patterns during the time of a memory decision and know whether or not the person is recognizing an old stimulus as old versus perceiving a new stimulus as new? So this is can we decode whether the trial is a hit or a correct rejection? Plotted here, this is a measure of accuracy classified performance area under the curve.
What’s plotted here is the data for all the subjects. You can see chance is at .5, every subject, taking all of the hits and taking all of the correct rejections, in every subject, we can discriminate whether or not they’re perceiving the old stimulus as old or perceiving the new stimulus as new. I think at the time, I remember being rather struck by the relatively high level of performance. It kind of caught my by surprise, but in hindsight, I realized I just wasn’t being a reflective scientist in that we know that the act of retrieval is a multicomponent event, cognitive and neurobiological event.
There are many things that co-occur when we’re remembering some past event or when we’re recognizing a stimulus that we’ve previously encountered. Not surprisingly, if you do a modified, this is sort of classification using whole brain data. If you use a searchlight approach where you’re trying to classify off of small spheres of 123 voxels, you can get above chance classification for many parts of cortex and also subcortical structures.
So plotted here, you can see we’re getting classification from dorsal, lateral and ventrolateral prefrontal cortex, structures in the parietal cortex. We kinda knew this from the univariate literature looking at, in fact, how overall BOLD signal changes when we’re recognizing stimuli versus perceiving them as novel. Lo and behold, we can leverage those patterns and those macroscopic signal differences to also do this classification on the trial level in an individual.
What I’ve shown you thus far kind of covaries two things. I’ve got old stimuli, stimuli the subject encountered, and I’ve got new stimuli. I’ve got old stimuli that they are subjectively recognizing as old, and new stimuli that they’re subjectively perceiving as below their recognition decision bound, they’re perceiving as novel.
So what we wanted to do is try to pull these things apart. We first asked whether we can detect the memory state, that is whether or not subjects are perceiving a stimulus as old or perceiving it as new, holding the true objective status of the item constant, that is, whether or not they’d experienced the stimulus or not. What I’m plotting here, classification for truly old faces, these are the hits and misses, and asking whether or not we can detect when they recognized the old face versus failed to recognize the old face. In fact, we can, well above chance.
These are new faces, and we’re trying to decode whether or not they have accurate absence of memory for the face or correctly rejecting it, or they have a false memory for this novel face. Here again, we can decode well above chance. Not as well as hits versus correct rejections, but we’re well above chance.
One thing that’s powerful or useful about these classification approaches is that we’re not only getting an out, we don’t just get a binary outcome, this is a recognized event, this is an event perceived as novel, but the classifier can also give us a readout of how confident it is, how much evidence it has on a particular trial for a particular classification. A given trial might fall, the evidence might fall close to the decision bound, and we’d say the classifier is sort of less certain as to what category that brain pattern responds to, maps to, the state of recognition versus the state of perceiving something as novel. Other trials, the evidence might be much further from the bound. There’s high neural evidence for the particular mental state.
We wanted to leverage this to ask the obvious question as to, but to ask the question nevertheless, how does classification performance change as the classifier’s reported confidence increases? Of course, classification should increase, and that’s what’s being plotted here. Plotted in the upper line is classification of hits versus correct rejections using an accuracy metric. On the left are all the trials, and on the right are the top 10% of trials in which the classifier is most confident in its classification. You can see all the points in between. The striking thing for me, in terms of, and I meant to say this at the outset, I’m not advocating the adoption of these tools and the use of these methods out in the wild, in the court. Simply trying to assess them.
What strikes me about this, though, is this is the average across the full sample of subjects, 16 to 18 subjects in this study. Almost every subject is near ceiling in terms of classification when you’re restricting your consideration to just those trials in which the classifier believes it’s got the strongest evidence for a particular memory state. That is striking. You can see this, the levels achieved for the other classifications.
What I just described to you was extremely stripped down, simplified events. You’re coming into the lab, you see a string of faces, a sequence of faces one at a time for a brief period of time. What we wanted to next ask is, well, how well can we do with fMRI to detect memories for rich, complex, autobiographical events?
We had a group of Stanford undergraduates be willing to wear these digital cameras around their necks for a three-week period. These cameras detect motion, they detect changes in light falling on the lens, and they automatically take a picture whenever there’s a change that meets its delta threshold. The important part of this is the subjects can wear these cameras for an extended period of time going about their lives, and they’re automatically, the cameras are automatically taking snapshots of their experiences over that three-week period.
So for each subject, on average, we, at the end of the three-week period, we had 45,000 photographs, through which two very motivated undergraduates pored through to distill down to 180 fairly unique events, as best as we could subjectively discern for each subject. We’ve got this set now of photos that are targeting particular life events that played out on the order of two to five minutes during a particular day over this three-week period. Then we have the subjects come in, and we’re collecting their data, collecting fMRI data as they’re making memory decisions. In this study, we presented a sequence of four photos over a four-second period, after which then they were able to make a memory decision as to whether or not they recognized that sequence as having come from their life or whether it’s a sequence, perhaps, from a different student on campus, they don’t recognize it as part of their past.
There are nuanced judgments we had them make. Right now, won’t get us bogged down in them. I’ll just sort of point out that subjects could encounter old sequences and they can recognize them as coming from their own life, or they can encounter new sequences and classify them accurately as not from their life. If you take these data now and you do classification, here’s just behavioral performance. For time reasons, we’ll flip past that. If you take these data and you do classification, from the get-go, overall trials, partly the memory behavior is very strong in this particular study, memory for which autobiographical events that play out over the order of two to five minutes that are multifeatural, multidimensional, and that have significance for us as individuals is much better than memory for individually presented faces for two seconds in the lab. But nevertheless, taking all trials, without even selecting restricting analyses to those trials in which the classifier was most confident, take all trials, on average, the classifier is sort of already beginning to approach ceiling in terms of performance. Not surprisingly, again, if you do the searchlight classification, there is evidence discriminating between old and new sequences of photos, cueing for one’s personal past or not, from frontal cortex, parietal cortex and many other structures. When we remember, we’re often remembering in different ways. We’re often encountering cues, and we might just perceive those cues on their own as familiar, as more or less familiar. We can kind of make a decision, we can place a decision bound, and any stimulus that elicit sort of a stronger sense of familiarity, we might decide to classify, as well, I’ve encountered that stimulus before, and it it’s below our decision bound, we might classify it as novel. There are other ways in which we remember, which is we take a cue, and it makes contact with a stored associative or conjunctive memory, which we think is situated, critically dependent upon and situated in the hippocampus, and the hippocampus pattern completes out its representation of the event and drives patterns of activation back So allow us to reinstate details and consciously relive and recollect and bring back to mind the details of our past encounter with that stimulus. Both of these recognition states, both of these memory states, a state of familiarity and the state of recollection, are very discriminable in terms of the elicited brain patterns from those subjective sense that we lack memory for some cue or some set of cues. In fact, not surprisingly, you can use these very methods to isolate and to target what structures and where, in cortex and in subcortical structures, is there information that differentiates between one class of memory, the ability to recollect details, versus of memory, this sense of familiarity. We see consistent with theoretical views and models of the role of the hippocampus versus other cortical structures that signals in the hippocampus and patterns in the hippocampus at the time of retrieval, co-occurring with structures in parietal cortex and frontal cortex, are indicative of this act of rich recollection of some past experience. I’m gonna skip over this, the high level on this, just for time purposes. The high level for this is we’re not actually decoding microscopic, fine-grained patterns of activity. We can do just as well decoding your brain, your memory states using your brain data, but feeding your data into a classifier that’s been trained up on brain data from other subjects. We can even do this where you can train the classifier on our simple lab-based face memory paradigm, and you can test it on a completely different group of subjects performing this autobiographical memory paradigm, and we can read out from the brain patterns from the autobiographical paradigm whether or not the person’s remembering or not, using the lab-based subjects’ brain data, and vice-versa. These data, I’m sure many of the neuroscientists in the room may not actually be that surprising. These data suggest that under at least the conditions that we’ve explored, you can actually decode or detect different states of memory using functional imaging data. The next body of work sort of highlights some of the, I think, important caveats here about the method, and might suggest caution in terms of where one moves in terms of real-world use. So the first is this point about true versus false memory, and then the second is about countermeasures. We’ve known, actually, with the advent of functional brain imaging, memory theorists, once there was some initial, basic science work done on true recognition, memory theorists began to immediately turn to this question of, well, what are the neural similarities and the neural differences when people are having memory for some past event that actually had occurred, true memory, versus memory for some past event that actually didn’t occur, false memory? Those data highlighted the commonality and the strong similarity between the mechanisms that build episodic memories that allow us to accurately remember and that also give rise to false remembering, false memories. So in this study, these are the data from the face memory study, face recognition study, the lab-based study. It’s not surprising to us that, in fact, when you look at, use these searchlight classifiers and you look at where you can decode hits from correct rejections, true recognition from correct rejections, the evidence or the areas in the brain that allow you to do that are highly overlapping with the areas in the brain that allow you to, in fact, discriminate between false memories and the perception of novelty or the absence of memory. Again, this is consistent with the pretty rich literature I’d be happy to talk about during the Q and A. This does place a potentially important boundary, condition, on the use of imaging, at least at this point. What I described over on the left is holding the objective status of the item constant. What happens if you hold the subjective status of the trial constant? That is, you’re comparing trials in which the person is perceiving the stimulus as having been previously encountered, and you’re comparing brain patterns for those stimuli that are recognized and that were truly encountered, to those stimuli that were recognized, but that were novel, false recognition. So when you hold subjective memory constant, classification begins to fall closer to chance. In some subjects, above chance, but we’re nowhere near where we are when we’re decoding the subjective state, holding objective status constant.
So, this is sort of comparing hits to false alarms, true recognition to false recognition. It’s much harder to discriminate between these brain patterns for these two mental states. What about misses and correct rejections?
This is thisthe question of if a person fails to recognize this stimulus, can I still use brain imaging to find a residue of past experience on the person’s brain, even though they’re not saying they remember,? I can still discriminate those things that were encountered versus not. There, in
In this study, we were at chance. So we observe poor classification when the memory state is held constant, telling us, again, that we might be able to decode these remarkably, I don’t wanna say qualitatively distinct, but clearly fairly distinct psychological states of remembering versus not remembering. We struggle with actually saying something about the person’s true history or experience.
I’m gonna skip over that and over that. This second caveat is that how a person orients to a stimulus really matters in terms of the kinds of brain patterns that the stimulus elicits. So here’s a study in which we wanted to ask, “What if we removed memory from the equation?”
The subject did not know that we were, that memory was an issue. Could we, under those situations, detect from their brain patterns whether or not the stimulus in front of them had been previously encountered or it was novel? So in this experiment, during the study phase, subjects are incidentally encoding a series of faces, or just making attractiveness ratings.
Then during the first set of scans, we have subjects not make memory decisions, but we have them perform a different judgment. They just decide is the face male or female. Here half of the faces are old and half of the faces are new, and the question is, when the subject isn’t orienting to memory, if memory is not what’s relevant, can we detect which of those faces the subject had actually encountered versus not?
Then to make certain it wasn’t that we had messed up the experiment, we wanted to try to replicate our prior findings, so in. In this group of subjects, we then, in the second set of scans, we had them make explicit memory judgments. That is, they again decided, made old-new decisions.
So here’s the data. We have half as many runs, so we have less data to train the classifier in this group of seven subjects. You can see everybody’s above chance, again, for the explicit old-new discrimination, decoding of whether the stimulus is old or new, presumably because they’re having this subjective experience of recognizing the old ones more often than not, and the new ones perceived as novel.
What about the implicit case? The implicit case, we’re much closer to chance. There’s some hints of evidence that we can read out from sensory cortex, given that these are faces, (unintelligible) form, gyrus, not surprisingly, there’s some hints of evidence that we can obtain above chance classification under implicit retrieval, but the classification doesn’t even exceed 60%.
We know this as memory theorists, memory processes are not automatically triggered. It’s not like you’re sitting here in this room and you look to the left and you see a colleague, and suddenly you have this flood of memories coming out of your memory system, given that cue. Our goal is partly to determine whether and what kinds of memories come out of our memory systems, given a cue.
This creates a final point, which is it creates the possibility that simple countermeasure strategies, goal states, how we interact with stimuli, change the brain patterns that are elicited by those stimuli, it raises the possibility that if somebody wanted to try to beat a classifier and try to manipulate their mental state and associated brain patterns, that might be a possibility.
So in this final study, we’ve begun to explore this, and we have a paper that just was accepted at J. Neuroscience during the study phase, the learning phase, they’re doing intentional learning. They’re encountering these individual faces one at a time. They know that their memory’s gonna be tested later. During the first set of scans, they performed explicit only recognition memory.
At the end of the fifth scan, we tell them, “Oh, by the way, the goal of this experiment “is we’re trying to read out your memory state “from your brain patterns, and we’ve been able “to do this previously. “So for the next set of scans, what we want you “to do is when you recognize the stimulus “in front of you as having been previously encountered, “we want you to quickly pull attention away “from memory, and focus on some novel “perceptual attribute of the face. “By contrast, if you perceive the stimulus as novel,” so in both cases, they have to make a memory judgment first. They have to determine that the item is old or they have to perceive it as novel. “If you perceive the stimulus as novel, “use that face to try to bring back to mind “somebody that it reminds you of in some “past event or experience with that individual.”
So we have them perform these judgments. This is a different rendering of the data. At the left in the light blue is classification. It’s around 70ish or so when people are making, when they’re compliant and they’re making explicit memory decisions, and when they’re engaging these countermeasures, pretty much every subject’s performance falls, and, on average, our classification performance is at chance.
So in subjects in which we can read out memory states when they adopt a countermeasure strategy, classification falls to chance. There’s hints that if you actually, what’s plotted here is the volume-by-volume acquisition time locked to the stimulus, what’s plotted here is classification performance as a function of volume, where zero is the onset of the stimulus. Because we’re dealing with this BOLD response, which is temporally smeared through time, it’s sluggish, you’re seeing multiple TRs out to 10 to 12 and a half seconds after the stimulus.
There’s a hint here that early on, the earliest TRs, we might be able to get above chance classification, even when the subject is adopting a countermeasure, but this is quickly masked by, perhaps, then the countermeasure that they are using, and classification falls to chance. When you do that sort of trick of restricting attention to the trials in which the classifier has the strongest evidence, you can kinda see maybe we can get upwards of 60% or so, but it’s not strong classification, suggesting that, in fact, how we engage with stimuli and strategic efforts to change our internal representations can overcome and mask these very powerful neural patterns associated with recognizing something versus perceiving it as novel.
So here’s what we know so far about memory detection. You can do, under very controlled situations, you can decode the perception that a stimulus is old or new, as well as whether you’re recollecting the stimulus versus perceiving it as novel. It’s much harder to differentiate between true versus false memory. Poor decoding of past experience or the absence of memory, that is, in the absence of the subjective state of recognizing versus perceiving it as novel, it’s harder to decode true experience. Implicit memory, we do also quite poor at decoding past experience, and that countermeasures can very much alter the patterns.
There are many unknowns with respect to actual real-world application, everything I’ve described to you. On the one hand, the front end suggests that you can do actually reasonably well in terms of detecting memories, but they’re under very constrained situations. There are claims in the Sharma case about being able to decode the source of a memory, whether it’s from true experience versus hearing about it, the details and other context. We have no data on that for real-world autobiographical events.
We know nothing about the effects of delay time, consolidation processes, which begin to transform, perhaps, the neural representation of events, interference, practice effects, the role of emotion both during the event itself, as the event is being encoded, but also under these very high-stakes situations where people are remembering under stress very affectively-laden states versus not.
Then there’s the question about the populations. All of our subjects have been young college students, and maybe even a more selective population of college students who happen to be at Stanford. In the remaining few minutes, if I can just say some words about the fMRI lie detection work that we’ve done. I don’t do lie detection. That’s not who I am as a scientist. But the emergence of these cases and efforts to try to draw on fMRI-based lie detection, as evidenced in the courts, motivated me to try to take a look at the literature.
In the final few minutes, maybe I can just say a little bit of my read of where the literature currently stands. One, let me just tell you about, there are different paradigms in the literature, but there are two canonical paradigms that are out there. One is this sort of mock crime paradigm where you’re a culprit. You come into the experiment and you’re instructed to go over to a drawer. You open the drawer, and there’s a ring and a watch in there. You’re instructed to take one of the two items, take it out of the drawer, bring it and put it in a locker amongst your personal belongings.
After that, you go into the scanner, and you’re gonna answer a series of questions. You’re instructed to answer, to deny having taken either the ring or the watch. You deny taking both items. So you’ve got ring subjects, subjects who took the ring, and if they, they say no, it’s argued that this instructed sort of no response is equivalent to a real-world lie, relative to if you had taken the watch, then you’d be telling the truth.
Then there are watch subjects who, for whatever reason, took the watch rather than the ring. There are controlled, neutral items, but that doesn’t matter. The critical comparison is comparison of fMRI signal during these putative lie trials versus putative truth trials. Here’s another paradigm, more of a concealed information, guilty knowledge paradigm.
You come in and I say, “Pick a number between “three and eight. “Write it down on a piece of paper, “stick it in your pocket.” You do that, you now go into the scanner. You’re presented a series of numbers. You’re asked, “Do you have the number x?” and you go through this in a series of trials, and you’re supposed to deny having any of these numbers. numbers. .
The argument is you’re lying when you’re denying having the particular number that you selected and wrote down. Again, the critical comparison are the lie and the truth trials. So if you do, there have been 30-some odd studies that have used these paradigms and other related paradigms in the literature. It’s an unquestionable fact that across those studies, if you do a meta-analysis, here’s an activation likelihood estimate and meta-analysis that my former Ph. D student Ben Hutchinson led, with Martha Farah and Liz Phelps, if you do a meta-analysis, there are regions that are consistently observed on these quote, unquote lie trials relative to truth trials. That’s not debated.
You can see some of the players being (unintelligible) lateral prefrontal cortex, ventral parietal cortex, and what has been in the literature described as anterior cingulate cortex, but really looks more like medial superior frontal, not really ACC, with the caveat there on the right hemisphere. So there’s cross-study consistency. We don’t know,; there’s an interpretation within this literature that these patterns often are reflecting the conflict between two different mental states, the mental state about the information that is the truth, and the mental state that represents your deception or your false statements, that there’s conflict in the system, and you get these conflict-like responses.
Pretty much every study that I know of, there might be a few exceptions subsequent to this review, none of the studies that have been conducted were designed to actually say anything about psychological process and to gain leverage on the psychological-neural processes. They all depend on reverse inference. There are differences that are observed. We assume that there’s conflict in the lie case that isn’t present in the truth case, and that’s what we’re picking up on, and that’s why fMRI lie detection, as a tool, might actually be effective. But we can’t actually draw those conclusions from the data cause none of the studies, as best as I can tell, were designed to say anything about mechanism.
Almost all the data in the literature, also group-level reports. You’re comparing effects on average, either in a lie group versus a truth group, or on average, over lie trials versus truth trials pooled across subjects. There are very few trials that have tried to look at detection of what’s been described as the lie versus the truth state at the individual subject level.
That said, in the handful of studies that have done that, their accuracy rates range, it’s not on an individual trial now, but it’s on average over the lie trial, it’s on average over the truth trials, ranged from 71 to 100%. There are different ways that folks have computed this accuracy. Caveats, though.
First, my read of the literature is the literature is fraught with fundamental design comfounds. Tell you about two experiments that reinforce conclusions I was drawing as I was reading some of these primary papers. First, let’s take this guilty knowledge paradigm. I have you, as a subject, do the exact same thing as the subjects in the lie detection version of the study. But now, after you’ve written down the number, I don’t tell you that this study has anything to do with deception, lie detection, truth detection, whatever. I simply present you a string of numbers in the scanner and have you press a button whenever you see a number on the screen.
What you see here is you’ve got this salient stimulus that you selected and wrote down and put in your pocket, and then you’ve got these other stimuli. On the left are data from actually the same subjects when they’re in the detection, when they’re in the deception version of the experiment, and you see regions in frontal and parietal cortex and on the midline. On the right are the data from the same subjects. You can see the similarity of the evoked responses, comparing the, what I would argue, is the highly salient stimulus to the other stimuli.
So in the absence of any demand to tell the truth or lie, you get differences in activation based on your past experience with the stimulus, and certain stimuli, because of past experience, are particularly salient, others are less salient. Is this sufficient evidence to warrant, is the evidence on the left sufficient evidence to warrant the conclusion that the signals we’re picking up on have something to do with deception versus truth telling?
Here’s a very nice, even more compelling paper from Matthias Gamer’s group. More compelling because it’s got a larger n. They kinda have the same logic, stimulus saliency and memory. For those who are interested and would like to watch a little video of the ring/watch paradigm, Alan Alda, in a PBS show on neuroscience and the law actually participated in that experiment, and, in fact, the Cephos CEO, who scanned Alda’s brain, was able to, in fact, determine what item Alda had taken.
What I was sort of asking Alda, well, tell me about your experience and what was your approach to the task. He said, “Well, yeah, I knew before “I got into the room, before I got into “the experiment, I had planned out “which stimulus I was gonna take and why “I was gonna take it, and I went over it, “and I sort of took it and put it amongst my items.” So this item has not only high salience, differential relevance to the subject, but it also has a rich autobiographical memory attached to it in a way that the other stimulus doesn’t.
So Gamer was trying to go after this, presented these probe items. You’re presented some cards and some money, and you get to pick out a card and a monetary item. These are the probe items that you’re supposed to sort of lie about during scanning, deny having taken. Then there are these other items that you also were asked to memorize. So another card and another monetary item. Now you go into the scanner, and you’re simply viewing these items. There are items that you didn’t select and take that you’re supposed to lie about, and there are items that you also didn’t memorize, but then there were the probe items, which would be the putative lie items, and then there are these target items, which are all just, they’re salient items that you’ve had, you’ve got an experience with, and presumably, have built an episodic memory for.
You’re simply pressing a key. No requirement to lie or tell the truth. You’re pressing the key as you encounter each of these items during scanning. What’s plotted here, I cannot walk over there, I’ll wander out of the light, what’s plotted here on the right. If you’ll just sort of let me do this, what’s plotted here on the right is the fact that when you look at the response in a number of these regions that the fMRI lie detection literature would point to you as potentially relevant for detecting deception, there’s a heightened response to the items that you’re supposed to be, you were initially told you’d be lying about, but now you’re just passively viewing, but these items that all you did was encounter them and tried to memorize them, also elicit above baseline activation relative to these other non-salient, novel items.
Are these signals related to deception or are they signals related to attention stimulus, saliency and memory? It’s hard for me, objectively, to draw a strong conclusion. Second and final, and this will be, I appreciate the time and I appreciate your hanging out here on a Friday afternoon, the second is, again, this issues of countermeasures. Here’s a very nice study by Ganus and colleagues where they have them perform one of these concealed information paradigms, and here on the left in the cooperative condition is a comparison of regions that were more active on the lie trials than the truth trials. They kinda look like the meta-analysis that I told you. There’s a lot of replication here.
They use those data, and they did pattern classification, and they sorted subjects or trials into guilty knowledge, sort of lie trials, and no knowledge, truth trials. They were 100% accurate in doing that sorting of brain patterns into putative lie patterns versus putative truth patterns. They now have subjects adopt a countermeasure strategy, which was to make a micromovement of, I believe it was a toe. That’s all it was, to essentially focus attention on your toe and make this micromovement while performing the exact same task, while, presumably, putatively lying versus telling the truth. Classification performance fell to what was reported as 33%, but you can see the few subjects who were classified as having been able to determine when they were lying versus not, they were very close to the decision bound.
So countermeasures here, again, have a remarkably powerful effect on the brain patterns that are gonna be used to try to decode lying versus truth telling if you accept even that these patterns are something about deception versus truth telling.
There are many factors that remain unanswered, the magnitude of the stakes of being caught in the real world, the stakes are often quite high. Lorne Semrau went into the scanner. I would imagine, I’ve never, I do not know the man, so this is pure speculation, but I would imagine the stakes seemed pretty high. You’re being scanned to collect data that you’re gonna bring forward to a trial in which you’re accused of defrauding the government of $3 million. Seems like pretty high stakes. None of the subjects in any of these experiments were under such levels of stress during scanning.
So what are the effects of stress on these patterns? What are the effects of retention interval? Often when individuals are being scanned, significant time has occurred between the actual event that they’re commenting on, that they’re being probed about, and the moment in which they’re being probed. In the Semrau case, these brain patterns were being collected seven years later in responses to questions that were about his intent seven years before. I can’t remember my intent a week ago why I did certain things. I could not imagine, that’s a very, the effect of retention (unintelligible) there, in terms of affecting memory, let alone other aspects of the signals, is largely large, and we don’t know.
The effects of practice. There’s a little bit of work on practiced lies versus not, but these, again, are within these lab-based paradigms as opposed to rich, real-world scenarios. The effects of the lie’s content, subject population. This came up actually in the Semrau case. There was a Daubert hearing as to whether the fMRI lie detection evidence should be permitted in his defense. One thing that came up is he’s an older individual now, and almost all the data and the literature are on young college kids. We need to understand the effects of subject population on these patterns.
Then the big question, the difference between instructed versus subject-initiated lies, which I think is one of the fundamental issues in this literature. So there’s been, as far as I know, and if others know otherwise, I’d love to hear, there have been, I believe, five efforts to try to enter fMRI-based lie detection into the courts. The most highest profile one being the Semrau case, which went to a full Daubert hearing at the Sixth Circuit, Appeals Court in the Sixth Circuit, and it was denied in the Daubert hearing in the Semrau case, and it was denied in a subsequent case that I testified in in Maryland, and it was denied in the other cases as well.
There was a mini explosion of these cases around 2010 or so. They’re not, as best as I can tell, there haven’t been other efforts in recent years. Now, I wanna make the point that my read of the literature, basically, my view of the literature is we just simply don’t know. I’m not questioning whether fMRI will ever be able to discriminate to detect lies versus truth telling. I just don’t think we know. The literature is fundamentally fraught with confounds.
What is being observed is likely detecting things that are potentially correlated with deception versus truth telling, but even there it’s hard to know because there are things that appear to be driven, there are signals that appear to be driven by the nature of the experimental designs that have been implemented. So to really answer this question, really careful, new science needs to be done. In talking with one thing, and perhaps one of the most rewarding aspects of being part of the Law and Neuroscience Network, is I’ve had an opportunity not only to interact with legal scholars, which I wouldn’t have had opportunities to do so, but also with many federal judges.
When it comes to mental states, if neuroscience can being forward evidence related to at least three classes of mental states, those that relate to eyewitness memory, those that relate to lying versus truth telling, and those that relate to pain, if neuroscience can do that, it would have a big impact in the courts because many decisions are being made based on very noisy data already, and if there were some diagnostic evidence that neuroscience can bring in, that would be great.
So I think time will tell what, ultimately, this form of neuroscience will permit, but I do think, and this’ll be a call to my neuroscience colleagues, I shied away from wanting to take this on for a while, but it’s happening. It’s happening in many different ways. Neuroscience data are getting into the courts. I was speaking with (unintelligible) about ways in which diffusion tensor maging is now regularly landing in the courts, and functional imaging is increasingly landing in the courts. While it’s not, I think, at least for me, it’s not something that I naturally, personality-wise orient to, I do think we need to step forward and do our best to weigh in on what the science can and can’t do, and to the extent that we can play a role in producing better science, that’s even better.
There was a very generous introduction about productivity over the years. I’m long enough in the tooth now to look back and appreciate that what my lab has achieved is really the byproducts and the work of the many wonderful graduate students and post-docs and talented lab coordinators I’ve had over the years.
This work is no different. It stems from tremendous efforts by Jesse Rissman at UCLA, Melina Uncapher, who’s been at my lab for a while now and will be moving up to UCSF, as well as my collaborator in Stanford Law School, Hank Greely, a number of research assistants, Tiffany and Tyler.
And then the fMRI lie detection review was heavily benefited from Ben’s conduction of the meta-analysis, and then collaborative work with Martha and Liz. So I’d like to thank you for your attention and for being here today.
(applause)
Follow Us