An established probability theory for hair comparison? “is not — and never was”

December 30, 2012

(This article was originally published at Error Statistics Philosophy » Statistics, and syndicated at StatsBlogs.)

Forensic Hair red

Hypothesis H: “person S is the source of this hair sample,” if indicated by a DNA match, has passed a more severe test than if it were indicated merely by a visual analysis under a microscopic. There is a much smaller probability of an erroneous hair match using DNA testing than using the method of visual analysis used for decades by the FBI.

The Washington Post reported on its latest investigation into flawed statistics behind hair match testimony. “Thousands of criminal cases at the state and local level may have relied on exaggerated testimony or false forensic evidence to convict defendants of murder, rape and other felonies”. Below is an excerpt of the Post article by Spencer S. Hsu.

I asked John Byrd, forensic anthropologist and follower of this blog, what he thought. It turns out that “hair comparisons do not have a well-supported weight of evidence calculation.” (Byrd).  I put Byrd’s note at the end of this post.

Washington Post Published: December 22

In its April investigation, The Post found that Justice Department officials failed to tell many defendants or their attorneys of questionable evidence and that the results of the review remained largely secret.

… How often might the hairs of different people appear to match? The truth is that there was no scientific way to know….

Before DNA profiling, testimony of a hair match was a powerful way for prosecutors to boil down an ambiguous case to a single, incriminating piece of physical evidence left at the scene of a crime…

But The Post’s investigation earlier this year showed how agents, prosecutors or both sometimes exaggerated the significance of the evidence they had.

For example, in a 1980 Indiana robbery case, one agent told jurors that he was unable to distinguish between the hair of different people just once in 1,500 cases he had analyzed.

In one of the District cases, federal prosecutors claimed that the agent had been unable to tell hair samples apart only “eight or 10 times in the past 10 years, while performing thousands of analyses.”

In another, the prosecutor said in closing arguments, “There is one chance, perhaps for all we know, in 10 million that it could [be] someone else’s hair.” That defendant was declared innocent this year.

The problem is, as an expert peer review panel wrote in Melnikoff’s case, “There is not — and never was — a well established probability theory for hair comparison.”

As noted in 2009 by the chief of the FBI hair team, the proper answer to the question of how often hairs from different people might match is, “We do not know.”

Vague standards

The FBI has known for decades that hair found at a crime scene is a valuable piece of evidence. Before DNA testing, agents would use a microscope to compare the evidence with a sample of hair from a suspect.

A visual analysis can tell animal hairs from human hairs; human hairs by race and body part; whether hairs were dyed or otherwise treated; and how hairs were removed from the body. Visual comparison, at its best, also can accurately narrow the pool of criminal suspects to a class or group or definitively rule out a person as a possible source.

But it was not possible to declare an absolute match. So the FBI had a problem. Hair comparisons could yield good evidence. But agents struggled to explain to a jury how good.

Morris Samuel “Sam” Clark was the head of the FBI’s hair unit when it began training state and local analysts in 1973. He said he long believed that examiners could trace hairs from a crime scene to a particular person with a high degree of probability — even though there is no scientific proof that is possible…

The FBI’s training regimen, which required agents to compare hairs side-by-side under high-powered microscopes for a year before working on live cases, gave lab veterans confidence that they could tell the difference between individuals’ hairs just as an ordinary person could distinguish between their faces.

They embraced a set of vague standards. In written lab reports, FBI agents would include the caveat that hair examination was not a basis for positive identification.

In court, however, they could suggest that it would be highly unlikely for an examiner’s match to be wrong. The bureau left it up to individual labs and examiners to explain matters to jurors. Agents were trained to say that in their “personal experience” they had rarely seen hairs from different people that looked alike.

That evolved into jurors’ hearing numbers that had a huge impact even if they lacked scientific grounding. After a slaying in Tennessee in 1980, an FBI agent testified in a capital case that there was one chance in 4,500 or 5,000 that a hair came from someone other than the suspect.

But as experts from around the world would later note, the FBI-taught answer was misleading. In reality, FBI examiners did not compare every hair to every other hair they had ever examined. They simply compared crime-scene hairs and hair samples from individuals relevant in each case.

Examiners kept no “database” of samples, which went back to police evidence files. And differences between hairs are so fine that a person can generally keep only a handful of hairs in mind at any time.

“The claim you could keep all those hairs in your head and sort them in your mind, that would be hard to do,” said Mark R. Wilson, a 23-year FBI veteran who helped develop DNA testing for hair in 1996. “After about three or four [hairs], it gets confusing.”

The claim was called into question at an international conference hosted by the FBI in 1985, but the training was not overhauled for at least a dozen more years…

Robillard, the former hair unit chief, said that he always waited for a defense attorney to challenge his claims about the accuracy of hair analysis but that neither they nor judges usually caught the logical sleight of hand.

“You would expect a defense attorney to say, ‘Wait — are you, Robillard, saying you compared every person’s hair to every other one?’ That’s the screaming question for cross-examination,” Robillard said. “I can’t off the top of my head remember ever having a defense attorney say that.”

….In 2004, Melnikoff lost his crime lab job in Washington because of errors whose discovery led to three overturned convictions in Montana. One of those cases was the child rape conviction of Jimmy Ray Bromgard, who served more than 15 years in prison before DNA tests showed he didn’t commit the crime.

At Bromgard’s 1987 trial, Melnikoff said he found head and pubic hairs “microscopically indistinguishable” from Bromgard’s, and he told the jury that there was less than one chance in 10,000 of a coincidence. He based this assertion on his case experience, multiplying by 100 the 1 in 100 frequency with which he claimed to have seen head and pubic hairs he could not tell apart.

After Bromgard was exonerated in 2002, a five-member panel that included Deadman said Melnikoff made “egregious misstatements not only of the science of forensic hair examinations but also of genetics and statistics.”

The full article is here.


Comment (from an e-mail) from John Byrd, forensic analyst:

It is a well-known problem in forensics that has proven difficult for the traditional labs to get past.

At the root of it is the tradition of hiring non-scientists into the technical positions in the labs. They tended to be agents. That explains a lot about misinterpretation of the weight of evidence and the inability to explain the import of lab findings in court.

I should note that we often talk of `weight of evidence` in forensic science. It is addressed by appeal to the frequency of a spurious match in repeated applications of a test. The larger the probability of a random match the lower the weight ascribed to the evidence. DNA is useful to the extent that the probability of someone else sharing the profile is low.

Hair comparisons do not have a well-supported weight of evidence calculation and we suspect if it did it would not be comparable to DNA, fingerprints, or other tests that are more reputable in the scientific community.

Clarified to mean: Hair comparison when made visually (under microscopes) do not have a well-supported weight of evidence calculation and we suspect if it did (i.e., if we checked the rate of false visual matches) it would not be comparable to DNA, fingerprints, or other tests that are more reputable in the scientific community.”

I am sure you can see the direct relationship between the weight of evidence and severity.

Note that the last person interviewed– Max Houck– is an anthropologist and was the first scientist (non-agent) they hired to do trace evidence. I know Max very well and he has distinguished himself in the scholarly world by pushing science fundamentals to the forensic disciplines. The first paper I saw Max present many years ago at the American Academy of Forensic Sciences was on the philosophy of science that underpins our forensic reasoning. (Forensics were largely born out of anthropology at turn of last century.) Just last year, he presented his ideas that the philosophy underlining forensic science is a subsidiary of historical sciences (archaeology, paleontology, astronomy, etc).

Forensics has turned a corner in any event. The accrediting bodies now follow ISO standards and require science degrees and training of the analysts. The National Academy of Sciences put out a scathing critique of forensics in America in 2009 that recommended that all analysts be trained and mentored to do scientific research before they become analysts.

The FBI is suffering lingering effects of the past… You might be pleased to hear that last time I saw the FBI lab Director in a meeting, he was all abuzz about wanting to hire a full time statistician to work with the staff. That was last year, so I will find out this year how that worked out. Statistics and scientific reasoning cannot be separated. John Byrd

John E. Byrd, Ph.D D-ABFA, 
Laboratory Director and Forensic Anthropologist


Filed under: Severity, Statistics

Please comment on the article here: Error Statistics Philosophy » Statistics

Tags: ,