Do we have a linguistic DNA?

Here is your starter for ten:

Which is the best definition of the term forensic?

A. pertaining to crime

B. pertaining to death

C. pertaining to medicine

D. pertaining to the law

If you chose D, give yourself a pat on the back. I put this question in because I feel that the word forensic is subject to a great deal of misunderstanding. For me the correct definition is “belonging to, used in, or suitable to courts of judicature”. It comes from the Latin forēnsis, meaning “of or before the forum”. In Roman times, criminal charges were heard before a group of citizens in the forum. Both the accused and the accuser would give speeches based on their sides of the story. So forensic is an adjective meaning court or legal. However now there is now a second meaning: “relating to or dealing with the application of scientific knowledge to legal problems”. It has now become so closely associated with the scientific field that many dictionaries include this meaning that equates the word “forensics” with “forensic science”. Call me a pedant, but I think the first definition should be sufficient. A forensic pathologist is one who works in the legal system. You can have forensic anthropologists, archaeologists, psychologists, psychiatrists, linguists, accountants, nurses and engineers. What they have in common is that they all work in the legal system.

For thirteen seasons the Las Vegas Crime Scene Investigators have been using their state-of-the art gadgets to solve grisly murders in under forty minutes. Such is the influence of the series that we have the CSI effect – juries unwilling to convict because they are unimpressed by the forensic evidence presented by the prosecution. Today I am going to be looking at a different aspect – forensics and language. I will begin with a branch of forensic linguistics known as forensic phonetics. I became especially interested in this subject after listening to a programme about it on the BBC.

What do forensic phoneticians do? We can glean an awful lot of information about a speaker from their voice. Obviously it’s usually possible to tell very quickly whether a speaker is male or female by listening to the overall pitch of the voice. Combining phonetic and sociolinguistic analysis of a voice can aid in establishing information about the speaker lives, their age, and other information about their background. Phoneticians can make speech visible on a computer screen, measuring its component parts – the duration of sounds, how long they are, their frequency, intensity, loudness and pitch. They can also decipher the content of difficult recordings. This can be useful when you have bad quality sources or unusual pronunciation patterns. Finally we have speaker identification. This could be by a witness. They may be asked to identify the voice in a voice parade or line-up. Alternatively, a professional may use a comparative phonetic analysis to try to establish if a suspect is the person talking on a criminal recording. The Holy Grail in this field would be some kind of automated system to recognise voices, but this is still a long way off.

Now I want to look at a practical example. Wearside Jack was the nickname given to John Samuel Humble, who pretended to be the Yorkshire Ripper in a number of hoax communications in 1978-79. As well as sending three letters taunting the authorities for their inability to catch him, Humble also did an audio-message spoken in a Wearside accent:

I’m Jack. I see you are still having no luck catching me. I have the greatest respect for you George, but Lord! You are no nearer catching me now than four years ago when I started. I reckon your boys are letting you down, George. They can’t be much good, can they?

As a consequence the investigation shifted away from the Leeds/Bradford area. Thus Peter Sutcliffe, the real killer, was able to get away with murder until he was finally apprehended in January 1981. The phoneticians situated the accent in the Castletown area of Sunderland. In an exercise in futility, 40,000 potential suspects were investigated and the police used billboards, full page ads in local and national newspapers and ‘Dial-the-Ripper’ hotlines to try to get their man. This campaign alone would cost one million pounds. This was not the police’s finest hour. It also shows a danger of this type of evidence. It is fine to use it as one line of enquiry, but you shouldn’t use to exclude any other possibility. In fact, the phoneticians did their job well. In 2005 the long arm of the law finally caught up with Humble. One of the envelopes he had used was traced to him through DNA, and in 2006 he was sentenced to eight years for perverting the course of justice. When he was arrested, Humble was living in the Ford Estate suburb of Sunderland and he told the police he had gone to school in the Castletown area.

Forensic phonetics is, as I have already mentioned, part of forensic linguistics. This is the study, analysis and measurement of language in the context of crime, judicial procedure, or legal disputes. The term first appeared in 1968 when Jan Svartvik, a Swedish professor of linguistics, used it in an analysis of statements by Timothy John Evans, the man wrongly executed for the Rillington Place murders. Svartvik was able to show that the confession statement of Evans was probably not authentic. The part of the statement where Evans confessed to the murders was clearly different from the rest of the transcript.

What do forensic linguists do?

The forensic linguist may be called upon to analyse a very wide variety of documents including:

• anonymous letters

• hate mail

• mobile phone texts in missing person’s cases

• online communications

• ransom demands

• suicide notes – are they genuine?

• verballing – claims by defendants that their statements were altered or even invented by police officers

• wills

A key concept for much of what they do is the linguistic fingerprint. The idea is that each human being uses language differently, and that this difference between people involves a collection of unconscious predilections which makes each speaker or writer unique. Every individual uses languages differently and this difference can be identified just like a fingerprint. What they analyse are the function words – which vs. that, but vs. nevertheless etc. These are the words we tend to think about the least, but with enough text to play with and by studying the frequency and distribution of these words we can know if the writing conforms to samples that we know were written by the author.

The practice of analysing documents for authenticity or to identify the author is not new. In 1439 Lorenzo Valla, the Italian humanist, was able to prove that the Donation of Constantine, a Roman imperial decree by which the emperor Constantine I supposedly transferred authority over Rome and the western part of the Roman Empire to the Pope, was a forgery. He did it by comparing the Latin with that used in authentic 4th Century documents. His conclusion was that it had probably been composed in the 8th century.

Let’s look at some more recent cases. In a famous case in 1982 Robert Eagleson, a professor of English at Sydney University was called in to examine a suicide note. The particular note was said to have been written by Janice Pollett, who had disappeared from her home. As well as examining the farewell note, Eagleson also had access to specimen letters by both Mr. and Mrs. Pollett. After looking at the grammar, spelling and punctuation Eagleson’s conclusion was that the letter had been written by the husband. Mervyn Pollett would later confess to the crime.

Forensic linguistics is also being used on more modern methods of communication. According to Wikipedia, on June 30, 2010, Paul Ceglia filed a lawsuit against Mark Zuckerberg claiming 84% ownership of Facebook and seeking financial compensation. What is interesting about this case is the use of emails. Ceglia described email exchanges with Zuckerberg, between July 2003 and July 2004 in which the two discussed the Facebook project, including ways to generate income from it. Zuckerberg’s legal team hired the linguist Gerald McMenamin, emeritus professor of linguistics at California State University, Fresno. He studied the e-mails, and found 11 different style markers, across punctuation, spelling and grammar, and concluded that Zuckerberg could not have been the author.

On October 26, 2012, federal agents arrested Ceglia and charged him with fabricating evidence in relation to his suit against Zuckerberg. Ceglia was charged with one count of mail fraud and one count of wire fraud, each of which carries a maximum penalty of 20 years in prison. In the two pages Ceglia produced for his lawsuit there were a number of inconsistencies such as differences in margins, spacing and columns. When the investigators searched Harvard’s email servers they could find no evidence of the messages Ceglia had mentioned in his lawsuit. And when they looked at his hard drive they found that Ceglia had falsified existing records to bolster his claim. I find this evidence more conclusive than that of the linguist. I agree with the comments by Ben Zimmer in the New York Times:

Many linguists, however, would challenge the notion that the “fingerprint,” a supposedly unique identifier, can be metaphorically applied to writing. Surely we all have our own written quirks and mannerisms — I tend to overuse em-dashes, for instance. But there is just too much internal variability in any person’s body of writing to imagine that we could take just a bit of it — a handful of e-mails — and recognize some sort of linguistic DNA. That is all the more true when it comes to digital genres like text messages, instant messages and tweets, full of unusual spellings and innovative abbreviations, and often sensitive to the type of device we’re using.

I find forensic linguistics absolutely fascinating. There are some really smart people in the field. And I really think it holds a great deal of promise for the future. I am particularly interested in the use of computers and the quantitative approach. But we need cautious about its adoption; forensic linguists still have a long way to go to convince courts of the reliability and validity of their methodology. Language, both written and spoken is not like your DNA or fingerprint, both of which never vary. Therefore, while Forensic Linguistics has its place in the courtroom, it is usually a much more useful for the defence. It is easier to prove that a defendant couldn’t have written something than to demonstrate that nobody else could have been written it. This type of evidence should not be used as the sole piece of evidence for the prosecution. Now I wonder what a forensic linguist would make of my oeuvre…


