Search This Blog

Showing posts with label word choice and deception. Show all posts
Showing posts with label word choice and deception. Show all posts

Deception Detection of Text, or Language or Word Choice in Texting.

Deception Detection of Text, Language and Word Choice in Texting.


The article below discusses some of the methods currently used and describes a new product for text analysis. Communication Experts call in content analysis or rhetorical analysis. Psychologists, law enforcement and the criminal justice system call it statement analysis or forensic statement analysis.
Patti Wood, MA, Certified Speaking Professional - The Body Language Expert. For more body language insights go to her website at http://PattiWood.net. Also check out the body language quiz on her YouTube Channel at http://youtube.com/user/bodylanguageexpert.
www.freshpatents.com/Method-and-system-for-the-automatic-recognition-of-deceptive-language-dt20070111ptan20070010993.php

Method and system for the automatic recognition of deceptive language

Abstract: A system for identifying deception within a text includes a processor for receiving and processing a text file. The processor includes a deception indicator tag analyzer for inserting into the text file at least one deception indicator tag that identifies a potentially deceptive word or phrase within the text file, and an interpreter for interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the text file and generating deception likelihood data based upon the density or distribution of potentially deceptive word or phrases within the text file. A method for identifying deception within a text includes the steps of receiving a first text to be analyzed, normalizing the first text to produce a normalized text, inserting into the normalized text at least one part-of-speech tag that identifies a part of speech of a word associated with the part-of-speech tag, inserting into the normalized text at least one syntactic label that identifies a linguistic construction of one or more words associated with the syntactic label, inserting into the normalized text at least one deception indicator tag that identifies a potentially deceptive word or phrase within the normalized text, interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the normalized text, and generating deception likelihood data based upon the density or frequency of distribution of potentially deceptive word or phrases within the normalized text. (end of abstract)
CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims priority from a U.S. provisional patent application Ser. No. 60/635,306, filed on Dec. 10, 2004, which is herein incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

[0002] This invention relates to the application of Natural Language Processing (NLP) to the detection of deception in written texts.

[0003] The critical assumption of all deception detection methods is that people who deceive undergo measurable changes--either physiological or behavioral. Language-based deception detection methods focus on behavioral factors. They have typically been investigated by research psychologists and law enforcement professionals working in an area described as "statement analysis" or "forensic statement analysis". The development of statement analysis techniques has taken place with little or no input from established language and speech technology communities.

[0004] The goal of these efforts has been twofold. Research projects, primarily conducted by experimental psychologists and management information systems groups, investigate the performance of human subjects in detecting deception in spoken and written accounts of a made up incident. Commercial and government (law enforcement) efforts are aimed at providing a technique that can be used to evaluate written and spoken statements by people suspected of involvement in a crime. In both cases, investigators look at a mix of factors, e.g. factual content, emotional state of the subject, pronoun use, extent of descriptive detail, coherence. Only some of these are linguistic. To date, the linguistic analysis of these approaches depends on overly simple language description and lacks sufficient formal detail to be automated--application of the proposed techniques depends largely on human judgment as to whether a particular linguistic feature is present or not. Moreover none of the proposed approaches bases its claims on examination of large text or speech corpora.

[0005] Two tests for measuring physiological changes are commercially available--polygraphs and computer voice stress analysis. Polygraph technology is the best established and most widely used. In most cases, the polygraph is used to measure hand sweating, blood pressure and respiratory rate in response to Yes/No questions posed by a polygraph expert. The technology is not appropriate for freely generated speech. Fluctuations in response are associated with emotional discomfort that may be caused by telling a lie. Polygraph testing is widely used in national security and law enforcement agencies but barred from many applications in the United States, including court evidence and pre-employment screening. Computer voice stress analysis (CVSA) measures fundamental frequency (FO) and amplitude values. It does not rely on Yes/No questions but can be used for the analysis of any utterance. The technology has been commercialized and several PC-based products are available. Two of the better known CVSA devices are the Diogenes Group's "Lantern" system and the Trustech "Vericator". CVSA devices have been adopted by some law enforcement agencies in an effort to use a technology that is less costly than polygraphs as well as having fewer detractors. Nonetheless, these devices do not seem to perform as well as polygraphs. The article Investigation and Evaluation of Voice Stress Analysis Technology (D. Haddad, S. Walter, R. Ratley and M. Smith, National Institute of Justice Final Report, Doc. #193832 (2002)) provides an evaluation of the two CVSA systems described above. The study cautions that even a slight degradation in recording quality can affect performance adversely. The experimental evidence presented indicates that the two CVSA products can successfully detect and measure stress but it is unclear as to whether the stress is related to deception. Hence their reliability for deception detection is still unproven.

[0006] Current commercial systems for detection of deceptive language require an individual to undergo extensive specialized training. They require special audio equipment and their application is labor-intensive. Automated systems that can identify and interpret deception cues are not commercially available.

BRIEF SUMMARY OF THE INVENTION

[0007] Motivated by the need for a testable and reliable method of identifying deceptive language, the present method detects deception by computer analysis of freely generated text. The method accepts transcribed or written statements and produces an analysis in which portions of the text are marked as highly likely to be deceptive or highly likely to be truthful. It provides for an automated system that can be used without special training or knowledge of linguistics.

[0008] A system for identifying deception within a text according to the present invention includes a processor for receiving and processing a text file, wherein the processor has a deception indicator tag analyzer for inserting into the text file deception indicator tags that identify potentially deceptive words and/or phrases within the text file. The processor also includes an interpreter for interpreting the deception indicator tags to determine a distribution of potentially deceptive word or phrases within the text file. The interpreter also generates deception likelihood data based upon the distribution of potentially deceptive word or phrases within the text file. The system may further include a display for displaying the deception likelihood data. The processor may further include a receiver for receiving a first text to be analyzed, a component for normalizing the first text to produce a normalized text, a component for inserting into the normalized text part-of-speech tags that identify parts of speech of word associated with the part-of-speech tags, and a component for inserting into the normalized text syntactic labels that identify linguistic constructions of one or more words associated with each syntactic label. The normalized text including the part-of-speech tag(s) and the syntactic label(s) is provided to the deception indicator tag analyzer.

[0009] In one embodiment of the system according to the present invention, the deception indicator tag analyzer inserts the deception indicator tag into the normalized text based upon words or phrases in the normalized text, part-of-speech tags inserted into the normalized text, and syntactic labels inserted in the normalized text. The deception indicator tags may be associated with a defined word or phrase or associated with a defined word or phrase when used in a defined linguistic context. Also, the interpreter may calculate a proximity metric for each word or phrase in the text file based upon the proximity of the word or phrase to a deception indicator tag such that the proximity metric is used to generate the deception likelihood data. The interpreter may also calculate a moving average metric for each word or phrase in the text file based upon the proximity metric of the word or phrase such that the moving average metric is used to generate the deception likelihood data. The calculation of the moving average metric for each word or phrase in the text file may be adjusted by a user of the system to alter the deception likelihood data as desired by the user.

[0010] A method for identifying deception within a text in accordance with the present invention includes the steps of: receiving a first text to be analyzed; normalizing the first text to produce a normalized text; inserting into the normalized text at least one part-of-speech tag that identifies a part of speech of the word associated with each part-of-speech tag; inserting into the normalized text at least one syntactic label that identifies a linguistic construction of one or more words associated with the syntactic label; inserting into the normalized text at least one deception indicator tag that identifies a potentially deceptive word or phrase within the normalized text, interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the normalized text; and generating deception likelihood data based upon the distribution of potentially deceptive words or phrases within the normalized text.

[0011] While multiple embodiments are disclosed, still other embodiments of the present invention will become apparent to those skilled in the art from the following detailed description, which shows and describes illustrative embodiments of the invention. As will be realized, the invention is capable of modifications in various obvious aspects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not restrictive.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] FIG. 1 is a schematic diagram of the components of a system for one embodiment of the invention.

[0013] FIG. 2 is a flowchart showing the overall processing of text in one embodiment of the invention.

[0014] FIG. 3 is a diagram showing how text is marked for display after analysis for deception.

[0015] FIG. 4 is a diagram showing an alternative for how text is marked for display after analysis for deception.

DETAILED DESCRIPTION

I. Overview

[0016] A core notion of the method is that deceptive statements incorporate linguistic attributes that are different from those of non-deceptive statements. It is possible to represent these attributes formally as a method of linguistic analysis that can be verified by empirical tests.

[0017] The method begins with certain widely accepted techniques of corpus linguistics and automated text analysis. The deception detection component is based on a corpus of "real world" texts, for example, statements and depositions from court proceedings and law enforcement sources which contain propositions that can be verified by external evidence. Linguistic analysis is accomplished by a combination of statistical methods and formal linguistic rules. A novel user interface interprets results of the analysis in a fashion that can be understood by a user with no specialized training.

[0018] A method in accordance with the present invention is implemented as an automated system that incorporates the linguistic analysis along with a method of interpreting the analysis for the benefit of a system user. A typical system user may be a lawyer, a law-enforcement professional, an intelligence analyst or any other person who wishes to determine whether a statement, deposition or document is deceptive. Unlike polygraph tests and similar devices that measure physiological responses to Yes/No questions, the method applies to freely generated text and does not require specialized or intrusive equipment. Thus it can be used in a variety of situations where statements of several sentences are produced.

[0019] The system builds on formal descriptions developed for linguistic theory and on techniques for automated text analysis developed by computational linguists. The analysis advances the state of the art in natural language processing, because deception detection is a novel application of NLP. In addition the system compensates for the inability of humans to recognize deceptive language at a rate little better than chance.

The Use of the Word Literally as an Indication of Deception

My friends and I get a kick out of language quirks and fun ways makerters and advertisers describe their products,incrediants and benefits. In addition, we have been talking about the overuse of the word Literally. This made for a very fun time Wednesday when a friend and I went to a makeup class. My friend was a senior editor at NASA and a Senior Partner at Towers Perin in charge of Communication.
I was holding in my laughter wishing I could see her face every time the makeup artist told us that another product could "literally" make us more gorgeous. And it was," litterally" going deep into our pores. I especially got a hoot from his long description of the minerals in the deep sea water. "This makeup contains the deep sea water taken from water over a mile down in the oceans of Japan." "It takes "literally" thousands of years for the sea water to collect the minerals." And those deep sea minerals "literally" bond naturally to your skin and make it beautiful." (Smile) It was a wonderful example of how some people use the word "literally" to describe things they don't truly understand but want to appear confident discussing. It is not an indication of an overt deception. That is, they are not thinking, "I am going to lie", but more a subconscious desire to look knowledgeable.

I think BP oil is accelerating the mineral buildup in deep sea water. I mean that "Literally."
Who knew that BP Oil was just helping us become more beautiful?


Patti Wood, MA, Certified Speaking Professional - The Body Language Expert Website http://www.PattiWood.net I have a new quiz on my YouTube Channel at http://www.youtube.com/user/bodylanguageexpert Check it out!

How can you tell if he or she is lying? Listening to the words.

In further media interviews the governor of South Carolina, Mark Sanford, uses a technique I now call the 'redefining tactic' (after former president Clinton's famous, "I did not have sexual relations with that woman.") Governor Sanford says he "crossed lines" with a handful of women other than his mistress. He says he "never crossed the ultimate line" with anyone but Maria Belen Chapur. During an emotional interview with the Associated Press at his statehouse office on Tuesday Sanford said that during the encounters with other women he "let his guard down" with some physical contact but "didn't cross the sex line." Sanford said the casual encounters happened on trips he'd taken outside the US with male friends to "blow off steam". He alleges they occurred while he was married but before he met Chapur. So to be clear--it doesn't count as sex if you don't cross the sex line and you are just blowing off steam. UCkkk!
As a media coach, I know the importance of using the right words. In this case he is choosing phrases that make him sound like a college frat boy. This is such a horrible story for his family to have to hear about. Please just apologize clearly and briefly and move on.

Click Here to read the AP article about Governor Sanford's apology.
For information of public seminars Patti is giving on body language and deception detection in Philadeliphia though Paliani consulting please contact us or go directly to the Paliani site.

What words do liars use? Governor Sanford's apology

Just like body language cues leak out whether or not someone is lying, Freudian “slips” in language can reveal underlying anxiety, guilt, or arousal. Research from as far back as the Mehrabian (1971) has reported higher numbers of speech errors in deceivers than in non-deceivers. Linguistic style analysis reveals how the deceptive message is conveyed as compared to a truthful message (Pennebaker & King, 1999). Based on earlier work, some of the most reliable markers of linguistic styles are the use of content-free words, such as articles, pronouns, prepositions, auxiliary verbs, conjunctions, and emotionally toned words. See my last post for other specific examples.
So Gary Condit when talking about his wife and a stewardess he had an affair with used the pronouns she and her to refer to both woman rather than using their names or stating his personal relationship with them. Recently Governor Sanford used the term "those boys" instead of my sons or using the names or his sons during his apology, though he referred to his staff by name, by using such an impersonal label for his sons he idicated his desire to disconect from his responsiblity as a father. Govenor Sanford also never actually said, "I am sorry." Instead he asked for forgiveness which is something I have noticed politicians and celebrities often choose to do in their interviews with the press. As a body language expert and media coach I coach for my clients to use the words, "I am sorry." "I apologize." "I made a mistake." For the nonverbal read of Governor Sanford's apology, check last Friday's post. And for the slips of the tongue used by Michael Vick in his apology for hosting dog fights go to my website.