Search This Blog

Deception Detection of Text, or Language or Word Choice in Texting.

Deception Detection of Text, Language and Word Choice in Texting.


The article below discusses some of the methods currently used and describes a new product for text analysis. Communication Experts call in content analysis or rhetorical analysis. Psychologists, law enforcement and the criminal justice system call it statement analysis or forensic statement analysis.
Patti Wood, MA, Certified Speaking Professional - The Body Language Expert. For more body language insights go to her website at http://PattiWood.net. Also check out the body language quiz on her YouTube Channel at http://youtube.com/user/bodylanguageexpert.
www.freshpatents.com/Method-and-system-for-the-automatic-recognition-of-deceptive-language-dt20070111ptan20070010993.php

Method and system for the automatic recognition of deceptive language

Abstract: A system for identifying deception within a text includes a processor for receiving and processing a text file. The processor includes a deception indicator tag analyzer for inserting into the text file at least one deception indicator tag that identifies a potentially deceptive word or phrase within the text file, and an interpreter for interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the text file and generating deception likelihood data based upon the density or distribution of potentially deceptive word or phrases within the text file. A method for identifying deception within a text includes the steps of receiving a first text to be analyzed, normalizing the first text to produce a normalized text, inserting into the normalized text at least one part-of-speech tag that identifies a part of speech of a word associated with the part-of-speech tag, inserting into the normalized text at least one syntactic label that identifies a linguistic construction of one or more words associated with the syntactic label, inserting into the normalized text at least one deception indicator tag that identifies a potentially deceptive word or phrase within the normalized text, interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the normalized text, and generating deception likelihood data based upon the density or frequency of distribution of potentially deceptive word or phrases within the normalized text. (end of abstract)
CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims priority from a U.S. provisional patent application Ser. No. 60/635,306, filed on Dec. 10, 2004, which is herein incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

[0002] This invention relates to the application of Natural Language Processing (NLP) to the detection of deception in written texts.

[0003] The critical assumption of all deception detection methods is that people who deceive undergo measurable changes--either physiological or behavioral. Language-based deception detection methods focus on behavioral factors. They have typically been investigated by research psychologists and law enforcement professionals working in an area described as "statement analysis" or "forensic statement analysis". The development of statement analysis techniques has taken place with little or no input from established language and speech technology communities.

[0004] The goal of these efforts has been twofold. Research projects, primarily conducted by experimental psychologists and management information systems groups, investigate the performance of human subjects in detecting deception in spoken and written accounts of a made up incident. Commercial and government (law enforcement) efforts are aimed at providing a technique that can be used to evaluate written and spoken statements by people suspected of involvement in a crime. In both cases, investigators look at a mix of factors, e.g. factual content, emotional state of the subject, pronoun use, extent of descriptive detail, coherence. Only some of these are linguistic. To date, the linguistic analysis of these approaches depends on overly simple language description and lacks sufficient formal detail to be automated--application of the proposed techniques depends largely on human judgment as to whether a particular linguistic feature is present or not. Moreover none of the proposed approaches bases its claims on examination of large text or speech corpora.

[0005] Two tests for measuring physiological changes are commercially available--polygraphs and computer voice stress analysis. Polygraph technology is the best established and most widely used. In most cases, the polygraph is used to measure hand sweating, blood pressure and respiratory rate in response to Yes/No questions posed by a polygraph expert. The technology is not appropriate for freely generated speech. Fluctuations in response are associated with emotional discomfort that may be caused by telling a lie. Polygraph testing is widely used in national security and law enforcement agencies but barred from many applications in the United States, including court evidence and pre-employment screening. Computer voice stress analysis (CVSA) measures fundamental frequency (FO) and amplitude values. It does not rely on Yes/No questions but can be used for the analysis of any utterance. The technology has been commercialized and several PC-based products are available. Two of the better known CVSA devices are the Diogenes Group's "Lantern" system and the Trustech "Vericator". CVSA devices have been adopted by some law enforcement agencies in an effort to use a technology that is less costly than polygraphs as well as having fewer detractors. Nonetheless, these devices do not seem to perform as well as polygraphs. The article Investigation and Evaluation of Voice Stress Analysis Technology (D. Haddad, S. Walter, R. Ratley and M. Smith, National Institute of Justice Final Report, Doc. #193832 (2002)) provides an evaluation of the two CVSA systems described above. The study cautions that even a slight degradation in recording quality can affect performance adversely. The experimental evidence presented indicates that the two CVSA products can successfully detect and measure stress but it is unclear as to whether the stress is related to deception. Hence their reliability for deception detection is still unproven.

[0006] Current commercial systems for detection of deceptive language require an individual to undergo extensive specialized training. They require special audio equipment and their application is labor-intensive. Automated systems that can identify and interpret deception cues are not commercially available.

BRIEF SUMMARY OF THE INVENTION

[0007] Motivated by the need for a testable and reliable method of identifying deceptive language, the present method detects deception by computer analysis of freely generated text. The method accepts transcribed or written statements and produces an analysis in which portions of the text are marked as highly likely to be deceptive or highly likely to be truthful. It provides for an automated system that can be used without special training or knowledge of linguistics.

[0008] A system for identifying deception within a text according to the present invention includes a processor for receiving and processing a text file, wherein the processor has a deception indicator tag analyzer for inserting into the text file deception indicator tags that identify potentially deceptive words and/or phrases within the text file. The processor also includes an interpreter for interpreting the deception indicator tags to determine a distribution of potentially deceptive word or phrases within the text file. The interpreter also generates deception likelihood data based upon the distribution of potentially deceptive word or phrases within the text file. The system may further include a display for displaying the deception likelihood data. The processor may further include a receiver for receiving a first text to be analyzed, a component for normalizing the first text to produce a normalized text, a component for inserting into the normalized text part-of-speech tags that identify parts of speech of word associated with the part-of-speech tags, and a component for inserting into the normalized text syntactic labels that identify linguistic constructions of one or more words associated with each syntactic label. The normalized text including the part-of-speech tag(s) and the syntactic label(s) is provided to the deception indicator tag analyzer.

[0009] In one embodiment of the system according to the present invention, the deception indicator tag analyzer inserts the deception indicator tag into the normalized text based upon words or phrases in the normalized text, part-of-speech tags inserted into the normalized text, and syntactic labels inserted in the normalized text. The deception indicator tags may be associated with a defined word or phrase or associated with a defined word or phrase when used in a defined linguistic context. Also, the interpreter may calculate a proximity metric for each word or phrase in the text file based upon the proximity of the word or phrase to a deception indicator tag such that the proximity metric is used to generate the deception likelihood data. The interpreter may also calculate a moving average metric for each word or phrase in the text file based upon the proximity metric of the word or phrase such that the moving average metric is used to generate the deception likelihood data. The calculation of the moving average metric for each word or phrase in the text file may be adjusted by a user of the system to alter the deception likelihood data as desired by the user.

[0010] A method for identifying deception within a text in accordance with the present invention includes the steps of: receiving a first text to be analyzed; normalizing the first text to produce a normalized text; inserting into the normalized text at least one part-of-speech tag that identifies a part of speech of the word associated with each part-of-speech tag; inserting into the normalized text at least one syntactic label that identifies a linguistic construction of one or more words associated with the syntactic label; inserting into the normalized text at least one deception indicator tag that identifies a potentially deceptive word or phrase within the normalized text, interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the normalized text; and generating deception likelihood data based upon the distribution of potentially deceptive words or phrases within the normalized text.

[0011] While multiple embodiments are disclosed, still other embodiments of the present invention will become apparent to those skilled in the art from the following detailed description, which shows and describes illustrative embodiments of the invention. As will be realized, the invention is capable of modifications in various obvious aspects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not restrictive.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] FIG. 1 is a schematic diagram of the components of a system for one embodiment of the invention.

[0013] FIG. 2 is a flowchart showing the overall processing of text in one embodiment of the invention.

[0014] FIG. 3 is a diagram showing how text is marked for display after analysis for deception.

[0015] FIG. 4 is a diagram showing an alternative for how text is marked for display after analysis for deception.

DETAILED DESCRIPTION

I. Overview

[0016] A core notion of the method is that deceptive statements incorporate linguistic attributes that are different from those of non-deceptive statements. It is possible to represent these attributes formally as a method of linguistic analysis that can be verified by empirical tests.

[0017] The method begins with certain widely accepted techniques of corpus linguistics and automated text analysis. The deception detection component is based on a corpus of "real world" texts, for example, statements and depositions from court proceedings and law enforcement sources which contain propositions that can be verified by external evidence. Linguistic analysis is accomplished by a combination of statistical methods and formal linguistic rules. A novel user interface interprets results of the analysis in a fashion that can be understood by a user with no specialized training.

[0018] A method in accordance with the present invention is implemented as an automated system that incorporates the linguistic analysis along with a method of interpreting the analysis for the benefit of a system user. A typical system user may be a lawyer, a law-enforcement professional, an intelligence analyst or any other person who wishes to determine whether a statement, deposition or document is deceptive. Unlike polygraph tests and similar devices that measure physiological responses to Yes/No questions, the method applies to freely generated text and does not require specialized or intrusive equipment. Thus it can be used in a variety of situations where statements of several sentences are produced.

[0019] The system builds on formal descriptions developed for linguistic theory and on techniques for automated text analysis developed by computational linguists. The analysis advances the state of the art in natural language processing, because deception detection is a novel application of NLP. In addition the system compensates for the inability of humans to recognize deceptive language at a rate little better than chance.