Georgetown Law Technology Review

Michael Mazzella

Legal Impressions

Voice Mimicking Technology Makes Fabricating Evidence Easy

October 2017

Dating back to 1906, voice recordings have been utilized in the courtroom to convey everything from custodial confessions and depositions to undercover sting operations and legal wills. These recordings accurately displayed the intent and character of the subject, whether or not she was aware that her words were being preserved.1 In most jurisdictions, these preserved conversations were taken at face value, given that a witness could authenticate the recording and the voices were readily and clearly identifiable by a custodian familiar with the subjects.2Id.However, the advent of a new technology that utilizes an artificial intelligence neural network to mimic a human voice to a nearly undetectable degree stands to undo over a hundred years of evidentiary jurisprudence.

Lyrebird is a software program developed by Canadian technology students at the University of Montreal.3 The software uses real recordings of an individual’s voice, analyzes the subject’s biometric vocal profile, and uses this catalog of sounds and tones to reconstruct eerily realistic forgeries of the individual’s own voice.4 Whereas other vocal artificial intelligence programs like Apple’s Siri or Amazon’s Alexa rely upon a database of prerecorded individual words which are then organized and repeated to create sentences, Lyrebird needs only portions of words in order to anticipate and recreate entirely new, nearly organic words, phrases, and sentences that seemingly match the unique voice of the host subject.5Id..

What sets Lyrebird apart and allows the software to accomplish such a feat is its constantly improving neural network; the more input it receives from a variety of users, the more Lyrebird learns to sound authentically human.6 Even more stunning is how little input the system truly needs to accomplish its object. The ease and accuracy of this new tool simultaneously demonstrates the drastic advancements in technology that are available to us through integrated artificial intelligence as well as the host of ethical and practical concerns that inevitably follow pushing the boundaries of scientific achievement.

A plethora of ethical dilemmas spawn from the ability to convincingly copy another’s voice. Everything from biometric security systems to the fight against “fake news” will be compromised when Lyrebird inevitably falls into nefarious hands.7 However, the most immediate concern for the modern legal justice system must be the possibility of fabricated vocal evidence, its ability to strongly influence a lay judge or jury, and the lack of available methods to prove its contrived nature. While the infant tech has not yet advanced to the level of being able to trick the average listener,8Id. its neural network ensures exponential improvements—advancements which will surely outpace achievements in audio authentication methods, leaving our courts and the public at large vulnerable to mass deception.

Lyrebird would allow a witness to easily fabricate a telephone conversation between herself and the defendant or opposing party. The only tools required to pull off such a plot would be Lyrebird technology and a sixty-second sample recording of a voice, easily lifted from a social media video, answering machine, or even from data hacked from other technology trained to learn a voice, such as Amazon Echo.9

While voice-mimicking technology continues to become increasingly realistic, methods of authenticating vocal evidence in court are at a standstill.10 Most outdated systems of authentication involve nothing more than an investigator listening to a recording with the naked ear and comparing that sample to the live voice of the suspect.11Id. Unfortunately, the current standard for introducing such evidence in many domestic courts is precisely the same. The Seventh Circuit Court of Appeals in U.S. v. Mendiola held that the legal standard for a vocal recording to meet the authentication requirements of Federal Rule of Evidence 901(b) is merely the “aural voice identification” of a lay witness familiar with the subject’s voice.12 With such a low bar to satisfy authentication requirements, and the highly convincing nature of voice mimicking technology—particularly to judges and jurors unaware of its existence—the prevalence of Lyrebird-doctored evidence in a court of law could conceivably go completely undetected, and bear grim consequences for all. As Harvard University’s lecturer on security technology, Bruce Schneier, puts it, the possibility of encountering fraudulent audio clips is our “new reality.” 13Gholipour, supra note 4.