The Voice Clone Debate

Likelihood of AI in pretexting?

The critical impact of artificial intelligence.

Our social engineering team just concluded a healthy debate.

As we enjoy the benefits of artificial intelligence that can be found in our endpoint detection and response offerings, we’re also trying to get our arms around its risks. From the security side, we’re always reminding ourselves that the greatest risk is a false sense of security. In the case of EDR, this translates as: artificial intelligence can make mistakes too.

But from the threat side, we end up debating the likelihood of voice cloning affecting banks negatively.

It’s extremely easy to clone a voice. A clone of my voice now occasionally shows up to internal infotex meetings, just to practice up, so we can keep ourselves on guard to the threat.

(OK, the real reason we’re doing it is because it’s a LOT of fun. And it’s not THAT scary that my wife now knows how to clone my voice!)

But allow my team to clone my voice? (I’ve been asked, incredulously).

Yes. Seriously, bad actors can clone my voice. I would rather my voice clone be used by people friendly to me, to teach those I love how to protect themselves against a clone of my voice.

Seriously. Bad actors can clone my voice. Multiply the number of movies and webinars on movies.infotex.com by the average length of those movies and webinars, and we estimate there is more than one hundred hours of my voice, available to anybody wanting to sound sophisticated, wise, and . . . . . oh, dumb joke.

But how about your voice? Or your CEO‘s voice? Your CEO, who met with that potential vendor in a Zoom meeting?

Eyebrows raised?

I raised my eyebrows when I overheard a talking head on TV put forth a “20 second rule,” where all you need is 20 seconds of a voice to clone it. While we think that’s a bit aggressive, who knows. The talking head worked for the federal government. Maybe he was sitting on a computer far faster than the one I use.

(Like every nation state.)

James Bond may be able to afford the technology, or have the appropriate license, to clone retinas and fingerprints, but he’d be more effective pretexting a voice.

Because it takes two minutes of my voice. [[It could just very well be the demo version of the cloud site we were using which could have required better Internet connectivity than we were using]].

Someday it WILL be possible to clone a voice based on 20 seconds of audio. And while your management team may not have hours of movies on the Internet, they do have 30 second voicemail greetings.

Right now, two minutes of my voice is all we need to create a legitimate clone. We’re standing up policies and procedures to protect ourselves against this threat. We’re still working through things, but look for a future article carrying boilerplate language for policies and procedures and . . .

. . . awareness training.

It’s. Extremely. Easy.

But ease of attack is only one aspect of likelihood. Impact affects motivation, so let’s go there next. Hopefully you’ll wait until you finished reading my article but you definitely need to check out this situation.

Voice cloning not only threatens our management team from a social engineering perspective, but it threatens our authentication systems. How many systems treat voice recognition (who you are) and account number (what you know) as two factor authentication? They need to realize one of those factors is now easily compromised.

It only takes two minutes.

voice wave with two speech bubbles, one is blue and shows a picture of a high ranking executive, the other identically sized but is red and has a hacker on it

Meanwhile, voice cloning is already being used to attack management teams.

The debate actually started when we proposed to update the Voice Authentication section of our Acceptable Use Policy Boilerplate, and our upcoming Artificial Intelligence Policy Guide. The new language was a section, called the voice code.

But then we realized, though the voice code may apply in some situations, community-based banks already have the appropriate practices in place – Callbacks or Out of Wallet Questions. The key is awareness training, not making people remember a new passphrase.

But who likes out of wallet questions?

And what is the likelihood?

The pushback is legitimate. With all the check fraud out there do we really want our clients to become concerned with a future threat.

The pushback includes that there is greater treasure elsewhere, that ransomware is still much easier and more lucrative, that most people don’t understand banking enough to deploy voice cloning, etc. And, due to the above listed issues and more, voice cloning is still too new, and it will take a while for the likelihood to go from low to high.

You can tell by my writing of this article that the only pushback I’m in league with is the last one listed above. And my main concern is how do you define “take a while”. And how long will it take us to stand up controls, which are usually awareness related, and thus require habits and disciplines to be modified.

In summary, the good news to voice cloning is that if you are already guarding against pretext calling, and already have out of wallet, passphrase, and callback controls in place, you’re a good way there. But there’s still more work to do. You still need to warn your employees of the new threat – your co-worker’s voice. And you need to double-down on out of wallet questions training. And you might need to address voice recognition.

If . . . . you think there is a likelihood.

But I am more paranoid than most. Let’s face it we started our company in the year 2000!.

So what is the threat to you? That’s the important question. If you take this poll, we will publish the results in a future article.

Original article by Dan Hadaway CRISC CISA CISM. Founder and Information Architect, infotex


Dan’s New Leaf – a fun blog to inspire thought in  IT Governance.

Audit & Assessment

Policies & Procedure Development

Endpoint Detection and Response

Managed SIEM

Consulting Services

Network Monitoring

2 Responses

  1. I really hadn’t thought about this but after reading the article, it scares me. I believe it’s not an if, but when.

    1. Agreed, but like any technology we need to apply controls, manage the risks, and reap the rewards!

      And the controls for voice cloning are very similar to normal telephone authentication controls anyway!

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

The Magnificent Seven 2023

Seven Trends . . . …that small bank Information Security Officers face in 2023 Another one of those Dan’s New Leaf Posts, meant to inspire thought about IT Governance . . . . Welcom...