A scary reality dawned on me today when I listened to a Podcast of Radiolab.
If you haven't heard of Radiolab before, I recommend checking out the New York based radio show. Every 1 or 2 weeks they publish a podcast covering a topic of scientific or philosophical nature. Today I listened to a combination of both as the Radiolab team investigated and discussed the technological breakthroughs in sound and video editing. And what they presented during the show to the listeners changed my view on media forever.
Probably most of you are aware of the fact that sound editing can be done quite easily. All you need is a recording of someone speaking the sentence you want to edit and the word(s) you want to edit from the same or any other recording.
Lets start with a simple example. Lets say someone is recorded saying "I voted Hillary Clinton" and a bit further down the line that person would say: "Donald Trump is a buffoon". Currently it is quite easy to edit and end up with a sentence "I voted Donald Trump".
The 'problem' with this well known technology is that it is limited to the words that exist in the recording. And here is where the first part of new technology comes in.
It is now possible via tools like Voco to insert and or replace words that were never recorded while the tone of the 'new' word is "synthesized such that it fits the narration." I.e. the same voice and fitting tone of the word depending on where it is placed in the sentence. All this can be done via a simple text editor where you specify the words you want to cut and write the ones you want to paste. And voila it all flows naturally like a genuine recording. To stay with the example above, it could be doctored to: "I never liked Hillary Clinton" even though the words 'never' and 'liked' were never mentioned in any recording.
So what's the big deal?
Well think about the implications here.
- It means that recordings are going to be even harder to trust. It also means people can be accused (with 'evidence') of saying things they never said.
- It will give people plausible deniability that they ever said the thing they were recorded saying.
To continue with examples of our beloved politicians, I am sure you remember the comment Donald Trump made in 2005 about women and which surfaced during his campaign:
“They let you do it. You can do anything. Grab ’em by the pussy.”
The full clip can be found here if interested. The point here is that this piece of recording is exactly the kind of thing that could be created now without any of those words being spoken by Donald Trump at all. Ever! And vice versa, I am sure if Donald Trump would have known we were this far with this technology he would have characterized this as 'Fake News'. As he has done quite a few times already during the 2016 elections and his current presidency.
The implications are scary. The damage that can be done to someone by fabricating a recording like that is huge. Especially in cases with politicians or other celebrities a recording can cause irreparable harm even though if the victim eventually would be able to prove the recording was a fake.
The same process is also happening with static video images. It is getting harder and harder to prove (for experts) if a picture is false or not due to forgers' advanced tools. And the time and effort it takes to prove something is fake is considerable. And it is very expensive. But it doesn't stop there either as now we reach the point it becomes really scary (or fun depending how it is used).
Synthesizing is now possible on video and audio combined. Thus any video of someone making a speech or giving an interview can be completely reworked to give a completely different message. None of the words need to be spoken and none of the expressions need to be shown before hand either. All that is needed is 20-40 min of video of someone and we are able to let someone appear to say things he or she never said in a very realistic manner.
Several techniques are used to do this by varying firms. One of them is lip syncing, a program which basically makes the lips move the way they should when new words are inserted. However the most impressive one I found was 'facial reenactment'. In this process the face is divided in a square with 250 by 250 points, resulting in over 60000 points that can be changed to accommodate the desired expressions. All that is needed is this particular program and a simple webcam. Once we are setup, we can change the facial expressions of someone who is talking live to us easily. A great example is shown in the face to face video.
In the video in question we see George W Bush (GWB) speaking publicly in front of a camera. His video is seen on monitor 1. Separately we have another person (can be anyone, but lets call him Ben) in front of another camera making faces of his choosing. His video is shown in monitor 2. On monitor 3 we see GWB speaking as he does on screen 1 but moving his face exactly like Ben is doing. All of the sudden GWB is a real-time puppet of Ben! All this without losing credibility of the video unless Ben overdoes it.
These technologies combined can deliver great things. A good use-case provided on the podcast was that you could get a famous Hollywood actress who doesn't speak a word of mandarin to deliver a address the Chinese audience fluently in mandarin, just by using prerecorded data of the actress and using a native speaker next to that. Once the data is combined, the result shows and actress who is reaching a Chinese audience in ways never done before, in their native language.
However it also shows that we need to become much more careful what audio and video recordings we trust. Obviously this was already the case in some ways, but we have taken a leap forward in terms of technology. And it is a lot easier to get fooled now.
I don't know about you, but I will not listen to or watch recordings the same as I did yesterday.
Will you?
Stay tuned!
References:
http://futureoffakenews.com/videos.html
https://en.wikipedia.org/wiki/Radiolab
https://www.vox.com/2016/10/7/13205842/trump-secret-recording-women
Hi Attalis, very interesting post and disturbing.. It is still noticeable in these fake video's that they are manufactured, but who knows how real they will look in a year or two.. I gotta check out these Radiolab podcasts as well. Thank you for the tip! :)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
More than welcome,! Yeah agreed, the technology can still be improved. Do check out Radiolab. They have a lot of interesting topics in their podcasts.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
This post received a 4.5% upvote from @randowhale thanks to @attalis! For more information, click here!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Editing is quite amazing. reminds me of Hollywood. The City of Magic. Vote up.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit