In recent years, huge leaps in the power of video and imaging software has created an era where it’s difficult to fully trust any image or video – especially those that show something extraordinary, from UFO hoaxes through to snowboarders being chased by bears.
If viral marketers can so easily get our attention and suspend our disbelief simply to make money, you can bet political opportunists and intelligence agencies are sure to be looking to use it for propaganda purposes. And a new video (embedded above) gives us a taste of where things are going as software continues to improve.
The video above shows how Washington University researchers used video of former president Barack Obama to map sounds to mouth shapes, and then composite those mouth movements over a video using a separate audio source:
Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Trained on many hours of his weekly address footage, a recurrent neural network learns the mapping from raw audio features to mouth shapes. Given the mouth shape at each time instant, we synthesize high quality mouth texture, and composite it with proper 3D pose matching to change what he appears to be saying in a target video to match the input audio track
With lies now traveling halfway around the world on social media before the truth has its pants on, we can only imagine the political damage a hoaxed clip of a political opponent could cause. And, perhaps less recognized but equally as important – especially in the year of certain politicians continually claiming ‘fake news!’ – this technique offers opportunities for denial of real, damaging video by simply saying ‘the deep state faked it to take me down’…
You can find more information about the topic at the researcher’s project website, “Synthesizing Obama: Learning Lip Sync from Audio“.