17 September 2018

Automatically Transforming Deep Fakes in Videos

Researchers at Carnegie Mellon University have devised a way to automatically transform the content of one video into the style of another, making it possible to transfer the facial expressions of comedian John Oliver to those of a cartoon character, or to make a daffodil bloom in much the same way a hibiscus would. Because the data-driven method does not require human intervention, it can rapidly transform large amounts of video, making it a boon to movie production. It can also be used to convert black-and-white films to color and to create content for virtual reality experiences. The technology also has the potential to be used for so-called deep fakes, videos in which a person's image is inserted without permission, making it appear that the person has done or said things that are out of character. Transferring content from one video to the style of another relies on artificial intelligence. In particular, a class of algorithms called generative adversarial networks (GANs) have made it easier for computers to understand how to apply the style of one image to another, particularly when they have not been carefully matched.

In a GAN, two models are created: a discriminator that learns to detect what is consistent with the style of one image or video, and a generator that learns how to create images or videos that match a certain style. When the two work competitively -- the generator trying to trick the discriminator and the discriminator scoring the effectiveness of the generator -- the system eventually learns how content can be transformed into a certain style. A variant, called cycle-GAN, completes the loop, much like translating English speech into Spanish and then the Spanish back into English and then evaluating whether the twice-translated speech still makes sense. Using cycle-GAN to analyze the spatial characteristics of images has proven effective in transforming one image into the style of another. That spatial method still leaves something to be desired for video, with unwanted artifacts and imperfections cropping up in the full cycle of translations. To mitigate the problem, the researchers developed a technique, called Recycle-GAN, that incorporates not only spatial, but temporal information.

More information: