Researchers at Carnegie Mellon
University have devised a way to automatically transform the content of one
video into the style of another, making it possible to transfer the facial
expressions of comedian John Oliver to those of a cartoon character, or to make
a daffodil bloom in much the same way a hibiscus would. Because the data-driven
method does not require human intervention, it can rapidly transform large
amounts of video, making it a boon to movie production. It can also be used to
convert black-and-white films to color and to create content for virtual reality
experiences. The technology also has the potential to be used for so-called deep
fakes, videos in which a person's image is inserted without permission, making
it appear that the person has done or said things that are out of character. Transferring
content from one video to the style of another relies on artificial
intelligence. In particular, a class of algorithms called generative
adversarial networks (GANs) have made it easier for computers to understand how
to apply the style of one image to another, particularly when they have not
been carefully matched.
In a GAN, two models are created:
a discriminator that learns to detect what is consistent with the style of one
image or video, and a generator that learns how to create images or videos that
match a certain style. When the two work competitively -- the generator trying
to trick the discriminator and the discriminator scoring the effectiveness of
the generator -- the system eventually learns how content can be transformed
into a certain style. A variant, called cycle-GAN, completes the loop, much
like translating English speech into Spanish and then the Spanish back into
English and then evaluating whether the twice-translated speech still makes
sense. Using cycle-GAN to analyze the spatial characteristics of images has
proven effective in transforming one image into the style of another. That
spatial method still leaves something to be desired for video, with unwanted
artifacts and imperfections cropping up in the full cycle of translations. To
mitigate the problem, the researchers developed a technique, called
Recycle-GAN, that incorporates not only spatial, but temporal information.
More information: