Hollywood studios could soon begin using generative AI to provide moviegoers with something they may not even have realized they wanted: perfect-looking lip-syncing in dubbed movies and TV shows!
This isn't a joke, Variety has an article on this:
This isn't a joke, Variety has an article on this:
Lip-sync dubbing for content localization is a compelling early use case of generative AI that is gaining traction now among studios.
In contrast to AI dubbing, which handles the creation of speech tracks, localizing content with lip-syncing applies only to the visuals — specifically using deep learning models to realistically alter screen actors’ mouth and facial movements to synchronize with the dubbed audio.
While synthetic voices could be used for the dub, so far companies have tended to use human actor voice performances, in collaboration with localization networks. Capabilities can even extend beyond faces to visually translate text depicted in scenes, such as a street sign or TV news ticker.
The proposed potential for lip-sync dubbing of premium content is a more immersive experience and more global hits. Because lip-sync dubbing has never previously been achievable with VFX, it represents a new cost for business, though one studios are betting will pay off by growing and more deeply engaging audiences.
“The conversation changed completely when people saw it in action and realized they were watching a film that just felt like an English-language film, but actually it wasn’t. The difference did shock us all how much better it was,” said Scott Mann, co-CEO at generative AI firm Flawless.
“The more exciting [uses of this tech] are doing things that VFX could never do,” said Matt Panousis, COO at Monsters Aliens Robots Zombies (MARZ), the Toronto VFX firm that provides the lip-sync dubbing product LipDub AI. “It’s more like 0-to-1 innovation versus 1-to-1 or 1-to-2 innovation. This has just never been doable before.”
Several major Hollywood studios are currently beta testing lip-sync dubbed versions of film and TV shows to ultimately decide whether they lift audience reach and engagement enough to be worth the investment longer term.
For its part, Flawless is focused more exclusively on major U.S. studios, starting with the biggest five before scaling up, and is providing each with a capacity of 1,000 hours over the next 12 months. Armed with some learnings on where it makes the biggest impact, they can increase to 10,000 next year.
Studios involved are devising respective strategies for these market tests, deciding on the type of content — genre, shows or movies, new or catalog, factual or fictional — and specific languages to prioritize. Another question is whether to localize English-language content for international markets or the reverse, meaning English-speaking U.S. audiences might be able to experience to a fully dubbed and lip-synced foreign-language film.
Because of its standards for maximum realism, Hollywood content is the most challenging to serve relative to less premium content such as creator videos. MARZ is going broader with its product LipDub before “opening the floodgates” to Hollywood.
But since its beta launch in January, LipDub now has 80 clients on the platform, including some of the biggest creators on YouTube as well as a number of Fortune 1000 consumer brands and their advertising agencies. For example, one brand used the tech to localize an ad that featured footage of a major celebrity speaking at a live event.
Ultimately, lip-sync dubbing will need to prove out among audiences. Some audiences may be more interested, but there’s also likely to be a portion of “purists” not as receptive to a foreign-language film or show altered in this way.
Likewise, some types of content may be more accepted with alteration than others. For instance, documentary content may not be as accepted, as it arguably changes common understanding of real-world people or events as they occurred. Use in factual content would also likely need disclosure. Studios may not even always want to disclose when the technology has been used for the effect.
Furthermore, studios will need to secure consent from actors for their faces to be altered with these tools, a step that hasn’t previously been required in dubbing workflows.
In contrast to AI dubbing, which handles the creation of speech tracks, localizing content with lip-syncing applies only to the visuals — specifically using deep learning models to realistically alter screen actors’ mouth and facial movements to synchronize with the dubbed audio.
While synthetic voices could be used for the dub, so far companies have tended to use human actor voice performances, in collaboration with localization networks. Capabilities can even extend beyond faces to visually translate text depicted in scenes, such as a street sign or TV news ticker.
The proposed potential for lip-sync dubbing of premium content is a more immersive experience and more global hits. Because lip-sync dubbing has never previously been achievable with VFX, it represents a new cost for business, though one studios are betting will pay off by growing and more deeply engaging audiences.
“The conversation changed completely when people saw it in action and realized they were watching a film that just felt like an English-language film, but actually it wasn’t. The difference did shock us all how much better it was,” said Scott Mann, co-CEO at generative AI firm Flawless.
“The more exciting [uses of this tech] are doing things that VFX could never do,” said Matt Panousis, COO at Monsters Aliens Robots Zombies (MARZ), the Toronto VFX firm that provides the lip-sync dubbing product LipDub AI. “It’s more like 0-to-1 innovation versus 1-to-1 or 1-to-2 innovation. This has just never been doable before.”
Several major Hollywood studios are currently beta testing lip-sync dubbed versions of film and TV shows to ultimately decide whether they lift audience reach and engagement enough to be worth the investment longer term.
For its part, Flawless is focused more exclusively on major U.S. studios, starting with the biggest five before scaling up, and is providing each with a capacity of 1,000 hours over the next 12 months. Armed with some learnings on where it makes the biggest impact, they can increase to 10,000 next year.
Studios involved are devising respective strategies for these market tests, deciding on the type of content — genre, shows or movies, new or catalog, factual or fictional — and specific languages to prioritize. Another question is whether to localize English-language content for international markets or the reverse, meaning English-speaking U.S. audiences might be able to experience to a fully dubbed and lip-synced foreign-language film.
Because of its standards for maximum realism, Hollywood content is the most challenging to serve relative to less premium content such as creator videos. MARZ is going broader with its product LipDub before “opening the floodgates” to Hollywood.
But since its beta launch in January, LipDub now has 80 clients on the platform, including some of the biggest creators on YouTube as well as a number of Fortune 1000 consumer brands and their advertising agencies. For example, one brand used the tech to localize an ad that featured footage of a major celebrity speaking at a live event.
Ultimately, lip-sync dubbing will need to prove out among audiences. Some audiences may be more interested, but there’s also likely to be a portion of “purists” not as receptive to a foreign-language film or show altered in this way.
Likewise, some types of content may be more accepted with alteration than others. For instance, documentary content may not be as accepted, as it arguably changes common understanding of real-world people or events as they occurred. Use in factual content would also likely need disclosure. Studios may not even always want to disclose when the technology has been used for the effect.
Furthermore, studios will need to secure consent from actors for their faces to be altered with these tools, a step that hasn’t previously been required in dubbing workflows.