Perfect lip-syncing for dubbed movies coming soon - thanks to AI!

Tools    





Hollywood studios could soon begin using generative AI to provide moviegoers with something they may not even have realized they wanted: perfect-looking lip-syncing in dubbed movies and TV shows!

This isn't a joke, Variety has an article on this:


Lip-sync dubbing for content localization is a compelling early use case of generative AI that is gaining traction now among studios.

In contrast to AI dubbing, which handles the creation of speech tracks, localizing content with lip-syncing applies only to the visuals — specifically using deep learning models to realistically alter screen actors’ mouth and facial movements to synchronize with the dubbed audio.

While synthetic voices could be used for the dub, so far companies have tended to use human actor voice performances, in collaboration with localization networks. Capabilities can even extend beyond faces to visually translate text depicted in scenes, such as a street sign or TV news ticker.

The proposed potential for lip-sync dubbing of premium content is a more immersive experience and more global hits. Because lip-sync dubbing has never previously been achievable with VFX, it represents a new cost for business, though one studios are betting will pay off by growing and more deeply engaging audiences.

“The conversation changed completely when people saw it in action and realized they were watching a film that just felt like an English-language film, but actually it wasn’t. The difference did shock us all how much better it was,” said Scott Mann, co-CEO at generative AI firm Flawless.

“The more exciting [uses of this tech] are doing things that VFX could never do,” said Matt Panousis, COO at Monsters Aliens Robots Zombies (MARZ), the Toronto VFX firm that provides the lip-sync dubbing product LipDub AI. “It’s more like 0-to-1 innovation versus 1-to-1 or 1-to-2 innovation. This has just never been doable before.”

Several major Hollywood studios are currently beta testing lip-sync dubbed versions of film and TV shows to ultimately decide whether they lift audience reach and engagement enough to be worth the investment longer term.

For its part, Flawless is focused more exclusively on major U.S. studios, starting with the biggest five before scaling up, and is providing each with a capacity of 1,000 hours over the next 12 months. Armed with some learnings on where it makes the biggest impact, they can increase to 10,000 next year.

Studios involved are devising respective strategies for these market tests, deciding on the type of content — genre, shows or movies, new or catalog, factual or fictional — and specific languages to prioritize. Another question is whether to localize English-language content for international markets or the reverse, meaning English-speaking U.S. audiences might be able to experience to a fully dubbed and lip-synced foreign-language film.

Because of its standards for maximum realism, Hollywood content is the most challenging to serve relative to less premium content such as creator videos. MARZ is going broader with its product LipDub before “opening the floodgates” to Hollywood.

But since its beta launch in January, LipDub now has 80 clients on the platform, including some of the biggest creators on YouTube as well as a number of Fortune 1000 consumer brands and their advertising agencies. For example, one brand used the tech to localize an ad that featured footage of a major celebrity speaking at a live event.

Ultimately, lip-sync dubbing will need to prove out among audiences. Some audiences may be more interested, but there’s also likely to be a portion of “purists” not as receptive to a foreign-language film or show altered in this way.

Likewise, some types of content may be more accepted with alteration than others. For instance, documentary content may not be as accepted, as it arguably changes common understanding of real-world people or events as they occurred. Use in factual content would also likely need disclosure. Studios may not even always want to disclose when the technology has been used for the effect.

Furthermore, studios will need to secure consent from actors for their faces to be altered with these tools, a step that hasn’t previously been required in dubbing workflows.



Hollywood studios could soon begin using generative AI to provide moviegoers with something they may not even have realized they wanted: perfect-looking lip-syncing in dubbed movies and TV shows!

This isn't a joke, Variety has an article on this:

I never watch anything dubbed.
__________________
I’m here only on Mondays, Wednesdays & Fridays. That’s why I’m here now.



+1. The problem isn't the horrible non-matching mouth movements, but the horrible replaced voices and emotions behind them. It's like the people responsible have gone out of their way to find the worst possible actors to do this.

Nothing can fix this, as Badham (Scout from TKAMB) was replaced altogether due to her famously heavy accent, and it's so jarring that it's hard to make sense of what's happening with the character. I thought I had lost my mind when I first heard this episode, but heavy background noise necessitated the re-recording of nearly all exterior dialogue.

I don’t watch anything dubbed because it ruins the enjoyment of watching a foreign movie. Dubbing a movie from, say, Ingmar Bergman, would take away the reality of watching a Swedish movie & hearing the sounds of that country.



They managed to find a way to make dubbing even more annoying!
If the dubbed version has perfect lip-syncing, 99% of people probably won't notice it's dubbed.



Trouble with a capital "T"
I don’t watch anything dubbed because it ruins the enjoyment of watching a foreign movie. Dubbing a movie from, say, Ingmar Bergman, would take away the reality of watching a Swedish movie & hearing the sounds of that country.
I agree with you. But if you've ever seen an Italian film with American actors they are often dubbed by Italian voice actors.

If the dubbed version has perfect lip-syncing, 99% of people probably won't notice it's dubbed.
That's true. There are a lot of American movies with Russian voice overs, that apparently Russians like to watch over subtitles. So perfectly dubbed in one's own language will become a thing due to AI.



Perhaps, but still. It's the principle of it for me!
That wasn't my point. I'm just saying, it's likely to become a common practice.



I don’t watch anything dubbed because it ruins the enjoyment of watching a foreign movie. Dubbing a movie from, say, Ingmar Bergman, would take away the reality of watching a Swedish movie & hearing the sounds of that country.

I typically prefer the original sub as well, for live-action at least. When it's an animated foreign film, I do whatever's convenient. Example, I've seen the sub of Grave of the Fireflies. I plan to watch the dub at some point, but I just can't bring myself to go through that ending again.


And of course, there are some rare anime that I can only get a sub of because the dub was incomplete, or maybe lost, such as Blue Dragon or F-Zero, in which case the Japanese voice acting slaughters the short-lived 4Kids dub, although the fandub of the final episode rocks.


On that subject and to digress slightly, if anyone's interested in an F-Zero anime, first check out the HD remastered episodes on the Operation Spin Booster YT channel. The dude's slowly doing the whole show, but it's worth it. He's at 28 of 51.



Movie Forums Squirrel Jumper
It won't allow me to read the article since I am not subscribed, but I don't see how this is going to work though, because if human beings cannot dub it perfectly, how do you expect artificial intelligence too? Artificial intelligence can't even seem to write a good screenplay it seems, so how is it suppose to speak perfectly over the mouths when it's a different language?



It won't allow me to read the article since I am not subscribeD
The entire article is included in the OP

so how is it suppose to speak perfectly over the mouths when it's a different language?
It won't "speak", it will tweak the mouth movement to adapt to the dubbed lines.



Movie Forums Squirrel Jumper
The entire article is included in the OP



It won't "speak", it will tweak the mouth movement to adapt to the dubbed lines.
Oh I see. I don't like that idea. It feels like messing with the original product, like pan and scanning, or adding new CGI in the old Star Wars movies.



Movie Forums Squirrel Jumper
Oh okay, but why is sync the least of the problems since most people who hate dubbing, seem to complain about the sync being off?




The least of the problems with dubbing is the mouth sync.

Yeah, that's my feeling as well.



Surely there are ways to make dubbed audio feel more naturalized into the sound design of the film. To me this is infinitely more distracting than syncing up lips.


But I also don't care about any of this really. Give me a few minutes to adjust to the dubbing, no matter the quality, and I'm good to go.



Movie Forums Squirrel Jumper
Oh yeah that makes sense. Well I have a couple of friends who cannot stand dubbing, so maybe this will open them up to good foreign language movies more either way.



The entire article is included in the OP



It won't "speak", it will tweak the mouth movement to adapt to the dubbed lines.

Why not speed up/slow down the audio to match the pre-existing movement of the lips?
That would also match the definition of "perfect" lip-syncing the article is going with.
This rhetorical question is to highlight the problem with the implied definition the article is working with.



It should also be noted that this article also has the feel of many tech articles that are borderline ads/press releases put out by a tech company promising the moon. Large grains of salt and all.


I thought the initial title meant they were going to use AI to help fix ADR done in post (which seems to at least make sense, might be feasible, but might also run into something similar to uncanny valley issues of late 90s computer animation and mo-cap).



Why not speed up/slow down the audio to match the pre-existing movement of the lips?
Because it would sound horrible and it still wouldn't match the movement of the lips. That's why.