Disney Master Redubbing in Tinsel Town With New ‘Visemes’ Approach

When it comes to films and and the entertainment industry, few other companies in Tinsel Town can quite compare to Disney. But there's a reason for the company's great success-it doesn't usually rely on actors, which can prove to be an added hitch in the process of film-making. But for the off-chance that they need to rely on working with live-action filming, or even the adaptation of one of its many international divisions' hit films or television shows, Disney has devised a way around the problems of a flubbed line or a poorly dubbed film or two.

Looking into taking the words out of an actor's mouth, and replacing it with something a bit better suited, researchers with Disney Research and the University of East Anglia have discovered that through their approach based on the association of facial movements and speech sounds that they can produce alternative word sequences that will better fit the shaping of the lip movements to the sound. While it may seem like a simple concept, much like Mad Libs that tries to place common movements of the mouth with alternative sounds, the study of "dynamic visemes" is one that Disney has researched for years and now they think that they have deciphered an audio profile system that will more accurately meet the performances of their actors when there's a need for redubbing a film.

"Dynamic visemes are a more accurate model of visual speech articulation than conventional visemes and can generate visually plausible phonetic sequences with far greater linguistic diversity" lead author of the study, Sarah Taylor says. "This work highlights the extreme level of ambiguity in visual-only speech recognition."

Traditionally lip readers find the spoken language difficult to discern because of these general ambiguities associated with facial movements and speech, however, are often able to utilize context to unravel the silent conversation of the mouth. But in the new study, looking for alternative better fitting phrases, Disney Research exploited the general ambiguity to their advantage, finding a model for replacing sounds associated to a facial movement.

The study, which will be presented this Thursday, April 23, at the IEEE International Conference on Acoustics, Speech and Signal Processing in Brisbane, Australia, reveals the intricate complexities of speech redubbing in the visual entertainment industry. And though the researchers may not be the editors behind the sound-boards, they are revealing a whole world of possibilities, bringing international films even more into the limelight.

"The method using dynamic visemes produces many more plausible alternative word sequences that are perceivably better than those produced using a static viseme approach."