HLS and Face Animation

JHinkle

New member
I've fallen in love with Halloween and Christmas animation - and I would like to share where current thoughts are with HLS supporting that activity.

As of last light, HLS can accept text vocals and produce the series of mouth positions required to animate speech (same results many of you obtained by using Papagayo).

HLS will enable the building of a Word dictionary that will hold "words" and their associated series of mouth movements required to animate speech.

HLS will have several new channel types added to it's capabilities which will greatly improve animation sequencing.

New channels types: "Word", "Mouth", and "Eye".

A Word channel will allow easy placement and sizing of a "word effect" on to a Word channel. The user will be able to select a word from the vocals and position/size it just like any other effect in HLS.

Once the vocal's words are placed, the user will inform HLS as to the number of mouth positions their specific display requires (some of the Halloween faces that can be purchased utilize 3, 5, or 7 different mouth positions). The user will create a translation map uniquely for that Word channels that tells HLS how to go from the 10 mouth positions derived from phonic analysis of the vocal's words to those required for their display. Assume we have a Halloween face that utilizes 5 unique mouth formations.

HLS will utilize a "Mouth" channel for each mouth position associated with the Word channel. If you have not looked into animation, a single word will require multiple different mouth positions to animate speaking that word.

So ... the Word channel defines the start and duration of the word .... a long spooky "ooooooo" can be as long as needed. HLS then drops "mouth" effects on to the associated Mouth channels. The user can then position the placement and duration of each mouth movement within the time frame of the Word.

Looking at a real scenario ... take a display utilizing 3 singing faces .... lead and two backup singers. Lead singer has 5 mouth positions and the backs each has 3 mouth positions.

The sequencing would go like this ...

1 Word channel for Lead's Only vocals ... leading to 5 mouth channels.
1 Word channel for Left backup Only vocals ... leading to 3 mouth channels.
1 Word channel for Right backup Only vocals ... leading to 3 mouth channels.
1 Word channel for vocals where right and left backups sing in harmony ... leading to 3 mouth channels.

HLS will then provide a mechanism where physical illumination channels are assign to one or more Mouth channels. HLS will then automatically populate the physical illumination channels with Level effects being driven from the multitude of Mouth channels stated above.

In summary - the process flow will be:

Vocal Text
Text to 10 animation mouth positions
Position vocal's words on to a Word channel
Translate mouth position from the 10 to the number required for the display and create that number of Mouth channels.
Position your mouth effects onto the associated mouth channel.
Tell HLS how to map you mouth channels into physical channels.
HLS automatically populates the physical illumination channels as required.

All comments and suggestions are welcomed as I'm still in development.

I would like to thank "timon" for the long discussion on this topic last night ... he helped solidify a number of items.

Joe

Here are the 10 mouth positions that HLS currently extracts by phonetically analyzing spoken text.

Viseme and Monster.png
 

Attachments

  • Visemes BitMap.bmp
    1.1 MB · Views: 29
  • Visemes.png
    Visemes.png
    268 KB · Views: 53
Last edited:
The potential of what Joe is doing is enormous. To date no sequencing program has been able to handle face animation assistance without the need of an external program. Having the song words right on the sequencing screen will make it so much easier to sequence.

Joe and I talked about other helper functions such as gross word placement when you start a sequence and the ability to click on in the lyrics window so the sequencing pointer would jump to that locations.

Way to go Joe.

Side note:

Joe, Forgot to tell you something last night. Since you can now handle MP3's you want to look at pulling the ID3v2 Lyrics field out of the MP3 if they have been included. It's not a bit deal but it would make it a bit quicker to load the Lyrics. Also now that your handle MP3 think about AAC's. That puts you on your way to adding Video which uses it.
 
Many times when I'm doing development and I have not solidified my implementation, I will build test boxes to play in.

The latest release has such a box available.

Far right Menu - says - "Mouth Movement"

Play with it if you like.

Enter a word - or phrase and click OK - then WAIT!

HLS will speak your word or phrase and then display the sequence of mouth positions required to animate speech - based on the phonemes extracted during phonic analysis.

Value ZERO is silence - before and after the conversion.

The non-zero numbers represent an index into an array of mouth positions (the 10 shown in a previous post).

Type in your name and see how many positions are returned. Ask yourself, how many milliseconds did it take to say the name. Look at the time required and that each movement requires a minimum or 25 msecs .... many times the mouth movements are longer than the total word time.

I am thinking about some sort of grammar rule that will look at the size of the sequencing time slice, the total time the word is active - the number of animation movements required - and try to collapse it some more - even above and beyond your limiting movements to 3, or 5, or 7 positions.

If you have thoughts or experience it this area - please speak up.

Thanks.

Joe
 
Joe... stop, makeing me purchase more stuff for your "free" program.:)

Also, very impressive you can get prototype like this up and running so quick.
 
When I started HLS last January - I put in some simple first-order DSP filters.

They were fine for bringing out the Beat for the Beat track .... but I recently implemented a similar Bandpass - used to bring out the vocals.

I was totally dis-satisfied!!!

Version 6Y now has some hi-powered DSP filters.

You can select from first-order filtering all the the up to 10th order.

The walls of these filters at 10 order is almost vertical.

Below is a shot of audio from "Thriller" - where Michael is singing.

First picture is normal audio - vocals and instruments.

The second picture is my version of "Vocal Enhance" plus a 2nd order Band Pass filter with Gain of 1, Center Freq of 3000hz and a BandWidth of 1500hz.

The results make it easy to listen and identify where each word in the vocals occur.

Enjoy.

Joe

ThrillerNormal.jpg

ThrillerBandPass.jpg
 
Joe, I think this might have solidified my decision for software this year. Thank you many times over for what you do, and continue to do!! :)
 
Joe or anybody,
Can you provide or suggest a reference (Basic or layman level) to help me understand how to use the DSP filters on your tool.
In the military I was a sonar technician with a Basic understanding of audio properties, That being said, (It was over 15 years ago) I have a rough time understanding how to best manipulate/use the tool you have provided.

People have said this a milliion times, but it still is not enough, Thanks for the great program and all the time and effort you put into it.

When I started HLS last January - I put in some simple first-order DSP filters.

They were fine for bringing out the Beat for the Beat track .... but I recently implemented a similar Bandpass - used to bring out the vocals.

I was totally dis-satisfied!!!

Version 6Y now has some hi-powered DSP filters.

You can select from first-order filtering all the the up to 10th order.

The walls of these filters at 10 order is almost vertical.

Below is a shot of audio from "Thriller" - where Michael is singing.

First picture is normal audio - vocals and instruments.

The second picture is my version of "Vocal Enhance" plus a 2nd order Band Pass filter with Gain of 1, Center Freq of 3000hz and a BandWidth of 1500hz.

The results make it easy to listen and identify where each word in the vocals occur.

Enjoy.

Joe
 
Last edited:
This is perfect Joe . Couldn't have asked for better. now i can use a vocals beat track.

Thanks much !
 
A filter will attenuate (decrease) the audio volume within a specific range of frequencies.

A Low Pass filter will attenuate frequencies Above a specific Corner frequency --- hence the name Low Pass meaning is passes frequencies lower than the Corner frequency unaltered.

A High Pass filter will attenuate frequencies Up to a specific Corner frequency --- hence the name Hi Pass meaning is passes frequencies above the Corner frequency unaltered

A Band Pass is a window - between which no attenuation takes place - outside of the window - the audio IS attenuated.
With a Band Pass - you have a center frequency and a window whose size if the BandWidth. This window is centered on the Center frequency.

A simple filter is a first order filter which provides a 3db attenuation slope at the skirt ( attenuation does not occur like a wall but has a slope up to the knee/Corner frequency - think ski slope).

If you place two of the same simple filters in series - the slope increases. The more you have the steeper it gets --- almost to a vertical wall.

The Term ORDER defines how many of these filters are in series - hence how steep is the attenuation slope.

The telephone system treats speech within a band or frequencies from 300hz to 3000hz.

Your BASS - or Beat in the music is usually less than 300hz.

So ... as some examples ....

If you wanted to set a Beat Track in your sequence - let your eyes help.

Use a Low Pass filter set to 300hz - order depending on the response you want. Watch the Gain - you don't want Clipping - as shown by audio line segments hitting the top or the bottom of the audio track display. Clipping will produce clicks/pops in the audio.

Process the Low Pass .... most of your voice will be gone as well as a lot of instrumentals. The Bass Notes will appear - so you can set your Beat effects by both listening to them and seeing them.

Lets say you want to have a lighting effects when ever a hi-pitch bell rings.

Use a Hi-Pass --- find the Corner Frequency that best gets rid of the vocals and audio with frequencies lower than the Bell's.

Use a Band Pass when you want audio in the middle of the frequency spectrum while ignoring those above and below corner points.

When doing Mouth animation - HLS wants you to place a WORD effect that shows the position and duration of the word.

Using a Band Pass filter will help you SEE the word as well as hear it.

Again - speech is primarily in the 300 to 3000hz range - so first try a Band Pass at 2000 with a BandWidth of 1000.

That will give you corner frequencies at 1000hz and 3000hz. Play with the frequency inputs and Order to See the words as well as still be able to hear them.

Please note - these are powerful DSP filters - they WILL attenuate you audio to nothing if not used properly.

Also note --- The DSP filters are being used in HLS to make certain aspects of the audio recognizable - NOT without distortion.

Odds are --- when you apply the BandPass to extract the Vocals - the result will sound very tinny - BUT - you will be able to see the words.

HLS also has a check box at the bottom of the DSP Filter Dialog - called "Audio Based on Vocals".

If this button is checked - I move from processing the audio as two channel audio to one channel MONO.

I do this by adding the two channels together (you may need to Gain of .9 or .8 if Clipping occurs).

A lot of music has the Vocals "In The Center" - meaning the vocals are in both channels (left and right) and are in-phase. By adding the channels together --- the vocal will tend to get louder while the instrumentals (which are not in-phase in both channels) will tend to be cancelled somewhat.

Hence - the vocals are somewhat lifted above the music.

Please note ... I have found that the BASS tends to be centered also --- so the BASS will also be elevated. Make sure your BandPass is of sufficient Order and the low Corner is such to remove most of the Bass.

I hope that helps.

Enjoy.

Joe
 
My laptop I do not get and audio from using the Mouth Movement but my desktop works fine.

Both a running XP SP3.

Check that my Master Volume and Audio sliders are all on.

Using Realtek Audio on the laptop
 
Last edited:
Your audio device on the laptop may need driver update that windows does not support .

ac97 comes to mind.
 
Back
Top