ABSTRACT


We present Conversational Gesture Synthesizer, an online method for automatically generating and synthesizing gesture animations whose intensity and style are driven by live spoken speech, as well as a specified conversational attitude. Body gestures are adapted in such a way to have a believable strength relation between speech and gestures, whereas gesturing style is matched with the current conversational attitude. The method is data-driven and uses pre-recorded mocap motions to generate new ones. The pipeline is made up of three stages, the preprocessing stage, the online generation stage and the postprocessing stage. The preprocessing stage is responsible of segmenting all the mocap motions and creating a motiongraph structure, and it is the only stage that is offline of the three. The online generation stage takes the speech input, extracts prosody features out of it, and uses the constructed motiongraph to select appropriate motion segments, while the postprocessing stage concatenates the selected segments together and creates the final animation.


PIPELINE

Pipeline is consisted of 3 stages: Preprocessing stage, online geenration stage and postprocessing stage. Out of the 3, only the preprocessing stage is offline.

Preprocessing

Online Generation

Postprocessing

Final Animation

RESULTS

Two user studies were conducted in order to evaluate the quality of the final animations, as well as how well can people recognize the displayed conversational attitude when there is no sound. Clips from the original mocap recordings were included in both user studies in order to compare the results with those of the synthesized ones.

MOTION QUALITY

Overall


4.4/7
Synthesized (Attitude-Based)

3.5/7
Synthesized (Generic)

4.9/7
Ground Truth


Male Animations


4.9/7
Synthesized (Attitude-Based)

3.8/7
Synthesized (Generic)

5.4/7
Ground Truth


Female Animations


3.6/7
Synthesized (Attitude-Based)

3.0/7
Synthesized (Generic)

3.8/7
Ground Truth


MOOD RECOGNITION

Overall


52/156
Synthesized (Attitude-Based)

22/156
Synthesized (Generic)

57/156
Ground Truth


Male Animations


49/104
Synthesized (Attitude-Based)

20/104
Synthesized (Generic)

54/104
Ground Truth


Female Animations


3/52
Synthesized (Attitude-Based)

2/52
Synthesized (Generic)

3/52
Ground Truth