ABSTRACT

We present Conversational Gesture Synthesizer, an online method for automatically generating and synthesizing gesture animations whose intensity and style are driven by live spoken speech, as well as a specified conversational attitude. Body gestures are adapted in such a way to have a believable strength relation between speech and gestures, whereas gesturing style is matched with the current conversational attitude. The method is data-driven and uses pre-recorded mocap motions to generate new ones. The pipeline is made up of three stages, the preprocessing stage, the online generation stage and the postprocessing stage. The preprocessing stage is responsible of segmenting all the mocap motions and creating a motiongraph structure, and it is the only stage that is offline of the three. The online generation stage takes the speech input, extracts prosody features out of it, and uses the constructed motiongraph to select appropriate motion segments, while the postprocessing stage concatenates the selected segments together and creates the final animation.

PIPELINE

Pipeline is consisted of 3 stages: Preprocessing stage, online geenration stage and postprocessing stage. Out of the 3, only the preprocessing stage is offline.

Preprocessing

Online Generation

Postprocessing

Final Animation

RESULTS

Two user studies were conducted in order to evaluate the quality of the final animations, as well as how well can people recognize the displayed conversational attitude when there is no sound. Clips from the original mocap recordings were included in both user studies in order to compare the results with those of the synthesized ones.

MOTION QUALITY

Overall

4.4/7
Synthesized (Attitude-Based)

Neutral 4.5/7
Aggressive 4.1/7
Happy 5.1/7
Sad 4.2/7

3.5/7
Synthesized (Generic)

Neutral 3.7/7
Aggressive 3.2/7
Happy 3.8/7
Sad 3.6/7

4.9/7
Ground Truth

Neutral 4.3/7
Aggressive 5.3/7
Happy 5.0/7
Sad 5.1/7

Male Animations

4.9/7
Synthesized (Attitude-Based)

Neutral 5.5/7
Aggressive 4.7/7
Happy 5.1/7
Sad 4.2/7

3.8/7
Synthesized (Generic)

Neutral 4.0/7
Aggressive 3.6/7
Happy 3.8/7
Sad 3.6/7

5.4/7
Ground Truth

Neutral 5.5/7
Aggressive 6.1/7
Happy 5.0/7
Sad 5.1/7

Female Animations

3.6/7
Synthesized (Attitude-Based)

Neutral 3.6/7
Aggressive 3.5/7

3.0/7
Synthesized (Generic)

Neutral 3.3/7
Aggressive 2.7/7

3.8/7
Ground Truth

Neutral 3.1/7
Aggressive 4.5/7

MOOD RECOGNITION

Overall

52/156
Synthesized (Attitude-Based)

Neutral 13/52
Aggressive 17/52
Happy 14/26
Sad 8/26

22/156
Synthesized (Generic)

Neutral 8/52
Aggressive 6/52
Happy 7/26
Sad 1/26

57/156
Ground Truth

Neutral 17/52
Aggressive 17/52
Happy 5/26
Sad 18/26

Male Animations

49/104
Synthesized (Attitude-Based)

Neutral 11/26
Aggressive 16/26
Happy 14/26
Sad 8/26

20/104
Synthesized (Generic)

Neutral 6/26
Aggressive 6/26
Happy 7/26
Sad 1/26

54/104
Ground Truth

Neutral 17/26
Aggressive 14/26
Happy 5/26
Sad 18/26

Female Animations

3/52
Synthesized (Attitude-Based)

Neutral 2/26
Aggressive 1/26

2/52
Synthesized (Generic)

Neutral 2/26
Aggressive 0/26

3/52
Ground Truth

Neutral 0/26
Aggressive 3/26

Audio-driven Gesture Animation for Virtual Characters

Jack Hadjicosti

Supervisors: Dr. Z. Yumak, Dr. ir. A.F. van der Stappen

Master Thesis project, February 2018

ABSTRACT

PIPELINE

Preprocessing

Online Generation

Postprocessing

Final Animation

RESULTS

MOTION QUALITY

Overall

Male Animations

Female Animations

MOOD RECOGNITION

Overall

Male Animations

Female Animations