VASA-1 is capable of reproducing facial expressions, lip movements synchronized with speech, as well as natural head movements.

VASA-1 is capable of reproducing facial expressions, lip movements synchronized with speech, as well as natural head movements. The new neural network from Microsoft can capture a wide range of emotions and subtle nuances, making the generated faces more realistic.

Users can specify the character's gaze direction, perceived distance, and even the character's emotional state.

VASA-1 achieves high realism by separating facial features, three-dimensional head position, and facial expressions into individual components. The developers of VASA-1 emphasize the system's efficiency in real-time mode. The system is capable of generating videos with a resolution of 512×512 pixels and a frame rate of 45 frames per second.