Video-based Characters - Creating New Human Performances from a Multi-view Video Database

Feng Xu^†, Yebin Liu^∗, Carsten Stoll^∗, James Tompkin^‡, Gaurav Bharaj^∗, Qionghai Dai^†, Hans-Peter Seidel^∗, Jan Kautz^‡, Christian Theobalt^∗
^†Tsinghua University, China ^‡University College London, UK ^∗MPI Informatik, Germany

Siggraph, 2011

An animation of an actor created with our method from a multi-view video database. The motion was designed by an animator and the camera was tracked from the background with a commercial camera tracker. In the composited scene of animation and background, the synthesized character and her spatio-temporal appearance look close to lifelike.

Abstract

We present a method to synthesize plausible video sequences of humans according to user-defined body motions and viewpoints. We first capture a small database of multi-view video sequences of an actor performing various basic motions. This database needs to be captured only once and serves as the input to our synthesis algorithm. We then apply a marker-less model-based performance capture approach to the entire database to obtain pose and geometry of the actor in each database frame. To create novel video sequences of the actor from the database, a user animates a 3D human skeleton with novel motion and viewpoints. Our technique then synthesizes a realistic video sequence of the actor performing the specified motion based only on the initial database. The first key component of our approach is a new efficient retrieval strategy to find appropriate spatio-temporally coherent database frames from which to synthesize target video frames. The second key component is a warping-based texture synthesis approach that uses the retrieved most-similar database frames to synthesize spatio-temporally coherent target video frames. For instance, this enables us to easily create video sequences of actors performing dangerous stunts without them being placed in harm’s way. We show through a variety of result videos and a user study that we can synthesize realistic videos of people, even if the target motions and camera views are different from the database content.