Motion capture companies Animatrik Film Design and DimensionalAnimatrik-di4d-hmc 4
Imaging talk about working together to bring specialist facial
performance capture services to VFX and game productions.

Performance Capture Pioneers Collaborate on New Mocap Experience

Motion capture companiesAnimatrik Film DesignandDimensional Imaging, known as DI4D, are collaborating to bring specialist facial performance capture services to visual effects and video game productions in the US and Canada.
Animatrik-di4d-hmc 8

Animatrik's mocap production services have been used on such productions as ‘Elysium’,
‘Warm Bodies’, Image Engine’s character work for the film ‘Chappie’, plus Microsoft’s ‘Gears of War’ series and Duncan Jones’ upcoming ‘Warcraft’ movie. Characters based on DI4D’s facial capture have appeared in television shows such as ‘Merlin’ on the BBC, and video games including ‘Left 4 Dead 2’ and ‘Quantum Break’.

Believable Faces

Animatrik has recently licensed DI4D’s facial performance capture software and equipment. Their team works either inside volumes at their studios in Vancouver and Toronto, or at locations to suit clients’ productions, and will now be able to carry out facial services at all locations as well. DI4D facial capture has been seen before in Microsoft’s ‘Halo 4’ and other projects.

The CEO of AnimatrikBrett Inesonsaid that believable facial performance capture has always been a tough challenge to solve, and that demand from clients for hyper-realistic facial animation is what triggered their search for a new type of system.

Animatrik-di4d-hmc 4

Brett described difficulties Animatrik has encountered in the past when capturing and using facial motion data. “Face pipelines are challenged by the solving and retargeting of the data you collect.  Anytime you re-map motion data from an actor to a character, there is a compromise of nuance in the result.  Something is lost.  Sometimes you can't put your finger on it, but something is not right.”  

Colin Urquhart, CEO of DI4D, described the distinctive characteristics of facial capture and, specifically, his company’s approach to it. “In body motion capture, it is possible to capture a relatively small set of markers and use their positions to estimate the underlying skeletal pose with a relatively high degree of accuracy. However, a much higher density, and hence smaller size, of markers is required to capture the much subtler animation of the face. In turn, this demands a much higher density motion capture camera.”


Various helmet mounted camera systems have been developed to address this limitation. But because it is impractical to apply many more than a 100 markers on a face, severely limiting the fidelity of capture, other techniques have been developed that attempt to enhance the fidelity of marker based capture by referencing a set of high resolution 3D scans of the actor’s face, acquired separately. Nevertheless, as the level of realism required continues to be pushed higher, an ever larger number of high resolution 3D scans is required.

Animatrik-di4d-hmc 5

 “DI4D’s approach to facial capture is radically different, and starts with the capture of a 3D face scan for every frame of the actual performance,” said Colin. “First, one or more stereo pairs of high resolution video cameras are used to capture the facial performance of the actors. Only standard video lighting is required - markers, make-up or structured illumination are not required.”

That is why DI4D’s approach to capture, based on 3D scanning instead of tracking markers, attracted Animatrik. Brett said, “We love being able to capture the surface of the face. As a service-based company our job is to meet clients’ expectations. So, if it is a digital double we are after, using DI4D’s system means that nothing is lost, and if we do have to re-map the data, then at least we have the strongest possible data set to work with, giving us a much better chance of making the loss imperceptible.” Also, because of its untethered design it has greater potential for acting, and combines high fidelity facial capture with freedom of movement.

Capture and Scanning Workflow

During production, theDI4D PROSystem captures sequences of high resolution, colour 3D facial performance data. The systems are supplied with one or more sets or ‘pods’ of three synchronised cameras - two monochrome cameras and one colour, capturing at up to 60 fps. The cameras stream directly to disk so that sequence length is not limited.

Animatrik-di4d-hmc 3

DI4D also has aHead Mounted Camerasystem, or HMC, that outputs a similar fidelity of data as the PRO System, but is mobile and moves with the actor. The gear consists of a light head rig with two small, synchronised high resolution video cameras, four small LED lights, a compact video recording system and battery pack, and a remote PC can be used to start and stop recording and for live previewing of the video streams. By using per frame JPEG compression, the HMC can stream direct to disk and capture at the same 60 fps frame without limiting the sequence length. This speed can be pushed to 1,000 fps with a custom camera set-up.

The sequence data from either capture system is post processed with DI4D dense passive stereo photogrammetry software and then, if needed, a fixed topology mesh can be tracked through the sequence using DI4D optical flow tracking software. The process results in very dense, realistic facial animation that preserves the subtlety and distinctive detail of facial performances.

Animatrik-di4d-hmc 5 7

Colin said, “A 3D scan is then calculated during post processing for every frame of the stereo video. A mesh or discrete set of landmarks comprising thousands of vertices, with a 3D point calculated for every foreground pixel, can be tracked through the 3D scan sequence. Instead of a best guess interpolation of a limited number of shapes, DI4D results in a tracked mesh with consistent topology that matches the actual facial performance for every frame and can then be processed to automatically recover a sequence of 3D models.”

Optical Flow Maps

At the same time as calculating per frame 3D models, the DI4D Processing Software calculates a denseoptical flow mapfor each image, leading to the previous and next image in the sequence. It is this dense optical flow data that is used in the DI4D Tracking Software to track the vertices of the mesh topology through a sequence.

The mesh can be applied to one reference frame in the sequence, and then tracked through the sequence using the flow maps. The software has tools for correcting drift during tracking. You can output the resulting tracked set of discrete landmarks in standard motion capture C3D format, and output the tracked mesh as a vertex cache compatible with 3D software like 3ds Max, Maya or Lightwave.

Animatrik-di4d-hmc 9


Several aspects of DI4D’s approach help maintainconsistency, which is very important for sustained realism. For example, the initial fixed topology mesh can also be tracked from a reference frame in one sequence, to a reference frame in another sequence, so that landmark or vertex placement across multiple sequences is consistent.

Also, if a mesh with a specified UV layout is tracked through a sequence, then a per-frame texture map can be generated using that consistent UV layout. A per frame detail map can also be generated using the same, consistent UV layout, and where colour cameras are used, a per frame colour texture map can be applied to the 3D model sequence as well.

Data and Compute Power

Processing can be carried out entirely on the CPU or can be accelerated by processing on aGPUusing theNVIDIA CUDAarchitecture. Processing can also be accelerated by distributing it over a network of processing PCs. In fact, Colin considers that the most significant industry development  helping to improve facial mocap in recent years is the increase in available compute power, particularly GPU acceleration. “This change has allowed us to apply advanced computer vision algorithms to sequences of thousands of frames. Previously, it would have taken several hours to process a single frame,” he said.

Animatrik-di4d-hmc 2

“The combination of the new small, high resolution – that is, 3MP –  cameras shooting at high frame rates, 60 fps for instance, with fast high capacity SSD drives capable of streaming the acquired data, has also been very important. It has made small, mobile capture systems such as our DI4D HMC a practical reality.”

Regarding other practicalities on set, he noted that, unfortunately, it is not really possible to increase fidelity without some increase in data volumes. However, the data captured during shooting with DI4D take the form of relatively lightweight video sequences – albeit lightly compressed to maintain quality – and does not present a particular problem to users.

Colin said, “The output data delivered after DI4D processing is also fairly lightweight, comprising a tracked mesh with consistent topology stored as a vertex cache, you’re your optional per frame texture and detail maps. Although the intermediate ‘raw data’ of per frame dense 3D scans and optical flow maps can become several GB per second, it only has to be stored temporarily during post processing, so again does not pose a particular problem for users on the set.”