1 00:00:00,000 --> 00:00:10,000 We present a novel real-time facial reenactment method that works with any commodity webcam. 2 00:00:10,000 --> 00:00:17,000 Since our method only uses RGB data for both the source and target actor, we are able to manipulate YouTube videos in real-time. 3 00:00:17,000 --> 00:00:20,000 Here, we demonstrate our method in a live setup. 4 00:00:20,000 --> 00:00:23,000 On the right, a source actor is captured with a standard webcam. 5 00:00:23,000 --> 00:00:28,000 This input drives the animation of the face in the video shown on the monitor to the left. 6 00:00:28,000 --> 00:00:33,000 A significant difference to previous methods is the re-rendering of the mouth interior. 7 00:00:33,000 --> 00:00:41,000 To this end, we re-synthesize the mouth interior of the target actor using video footage from the training sequence based on temporal and photometric similarity. 8 00:00:41,000 --> 00:00:47,000 As we can see, we are able to generate a realistic and convincing reenactment result. 9 00:00:47,000 --> 00:00:52,000 Here, we show a close-up of the footage from the previous live reenactment. 10 00:00:52,000 --> 00:00:55,000 The input video stream is shown on the left. 11 00:00:55,000 --> 00:00:59,000 Note that the target actor is re-rendered in a neutral pose. 12 00:00:59,000 --> 00:01:02,000 On the right, we can see the final output of our method. 13 00:01:02,000 --> 00:01:18,000 Our system reconstructs and tracks both source and target actors using a dense photometric energy minimization. 14 00:01:18,000 --> 00:01:25,000 Using a novel subspace deformation transfer technique, we transfer the expressions from the source to the target actor. 15 00:01:25,000 --> 00:01:32,000 This allows us to obtain a modified face template of the target actor according to the expressions of the source actor. 16 00:01:32,000 --> 00:01:39,000 We now re-render the modified face on top of the target sequence in order to replace the original facial expressions. 17 00:01:48,000 --> 00:01:52,000 Here, we show additional live sequences where we reenact various YouTube videos. 18 00:02:18,000 --> 00:02:20,000 here we snapshot the Crispyoor pages from the fresh, Asian people will enjoy spaian. 19 00:02:20,000 --> 00:02:20,940 There will be whats Vedant points in order and splge at sist!, 20 00:02:20,940 --> 00:02:22,000 there may be some questions viele Communist colors speaking, 21 00:02:22,000 --> 00:02:23,000 there may sound pretty cool. 22 00:02:23,000 --> 00:02:24,000 there may sound a lot, there are a lot of ways. 23 00:02:24,000 --> 00:02:25,000 Here, lurings some signals that we measure for nations, 24 00:02:25,000 --> 00:02:26,000 there's also a false difference between Bob the McDermott, 25 00:02:26,000 --> 00:02:27,000 there may sound or toss your magas such a bad note. 26 00:02:27,000 --> 00:02:30,000 Aaspira or�ets還 distribution? 27 00:02:30,000 --> 00:02:31,000 Use your Qualcomm beyond a standard takes place. 28 00:02:31,000 --> 00:02:32,000 Here is our understanding. 29 00:02:32,000 --> 00:02:34,000 Here is your personal best approach from the camera, 30 00:02:34,000 --> 00:02:36,000 which is the leading GPU cards that end up here. 31 00:02:36,000 --> 00:02:38,000 We moved to Brussels to revise the收看 Antes Society of being the Challrizola Team, 32 00:02:38,000 --> 00:02:40,000 which can modern complicated��게요 and then base it with a national provision right. 33 00:02:40,000 --> 00:02:41,000 our work that we can fully believe her. 34 00:02:41,000 --> 00:03:10,980 Thank you. 35 00:03:11,000 --> 00:03:40,980 Thank you. 36 00:03:41,000 --> 00:04:10,980 Thank you. 37 00:04:11,000 --> 00:04:22,400 In order to evaluate our approach, we perform a cross-validation based on optical flow. 38 00:04:23,260 --> 00:04:27,520 To this end, we retrieve mouth interiors from the first half of the video. The second half 39 00:04:27,520 --> 00:04:28,980 is used for evaluation queries. 40 00:04:30,560 --> 00:04:33,240 As we can see, our re-rendering error is very low. 41 00:04:41,360 --> 00:04:46,060 Our method introduces a new RGB face-tracking pipeline, which we compare against state-of-the-art 42 00:04:46,060 --> 00:04:47,520 real-time face-tracking methods. 43 00:04:49,700 --> 00:04:52,740 Here, we show a comparison against Kao et al. and Tees et al. 44 00:04:53,580 --> 00:04:58,640 Note that Tees et al. is based on RGBD data, whereas Kao et al. and our method require only 45 00:04:58,640 --> 00:04:59,500 RGB input. 46 00:05:11,000 --> 00:05:24,400 Here, we show another tracking comparison to FaceShift 2014, which relies on RGBD data. 47 00:05:24,400 --> 00:05:30,040 Although our method is RGB only, we achieve similar tracking quality. 48 00:05:30,040 --> 00:05:49,100 In contrast to real-time tracking methods, we also compare against the offline tracking 49 00:05:49,100 --> 00:05:50,200 algorithm of She et al. 50 00:05:50,200 --> 00:05:55,000 Note that She et al. perform additional geometric refinement using shading cues. 51 00:06:02,560 --> 00:06:05,940 We now compare our approach against previous re-enactment approaches. 52 00:06:06,800 --> 00:06:09,760 Here, we show the scenario of a translator animating another person. 53 00:06:11,300 --> 00:06:14,760 Note that our approach runs in real-time, while Gerrito et al. works offline. 54 00:06:14,760 --> 00:06:20,820 Here, we show a comparison to Tees et al. who rely on RGBD data. 55 00:06:22,220 --> 00:06:24,520 Both methods produce similar re-enactment results. 56 00:06:24,900 --> 00:06:29,380 However, note that Tees et al. uses a geometric teeth proxy, which leads to artificially shaped 57 00:06:29,380 --> 00:06:30,100 mouth regions. 58 00:06:30,900 --> 00:06:31,700 Thank you for watching.