1
00:00:00,000 --> 00:00:10,000
We present a novel real-time facial reenactment method that works with any commodity webcam.

2
00:00:10,000 --> 00:00:17,000
Since our method only uses RGB data for both the source and target actor, we are able to manipulate YouTube videos in real-time.

3
00:00:17,000 --> 00:00:20,000
Here, we demonstrate our method in a live setup.

4
00:00:20,000 --> 00:00:23,000
On the right, a source actor is captured with a standard webcam.

5
00:00:23,000 --> 00:00:28,000
This input drives the animation of the face in the video shown on the monitor to the left.

6
00:00:28,000 --> 00:00:33,000
A significant difference to previous methods is the re-rendering of the mouth interior.

7
00:00:33,000 --> 00:00:41,000
To this end, we re-synthesize the mouth interior of the target actor using video footage from the training sequence based on temporal and photometric similarity.

8
00:00:41,000 --> 00:00:47,000
As we can see, we are able to generate a realistic and convincing reenactment result.

9
00:00:47,000 --> 00:00:52,000
Here, we show a close-up of the footage from the previous live reenactment.

10
00:00:52,000 --> 00:00:55,000
The input video stream is shown on the left.

11
00:00:55,000 --> 00:00:59,000
Note that the target actor is re-rendered in a neutral pose.

12
00:00:59,000 --> 00:01:02,000
On the right, we can see the final output of our method.

13
00:01:02,000 --> 00:01:18,000
Our system reconstructs and tracks both source and target actors using a dense photometric energy minimization.

14
00:01:18,000 --> 00:01:25,000
Using a novel subspace deformation transfer technique, we transfer the expressions from the source to the target actor.

15
00:01:25,000 --> 00:01:32,000
This allows us to obtain a modified face template of the target actor according to the expressions of the source actor.

16
00:01:32,000 --> 00:01:39,000
We now re-render the modified face on top of the target sequence in order to replace the original facial expressions.

17
00:01:48,000 --> 00:01:52,000
Here, we show additional live sequences where we reenact various YouTube videos.

18
00:02:18,000 --> 00:02:20,000
here we snapshot the Crispyoor pages from the fresh, Asian people will enjoy spaian.

19
00:02:20,000 --> 00:02:20,940
There will be whats Vedant points in order and splge at sist!,

20
00:02:20,940 --> 00:02:22,000
there may be some questions viele Communist colors speaking,

21
00:02:22,000 --> 00:02:23,000
there may sound pretty cool.

22
00:02:23,000 --> 00:02:24,000
there may sound a lot, there are a lot of ways.

23
00:02:24,000 --> 00:02:25,000
Here, lurings some signals that we measure for nations,

24
00:02:25,000 --> 00:02:26,000
there's also a false difference between Bob the McDermott,

25
00:02:26,000 --> 00:02:27,000
there may sound or toss your magas such a bad note.

26
00:02:27,000 --> 00:02:30,000
Aaspira or�ets還 distribution?

27
00:02:30,000 --> 00:02:31,000
Use your Qualcomm beyond a standard takes place.

28
00:02:31,000 --> 00:02:32,000
Here is our understanding.

29
00:02:32,000 --> 00:02:34,000
Here is your personal best approach from the camera,

30
00:02:34,000 --> 00:02:36,000
which is the leading GPU cards that end up here.

31
00:02:36,000 --> 00:02:38,000
We moved to Brussels to revise the收看 Antes Society of being the Challrizola Team,

32
00:02:38,000 --> 00:02:40,000
which can modern complicated��게요 and then base it with a national provision right.

33
00:02:40,000 --> 00:02:41,000
our work that we can fully believe her.

34
00:02:41,000 --> 00:03:10,980
Thank you.

35
00:03:11,000 --> 00:03:40,980
Thank you.

36
00:03:41,000 --> 00:04:10,980
Thank you.

37
00:04:11,000 --> 00:04:22,400
In order to evaluate our approach, we perform a cross-validation based on optical flow.

38
00:04:23,260 --> 00:04:27,520
To this end, we retrieve mouth interiors from the first half of the video. The second half

39
00:04:27,520 --> 00:04:28,980
is used for evaluation queries.

40
00:04:30,560 --> 00:04:33,240
As we can see, our re-rendering error is very low.

41
00:04:41,360 --> 00:04:46,060
Our method introduces a new RGB face-tracking pipeline, which we compare against state-of-the-art

42
00:04:46,060 --> 00:04:47,520
real-time face-tracking methods.

43
00:04:49,700 --> 00:04:52,740
Here, we show a comparison against Kao et al. and Tees et al.

44
00:04:53,580 --> 00:04:58,640
Note that Tees et al. is based on RGBD data, whereas Kao et al. and our method require only

45
00:04:58,640 --> 00:04:59,500
RGB input.

46
00:05:11,000 --> 00:05:24,400
Here, we show another tracking comparison to FaceShift 2014, which relies on RGBD data.

47
00:05:24,400 --> 00:05:30,040
Although our method is RGB only, we achieve similar tracking quality.

48
00:05:30,040 --> 00:05:49,100
In contrast to real-time tracking methods, we also compare against the offline tracking

49
00:05:49,100 --> 00:05:50,200
algorithm of She et al.

50
00:05:50,200 --> 00:05:55,000
Note that She et al. perform additional geometric refinement using shading cues.

51
00:06:02,560 --> 00:06:05,940
We now compare our approach against previous re-enactment approaches.

52
00:06:06,800 --> 00:06:09,760
Here, we show the scenario of a translator animating another person.

53
00:06:11,300 --> 00:06:14,760
Note that our approach runs in real-time, while Gerrito et al. works offline.

54
00:06:14,760 --> 00:06:20,820
Here, we show a comparison to Tees et al. who rely on RGBD data.

55
00:06:22,220 --> 00:06:24,520
Both methods produce similar re-enactment results.

56
00:06:24,900 --> 00:06:29,380
However, note that Tees et al. uses a geometric teeth proxy, which leads to artificially shaped

57
00:06:29,380 --> 00:06:30,100
mouth regions.

58
00:06:30,900 --> 00:06:31,700
Thank you for watching.