본문 바로가기

Paper review/Vision

[논문 리뷰] Everybody Dance Now

warmwall 2020. 6. 14. 11:55

Submit: Caroline Chan, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros. ICCV (2019)
Paper: https://arxiv.org/abs/1808.07371

0. Summary

"do as I do" motion transfer하는 간단한 method 제안
각 frame별 img2img translation
- source에서 pose ditection을 해 target에 mapping
- 얼굴은 GAN 사용하여 더 자연스럽게

1. Learning

pix2pix 아키텍처를 가져와 문제에 맞게 customize
Train
- conditional GAN 기반
- 이미지 -> pose estimation -> true image(dist) / pose estimation(fake)
- discriminator feature-matching loss에다가 perceptual VGG loss를 넣으면 성능이 조금 향상된다.
- temporal smoothing에다가 얼굴에 대한 추가 GAN 적용

(a) Pose estimation

frame 이미지가 있으면 pose stick figure를 만든다. (pretrained model 사용)
- openpose 사용시 총 135개 keypoint 사용 가능.

(b) Temporal smoothing

직전에 생성된 이미지에다가 pose estimation 결과를 합쳐서 새 이미지 생성 (두 개 frame을 봄)
감별자: 실제 이미지 + pose estimation --> real / 생성 이미지 + pose estimation --> fake

(c) Face GAN

얼굴 부분의 stick figure에다가 얼굴 이미지를 이용해 pix2pix 적용
Ground Truth에 다 가까워지는 효과. 디테일이 살아있음

full image GAN (GD) train
L_sooth + L_FM + L_VGG(이전/현재 frame의 loss)

G, D를 freeze하고 FaceGAN 부분만 학습
L_face(GAN loss) + L_VGG (상대적으로 간단)

train시 LSGAN 사용

(d) Transfer Network

Target frame을 pose estimation하여 pose stick feature를 만든다.
normalize하여 키/크기 조정: Pose Normalization

휴리스틱한 부분
키와 ankle position을 구해 normalize 수행
source와 target의 분포가 다르기 때문에 결과가 조금 부족할 수 있다.

Target person 이미지 생성

3. Results

pix2pix / T. S. / T. S. + FaceGAN 세 가지 case를 비교
source/target이 같은 validation data를 기준으로 비교하였음.
- mean pose distance (낮을수록 좋음): ground truth와 output의 stick figure의 MSE 비교
  - Temporal smoothing만 적용하면 크게 향상되진 않지만, FaceGAN 적용시 성능이 훨씬 좋아짐을 알 수 있음
- missed detection (적을수록 좋음)

'Paper review > Vision' 카테고리의 다른 글

[논문 리뷰] Learning Spatiotemporal Features with 3D Convolutional Networks (0)	2020.06.14
[논문 리뷰] ObamaNet: Photo realistic lip sync from text (0)	2020.06.14
[논문 리뷰] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset (0)	2020.06.14
[논문 리뷰] Adversarial Examples Are Not Bugs, They Are Features (0)	2020.06.14
[논문 리뷰] A Closer Look at Few shot Classification (0)	2020.06.14

used to deeplearn deedy 님의 블로그입니다.

티스토리툴바