DO 2D GANS KNOW 3D SHAPE? UNSUPERVISED 3D SHAPE RECONSTRUCTION FROM 2D IMAGE GANS

Gunahnkr
2 min readMay 3, 2021

0503

Figure 1: The first column shows images generated by off-the-shelf 2D GANs trained on RGB images only, while the rest show that our method can unsupervisedly reconstruct 3D shape (viewed in 3D mesh, surface normal, and texture) given a single 2D image by exploiting the geometric cues contained in GANs. The last two columns depicts 3D-aware image manipulation effects (rotation and relighting) enabled by our framework. More results are provided in the Appendix.
Figure 2: Framework outline. Starting with an initial ellipsoid 3D shape (viewed in the surface normal), our approach renders various ‘pseudo samples’ with different viewpoints and lighting conditions. GAN inversion is applied to these samples to obtain the ‘projected samples’, which are used as the ground truth of the rendering process to refine the initial 3D shape. This process is repeated until more precise results are obtained.
Figure 3: Method overview. (a) Given a single image, Step 1 initializes the depth with ellipsoid (viewed in surface normal), and optimizes the albedo network A. (b) Step 2 uses the depth and albedo to render ‘pseudo samples’ with various random viewpoint and lighting conditions, and conducts GAN inversion to them to obtain the ‘projected samples’. © Step 3 refines the depth map by optimizing (V, L,D, A) networks to reconstruct the projected samples. The refined depth and models are used as the new initialization to repeat the above steps.

StyleGAN2-ADA consists

  1. Mapping network: maps the latent vector z in the input space Z to an intermediate latent vector.
  2. Synthesis network(G): map W to the output image

follow the photo-geometric autoencoding design in Wu et al. (2020). For an image I ∈ R 3×H×W, we adopt a function that is designed to predict four factors (d, a, v, l)

D: Depth map

A: Albedo image

V: Viewpoint

L: Light direction

Methodology

  1. Using a Weak Shape Prior — make the observation that many objects including faces and cars have a somewhat convex shape prior
  2. Sampling and Projecting to the GAN Image Manifold — create “pseudo samples” by sampling a number of random viewpoints and lighting directions. GAN inversion: reconstruct them with the GAN generator (GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.)
  3. Learning the 3D Shape —
Figure 4: Qualitative comparisons. (a) shows the reconstructed 3D mesh, surface normal, and textured mesh of our method. (b) shows the results of Unsup3d (Wu et al., 2020). We see that results in (a) are more accurate and realistic.
Figure 5: Qualitative comparison on buildings. The first row shows the input image and our relighting effects described in Sec.4.2. The second row shows the recovered shape (viewed in surface
normal and mesh) of our method, while the last row shows the results of Unsup3d.
Figure 6: Results without symmetry assumption.
Figure 7: 3D-aware image manipulation, including rotation and relighting. We show results obtained via both 3D mesh and GANs. The input of the first row is a real natural image. Our method
achieves photo-realistic manipulation effects obeying the objects’ underlying 3D structures.
Figure 8: Qualitative comparison on face rotation. ”Ours (GAN)” and ”Ours (3D)” indicate results generated by GAN and rendered from 3D mesh respectively. The face identities in the
baseline methods tend to drift during rotation.
Figure 9: Qualitative results on the LSUN Horse dataset (Yu et al., 2015).

--

--

Gunahnkr

A passionate individual who strives to reveal the mind functioning through computational neuroscience and humanities study