Tamar Rott Shaham, Tali Dekel, Tomer Michaeli, ICCV-2019
This paper proposes a novel GAN training technique to obtain a generative model that can be learned using a single image. Unlike some of the previous works that used single image for training for a single task, the model can be used for unconditional generative modelling, not limited to texture images. This paper also got the best paper award in ICCV 2019
The main contribution of this paper is its training method for generative modelling by using a single training image. The said is done by using pyramid of fully convolutional GANs capturing the patch distribution at different scales.
While the idea of using single training image has been there for quite some time(mostly being for a specific task like harmonization, style transfer, etc) the proposed procedure takes it one step forward by making the generative model unconditional(generating real-like images sampled from training distribution by passing random noise) and not limited to texture images
As can be seen from the diagram, the model follows a pyramid structure with each level being given fed a different scale of the training image, with xn being the downsampled version of the image with factor rn (r>1). At the lowest(coarse) level, random noise is passed through the generator and the output of this(after upsampling by a factor of r) is passed onto the next level(N=n-1).
The generation from the previous level is added to random noise and passed through the generator at this level while the generation from the previous level is also residually addded to the generation of this level. The same process keeps on continuing till the level with the original image size is reached.
The different levels are responsible for generation of different kind of details in the image, with the lowest level able to generate coarser details and the higher levels able to manipulate more finer details.
For testing, the results have been shown for various cases with generation being purely unconditional, and also the same network being used for conditional generation task with input of the image to be manipulated being passed onto the later levels rather than the lowest one.
The proposed technique to train unconditional generative model using a single image is undoubtedly quite astounding and there being different level of scales in the architecture just makes it easier to get representations of various patches of the image.
The fact that a single network can be used for various tasks like harmonization, inpainting, etc with no explicit training is also great.
However, one of the major drawback of this technique(quite visibly) is also its usage of a single training image, which leads to a bottleneck in the variability of the generations produced by the network. The purely unconditional generations being successful for varied representations of the same kind(as in the same class of data) of image, would fail to generate various different classes of data.