papers_we_read

Learning to Simulate Dynamic Environments with GameGAN

Seung Wook Kim, Yuhao Zhou, Jonah Philion, Antonio Torralba, Sanja Fidler, CVPR 2020

Summary

The paper introduces GameGAN, a generative model that learns to visually imitate a desired game by ingesting screenplay and keyboard actions during training. Given a key pressed by the agent, GameGAN “renders” the next screen using a carefully designed generative adversarial network. It offers key advantages over existing work: memory module that builds an internal map of the environment, allowing for the agent to return to previously visited locations with high visual consistency. In addition, GameGAN is able to disentangle static and dynamic components within an image making the behavior of the model more interpretable, and relevant for downstream tasks that require explicit reasoning over dynamic elements. This enables many interesting applications such as swapping different components of the game to build new games that do not exist.

GameGAN ingests screenplay and keyboard actions during training and aims to predict the next frame by conditioning on the action,i.e. a key pressed by the agent. It learns from rollouts of image and action pairs directly without having access to the underlying game logic or engine. GameGAN supports several applications such as transferring a given game from one operating system to the other, without requiring to re-write code.

GameGAN

The focus is on an action-conditioned simulator in the image space where there is an egocentric agent that moves according to the given action at ∼ A at time t and generates a new observation xt+1. We assume there is also a stochastic variable zt ∼ N (0; I) that corresponds to randomness in the environment. Given the history of images x1:t along with at and zt , GameGAN predicts the next image xt+1 . GameGAN is composed of three main modules:

Dynamics Engine

where;

Memory Module

where; K, G and E are small MLPs. w is a learned shift kernel that depends on the current action, and the kernel is used to shift αt−1.

Rendering Engine

Proposed Loss functions

Main Contributions

Our two cents

Resources