WebProNews

Apple Publishes First AI Research Paper on Using Adversarial Training to Improve Realism of Synthetic Imagery

Earlier this month Apple pledged to start publicly releasing its research on artificial intelligence. During the holiday week, Apple has released its first AI research paper detailing how its engineers and computer scientists used adversarial training to improve the typically poor quality of synthetic, computer game style images, which are frequently used to help machines learn.

The paper’s authors are Ashish Shrivastava, a researcher in deep learning, Tomas Pfister, another deep learning scientist at Apple, Wenda Wang, Apple R&D engineer, Russ Webb, a Senior Research Engineer, Oncel Tuzel, Machine Learning Researcher and Joshua Susskind, who co-founded Emotient in 2012 and is a deep learning scientist.

screen-shot-2016-12-27-at-10-03-16-am

The team describes their work on improving synthetic images to improve overall machine learning:

With recent progress in graphics, it has become more tractable to train models on synthetic images, potentially avoiding the need for expensive annotations. However, learning from synthetic images may not achieve the desired performance due to a gap between synthetic and real image distributions. To reduce this gap, we propose Simulated+Unsupervised (S+U) learning, where the task is to learn a model to improve the realism of a simulator’s output using unlabeled real data, while preserving the annotation information from the simulator.

We developed a method for S+U learning that uses an adversarial network similar to Generative Adversarial Networks (GANs), but with synthetic images as inputs instead of random vectors. We make several key modifications to the standard GAN algorithm to preserve annotations, avoid artifacts and stabilize training: (i) a ‘self-regularization’ term, (ii) a local adversarial loss, and (iii) updating the discriminator using a history of refined images. We show that this enables generation of highly realistic images, which we demonstrate both qualitatively and with a user study.

We quantitatively evaluate the generated images by training models for gaze estimation and hand pose estimation. We show a significant improvement over using synthetic images, and achieve state-of-the-art results on the MPIIGaze dataset without any labeled real data.

Conclusions and Future Work

“We have proposed Simulated+Unsupervised learning to refine a simulator’s output with unlabeled real data,” says the Apple AI Scientists. “S+U learning adds realism to the simulator and preserves the global structure and the annotations of the synthetic images. We described SimGAN, our method for S+U learning, that uses an adversarial network and demonstrated state-of-the-art results without any labeled real data.”

They added, “In future, we intend to explore modeling the noise distribution to generate more than one refined image for each synthetic image, and investigate refining videos rather than single images.”

View the research paper (PDF).