New AI system takes its inspiration from humans: when a human sees a color from one object, we can easily apply it to any other object by substituting the original color with the new one.

((Chris Kim))

Very few species on Earth, including humans, are gifted with the ability to imagine. We can form mental pictures and visualise things, which may not be present in our field of vision at the moment. We can decompose an imagined object into its attributes or components, like imagining a red apple with a green twig. We also can remove the attribute of red and green colourations from the objects and swap them to picture a green apple with a red twig.

Using our ability to imagine, we can tell fantasy-based stories, take inspiration from existing events, modify them to make new stories, and even paint imaginative pictures of how the future might look. Researchers from the University of South Carolina, United States, have now designed an artificial intelligence (AI) based algorithm, which can visualise the unseen.

The scientists claim that the images of red boats and blue cars can be decomposed and recombined to synthesise novel images of red cars using the AI developed by them. Their research has recently been published as a poster at the International Conference on Learning Representations 2021 and has been accepted for publication.

When our brain imagines something unreal, several neural networks are activated to enable it. While machines can perform several tasks much better than humans today, they still lack this basic characteristic of humans. So ideally, researchers are training the AI to replicate the human ability of imagination—to distil out attributes of colour, shape, texture, pose, position etc., from an object and use it to create new objects with novel characteristics.

“We were inspired by human visual generalisation capabilities to try to simulate human imagination in machines,” says the study’s lead author Yunhao Ge, a computer science PhD student. AI-based algorithms run through extrapolation. Given a large enough number of samples to process, an AI can generate novel samples from them while preserving required traits and synthesising new ones based on extrapolation. In this case, the AI is essentially trained to produce what we commonly know as “deepfakes” using a concept of “disentanglement”.

In recent years, deepfakes are among the most prominent features behind the increasing trend of spreading hoaxes and bogus content through the internet. It is a portmanteau of the words deep learning—training an AI algorithm—and fakes. You might have come across deepfakes in situations where you see a viral video of some celebrity speaking things that seems very unlike them. Upon further check, you might find fact-checks of the same viral video establishing that someone superimposed the spoken content onto the face of the said celebrity.

This process of substituting a person’s identity (face) while preserving the original movement of the mouth is essentially disentanglement. Attributes of an image or video are being broken down into its simplest components like shape, colour, movement and are being used to synthesise novel content using computers.

In this study, the AI was provided with a group of sample images. It sequestered one image and mined for similarities and differences among the other images to produce an inventory of the available sample features. This is what the researchers call “controllable disentangled representation learning.” Then, it recombines this available information to achieve “controllable novel image synthesis,” or in human-like terms – imagination.

“For instance, take the Transformer movie as an example,” said Ge, “It can take the shape of Megatron car, the colour and pose of a yellow Bumblebee car, and the background of New York’s Times Square. The result will be a Bumblebee-colored Megatron car driving in Times Square, even if this sample was not witnessed during the training session.”

This study is proof that no scientific advancement can be categorised as an absolute boon or bane. When placed in the wrong hands, the same technology can be used to produce deepfake images and videos and can spread misinformation and fake news. On the other hand, the researchers of this study have provided a potential use of the same for the greater good.

The application framework used here is compatible with nearly any type of data and opens up a range of possibilities. For example, this AI could be immensely helpful in the field of medicine. Doctors can potentially disentangle a drug’s medicinal function from all other factors that are unique to the patient and design targeted drugs by recombining them with suitable factors of other patients.

“Deep learning has already demonstrated unsurpassed performance and promise in many domains, but all too often this has happened through shallow mimicry, and without a deeper understanding of the separate attributes that make each object unique,” said Laurent Itti, a professor of computer science at the University of South California and the principal investigator of this study. “This new disentanglement approach, for the first time, truly unleashes a new sense of imagination in A.I. systems, bringing them closer to humans’ understanding of the world,” he adds.

The study titled Zero-shot Synthesis with Group-Supervised Learning was published in the 2021 International Conference on Learning Representations and can be accessed here.