AI image generators have access to hundreds of millions of picture examples. Artificial intelligence image generators are capable of generating everything from realistic portraits to dreamscapes from a text description alone.
But despite their huge creative potential, new evidence shows that they have a surprising aesthetic comfort zone.
How 100 Cycles of AI Feedback Lead to the Same Images?
Recently, scientists learned that when image generators play a visual game of telephone, with one image generator creating an image and then another image generator creating an image based on a description of the first image generated, they eventually wind up stuck in a cycle of reproducing a few generic image styles.
Think of hotel art: lighthouses in front of a seashore, dark city skylines…
AI image generators have access to hundreds of millions of picture examples. Artificial intelligence image generators are capable of generating everything from realistic portraits to dreamscapes from a text description alone.
But despite their huge creative potential, new evidence shows that they have a surprising aesthetic comfort zone.
How 100 Cycles of AI Feedback Lead to the Same Images?
Recently, scientists learned that when image generators play a visual game of telephone, with one image generator creating an image and then another image generator creating an image based on a description of the first image generated, they eventually wind up stuck in a cycle of reproducing a few generic image styles.
Think of hotel art: lighthouses in front of a seashore, dark city skylines, and cozy cottage facades.
The research was conducted on two common AI models to see how well they work. The models were used to relay pictures from each other. Stable Diffusion XL was used to produce an image based on a text description.
The description was used as input for LLaVA. The description was of the image produced from the previous text description. The procedure was repeated for 100 cycles. The research was documented in the journal Patterns.
Credits: Theoutpost.ai
The starting point of each visual task could be as specific and eccentric as desired. The example: “As I sat particularly alone, surrounded by nature, I found an old book with exactly eight pages that told a story in a forgotten language waiting to be read and understood.” However eccentric the point of departure, the destination of each visual task has led back to similar ground.
In 1,000 different versions of telephone games, it was revealed that most image sequences finally converged on just 12 different visual styles. This was always accompanied by a transformation, which was either sudden or gradual.
However, it was almost always the case. By round 100, the AI was likely to have moved on from whatever visual style it was promoting towards what they term “visual elevator music.”
How Training Data Anchors Artificial Creativity?
The most frequent recurring themes were lighthouses at sea, formal interior settings, night scenes of cities, and aged buildings in the countryside. None of these are exciting or particularly creative, more like background hum – safe, predictable, and completely forgettable.
To check if such a problem was exclusive to Stable Diffusion and LLaVA, various permutations of models were tried. These models included similar tendencies to gravitate towards a restricted vocabulary of image motifs. Even after the game had been expanded beyond 1,000 turns, a similar trend continued to repeat around 100 turns, with trivial differences branching out in the remaining turns.
What makes this scenario especially intriguing is its contrast with human behavior. When people play a game of telephone, the information being conveyed gets severely twisted simply because each person listens differently, has a different recall, and brings a different set of biases with him or her.
But AI models have the opposite problem. They’re too predictable, too similar with respect to the patterns they have detected in their training data. Regardless of how exotic the initial conversation topic is, the AI models will inevitably return to the patterns that have been seen most frequently during training.
Why AI Favors the Familiar?
It’s as if you had a conversation partner who could always find his way back to the same short list of topics that he’s comfortable with, regardless of the conversation topic you started with.
And this leads to questions about AI creativity, or rather, the lack thereof. These AIs are not, in fact, inventing new ways of visually communicating. They are remixing and repackaging information from what they have been trained on, and when given the liberty to continue on with this process of iterative creation, they fall back on the most obvious course.
This is also a function of the types of things the human species wishes to photograph and post on the worldwide web. If the training sets include a thousand images of lighthouses and night scenes in the city, the AI models are bound to be interested in these types of scenarios. The models basically reflect our own preferences.
But it also points to a flaw in today’s AI. It’s always a breeze for these models to mimic an existing style. But to instill it with a sense of true taste, originality, and a vision to pioneer a brand-new aesthetic direction, now that’s a challenge even AI’s superior processing power has not been able to overcome.
It has the capability to create visually striking images. But left alone to run its course, it will resort to a visual equivalent of elevator music—that’s pleasant enough, but forgettable.