Apple continues to focus on AI-powered image modification, with new studies detailing evaluation frameworks and an AI model that can turn 2D images into 3D scenes in a second. Here’s what the company’s research has revealed.
With iOS 26, Apple made it possible to turn two-dimensional photos into three-dimensional "Spatial Scenes," allowing for a more dynamic viewing experience with an added sense of depth. Now, the iPhone maker has published additional research relating to the same concept, albeit with much more impressive results.
Among other research papers made available…
Apple continues to focus on AI-powered image modification, with new studies detailing evaluation frameworks and an AI model that can turn 2D images into 3D scenes in a second. Here’s what the company’s research has revealed.
With iOS 26, Apple made it possible to turn two-dimensional photos into three-dimensional "Spatial Scenes," allowing for a more dynamic viewing experience with an added sense of depth. Now, the iPhone maker has published additional research relating to the same concept, albeit with much more impressive results.
Among other research papers made available in December 2025, Apple released a scientific study dubbed "Sharp Monocular View Synthesis in Less Than a Second." The study, which can be found on the Apple Machine Learning blog, details an open-source AI model known as "SHARP," which is capable of turning a 2D image into a 3D scene in under a second.
Also related to modifying images with the help of AI, the company published a research paper about an evaluation framework for text-guided image editing. Another evaluation method examines the extent to which AI models understand the underlying linguistic complexity of languages other than English, particularly in terms of inflectional morphology.
Here’s what the research papers themselves revealed.
SHARP — 3D scenes from 2D images in less than a second
SHARP, an AI model, is described as "an approach to photorealistic view synthesis from a single image." When provided with a single image, SHARP "regresses the parameters of a 3D Gaussian representation of the depicted scene." This is done in less than a second on a typical GPU, according to the study.
SHARP can generate 3D scenes in less than a second. Image Credit: Apple.
Instead of using triangles to represent a 3D scene, Gaussian representations render volume through millions of ellipsoids or blobs. In essence, millions of these blobs put together are used to represent a 3D image. Achieving a 3D representation like this can require multiple photos of the same object or area, taken from different vantage points and angles.
Apple’s SHARP, however, only needs one image, and it generates a 3D scene in less than a second when used on a typical GPU. The process only involves "a single feedforward pass through a neural network," but it took time to get there.
The company’s researchers trained SHARP on relatively large datasets, intending to make the model capable of discerning common depth patterns.
The goal was for SHARP to predict the depth of a 3D scene and generate a 3D Gaussian representation based on that prediction.
SHARP fails in select scenarios. Image Credit: Apple.
In most cases, the model successfully did what it was created to do; however, there were failures as well. Apple’s examples include objects being rendered in incorrect positions, such as a bee being placed behind a flower rather than on top of it, or the sky being interpreted as a curved surface nearby. The model can also experience issues processing complex reflections.
Another limitation is that SHARP only generates scenes based on the visible portions of an image, meaning it doesn’t try to "fill in the blanks" to create a larger 3D environment. Even so, the AI model is remarkably impressive, and it’s even publicly available on GitHub.
While SHARP focuses on creating 3D representations, Apple has published research that revolves around editing 2D images and the quality of the output.
GIE-Bench — Evaluating text-based image editing
Apple’s researchers devised an evaluation framework for text-based image editing, which grades AI model output based on functional correctness and image preservation.
Apple devised a framework for the evaluation of text-based image editing.
Functional correctness is evaluated through automatically generated multiple-choice questions that aim to verify whether or not the requested change was successfully applied. Image preservation involved an object-aware masking technique and preservation scoring to ensure that non-targeted areas of an image are not altered. Human annotation for image masks was also used.
The company used a thousand editing examples across 20 content categories to test the output of multiple editing models, such as MGIE, OmniGen, and GPT-Image 1. At the time of writing, OpenAI’s GPT-Image-1 was found to be the best-performing model.
Apple used detailed instructions and edit requests, including making the models add, remove, replace, or change the size of an object. The models were also required to change backgrounds, modify the layout, and so on.
The study revealed that GPT-Image-1 generated the most desirable results. However, the model did fail in select scenarios, where it improperly or partially removed objects, misunderstood layout change requests, or failed to preserve non-target areas in their entirety.
"Overall, while GPT-Image-1 is highly capable in executing core edits, it lacks fine-grained control over spatial relationships and content preservation, leaving room for improvement for tasks that demand high precision or minimal collateral changes," reads the study.
Meanwhile, "OneDiffusion and MagicBrush consistently achieve the highest preservation scores across all metrics." Overall, this provides Apple with a convenient way to test its own image-generation models or identify faults in competing products.
The company already has an on-device image-generation solution, known as Image Playground. It’s part of Apple Intelligence, an AI-powered suite of features that’s available in multiple languages.
IMPACT: Inflectional Morphology Probes Across Complex Typologies
Apple’s researchers decided to evaluate the performance of AI models across different languages by developing a dedicated framework focused on inflectional morphology.
Apple created a framework to evaluate how LLMs perform with morphologically rich languages.
The company’s researchers noted that AI models typically struggle with morphologically rich languages, saying that it also "remains unclear to what extent these models truly grasp the underlying linguistic complexity of those languages."
To be more specific, Apple’s research examined how LLMs performed and whether or not they generated output in accordance with the inflectional morphology of Arabic, Russian, Finnish, Turkish, and Hebrew.
Inflectional morphology, as opposed to derivational morphology, explains how morphemes are used to alter a word so that it fits a specific purpose or grammatical structure. For example, suffixes can be used to convey grammatical functions like the number of a noun, or the tense of a verb, and so on.
"IMPACT includes unit-test-style cases covering both shared and language-specific phenomena, from basic verb inflections (e.g., tense, number, gender) to unique features like Arabic’s reverse gender agreement and vowel harmony in Finnish and Turkish," reads the study.
The study evaluated the performance of eight multilingual LLMs, noting that they "struggle with other languages and uncommon morphological patterns, especially when judging ungrammatical examples."
For each language, Apple’s researchers decided to "identify templates targeting specific morphological aspects, generate controlled utterances, and test LLMs in two scenarios." The two scenarios included "one where the LLM predicts the correct inflection (Generation), and another popular scenario where the LLM judges grammatical and ungrammatical utterances."
Apple’s researchers tested AI models with fill-in-the-blanks tests.
In essence, AI models were provided with patterns that detail the specific aspects of certain languages in terms of inflectional morphology. In other words, what morphemes are used to indicate plurality, or how tense is expressed, and so on.
Then, AI models were required to complete fill-in-the-blank type tests, where they needed to apply the previously outlined patterns based on the surrounding words.
Later, the LLMs took on the role of "judges," where they were tasked with evaluating whether or not a provided utterance was grammatically correct and whether or not it was something a native speaker would use.
Apple’s researchers outlined where specific models failed for each language. The results indicate that most AI models struggle with uncommon morphological patterns. Though some LLMs performed in one language better than another, they all performed worse than they would have in English.
As for what this means in practical terms, Apple could employ the IMPACT framework as a means of evaluating the performance of its in-house models, particularly when they deal with morphologically complex languages.
In 2025, the company announced its Live Translation feature that works with AirPods. It provides real-time translations, letting users with knowledge of different languages communicate with relative ease.
Even with Apple’s recent AI team changes, like the retirement of former AI head John Giannandrea, Apple hasn’t slowed down on publishing research related to artificial intelligence. The company is expected to release its delayed contextually-aware version of Siri with the iOS 26.4 update.