Teaching AI to Predict What Cells Will Look Like Before Running Any Experiments

This is a sponsored article brought to you by MBZUAI.

If you’ve ever tried to guess how a cell will change shape after a drug or a gene edit, you know it’s part science, part art, and mostly expensive trial-and-error. Imaging thousands of conditions is slow; exploring millions is impossible.

A new paper in Nature Communications proposes a different route: simulate those cellular “after” images directly from molecular readouts, so you can preview the morphology before you pick up a pipette. The team calls their model MorphDiff, and it’s a diffusion model guided by the transcriptome, the pattern of genes turned up or down after a perturbation.

At a high level, the idea flips a familiar workflow. High-th…

This is a sponsored article brought to you by MBZUAI.

At a high level, the idea flips a familiar workflow. High-throughput imaging is a proven way to discover a compound’s mechanism or spot bioactivity but profiling every candidate drug or CRISPR target isn’t feasible. MorphDiff learns from cases where both gene expression and cell morphology are known, then uses only the L1000 gene expression profile as a condition to generate realistic post-perturbation images, either from scratch or by transforming a control image into its perturbed counterpart. The claim is that competitive fidelity on held-out (unseen) perturbations across large drug and genetic datasets plus gains on mechanism-of-action (MOA) retrieval can rival real images.

aspect_ratio

This research led by MBZUAI researchers starts from a biological observation: gene expression ultimately drives proteins and pathways that shape what a cell looks like under the microscope. The mapping isn’t one-to-one, but there’s enough shared signal for learning. Conditioning on the transcriptome offers a practical bonus too: there’s simply far more publicly accessible L1000 data than paired morphology, making it easier to cover a wide swath of perturbation space. In other words, when a new compound arrives, you’re likely to find its gene signature which MorphDiff can then leverage.

Under the hood, MorphDiff blends two pieces. First, a Morphology Variational Autoencoder (MVAE) compresses five-channel microscope images into a compact latent space and learns to reconstruct them with high perceptual fidelity. Second, a Latent Diffusion Model learns to denoise samples in that latent space, steering each denoising step with the L1000 vector via attention.

Diagram depicting cell painting analysis pipeline, including dataset curation and perturbation modeling. Wang et al., Nature Communications (2025), CC BY 4.0

Diffusion is a good fit here: it’s intrinsically robust to noise, and the latent space variant is efficient enough to train while preserving image detail. The team implements both gene-to-image (G2I) generation (start from noise, condition on the transcriptome) and image-to-image (I2I) transformation (push a control image toward its perturbed state using the same transcriptomic condition). The latter requires no retraining thanks to an SDEdit-style procedure, which is handy when you want to explain changes relative to a control.

It’s one thing to generate photogenic pictures; it’s another to generate biologically faithful ones. The paper leans into both: on the generative side, MorphDiff is benchmarked against GAN and diffusion baselines using standard metrics like FID, Inception Score, coverage, density, and a CLIP-based CMMD. Across JUMP (genetic) and CDRP/LINCS (drug) test splits, MorphDiff’s two modes typically land first and second, with significance tests run across multiple random seeds or independent control plates. The result is consistent: better fidelity and diversity, especially on OOD perturbations where practical value lives.

The bigger picture is that generative AI has finally reached a fidelity level where in-silico microscopy can stand in for first-pass experiments.

More interesting for biologists, the authors step beyond image aesthetics to morphology features. They extract hundreds of CellProfiler features (textures, intensities, granularity, cross-channel correlations) and ask whether the generated distributions match the ground truth.

In side-by-side comparisons, MorphDiff’s feature clouds line up with real data more closely than baselines like IMPA. Statistical tests show that over 70 percent of generated feature distributions are indistinguishable from real ones, and feature-wise scatter plots show the model correctly captures differences from control on the most perturbed features. Crucially, the model also preserves correlation structure between gene expression and morphology features, with higher agreement to ground truth than prior methods, evidence that it’s modeling more than surface style.

Graphs and images comparing different computational methods in biological data analysis. Wang et al., Nature Communications (2025), CC BY 4.0

The drug results scale up that story to thousands of treatments. Using DeepProfiler embeddings as a compact morphology fingerprint, the team demonstrates that MorphDiff’s generated profiles are discriminative: classifiers trained on real embeddings also separate generated ones by perturbation, and pairwise distances between drug effects are preserved.

Charts comparing accuracy across morphing methods for image synthesis techniques in four panels. Wang et al., Nature Communications (2025), CC BY 4.0

That matters for the downstream task everyone cares about: MOA retrieval. Given a query profile, can you find reference drugs with the same mechanism? MorphDiff’s generated morphologies not only beat prior image-generation baselines but also outperform retrieval using gene expression alone, and they approach the accuracy you get using real images. In top-k retrieval experiments, the average improvement over the strongest baseline is 16.9 percent and 8.0 percent over transcriptome-only, with robustness shown across several k values and metrics like mean average precision and folds-of-enrichment. That’s a strong signal that simulated morphology contains complementary information to chemical structure and transcriptomics which is enough to help find look-alike mechanisms even when the molecules themselves look nothing alike.

MorphDiff’s generated morphologies not only beat prior image-generation baselines but also outperform retrieval using gene expression alone, and they approach the accuracy you get using real images.

The paper also lists some current limitations that hint at potential future improvements. Inference with diffusion remains relatively slow; the authors suggest plugging in newer samplers to speed generation. Time and concentration (two factors that biologists care about) aren’t explicitly encoded due to data constraints; the architecture could take them as additional conditions when matched datasets become available. And because MorphDiff depends on perturbed gene expression as input, it can’t conjure morphology for perturbations that lack transcriptome measurements; a natural extension is to chain with models that predict gene expression for unseen drugs (the paper cites GEARS as an example). Finally, generalization inevitably weakens as you stray far from the training distribution; larger, better-matched multimodal datasets will help, as will conditioning on more modalities such as structures, text descriptions, or chromatin accessibility.

What does this mean in practice? Imagine a screening team with a large L1000 library but a smaller imaging budget. MorphDiff becomes a phenotypic copilot: generate predicted morphologies for new compounds, cluster them by similarity to known mechanisms, and prioritize which to image for confirmation. Because the model also surfaces interpretable feature shifts, researchers can peek under the hood. Did ER texture and mitochondrial intensity move the way we’d expect for an EGFR inhibitor? Did two structurally unrelated molecules land in the same phenotypic neighborhood? Those are the kinds of hypotheses that accelerate mechanism hunting and repurposing.

The bigger picture is that generative AI has finally reached a fidelity level where in-silico microscopy can stand in for first-pass experiments. We’ve already seen text-to-image models explode in consumer domains; here, a transcriptome-to-morphology model shows that the same diffusion machinery can do scientifically useful work such as capturing subtle, multi-channel phenotypes and preserving the relationships that make those images more than eye candy. It won’t replace the microscope. But if it reduces the number of plates you have to run to find what matters, that’s time and money you can spend validating the hits that count.

Similar Posts