In December, Apple published Sharp, a technique for generating 3D Gaussian Splats from a single photograph in under one second. Not from video, not from multiple angles - one image.
Drone shot of Blåvand at New Years Eve
I’ve written before about Gaussian Splatting and Neural Radiance Fields. With Apple’s Sharp codebase now public and Christmas holiday ahead of me, I felt inspired to see if I could build a fully featured pipeline and engine that allow my visitors to explore 3D scenes directly within my blog posts.
Apple did the hard work - training a neural network that infers depth and generates ~1.2 million Gaussians from a single image. I built the infrastructure around it. With the help of AI coding tools, this took me a week or so across three domains: frontend rendering, CMS integration, and cloud ML processing.
Contents:
- The Frontend - How Sharp works, scroll-based rotation, performance, post-processing
- The CMS - Three custom widgets for Decap CMS
- The Backend - Running Sharp on Modal.com
- Limitations - Light behavior, difficult images, diorama constraints
- What This Means - How does this play into the future and existing platforms, like iOS?
Blavand dunes - scroll to see the scene rotate
As you scroll, the scene above rotates. You can drag to explore, zoom in, or click to open fullscreen.
The Frontend
The goal was making 3D feel a bit like it’s part of reading - not a demo you have to engage with, but something that responds to how you already interact with a page.
How Sharp Works
Traditional Gaussian Splatting requires dozens of photos from different angles. Sharp uses a neural network to infer depth from a single image, generating a two-layer representation (foreground and background) with about 1.2 million Gaussians encoding color, position, opacity, and scale.
The amazing collapsed lava tubes of Galapagos