A Revolution in Computer Graphics Is Bringing 3D Reality Capture to the Masses
As a weapon of war, destroying cultural heritage sites is a common method by armed invaders to deprive a community of their distinct identity. It was no surprise then, in February of 2022, as Russian troops swept into Ukraine, that historians and cultural heritage specialists braced for the coming destruction. So far in the Russia-Ukraine War, UNESCO has confirmed damage to hundreds of religious and historical buildings and dozens of public monuments, libraries, and museums.
While new technologies like low-cost drones, 3D printing, and private satellite internet may be creating a distinctly 21st century battlefield unfamiliar to conventional armies, another set of technologies is creating new possibilities for citizen archivists off the frontlines to preserve Ukrainian heritage sites.
Backup Ukraine, a collaborative project between the Danish UNESCO National Commission and Polycam, a 3D creation tool, enables anyone equipped with only a phone to scan and capture high-quality, detailed, and photorealistic 3D models of heritage sites, something only possible with expensive and burdensome equipment just a few years ago.
Backup Ukraine is a notable expression of the stunning speed with which 3D capture and graphics technologies are progressing, according to Bilawal Sidhu, a technologist, angel investor, and former Google product manager who worked on 3D maps and AR/VR.
“Reality capture technologies are on a staggering exponential curve of democratization,” he explained to me in an interview for Singularity Hub.
According to Sidhu, generating 3D assets had been possible, but only with expensive tools like DSLR cameras, lidar scanners, and pricey software licenses. As an example, he cited the work of CyArk, a non-profit founded two decades ago with the aim of using professional grade 3D capture technology to preserve cultural heritage around the world.
“What is insane, and what has changed, is today I can do all of that with the iPhone in your pocket,” he says.
In our discussion, Sidhu laid out three distinct yet interrelated technology trends that are driving this progress. First is a drop in cost of the kinds of cameras and sensors which can capture an object or space. Second is a cascade of new techniques which make use of artificial intelligence to construct finished 3D assets. And third is the proliferation of computing power, largely driven by GPUs, capable of rendering graphics-intensive objects on devices widely available to consumers.
Lidar scanners are an example of the price-performance improvement in sensors. First popularized as the bulky spinning sensors on top of autonomous vehicles, and priced in the tens of thousands of dollars, lidar made its consumer-tech debut on the iPhone 12 Pro and Pro Max in 2020. The ability to scan a space in the same way driverless cars see the world meant that suddenly anyone could quickly and cheaply generate detailed 3D assets. This, however, was still only available to the wealthiest Apple customers.
Day 254: hiking in Pinnacles National Park and scanning my daughter as we crossed a small dry creek.
Captured with the iPhone 12 Pro + @Scenario3d. I can’t wait to see these 3D memories 10 years from now.
On @Sketchfab: https://t.co/mvxtOMhzS5#1scanaday #3Dscanning #XR pic.twitter.com/9DX1Ltnmh8
— Emm (@emmanuel_2m) September 14, 2021
One of the industry’s most consequential turning points occurred that same year when researchers at Google introduced neural radiance fields, commonly referred to as NeRFs.
This approach uses machine learning to construct a credible 3D model of an object or space from 2D pictures or video. The neural network “hallucinates” how a full 3D scene would appear, according to Sidhu. It’s a solution to “view synthesis,” a computer graphics challenge seeking to allow someone to see a space from any point of view from only a few source images.
“So that thing came out and everyone realized we’ve now got state-of-the-art view synthesis that works brilliantly for all the stuff photogrammetry has had a hard time with like transparency, translucency, and reflectivity. This is kind of crazy,” he adds.
The computer vision community channeled their excitement into commercial applications. At Google, Sidhu and his team explored using the technology for Immersive View, a 3D version of Google Maps. For the average user, the spread of consumer-friendly applications like Luma AI and others meant that anyone with just a smartphone camera could make photorealistic 3D assets. The creation of high-quality 3D content was no longer limited to Apple’s lidar-elite.
Now, another potentially even more promising method of solving view synthesis is earning attention rivaling that early NeRF excitement. Gaussian splatting is a rendering technique that mimics the way triangles are used for traditional 3D assets, but instead of triangles, it’s a “splat” of color expressed through a mathematical function known as a gaussian. As more gaussians are layered together, a highly detailed and textured 3D asset becomes visible.The speed of adoption for splatting is stunning to watch.
It’s only been a few months but demos are flooding X, and both Luma AI and Polycam are offering tools to generate gaussian splats. Other developers are already working on ways of integrating them into traditional game engines like Unity and Unreal. Splats are also gaining attention from the traditional computer graphics industry since their rendering speed is faster than NeRFs, and they can be edited in ways already familiar to 3D artists. (NeRFs don’t allow this given they’re generated by an indecipherable neural net.)
For a great explanation for how gaussian splatting works and why it’s generating buzz, see this video from Sidhu.
Regardless of the details, for consumers, we are decidedly in a moment where a phone can generate Hollywood-caliber 3D assets that not long ago only well-equipped production teams could produce.
But why does 3D creation even matter at all?
To appreciate the shift toward 3D content, it’s worth noting the technology landscape is orienting toward a future of “spatial computing.” While overused terms like the metaverse might draw eye rolls, the underlying spirit is a recognition that 3D environments, like those used in video games, virtual worlds, and digital twins have a big role to play in our future. 3D assets like the ones produced by NeRFs and splatting are poised to become the content we’ll engage with in the future.
Within this context, a large-scale ambition is the hope for a real-time 3D map of the world. While tools for generating static 3D maps have been available, the challenge remains finding ways of keeping those maps current with an ever-changing world.
“There’s the building of the model of the world, and then there’s maintaining that model of the world. With these methods we’re talking about, I think we might finally have the tech to solve the ‘maintaining the model’ problem through crowdsourcing,” says Sidhu.
Projects like Google’s Immersive View are good early examples of the consumer implications of this. While he wouldn’t speculate when it might eventually be possible, Sidhu agreed that at some point, the technology will exist which would allow a user in VR to walk around anywhere on Earth with a real-time, immersive experience of what is happening there. This type of technology will also spill into efforts in avatar-based “teleportation,” remote meetings, and other social gatherings.
Another reason to be excited, says Sidhu, is 3D memory capture. Apple, for example, is leaning heavily into 3D photo and video for their Vision Pro mixed reality headset. As an example, Sidhu told me he recently created a high-quality replica of his parents’ house before they moved out. He could then give them the experience of walking inside of it using virtual reality.
“Having that visceral feeling of being back there is so powerful. This is why I’m so bullish on Apple, because if they nail this 3D media format, that’s where things can get exciting for regular people.”
i’m convinced the killer use case for 3d reconstruction tech is memory capture
my parents retired earlier this year and i have immortalized their home forever more
photo scanning is legit the most future proof medium we have access to today
scan all the spaces/places/things pic.twitter.com/kmqX5FYaN6
— Bilawal Sidhu (@bilawalsidhu) November 3, 2023
From cave art to oil paintings, the impulse to preserve aspects of our sensory experience is deeply human. Just as photography once muscled in on still lifes as a means of preservation, 3D creation tools seem poised to displace our long-standing affair with 2D images and video.
Yet just as photography can only ever hope to capture a fraction of a moment in time, 3D models can’t fully replace our relationship to the physical world. Still, for those experiencing the horrors of war in Ukraine, perhaps these are welcome developments offering a more immersive way to preserve what can never truly be replaced.
Image Credit: Polycam