Episode 18 | Tech Futures

Episode 18: Exploring Neural Radiance Fields (Nerfs)

By Eliza Mace

Play Now

Preview

Show Notes

The field of 3D rendering has come a long way from pure trigonometry, key point matching, and simple raytracing. A new family of algorithms called Neural Radiance Fields (NeRFs) combines classical rendering techniques with powerful Neural Networks to revolutionize missions in the fields of medicine, disaster relief, robotics, and beyond.

Transcript

Full text of transcript

[00:00:00] Eric Bianchi: Wow, so with this tech the sky’s the limit. So, let’s start with the medical field. One of the problems that someone had approached our team with was looking at what’s called angiograms, which are effectively trying to model the blood flow in your heart. And the current way that they’re doing this is that they have to inject the patient with this dye that’s actually toxic to the patient. And so, getting a clear and crisp model on the first shot is very important. So, the idea is that they take a camera that can see inside the body, and they spin it around and they’re able to get pictures taken every so often. And they use those pictures to reconstruct the heart and looking at how the blood flows over a period of time. So, NeRFs would be an excellent application here because we need to get a high quality model that can be viewed from many different angles over, this period of time. And they’re super well suited for this because, we need to do it in one shot. And NeRFs offer this, inherent ability to model these types of 360 scenes.

[00:01:10] Eliza Mace: Hello and welcome to MITRE’s Tech Futures Podcast. I’m your host, Eliza Mace and I’m a lead machine learning engineer here at MITRE. At MITRE, we offer unique vantage points and objective insights that we share in the public interest. In this podcast series, we showcase emerging technologies that will affect the government and our nation in the future. Today we’ll be discussing a recent MITRE investigation into neural radiance fields or NeRFs, a new family of algorithms used to generate 3D models and novel viewpoints of objects and scenes. We will build a mental model of the problem and explore old solutions and their pitfalls. Then we’ll learn about the inner workings of NeRF algorithms and find out how they address those issues. Finally, we will discuss application areas and how the field is evolving. Before we begin, I want to say a huge thank you to the MITRE Tech Futures team. This episode would not have happened without their support. Now without further ado, I bring you MITRE’s Tech Futures Podcast, episode number 18.

NeRFs are used to generate images, so explaining them without visual aids, like, say, on a podcast can be difficult. So, let’s set up some mental images that will help. Think of an image as an array of pixels, each pixel is made up of three values, red, green, and blue, the represent how much energy from light beams bouncing off of objects in the camera’s view, reach that particular pixel. Just like that energy enters the sensor to form an image today. We’re going to picture those light beam shooting back out from each pixel perpendicular to the image. now picture yourself in a room with one camera in front of you, one to your left and went to your right, all pointing at you. You can imagine the beams of light entering each one and what the resulting image would be from each distinctive viewpoint. But there’s no camera behind you. So, what would you do if you wanted to know what you look like from behind? This is the problem nerves are trying to solve, given a set of views of an object or scene. Can you create an image of what it would look like from a new viewpoint? A common way of approaching this problem is to use the views you do have to make a 3D model of the object or scene of interest, and then use that model to create the new view. This isn’t a brand new problem or solution, but as Dr. Ryan Cinoman explains, the traditional solutions have some pitfalls that NeRFs seek to improve

[00:03:30] Ryan Cinoman: To make 3D models from multiple images taken at various angles the most naive way to do it is basically to turn it into a pure trigonometry problem. So, if I have two or three vantage points, and I have two or three common points finding the shape of those points is just a matter of basic math or trigonometry. One method of doing it is if you have multiple images and you tag common key points between the two images. Either by manual annotation or some sort of pre-processing that identifies similar shapes, corners, edges, in the different pictures. You can basically turn it into a trigonometry problem and create some sort of surface mesh by putting each of the key points into 3D space. So that’s the traditional way of doing 3D rendering. The thing that makes NeRFs unique is this idea of representing the object as a neural field allows you to. not have to think about key points and not have to find the common points between the different images,

[00:04:31] Eliza Mace: So, what has changed since the days of relying on key point matching and trigonometry that has enabled the rise of nerves? Dr Eric Bianchi, who you heard in our opening, explains the technologies that coalesce to enable these new algorithms:

[00:04:44] Eric Bianchi: the technology behind NeRFs come from, ray tracing. Imagine you have an image and for each pixel in that image, you project out a ray into the field and along that ray, you’re sampling data at some kind of interval. That’s what ray tracing is, and that was one little part of a Nerf. And the other part of the Nerf is multilayer perceptrons. These types of fully connected networks, they are computationally expensive. But nowadays, these types of things are trivial for the kind of graphical processing units that we have. Another kind of concept of NeRFs was this idea of data structuring in a creative way in certain configurations like OC-trees or like splitting and breaking down a scene into these little compartments. When you add all these things together, the underlying function is novel when it comes to rendering and creating this 3D representation.

[00:05:48] Eliza Mace: Okay, now that we have built up the problem and the base capabilities, let’s hear how NeRFs actually work. During this description, whenever you hear Dr. Cinoman talk about multi-layer perceptrons or MLPs. Just picture values, passing through a function with tunable parameters. As Dr. Bianchi mentioned earlier, MLPs are a type of fully connected feed forward neural network. But all you need to know. Is that these layers are what we are adjusting. As we train a NeRF to render new views. The most important thing to be keeping in mind while we’re hearing this explanation is that circle cameras we discussed earlier. Remember how we talked about beams of light shooting out from each pixel in an image? Well picture if you have multiple cameras, all pointing at you that the beam shooting out from each camera’s view are actually going to intersect right where you’re standing. We’re going to use this intersection and each camera’s unique viewpoint to reconstruct what the object in the middle actually looks like.

[00:06:43] Ryan Cinoman: NeRFs, as they’re commonly called, stands for neural radiance fields. And that’s a combination of two things. Neural field and radiance. A neural field is a broader categorization of 3D rendering methods. Like a field in physics, you have a quantity radiance that varies through a 3D space. So, what a neural radiance field does is to each point in 3D space, it associates a radiance quantity, the radiance depends on the direction you’re looking at. That’s what gives a neural radiance field its interesting properties and why it seemed to work so well for synthetic rendering.

[00:07:25] Ryan Cinoman: So, the basic structure of a NeRF is you have two MLPs, two multilayer perceptrons and each one is optimizing a different component that goes into the NeRF. The first MLP has control over the 3D X, Y, and Z coordinates and it outputs a density field, which is the density of the object at a given point. Then that density is fed in into the second multi-layer perceptron with an additional coordinate for the viewing direction. And then it models that MLP outputs the radiance that’s admitted at that point in that direction. Then, both MLPs are optimized to make the synthetic images match the training images as well as possible via photometric loss, which is how far apart is each color in the training images from the synthetic images that the nerve generates. So, this strategy of first creating a function for the density and then creating the function for the color intensity is what forces it to basically have that 3D structure of the density function that’s outputting.

[00:08:46] Ryan Cinoman: Ray tracing is the key point that gets us from this. Function of space and direction to an actual synthetic image. From the place you want the virtual camera to be, you will literally follow the path of a light say a photon coming out from where you’re looking. Passing through transparent objects, reflecting whatever path that light is taking. And then just sampling along that ray and integrating it to see what the color that’s gonna come back to the virtual sensor. And you do that in every direction or every pixel that’s on the virtual camera. And then you get an image back. And for nurse, it’s the same exact idea. If I have my 5G NeRF function, this radiance field, and I want to generate a synthetic image from it, I input my virtual camera over my virtual pixel location. I trace array in the direction I’m looking. And I have two things. I have the radiance at each point, and I have a function for the density of the object at each point. I sum over the radiance that’s being emitted. Times the probability that a light photon is going to penetrate to that depth. Once it hits a very hard object, it’s not gonna penetrate any further. If it hits a transparent object, it might penetrate a 50% rate or something like that. And then by integrating over all of those colors that it’s seen on its way we get a color that the pixel sees.

[00:10:17] Eliza Mace: To recap, we train the Nerf by pretending to pass a beam of light from every pixel in our synthesized image, through the scene. We look to see where the rays intersect with the objects in the scene, which we can figure out by checking where our res intersect with the race that would project from our existing images. Then we compare the image we synthesize to an actual view and update the model just as in any supervised training of a neural network. This sounds much more complicated than just leveraging geometry so what’s the payoff here?

[00:10:47] Eric Bianchi: With their loss function and their underlying ray tracing algorithm, they’re able to capture reflectance and even translucence of objects. From the light source that’s given in that scene you can imagine as you’re shifting your gaze or the viewpoints around the scene the light source is reflecting and that’s one of the major advantages. The other major advantage is that with these older algorithms, we’re heavily reliant on, key points and where you have to generate meshes from these key points now in a heavily mathematical kind of model. NeRF wasn’t faster on generating the model or rendering an image, but it did generate higher quality and higher fidelity images than these other dense reconstruction methods were able to achieve prior to this.

[00:11:34] Eliza Mace: When I asked Dr. Bianchi how this technology applies specifically to MITRE and our sponsors, he described a wide range of possible applications:

[00:11:43] Eric Bianchi: so, going to the big landscape or the big 3D scenes nurse could be very advantageous for situations like FEMA where we have images collected from a bunch of different places and people’s, cameras people’s iPhones, maybe drone footage. And we’re able to take this hodgepodge of unstructured image data and create model for the first responders. Like after an earthquake or after a tornado, let’s get our first responders the best places of entry. The 3D scene generation that also could be used for robotics. For example, entire city blocks could be converted into a testbed model space where you can run simulations within that large scenes model environment. AR/VR environments would be very advantageous to accept a NeRF because NeRFs now can render at the speeds that allow you to walk around a VR environment

[00:12:42] Eliza Mace: since the first NeRF paper publication in 2020, there have been hundreds of additional papers published improving on NeRFs or extending their capabilities. I asked Dr. Bianchi what some of the common themes he and Dr. Cinoman noticed during their study were.

[00:12:57] Eric Bianchi: I’m gonna touch on some of the high-level groupings and trends of the methods themselves. So, what we see is that for faster training, for example, or faster rendering. You see an efficient sampling of rays. So, remember how the rays are going through the scene and we’re sampling along the points, and we can naively do that or we can maybe sample at points of interest and begin to maybe take a denser sampling around surfaces, which is what we really care about. So that little change has increased training speeds. Another highlight of that is a higher order or more sophisticated data structures for storing features in during the training process. If we’re looking at city blocks or streets, we can see that NeRF models have to take into effect the transient objects in the scene, so that what they try and do is there’s an embedding of the structural space such that it embeds the unmoving structure in the background, and it filters out the moving objects in the scene. So then when we’re left with is just this, the buildings themselves, maybe the trees, the stop lights, things that aren’t changing over time. They have to mask out some of these transient objects. They will filter that out and not include that in their rendering or model. And so that it requires some additional Computer vision tasks within the NeRF model.

[00:14:31] Eliza Mace: Interestingly, the same capability that allows nerves to filter out transient objects can also be leveraged for some, perhaps, unexpected capabilities. The original idea was to add transient variables that could learn to filter out impermanent objects. But what if instead we use the same types of variables to make changes on purpose?

[00:14:51] Ryan Cinoman: another huge area is controllable NeRFs. We have this method of training nerfs on real world objects or synthetic objects. But that’s not the only thing we might wanna do with a synthetic image generator. We might want to create a model that we can actually control. Can I create a room and then create synthetic images with the door closed and with the door open? Can I make images of a scene at night and during the day? Since synthetic image generation is the task we’re trying to accomplish here, being able to control the output, for example, we’ve seen things like NeRFs modeling people’s faces where you can control aspects of the NeRF like opening, and closing each of their eyes individually or opening and closing the mouth.

[00:15:38] Eliza Mace: Okay. Things are getting a bit wild now. And with all these crazy flavors of NeRFs now running rampant in the research community how can anyone interested in using them figure out where to start?

[00:15:50] Eric Bianchi: with any sort of mission problem, choosing the right tool for the solution is like the very first step and the NeRFs at the moment, they’re just exploding and there’s so many different avenues to go down. I think figuring out the right NeRF for the situation is the first kind of hurdle that you have to get over. For example, maybe you have the time to get pose information from images. Maybe we don’t have the ability to get accurate pose information of the cameras. Maybe I have the ability to apply masks to the images. Maybe I have the generalizable model, like in human NeRF or the, some of the object level NeRFs. Some nerfs have different requirements than other NeRFs.

[00:16:37] Eliza Mace: If you would like to leverage nerves and find out which algorithms meet your mission needs, or if you want to find out more about the concepts we discussed today, you can check out Dr. Bianchi and Dr. Cinoman’s paper techfutures.mitre.org.

Thanks so much for tuning into this podcast. I wrote, produced, and edited this episode with support from the Tech Futures team. Including Dr. Heath Farris and Beverly Wood. Our guests for this episode were Dr. Eric Bianchi and Dr. Ryan Cinoman.

[00:17:07] Eliza Mace: The music for this episode was brought to you by Ooey, Truvio, and Baegel.

MITRE: solving problems for a safer world.

Meet the Guests

Dr. Eric Bianchi

Dr. Eric Bianchi recently finished his first year as a Data Scientist (and researcher) at MITRE. He graduated from Virginia Tech in December of 2021 with a Doctorate in structural engineering as well as a MS in computer engineering with a focus in machine learning and computer vision. His research centered on bridge inspection and artificial intelligence, and he has published articles in prominent civil engineering journals such as Computing in American Society of Civil Engineering, Structural Health Monitoring, Automation in Construction, IEEE, and at conferences such as SPIE Optics and Photonics. While in school, Dr. Bianchi started venture capital backed company focused on presenting dynamic parking data to web and mobile interfaces. While the company has dissolved, he still enjoys the entrepreneurial spirit that his MITRE has to offer. Dr. Bianchi is currently working on application and research for computer vision and data-driven projects and hopes to continue to pursue those areas.