Select Page

Episode 21: Exploring Bayesian Conformal Sets

By Patrick Hinson
Play Now


Show Notes

This episode explores current research into the combining of two statistical methods – Bayesian Models and Conformal Prediction.  When used in tandem, this is called Bayesian Conformal Sets, which is helping stakeholders in high risk/safety critical scenarios evaluate risk and uncertainty when modeling.


Full text of transcript

[00:00:00] Patrick: Picture this, you’re a doctor who has some x-rays and you need to decide how to diagnose a patient. You decide to feed the x-ray image into a fancy neural network and it declares that the patient has a specific type of infection. As a doctor, you decide to recheck the image before making a decision, and you find this diagnosis to be surprising. You think to yourself: sure, the model gave this diagnosis, but how confident is it in this particular diagnosis? What if something else is wrong? Were there any other potential diagnoses that were runner ups? Enter this episode’s topic, a state-of-the-art statistical modeling technique called Bayesian Conformal Sets currently being studied at MITRE that aims to better inform decision makers in high stakes scenarios, like the life or death one we just mentioned and better quantify uncertainty in the space of artificial intelligence.

[00:00:43] Hello and welcome to MITRE’s Tech Futures Podcast. I’m your host Patrick Hinson. At MITRE, we offer a unique vantage point and objective insights that we share in the public interest, and in this podcast series, we showcase emerging technologies that will affect the government and our nation in the future. Today we will be talking about an exciting merging of statistical modeling techniques, Bayesian Conformal Sets, but before we begin, I would like to mention that this podcast was made possible by MITRE’s Independent Research and Development program, which funds projects that address the critical problems and priorities of our government sponsors. We do that through applied research that reflects our sponsors near, mid, and far term research needs. Now without further ado, I bring you MITRE’s Tech Futures Podcast Episode #21: Exploring Bayesian Conformal Sets.

[00:01:28] Patrick: In our intro, we introduced a modeling approach called Bayesian conformal sets, an approach that seeks to unify Bayesian modeling and conformal prediction to better quantify uncertainty when modeling. You can think of this unification like how the spoon and the fork came together to make the great invention of the spork. But so far we’ve introduced a few terms that you may not be familiar with. What is Bayesian modeling? What is conformal prediction? And what is the unification of the two that is Bayesian conformal sets?

We can start with Bayesian models, which unsurprisingly are models building on a framework provided by Bayes’ Theorem. They use data and prior knowledge or beliefs about the variables in the data to obtain new beliefs or knowledge about the variables. There are many types of Bayesian models and many different ways that you can approach implementing them. The type of Bayesian models that Paul is researching are Basie and models that incorporate supervised deep learning methods, which can make these models very powerful. Here’s how Paul Scemama, an artificial intelligence researcher at MITRE leading the study in Bayesian conformal sets describes Bayesian models.

[00:02:27] Paul: We can think of Bayesian models, and this is oversimplified, but we can think of them as models that can better know what they don’t know. So, meaning that if they’re used in a setting where they’re fed inputs that they’ve never seen before and are unlike anything they’ve ever seen before, they can better recognize this and exhibit more uncertainty in their prediction. And that’s what we want.

[00:02:49] Patrick: Now we can move on to the second ingredient of Bayesian conformal sets, conformal prediction. I asked Paul how he would describe the concept and here’s what he had to say.

[00:02:57] Paul: What it basically is, is we allow our ultimate output to be a set instead of just a single value. For example, instead of, we produce a single label, from our system, it’s a set of labels. And the assumption is that the model thinks that the true label is in that set with some confidence, and if we allow that conformal prediction is this family of procedures where we can make some sort of statistical guarantees about those sets.

[00:03:29] Patrick: One of the more powerful statistical guarantees of conformal sets are when they’re applied to what is called exchangeable data, which is when data is sequential and the order of the sequences that comprise the data does not have an impact on the system from which the data is drawn from.

[00:03:42] Paul: The most general one, and what makes it conformal is that on average the true label will be in those prediction sets where we get to choose the rate. So if I say, I want 95 percent of the time the true label to be in this sets then on, what’s called exchangeable data. It’s a slightly less restrictive than the I.I.D assumption, the independent, identically distributed data assumption, then we get that guarantee, which is great and really useful.

[00:04:10] Patrick: So in short, rather than treating the model output as a single label, we instead get a set of potential labels for output. For example, if we had a model that classifies images of animals, a typical approach would be to take the images as input, and then output a label for that image. Like this image is a dog or this image is a cat. However, if we were to apply conformal prediction methods to the model, the output would instead be a set of labels. So for output the model would instead say that at a certain threshold, it thinks the image is likely either a dog or a wolf or a bear, and that set of three would be the output instead of the single label. Additionally, this guarantee that Paul mentioned is quite powerful and makes conformal prediction a very attractive method for modeling. Here’s how Dr. Geoff Warner, a Principal Data Scientist here at MITRE who has worked with a number of sponsors and served as a principal investigator on projects spanning a broad range of topics explains this conformal guarantee:

[00:05:03] Geoff: So conformal prediction prescribes a method whereby, you can choose a threshold in the predicted probabilities of that model, above which if you include all of the output that is above that threshold, then that set will be guaranteed to contain the true answer with the correct probability that the user has prescribed at the outset.

[00:05:23] Patrick: Wow! A guarantee that the model contains the true label that we’re looking for. That is quite an advantage to have when modeling. Now that we know what Bayesian models and conformal prediction are, what is the use of them in tandem that is Bayesian conformal sets? Here’s how Paul answers that question:

[00:05:40] Paul: For me, it just means any Bayesian model, and you apply conformal prediction to it. And so the idea of Bayesian conformal sets is can we combine these and then can we benefit from both of their advantages? And what my study kind of shows is that it’s not that simple and that there are some clear cases where conformal prediction, can really help the Bayesian model, even on out of distribution data, but also it can actually make it worse than what we would otherwise see without using it. This is just one of the ways to try and make machine learning systems just more safe in deployment, especially when we really care about the safety and we care about, okay, if we’re unsure, we want to defer to the human expert.

[00:06:26] Patrick: When Paul mentions out of distribution data, he is referring to data that the model has not seen before and is unlike the data that the model was trained on, which can be encountered in practice. But the performance that Paul has seen on in distribution data is promising so far. Here he is describing his study applying Bayesian conformal sets for medical diagnoses based on x-rays to find implications the approach has on handling risk with AI:

[00:06:50] Paul: So in my study I focused on image classification, like a medical diagnosis with X-ray classifying the different organs in it. It’s a really emblematic application where safety is of the utmost importance. We don’t want the machine learning model to be overly confident and then be wrong. I wanted to do something that was closer to real life, so I used the MedMNIST cohort of data sets. And basically what I did was, they have two data sets with the same labels of the same images but with slightly different views of these X-rays, and I trained different models with different Bayesian methods on one of the data sets and then I did the conformal calibration stage with that same data set and then I evaluated how it did on the test set, but then also the test set of the other data set which is obviously going to be out of distribution because it’s different views. And so in real life we might have these X-rays, but maybe we’re getting new X-rays, maybe some of them have slightly different views. What’s the risk of still deploying the machine learning system in that, and how does conformal prediction and Bayesian modeling reduce that risk?

[00:08:06] Patrick: In addition to the medical image classification tasks that Paul is currently exploring, he detailed to me other areas where he believes that Bayesian conformal sets can shine.

[00:08:14] Paul: I would want to try it on any application where there’s a human in the loop, right? Where you have some sort of expert that you can defer to, so it’s not a fully automated system, but you want a predictive model to alleviate some sort of workload. And also where making mistakes are really bad, so it’s in the safety critical setting. So a good example of an application that might not be the best would be recommendation, right? It’s okay if you’re wrong a few times when you’re doing recommendation, for like consumers and stuff, but then with the medical domain, or even, more of the critical settings that the D. O. D. would be interested in stuff like that, you know, mistakes can have large consequences. So, in any of those applications, I would at least, try to see how these methods perform on that. You could do the same with satellite imagery, and you have an analyst and they’re looking at stuff. But the workload, just staring at the screen, you’re going to have more human error when you’re doing a lot of easy stuff. Conformal prediction has also been used a lot in the regression setting. The Washington Post used it for election prediction results. So it’s definitely being used in some real world scenarios, but not that I’ve seen with high dimensional input and Bayesian methods together.

[00:09:39] Patrick: This all sounds great, but there are many other great modeling approaches out there – some of them have even been covered in previous Tech Futures Podcast episodes. I asked Geoff why a company like MITRE would be interested in studying Bayesian conformal sets, and how does the study align with MITRE’s mission of solving problems for a safer world?

[00:09:55] Geoff: As the years have passed, we’ve come to rely more and more on models for decision making. And often these models, for example, with classification models, they output some probability. We need to accurately characterize the uncertainty in how these models function and how well they’re actually performing in these spaces. It’s vitally important in, for example, medical diagnostics if you’re looking at radiological images that you have an accurate sense of what your actual confidence in the result is. So this is important from that perspective, and it’s a universal problem with really all deep learning algorithms at this point. We do care about safe predictions, right? We want our models to be reliable, and we want to be able to characterize their uncertainty accurately. Obviously, that aligns, I think, very clearly with this notion of creating a safer world. If we’re relying on these models for decision making more and more, we want to be able to understand and characterize their behaviors better including how they behave under uncertainty.

[00:10:52] I look at it like this: we’ve moved from, more application oriented things where, okay, I design a thing like an Excel application, and I want to know exactly how it’s going to behave when I put in a number in one cell and a number in another and I ask it to add. I want to know, okay, it’s reliably going to add those two things. The regime that we use to test those kinds of applications is very different from the regime we use to test things like deep learning models, which are more like, in some sense, creatures that we tame to perform certain tasks. For example, we can train a horse to act as a conveyance, but in many cases, it might be better just to build a car which is engineered specifically for that purpose. So in this world where we’re trying to wrangle these models, it’s almost like animal husbandry, right? We’re trying to get them to behave in certain ways. We need effective methods for characterizing the uncertainty in the building of these models, and I think this research goes towards this question of trying to understand why they sometimes give us unexpected behaviors in the wild.

[00:11:50] Patrick: Given all the great things that we’ve heard about Bayesian conformal set so far, I wanted to ask Paul what he felt were some of the limitations of this kind of modeling. You’ll hear him mention something called a calibration set, which you can think of as just a portion of the data set that the model is tested and evaluated on, and is necessary to tune prediction sets used by conformal prediction.

[00:12:08] Paul: You have to be careful and make sure that you’re not accidentally increasing or worsening safety of your machine learning system by just slapping them together and just being like it’s going to be good. So as we get fed inputs that are unlike what we’ve seen, sometimes we’ll be super overly confident when, in fact, we should be the least confident we can be. And conversely, there are some Bayesian methods that have been shown to be actually underconfident, so they could be more confident, but they’re not so they err on the side of being more and more conservative. And one of those is, mean field variational inference, it’s a type of Bayesian modeling method, and especially in classifiers, it’s actually more underconfident, and what we find is that if the model is overly confident on that calibration set, what conformal prediction does is it says, okay, you’re way too confident. We’re going to make you way more conservative.

[00:13:08] Patrick: So to conclude, Bayesian and conformal sets are not perfect, but they do show a lot of potential once more research and time gets dedicated to them. There appears to be a need to build out software to support these models so that it is more accessible to the general public, rather than coding the inner workings of these models by hand. Once we have this kind of access we may start seeing Bayesian conformal sets appear in more high risk, safety critical areas like healthcare and national defense. Paul seems to be optimistic for their future:

[00:13:34] Paul: I think Bayesian deep learning is going to grow a lot, it already has grown a lot. I think this general quantifying uncertainty is a large thing in research right now, and I think the research is a little ahead of the tools to be able to use them. So I think there’s an opportunity to have more reliable implementations because these are often pretty complex methods. But allowing people to use them out of the box – I don’t feel like it’s fully there, but I see it continuing to grow. And then conformal prediction, I think it will grow even faster because it’s newer, and the research has seen this huge rise in the past couple of years and I think that’s just going to continue and continue because I feel like it’s a relatively understudied field compared to Bayesian modeling in general, and I think we’ll see more unification in the theory of using the two. So hopefully, more theoretical results of, okay, when does combining the two make sense? When does it not? I think what I’ve studied is to show that there are some initially counterintuitive results in combining them or not combining them, and I hope that that will catalyze more research into the nitty gritty math to make some sort of guarantees on when combining them is going to help and when it’s going to harm.

[00:14:54] Patrick: Thanks for tuning in to this episode of MITRE’s Tech Futures Podcast. If you would like to learn more about Bayesian conformal sets, you can read Paul’s paper at I wrote, produced, and edited this show with the help of Dr. Heath Farris and Dr. Kris Rosfjord: Technology Futures Innovation Area Leads, Tom Scholfield: Media Engineer, and Beverly Wood: Strategic Communications. Our guests for this episode included Paul Scemama and Dr. Geoff Warner. The music in this episode was brought to you by Ooyy and Truvio.


Copyright 2023. The MITRE Corporation. All rights reserved. MITRE PRS number 23-4062. MITRE solving problems for a safer world.

Meet the Guests

Paul Scemama

Paul Scemama has been at MITRE for about a year and a half. Before MITRE he graduated from undergrad with a degree in applied math and statistics. At MITRE, Paul has been able to work on a diverse set of projects; such as developing a library for training surrogate models of complex simulations, as well as developing a cloud-native evaluation pipeline for an augmented reality microscope. His academic interests lie mostly with probabilistic machine learning, approximate inference, and probabilistic programming, but he is also always open expand his horizons and try something new!

Geoff Warner

Geoff Warner has been at MITRE for 13+ years.  He spent his early career in academia doing graduate and postdoctoral work in theoretical condensed matter physics, with a focus on non-equilibrium phenomena in superfluid helium-three and degenerate Fermi gases.  Since coming to MITRE, Geoff has worked with a number of sponsors and served as a principal investigator on projects spanning a broad range of topics, including:  systemic risk in financial markets; computational models of the VBA claims process; hypertemporal and hyperspectral imaging; sensor physics; evolutionary algorithms for the automated discovery of novel tax schemes; muon imaging; exotic materials physics (Kosterlitz-Thouless transitions, time crystals, and topological insulators); and quantum computing.