In a recent interview, Scott Alexander wrote,
There's no clear distinction between a prior and a bias. I too have a prior that markets are better than central planning. Does this mean anything different from the accusation "you're biased in favor of markets"?
Yes it does mean something different. Having a strong prior is ok. It is one’s (lack of) willingness to recognize evidence against one’s prior that indicates bias.
To me, Bayesian reasoning means that when you get a new piece of evidence, you weigh that against what you believed before. What you believed before is termed your “prior.”
In an example I like to use, during a routine checkup a doctor presented me with evidence that there was microscopic blood in my urine. My prior was that there was nothing seriously wrong with me. His evidence moved me slightly in the direction of thinking that there might be something seriously wrong, but not much. He, on the other hand, was convinced that I absolutely needed to go through a rigorous battery of tests to rule out anything serious.
When I used this example in teaching probability to high school students, I showed that my doctor failed to use Bayesian reasoning. If fifteen percent of the population has this symptom, and only one in 20,000 of the population has a serious illness that might be indicated by it, the doctor’s case for undergoing the tests is really weak.
A Bayesian takes a weighted average of prior information and new evidence to form a new probabilistic belief. My doctor put a huge weight on the new evidence, and I thought that a more rational calculation would put more weight on the prior information: the low “base rate” of serious illness in the general population, the high “base rate” of people with microscopic blood in their urine, my age (this was more than 30 years ago) and the fact that since I was 15 years old doctors have occasionally found microscopic blood in my urine, but I had been healthy all this time.
But now, I want to use it as an example to illustrate what I see as the difference between having a prior and having a bias. My prior belief was that it was unlikely that I had something seriously wrong. But that was not a bias. Bias would have been for me to look at the evidence and conclude that it was less likely that I had something seriously wrong.
So here is my definition of bias:
Reasoning is biased if you put a negative weight on new information.
The classic experiment showing confirmation bias (also known as motivated reasoning) is to give two groups with opposite beliefs on a topic a summary of some recent findings on the topic. In the experiment, each group reports afterward that its beliefs have been strengthened by the new evidence. At least one side must be assigning a negative weight to the new information.1
From October 2023 through January of 2024, the number of heating-degree days in New York was more than 10 percent below normal. This is not enough to convince me that global warming should be a top worry. For one thing, the number of cooling-degree days last summer was also a bit below normal. And there are many other indicators to examine. But the mild winter in the mid-Atlantic region (I live much closer to DC than NY) counts as evidence in favor of global warming, and I am not going to strengthen my belief that global warming is not a top worry. I am trying not to be biased. I will avoid giving a negative weight to the evidence of this year’s mild winter.
To take a more heated example (pun intended), suppose that one believes that Palestinians are prepared to live in peace with Israel, and that the best way to promote peace in the region is to march against Israel. Then October 7 happens.
I think that an unbiased reaction would be to say that this reduces one’s confidence that the Palestinians are prepared to live in peace with Israel and that demonstrations against Israel are a constructive approach to take. You may have started out with a very high prior for such a belief, and you may not give much weight to the evidence of October 7. You may still blame Israel.
But you would not strengthen your belief that the Palestinians are prepared to live in peace with Israel. You would not become more inclined to denounce Israel. I would argue that those whose immediate reaction to October 7 was to denounce Israel were showing bias, by my definition. The were giving a negative weight to new information.
The TLDR is this: You can have a prior belief about something, based on your own observations and the opinions of others you trust. That belief could be very strong without making you biased. But you are biased if you give negative weight to new information. That is, evidence against your prior belief should reduce your confidence in that belief, not raise it.
Substacks referenced above:
@
It could be that each group is sifting through the information and putting a higher weight on the information that supports its prior. But let us assume that we are talking about evidence that objectively can be viewed as supporting one side or the other.
But again, we get back to trust. It's really hard to not strengthen your prior when you catch those who disagree lying to you, or misrepresenting the facts, and this is because in daily life you will run into a large number of people who will lie to you precisely because they are wrong. "Ah, I shouldn't strengthen my belief because those liars have always been lying, so I have learned nothing" may be the correct approach -- but I don't know anybody who does this. It's only when you think those who disagree with you are arguing in good faith, and have just made _mistakes_ (which you can point out to them, and if you are correct they will acknowledge) that you can decide that nothing more was learned here. Otherwise 'I have evidence that the people who disagree with me do not care about the truth. I do. Therefore I am more likely to be correct than they are' is really hard to resist.
I think Scott's definition of bias is fine, and works well from a statistical viewpoint. But rather than redefining bias, we can avoid Arnold's wordsmithing by having our goal be to maximize the accuracy of our predictions rather than minimizing our bias. Introducing some bias through an ideology can increase our accuracy because it reduces our variance, e.g. our tendency to overfit our mental model to whatever events we happened to observe by chance. The bias-variance tradeoff from statistics tells us this is a good idea provided our prior does not introduce so much bias that it swamps the benefit obtained from the reduction in variance. I'll expand on this below.
In statistics/machine learning/predictive modeling, there is a very famous "bias-variance tradeoff." Models allow us to use data x to predict an outcome y by using a function: yhat = f(x). We learn the function f by examining past pairs [(x1,y1),...,(xn,yn)]=[X,Y], i.e. the training data. The bias of our model is the expected error. The bias-variance tradeoff says that if we want to reduce the mean squared error of our predictions, we have two choices - either reduce the function's bias or reduce its variance. The value of machine learning is that it allows us to fit very complex models that are unbiased (in the statistical sense), such as deep learning models with billions of parameter. The model is less biased than, say, a linear regression, because it is flexible enough to fit all sorts of complex patterns and so on average it should be correct. But because the function is so complex, the variance could be so high that it is worthless. So in machine learning, we look for ways to reduce the variance through various regularization methods. In Bayesian statistics, we use an informative prior for regularization.
The bias and variance of the model are viewing the training data, [X,Y], as the random variable. When we say on average the model should be correct, we mean that if we train our model on a random sample, we have no reason to expect the model's predictions should be too high vs too low. Some samples would cause our prediction to be too high whereas others would cause it to be too low. A model with high variance will make wildly different predictions depending on what training data is happens to use. But the key point is that the training data is random, and we want to design a method that allows to make a good prediction taking this uncertainty into account.
Bringing this back to the real world, the training data in our minds is our life experience. When we encounter a data point, like the number of heating-degree days in New York being 10% below normal, we can use this to make a prediction about global warming by filtering it through our mental model. Of course our data point didn't have to be this. We also could have learned the same statistic in Seattle or the same statistic from a different point in time. If we come into this situation with a naive unbiased blank slate model, we would overfit to that datapoint. And so depending on what the datapoint we observe is, we could draw wildly different conclusions. In other words, our model has too high variance. On the other hand, we can come in with a strong prior based on, say, our scientific knowledge. Our model is now biased, because whichever datapoint we happen to observe will not affect our conclusion as much. But this also means it has lower variance. And we can see the benefit of that - we will not be fooled by noise into reaching incorrect conclusions.