Discussion about this post

User's avatar
Laura Creighton's avatar

But again, we get back to trust. It's really hard to not strengthen your prior when you catch those who disagree lying to you, or misrepresenting the facts, and this is because in daily life you will run into a large number of people who will lie to you precisely because they are wrong. "Ah, I shouldn't strengthen my belief because those liars have always been lying, so I have learned nothing" may be the correct approach -- but I don't know anybody who does this. It's only when you think those who disagree with you are arguing in good faith, and have just made _mistakes_ (which you can point out to them, and if you are correct they will acknowledge) that you can decide that nothing more was learned here. Otherwise 'I have evidence that the people who disagree with me do not care about the truth. I do. Therefore I am more likely to be correct than they are' is really hard to resist.

Expand full comment
Justin's avatar

I think Scott's definition of bias is fine, and works well from a statistical viewpoint. But rather than redefining bias, we can avoid Arnold's wordsmithing by having our goal be to maximize the accuracy of our predictions rather than minimizing our bias. Introducing some bias through an ideology can increase our accuracy because it reduces our variance, e.g. our tendency to overfit our mental model to whatever events we happened to observe by chance. The bias-variance tradeoff from statistics tells us this is a good idea provided our prior does not introduce so much bias that it swamps the benefit obtained from the reduction in variance. I'll expand on this below.

In statistics/machine learning/predictive modeling, there is a very famous "bias-variance tradeoff." Models allow us to use data x to predict an outcome y by using a function: yhat = f(x). We learn the function f by examining past pairs [(x1,y1),...,(xn,yn)]=[X,Y], i.e. the training data. The bias of our model is the expected error. The bias-variance tradeoff says that if we want to reduce the mean squared error of our predictions, we have two choices - either reduce the function's bias or reduce its variance. The value of machine learning is that it allows us to fit very complex models that are unbiased (in the statistical sense), such as deep learning models with billions of parameter. The model is less biased than, say, a linear regression, because it is flexible enough to fit all sorts of complex patterns and so on average it should be correct. But because the function is so complex, the variance could be so high that it is worthless. So in machine learning, we look for ways to reduce the variance through various regularization methods. In Bayesian statistics, we use an informative prior for regularization.

The bias and variance of the model are viewing the training data, [X,Y], as the random variable. When we say on average the model should be correct, we mean that if we train our model on a random sample, we have no reason to expect the model's predictions should be too high vs too low. Some samples would cause our prediction to be too high whereas others would cause it to be too low. A model with high variance will make wildly different predictions depending on what training data is happens to use. But the key point is that the training data is random, and we want to design a method that allows to make a good prediction taking this uncertainty into account.

Bringing this back to the real world, the training data in our minds is our life experience. When we encounter a data point, like the number of heating-degree days in New York being 10% below normal, we can use this to make a prediction about global warming by filtering it through our mental model. Of course our data point didn't have to be this. We also could have learned the same statistic in Seattle or the same statistic from a different point in time. If we come into this situation with a naive unbiased blank slate model, we would overfit to that datapoint. And so depending on what the datapoint we observe is, we could draw wildly different conclusions. In other words, our model has too high variance. On the other hand, we can come in with a strong prior based on, say, our scientific knowledge. Our model is now biased, because whichever datapoint we happen to observe will not affect our conclusion as much. But this also means it has lower variance. And we can see the benefit of that - we will not be fooled by noise into reaching incorrect conclusions.

Expand full comment
19 more comments...

No posts