We often want to know how outcomes are affected by policy and by underlying conditions. For example, we want to know how education affects earning capacity. We want to know how drug use affects crime.
When you read a study, one important question to ask is “How did they measure that?” Because researchers almost never have a perfect measure of the variables that they are talking about. And measurement error affects the results.
By measurement error in this context, I mean pure noise. It is not bias in the measurement, which is a separate issue. Measurement error comes from whatever we “leave out” when we pick a real-world measuring instrument.
Consider looking at the effect of education on salary. When we measure education as “number of years of schooling,” we leave out anything to do with educational quality. When we measure salary in dollar terms, we leave out cost of living differences in different locations.
Measurement error makes the relationship we observe weaker than the true relationship.
If we get years-of-schooling data and salary data on a thousand individuals, and we find that 40 percent of the variation in salary across individuals can be explained by years of schooling, then we know that if we had better measures of education and income we would be able to explain even more of the variation.
This becomes really important when more than one explanatory variable is involved. The variable that is measured with the least error will take some of the explanatory power away from the other variable.
For example, suppose we believe that salary is affected by individual intelligence and character. If our measure of intelligence has very little noise but our measure of character has a great deal of measurement error, then we are likely to report that when it comes to salary intelligence matters a lot and character matters very little. In reality, the opposite could be the case, and our result is coming from measurement error in the “character” variable.
In Steve Sailer’s review of my review of his book, Sailer writes,
I set out to write a series of articles proving that the federal government's racial/ethnic categories were scientifically ridiculous and led to useless data. But as I studied the question in depth, I came to the opposite conclusion that ... eh ... in the big picture of things, they are good enough for government work.
We know that relative to any truly scientific way to group people according to population genetics, self-reported race will be a noisy measure. But the fact that self-reported race correlates with variables like IQ and testosterone levels means that self-reported race measures something. If it were all noise, it would not correlate with anything.
Similarly, we can be certain that IQ is not meaningless. It may be a noisy measure of intelligence, but if it were only noise then its value in predicting outcomes would be nil. Instead, it helps predict grades, test scores, income, longevity, and other measures.
Whenever I see a study, I look carefully at how the variables are measured. I consider the likely magnitude of measurement error. I try to guess how this could affect the results. Thinking about measurement error is an important habit to cultivate.
substacks referenced above:
@
There is a tendency to prioritize variables that can be objectively measured over qualitative variables. This preference may arise from the belief that quantifiable data is more objective or simply because such data is easier to collect and analyze. As a result, vital information that is not easily captured by numerical measurements may be neglected or undervalued.
For example, P. T. Bauer observed that development economists tend to focus on physical and financial resources, which can be measured, while ignoring individual, cultural, social, and political factors that cannot be measured but that have a profound effect on a nation's productivity.
A common issue stemming from this "quantitative bias" is the evaluation of a program's or policy's efficacy based on measurable inputs rather than qualitative outcomes. For example, improvements in education are often assessed by dollar expenditures and class size instead of gains in students’ knowledge or their ability to think critically.
Goodhart’s Law states that any quantifiable indicator used as a proxy for a non-quantifiable goal will eventually become the goal, making it useless as an indicator. For example, the manager of a large IT department decided to measure production by the number of completed work orders, or “tickets.” Overnight, tasks that previously required a single ticket were split up into multiple tickets – one for each subtask. Productivity fell as time and resources were diverted to filling out and completing tickets instead of designing and writing software. What you measure is what you get; bean counters get beans.
AIUI this is one of the points made in this old review of "Hive Mind":
https://slatestarcodex.com/2015/12/08/book-review-hive-mind/
Jones says "average national IQ matters more [economically] than individual IQ" and, I think, wants that to have the counterintuitive implication that how intelligent you personally are doesn't make such a big difference to your personal economic outcomes. But if (as seems plausible) individual IQ score is a noisier measure of individual intelligence than average population IQ is of population average intelligence, that would have the measurement error effect you describe.