Statistical Jargon to Use and to Avoid

Feb 9, 2024

Yay for Type I and Type II errors. Yay for understanding inverse probability; Boo to statistical significance

19 Comments

Quibble: why say Type I and Type II errors instead of false positives and false negatives? With the former I always forget which is which; the latter is intuitively descriptive.

Expand full comment

Reply (2)

Handle

Feb 9

The type nomenclature goes back nearly a century from Egon Pearson. The language problem is that you want to describe two different kinds or sources of some noun meaning error, and falsity is almost never used and "mistake" is not a true synonym. You want them named 'errors' so you can use the more abstract noun in formulating and discussing concepts like "crossover error rate" or "propagation of error" or "margin of error".

If one said "positive error" and "negative error", that would be potentially confusion as to the sign of the number involved. One could day "false positive error" and "false negative error" which I think is superior to and no more redundant than "type I error" and "type II error".

Expand full comment

Reply (2)

Xenophon

Feb 13

"False positive" and "False negative" — with or without the word "error" — also facilitate discussing "true positive" and "true negative." Even though you can easily compute the "true..." rate give the "false..." rate (and vice versa), it's often context-specific which of these four terms is most useful to discuss.

Expand full comment

stu

Feb 10

Interesting and sounds right but then false positive is simply a short for false positive error.

Expand full comment

Candide III

Feb 9

Mathematicians often introduce bland and unintuitive terminology like this. Corresponding jargon from areas where the concepts denoted by the terms find practical use tend to be better. In this case, the Soviet military jargon for Type I error is _false alarm_, and for Type II error - _miss_ (or _missed alarm_).

Expand full comment

John Samples

Feb 9

Arnold, you deservesome kind of award for writing this essay. Great stuff!

Expand full comment

John Engel

Feb 9

Software debugging has 3 error types. Type I; Aha!; You discover someone else's bug. Type II; Oops!: You discover your own bug. Type III: Ah shit!; Someone else discovers your bug.

Expand full comment

econjeff

Feb 9

"In classical statistics, a Type I error means claiming that the evidence for a hypothesis is strong when it isn’t. And a Type II error means failing to recognize that the evidence for a hypothesis actually is strong." I like the post / stack / essay in general, but the bit quoted here is incorrect as stated. Consider the case where some parameter equals zero in the population, and the null hypothesis is that it equals zero. But, as happens five percent of random samples, our estimate is statistically different from zero. In that case, the null is true, we reject the null and thereby make a Type I error, but the evidence against the null is nonetheless strong. That's why we make the mistake!

Expand full comment

Scott Gibb

Feb 15

Here’s an implementation of the trade-off between Type I and Type II errors for an application involving vehicle collision.

https://www.cs.cmu.edu/~astein/pub/TRR-K01.pdf

Expand full comment

BenK

Feb 12

Sowell's Law only holds when the true 'vector' of the decision matrix is known; so that you are simply shifting the risk between the mistakes. In any system where that vector isn't known, there are decisions to be made which are potential 'solutions' or 'categorical errors' and can reduce both kinds of error and/or increase them; or increase one without reducing the other. Removing information content from the assessment of mortgage risk - or including useless information - can worsen all types of error, for example, without reducing any.

Expand full comment

Vaishnav Sunil

Feb 10

At first glance, the way we trade off Type 1 and Type 2 errors seems to be somewhat related to our moral intuitions around commission vs omission, loss aversion and most importantly, how saliently and clearly the failings can be attributed to our deliberate actions. . We don't seem to care about what's most relevant - our expected impact on the world - but we care a whole lot about what people can legibly hold us morally culpable for. For example:

- As a society, we'd much rather avoid committing the crime of prosecuting an innocent person than fail to keep innocent people safe from criminals, because then, it's on the criminals.

- We'd much rather too much regulation that avoids accidents that can be blamed on regulation but we're sanguine about the opportunity cost of those regulations and the countless lives it could improve on the margin.

- I'd rather avoid marrying the wrong person and have to accept that I chose wrong. But it's much less legible whether I did everything I can to find a partner.

Expand full comment

stu

Feb 10Edited

"When you tell your audience (a) is realistic, but people hear (b) and call you a racist, don’t say I didn’t warn you."

I don't think it matters whether they understand inverse probability. Even if they did I think they'd still say the comment was racist. You can't say anything negative about the oppressed no matter what.

In another way, the statement probably is a bit racist. Just like saying women earn 70% what men do, the statement about turnstile jumpers does account for known differences in income, education, etc. I guess one could also see a bit of irony in assumptions that turnstile jumping is accusing the jumper and earning less doesn't accuse the earner. What type error is that?

Expand full comment

NeedleFactory

Feb 9

I'm don't understand why (b) is clearly the statement with "inverse probability"

Replacing "serious kidney ailments" with A, and

replacing "microscopic levels of blood in their urine" with B, we have these two statements:

(d) Men with B often have A.

Presented with (c) and (d), how does one know which statement is the "inverse" correlation?

I think you are using some medical knowledge to make the distinction.

Expand full comment

Scott Gibb

Feb 9

Type I and Type II Errors of the First Amendment

This post was so good that I decided to write a brief summary of it - painted somewhat heavily with my own signature.

https://open.substack.com/pub/scottgibb/p/type-i-and-type-ii-errors-of-the?r=nb3bl&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Expand full comment

Thomas L. Hutcheson

Feb 9

Some of these errors can be avoided (the ones that do not require yes/no answers) by going straight to inputting the distribution that the "significance test" is applied to, into cost benefit analysis. A "statistically insignificant" result applied to a large effect might suggest the use of a new drug where a statistically significant effect of a small effect would not.

Expand full comment

Brian Smith

Feb 9

"Technically, statistical significance means that the sample size is sufficiently large that you can believe the result, as long as everything else about the how the study was done is kosher."

It also depends on the plausibility of the premise. And the reasonableness of reducing the premise and the data to numbers amenable to statistical significance.

The best explanation I've heard was in my econometrics class: A statistical significance of p<.01 means that, if nothing is really going on, I'd see results like this only 1% of the time, and I don't think I'm living in a 1% world, so I accept the proposition."

Most social science experiments take p<.05 as the required level of significance. This means that, if the hypothesis is wrong, 5% of studies will nevertheless conclude that the hypothesis is correct. Since "cannot reject the null hypothesis" results are rarely interesting enough to be published, lots of studies will be published, even with findings that are, in a "real" sense, wrong. Which is probably why most studies won't replicate.

Now, consider some alternative scenarios: Someone believes that ghosts are real and can speak through electronic boxes. He sets up an experiment to test this hypothesis, and meticulously records results. He records sounds in a place believed to be haunted, and records sounds in a place believed not to be haunted as a control. He finds word-like sounds in the former, with p<.05. Does this credibly demonstrate that the haunted location has ghosts, who are communicating through the electronic box? Only if you consider, a priori, that it is plausible that ghosts can communicate through an electronic box. If you don't think that's plausible, you'd need a much lower p value to even think about reconsidering - say p<.000000001.

Expand full comment

Reply (1)