Andy Matuschak's essay "Why Books Don't Work" is evaluated on its reasoning, neutrality, and respectful engagement:
Clarity and Fairness of Arguments:
Matuschak's central thesis is clearly articulated and supported with relevant examples, such as the challenges faced by readers of non-fiction tomes and the implicit model of learning in textbooks.
The essay avoids overgeneralization by acknowledging exceptions and the variability in individual learning processes.
Bias and Representation:
The essay maintains a neutral tone, focusing on the critique of the medium rather than particular authors or types of books.
It does not misrepresent the nature of books but rather challenges commonly held assumptions about their efficacy in knowledge transfer.
Logical Fallacies:
No evident logical fallacies are present. The arguments are based on observable phenomena and logical deductions rather than fallacious reasoning.
Grade: 90/100
The essay scores high on clarity, fairness, and neutrality. However, it could benefit from more empirical data to support its claims, as much of the argument relies on the author's observations and hypotheses.
Suggestions for Improvement:
Incorporating empirical studies or quantitative data on reading comprehension and retention could strengthen the argument.
Addressing potential counterarguments, such as the role of books in different cultural or educational contexts, would add depth to the analysis.
Exploring alternative or complementary learning methods in more detail could provide a more holistic view of the subject.
It's interesting that it came up with something slightly different when I tried it. I know that it tries to introduce some randomness, and I know that human graders probably would not always produce the exact same grade and feedback, but it still seems undesirable.
With Internet Search Engines, people have been noticing for a long that the engines spit out very different results (and autocompletion suggestions), both in substance and in rank, depending on hundreds of distinct factors the engine tried to learn about the user.
If you cleared the cache or swapped IP addresses with a VPN or used a different browser, you'd get different answers to your query, some of which were clearly intentionally installed filters of ideologically-motivated thumbs on the scale.
I have been dismayed to discover that past searches which worked (which I could see in my history) no longer functioned when I tried them again later, though they clearly point to pages still online, and this even when I used all the advanced search tools like "site:XYZ" and so forth.
The selective and targeted shaping of discoverable information is more appropriate to a military psychological operation than to something purporting to be a mere search engine selling ads, and is extremely creepy and Orwellian stuff. I don't see any reason to believe the same motivations would manifest in manipulations to these new tools.
With the answers changing all the time, there is plausible deniability for whether or not similar thumbs on the scale have been inserted into the ChatBots to similarly but covertly shape answers to support one side in a controversy over another.
It gave a B+ to my recent Substack post, which I think is a fair grade. The feedback was: "It could be improved with more specific examples and a deeper analysis of the implications of the differences between LLM and human cognition." The essay perhaps assumes too much familiarity on the part of the reader with the two subject matters, which are rap lyrics and human & LLM cognition.
The essay also isn't really an op-ed style essay, given its subject matter, but I am not sure why that should be relevant to the grader.
This essay stands out for its clear and well-structured arguments, balanced engagement with multiple viewpoints, and demonstration of intellectual humility. It avoids overgeneralizations and respects the complexity of the subject. The author's approach of blending personal experiences with broader industry data and trends adds depth to the analysis. Improvements could include perhaps a more direct engagement with counterarguments or a deeper exploration of the psychological and lifestyle factors that influence the decision to join a startup or a large company. Nonetheless, the essay is an exemplary piece in terms of reasoned argument and fair representation of diverse perspectives.
I asked ChatGPT to critique the first chapter of a novel I'm working on. It's comments were absolutely on the money - I would venture to say more detailed and insightful than any human editor I've worked with. A humbling experience.
I'd like to see it take a shot at some famous (or infamous) Supreme Court opinions and dissents, perhaps sanitized to exclude the legal citation coding and footnotes if necessary for effective grading. I wonder if the bots are smart enough yet to pick up on all the "professionally and politely formulated incoherence" at the heart of many recent opinions.
The original was in the Tallahassee Democrat but I think it was gated after a few articles per month so I found a place that copied it. This deals with the murder of Dan Markel, a Florida State law prof.
The overall grade was a C-, actually a little better than I thought it would give. Grade inflation is all around us.
BIG concern: The ChatGPT models are constantly being updated/learning. Some of the info that they "learn from" has and will come from it's own outputs. Feedback time. Not sure how that will be deal with.
I recall thinking I had learned some things I didn't know and wouldn't have guessed from this article. But I read it years ago and don't have a strong opinion about its findings. Still, it seems like it would be a good candidate for memory-holing so might as well subject it to a test while it's still up.
I had never heard of the incidents discussed, and feel it is unlikely that I *could* learn the truth of them from a google search at this date; and harbor a similar doubt about what your computer program would be able to do with presumably a whole lot of unreliable verbiage to mine. The truth seems more and more inconvenient, and an invention circa 2023 seems unlikely to be proof against that unfortunate fact. But I am by no means confident about any of it.
Generally something rubs me the wrong way when there's a terrible event, like the one in Tulsa - and the response is: "This event is not enough to sow perpetual hatred with - we want our own terrible event like that, let's see what we've got to work with!"
I tried the first one on the list, Superlinear returns. The grader gave lots of helpful feedback, ending with "Your essay effectively unpacks the concept of superlinear returns, using a blend of historical, scientific, and business examples. To elevate it further, consider incorporating a more balanced perspective by addressing potential critiques or downsides, and offering clearer practical advice for readers seeking to apply these concepts. Your intellectual curiosity and depth of argument are commendable, and these enhancements could make your essay even more impactful."
All in split-seconds. I'm convinced that ChatGPT as a grading tool is a powerful use case.
Yep, a very powerful use case. I also agree that there is currently a bit too much insistence on balance and neutrality. Right now my impression is that for every expression that is the equivalent of "2+2=4" the grader would take away points every time one failed to add, "To be sure, there are those who say 2+2=5 ... " Can't get an A for neutrality without getting a C- for Dryasdust styling.
Another challenge is the interactive visualisation essay, such as this one by Bret Victor about interactive visualisation. http://worrydream.com/LadderOfAbstraction/ I think that ChatGPT ought to be able to give us much, much better interactive visualisations without all the hard work Bret alludes to, by giving us an 'explainer' to go with our explanations -- something a reader can use to ask us what we really meant when we wrote something, and get that bit correct.
Dan Williams maybe coined a great phrase, Marketplace of Rationalizations. Importantly different than misinformation, and a better explanation for so much false belief. Especially among Dems.
I’ve run several essays/opinion pieces in a perfunctory way just to quickly see what it would do, including articles from Heather Heying, David Friedman, Aaron Renn, and an opinion piece from Mother Jones. Like you, I find the emphasis on neutrality to be irritating, after all people *are* expressing an opinion. Also, it doesn’t always assign a letter grade, giving Heying’s article 4/10, Renn’s a C+, and saying the Mother Jones piece “would receive a low score for balanced argumentation.” I liked that it considered an array of factors, such as counterarguments, fallacies (if present), assumptions, etc. Here's an essay suggestion - https://www.notonyourteam.co.uk/p/winning-through-social-dysfunction
Why Books Don't Work by Andy Matuschak https://andymatuschak.org/books/
I can't argue with what the GPT came up with:
Andy Matuschak's essay "Why Books Don't Work" is evaluated on its reasoning, neutrality, and respectful engagement:
Clarity and Fairness of Arguments:
Matuschak's central thesis is clearly articulated and supported with relevant examples, such as the challenges faced by readers of non-fiction tomes and the implicit model of learning in textbooks.
The essay avoids overgeneralization by acknowledging exceptions and the variability in individual learning processes.
Bias and Representation:
The essay maintains a neutral tone, focusing on the critique of the medium rather than particular authors or types of books.
It does not misrepresent the nature of books but rather challenges commonly held assumptions about their efficacy in knowledge transfer.
Logical Fallacies:
No evident logical fallacies are present. The arguments are based on observable phenomena and logical deductions rather than fallacious reasoning.
Grade: 90/100
The essay scores high on clarity, fairness, and neutrality. However, it could benefit from more empirical data to support its claims, as much of the argument relies on the author's observations and hypotheses.
Suggestions for Improvement:
Incorporating empirical studies or quantitative data on reading comprehension and retention could strengthen the argument.
Addressing potential counterarguments, such as the role of books in different cultural or educational contexts, would add depth to the analysis.
Exploring alternative or complementary learning methods in more detail could provide a more holistic view of the subject.
It's interesting that it came up with something slightly different when I tried it. I know that it tries to introduce some randomness, and I know that human graders probably would not always produce the exact same grade and feedback, but it still seems undesirable.
With Internet Search Engines, people have been noticing for a long that the engines spit out very different results (and autocompletion suggestions), both in substance and in rank, depending on hundreds of distinct factors the engine tried to learn about the user.
If you cleared the cache or swapped IP addresses with a VPN or used a different browser, you'd get different answers to your query, some of which were clearly intentionally installed filters of ideologically-motivated thumbs on the scale.
I have been dismayed to discover that past searches which worked (which I could see in my history) no longer functioned when I tried them again later, though they clearly point to pages still online, and this even when I used all the advanced search tools like "site:XYZ" and so forth.
The selective and targeted shaping of discoverable information is more appropriate to a military psychological operation than to something purporting to be a mere search engine selling ads, and is extremely creepy and Orwellian stuff. I don't see any reason to believe the same motivations would manifest in manipulations to these new tools.
With the answers changing all the time, there is plausible deniability for whether or not similar thumbs on the scale have been inserted into the ChatBots to similarly but covertly shape answers to support one side in a controversy over another.
That variability is a feature and not a bug. You get the same phenomenon when you press "regenerate". The result is always slightly different.
It gave a B+ to my recent Substack post, which I think is a fair grade. The feedback was: "It could be improved with more specific examples and a deeper analysis of the implications of the differences between LLM and human cognition." The essay perhaps assumes too much familiarity on the part of the reader with the two subject matters, which are rap lyrics and human & LLM cognition.
The essay also isn't really an op-ed style essay, given its subject matter, but I am not sure why that should be relevant to the grader.
The link is here: https://davefriedman.substack.com/p/beyond-the-beat-how-ai-interprets
FWIW, a human who read the same post commented that I was "off base," so I suppose that person would have given it an F.
Was a good note explaining human context rather than word association probabilities. Didn’t see the off base comment nor why he thought so.
Fewer points briefly is better.
Great clickbait girls in bikinis.
I've used it to grade one of my favorite essays: https://danluu.com/startup-tradeoffs/. I agree with the assessment.
----
Overall Grade: 9/10
This essay stands out for its clear and well-structured arguments, balanced engagement with multiple viewpoints, and demonstration of intellectual humility. It avoids overgeneralizations and respects the complexity of the subject. The author's approach of blending personal experiences with broader industry data and trends adds depth to the analysis. Improvements could include perhaps a more direct engagement with counterarguments or a deeper exploration of the psychological and lifestyle factors that influence the decision to join a startup or a large company. Nonetheless, the essay is an exemplary piece in terms of reasoned argument and fair representation of diverse perspectives.
I asked ChatGPT to critique the first chapter of a novel I'm working on. It's comments were absolutely on the money - I would venture to say more detailed and insightful than any human editor I've worked with. A humbling experience.
I'd like to see how it handles something like this...
https://josephheath.substack.com/p/a-simple-theory-of-cancel-culture
I'd like to see it take a shot at some famous (or infamous) Supreme Court opinions and dissents, perhaps sanitized to exclude the legal citation coding and footnotes if necessary for effective grading. I wonder if the bots are smart enough yet to pick up on all the "professionally and politely formulated incoherence" at the heart of many recent opinions.
I purposely gave it what I thought was a poorly reasoned piece.
https://taxprof.typepad.com/taxprof_blog/2023/12/the-case-for-and-against-keeping-donna-adelson-in-jail-before-her-trial-in-dan-markels-murder.html
The original was in the Tallahassee Democrat but I think it was gated after a few articles per month so I found a place that copied it. This deals with the murder of Dan Markel, a Florida State law prof.
The overall grade was a C-, actually a little better than I thought it would give. Grade inflation is all around us.
Has anyone asked the grader to rewrite an essay such that it would get a better grade?
BIG concern: The ChatGPT models are constantly being updated/learning. Some of the info that they "learn from" has and will come from it's own outputs. Feedback time. Not sure how that will be deal with.
C
Or:
https://devinhelton.com/why-urban-decay
I recall thinking I had learned some things I didn't know and wouldn't have guessed from this article. But I read it years ago and don't have a strong opinion about its findings. Still, it seems like it would be a good candidate for memory-holing so might as well subject it to a test while it's still up.
I nominate one of Helen Andrews' revising-the-revisionists pieces (I think, at this point, she would have to be granted the honor of being the revisionist). Maybe this one: https://www.theamericanconservative.com/how-fake-history-gets-made/
I had never heard of the incidents discussed, and feel it is unlikely that I *could* learn the truth of them from a google search at this date; and harbor a similar doubt about what your computer program would be able to do with presumably a whole lot of unreliable verbiage to mine. The truth seems more and more inconvenient, and an invention circa 2023 seems unlikely to be proof against that unfortunate fact. But I am by no means confident about any of it.
Generally something rubs me the wrong way when there's a terrible event, like the one in Tulsa - and the response is: "This event is not enough to sow perpetual hatred with - we want our own terrible event like that, let's see what we've got to work with!"
Please try some of these: paulgraham.com/articles.html
I tried the first one on the list, Superlinear returns. The grader gave lots of helpful feedback, ending with "Your essay effectively unpacks the concept of superlinear returns, using a blend of historical, scientific, and business examples. To elevate it further, consider incorporating a more balanced perspective by addressing potential critiques or downsides, and offering clearer practical advice for readers seeking to apply these concepts. Your intellectual curiosity and depth of argument are commendable, and these enhancements could make your essay even more impactful."
All in split-seconds. I'm convinced that ChatGPT as a grading tool is a powerful use case.
Yep, a very powerful use case. I also agree that there is currently a bit too much insistence on balance and neutrality. Right now my impression is that for every expression that is the equivalent of "2+2=4" the grader would take away points every time one failed to add, "To be sure, there are those who say 2+2=5 ... " Can't get an A for neutrality without getting a C- for Dryasdust styling.
Indeed, I am not sure what sort of balance Graham's essay lacks.
Did you write a substack on how you programmed your grader? If so, I missed it.
I predict that GPT will have difficulty when the essay in question is truthful but not sincere. I am curious as to what it makes of Matt Labash's _Living Like a Liberal_ https://www.washingtonexaminer.com/weekly-standard/living-like-a-liberal
Another challenge is the interactive visualisation essay, such as this one by Bret Victor about interactive visualisation. http://worrydream.com/LadderOfAbstraction/ I think that ChatGPT ought to be able to give us much, much better interactive visualisations without all the hard work Bret alludes to, by giving us an 'explainer' to go with our explanations -- something a reader can use to ask us what we really meant when we wrote something, and get that bit correct.
Dan Williams maybe coined a great phrase, Marketplace of Rationalizations. Importantly different than misinformation, and a better explanation for so much false belief. Especially among Dems.
https://www.cambridge.org/core/journals/economics-and-philosophy/article/marketplace-of-rationalizations/41FB096344BD344908C7C992D0C0C0DC
Since I believe the heart decides, but tells the brain to rationalize, this article confirms that bias.
The projection is unreal.
I’ve run several essays/opinion pieces in a perfunctory way just to quickly see what it would do, including articles from Heather Heying, David Friedman, Aaron Renn, and an opinion piece from Mother Jones. Like you, I find the emphasis on neutrality to be irritating, after all people *are* expressing an opinion. Also, it doesn’t always assign a letter grade, giving Heying’s article 4/10, Renn’s a C+, and saying the Mother Jones piece “would receive a low score for balanced argumentation.” I liked that it considered an array of factors, such as counterarguments, fallacies (if present), assumptions, etc. Here's an essay suggestion - https://www.notonyourteam.co.uk/p/winning-through-social-dysfunction
I am not allowed to like this? It’s interesting that grading goes from numbers to letters to mere qualitative notes.
I'm not complaining, just noting the variation which, as some others say, isn't a bad thing.
It would be interesting to see if ChatGPT is as concerned with balance when grading a left leaning essay.
It seemed to be concerned when I gave it part of a post from Kareem Abdul-Jabbar's substack