I would give 95 percent of Arnold Kling's essay's A's. He's helped me clarify my thinking on many issues. I often read an ASK essay and think, "I wish everyone could read this." Victor Davis Hanson loses my respect when he ventures outside his historical specialties and pontificates on current events. I mean, he confirms my biases well enough, but with such leaden predictability I'm embarrassed for our side.
Yea, I love VDH's historical books, and even have a few signed by him. His modern political writing is a little less compelling, however. That might be due to some Gell-Mann amnesia on my part perhaps, but I think that in general his history tends to examine and discuss the issues more roundly.
I don't think Hanson is trying to persuade the unpersuadable; rather he seeks to mobilize opposition to our sad and dangerous cultural decline. To judge by the comments at his site, he seems to be doing this effectively. Those he opposes are not amenable to changing their minds through gentlemanly argument. In the recent Congressional hearing, Rep. Stefanik was not trying to persuade the sorry Presidents of Harvard, MIT, and Penn that they should not tolerate misconduct inconsistent with reasoned discourse in the university, but rather to expose their moral corruption and confusion by persistent aggressive questioning. She did not seek to draw out the possible justifications for their positions, but rather to get them to commit to statements that would discredit their position. In this she succeeded greatly, although her method was quite contrary to Arnold's idea of proper argument.
No one ever graded a cross-examination on whether the lawyer was friendly and fair to the opposition.
In general, any approach to grading only works well when assessing things that are focused on performing as well as possible on the grader's standard of excellence. The farther the purpose of some effort deviates from that goal, the less meaningful such a grade becomes.
This grader works best when grading essays purporting to be neutral, balanced, and reasoned attempts to open the openable minds on the other side. It isn't going to work on and should not be applied to anything admitting to being a one-sided expression of a particular view.
I like VDH, but because I agree with him so much, it feels like confirmation bias reading him.
But, unlike Arnold most of the time, he’s advocating a position. Providing the rationalizations his readership market desires.
I much prefer the thought provoking balance and challenging ideas of the book review.
But there’s something about the grader that strikes me as not true. Immigration is divisive, but is it really so complex? Border, legal restriction on immigration, or no border?
If a border, are the laws being enforced, rule of law, or not enforced?
Caplan honestly favors open borders, but no elected Democrats say so. Probably VDH is factually wrong about 0 deportations, but accurate about 8 million illegals. Every single American who supports rule of law, which Pelosi & many Dems frequently claim as the reason for their witch hunts against Trump, all Americans should be outraged at Biden’s acceptance of 8 million illegals violating border laws.
I think speaking truth is more important than being balanced.
On many questions & issues, the important truth is less what did, in fact, happen, but what will happen in the future. The future is uncertain, so there is not yet truth about it.
And the constant suggestions to increase the size to make it better continue to bother me. I spend time here on ASK because of brevity and breadth, and interest including commenting. Fantasy Intellectuals are better when brief, tho also less balanced.
Abortion, illegal immigration, trans rights & women’s rights, parents’ rights, school choice, race relations & rights. It would be good to see some 9/10 and less than 5 graded essays on the most politically controversial topics. Especially short ones.
I’d be interested if the AI sees any issue as simple, or has any capacity to reward the writer who is able to boil things down to their essence. Because that too is a skill.
"Your essay ... presents a strong and clear viewpoint but lacks balance and fair engagement with opposing perspectives, which are crucial elements in op-ed writing."
Is that really true? An op-ed is limited to 600-800 words. It is generally assumed to be stating a case. Of course, it should do so without lying or being unnecessarily inflammatory but it is often more like a prosecutor's or defense attorney's opening statement than a judge's careful weighing of the evidence. Stating a good, reasoned case supported by evidence may require all those 600 or 800 words. Taking 200 words to "engage[] with opposing perspectives may result in a shallow "on the one hand, on the other hand" that does not provide much enlightenment.
I suppose I have two general problems with your grader. First, it has one standard for grading essays but different essays have different purposes and have to be graded by different criteria. Second, it doesn't recognize the trade-offs involved in limited length essays. In regard to Arnold's review of Sumner, adding "potential solutions" may require less "depth of analysis". If the purpose of the essay is to explain Sumner, "potential solutions" may be irrelevant but "depth of analysis" extremely important.
That's a good point. Hard and short word limits dramatically raise the trade-off between engaging with opposing points at the cost of making your own.
That also raises the question of why one still sees so much paper-era "style inertia" into the digital age. Yes, I can see why when there is a well-circulated print edition one might favor consistency, but one might always add a "this has been shortened from the original version" disclaimer, like they used to do for films on tv. Online you can also address opposing points in footnotes or supplemental information documents, like for scientific literature.
But so long as an author is stuck with a few hundred words of raw text, the grader is going to be too harsh.
For "may be perceived as inflammatory", the grader should provide alternatives terms which are just as proportional to the effect being described, but which wouldn't cost points. The phrase Hanson used was "bankrupting big-city budgets." "Straining" may not be inflammatory (or is it?) but doesn't communicate the magnitude of the fiscal problem. Per Hizoner, "Every community in this city is going to be impacted. We have a $12 billion deficit that we’re going to have to cut — every service in this city is going to be impacted. All of us.”
A grader should show an author "what right looks like" but without tone-policing that waters down the message.
That worried me a bit, too. I find myself wondering how much the grader can handle grading for accuracy as opposed to tone. If someone is writing about someone being executed by beheading, being dinged for using inflammatory language like "decapitated" would be a problem. Sometimes the language matches the reality and using softer euphemisms is just double speak. Or, to us one of my favorite movie lines "I'm not going to calm down, you start getting excited mother f-----!"
I have not yet read your essay, but will, as I know it will be clear and informative.
As for VDH, insofar as he is guilty of not "engaging respectfully with opposing views" - his point in framing that view as nihilistic would seem to be that that his enemies (yes, enemies) are animated by no affirming view - that they posit nothing. If he were to move off this position, I don't think there's anywhere for him to go except to present the left as diabolical - as wishing to raise what is base and low and corrupt, to the highest. Wishing - how to phrase it more sweetly? - to elevate the thug who released from jail - wearing that magic talisman, the ankle monitor - raped and left for dead the girl, thanks to Soros-derived jurisprudence; or the illegal who murdered the cheerleader, now in the ground thanks to the open border mania of the left with the complicity of the libertarian "right". Or the mentally ill dude who killed six people last week after his bail on prior offenses was furnished by the non-profit that exists to get people like that out, unvetted, knowing nothing about them except that they are criminals, thus most deserving of charity, never stopping to wonder why his friends or family didn't want to pay his bail. (I read the local news.)
Would the essay then receive a better grade? If he described this position as sympathetically as possible?
It may not be my favorite thing to read, but a polemic of this sort is very much designed to move minds or stir emotions.
I must withhold my approval of his essay, however - not on these metrics you've instructed the AI to assess - but on its content: the idiot suggestion that America must build more unneeded dams to demonstrate its vitality.
It would be a sign that we *hadn't* given away the farm (in some cases, quite literally, where impoundments are concerned! - when it's not the timber land or the gorgeous scenery) if we still had the once-thriving environmental movement that understood the role of water hustlers in pushing dams on gullible authorities, and the importance of riparian habitat. Further, of course, this crazy desire of the GOP to scrape, pave, drill, clearcut, landfill, flood every surface of the country, and to use eminent domain to do so in the case of dams - is funny in an organization that pretends it is rural people who are its bedrock (lol).
Remember when it was still permitted for conservatives to care about sprawl, when anti-immigration groups were allowed to write papers showing that sprawl was fueled by population growth, and thus heavily by immigrants?
Of course it is precisely letting in 8 million people that will cause our cities to metastasize. And supply the water hustlers the population figure ammo they use to persuade the bubbas that some productive people need to give up their home and livelihood because another bass lake is both needed and going to make the county rich. (No, the bass reservoir is not needed, even with all the newcomers.)
Already, in my state, 300 miles of interstate are well on their way to becoming a single hellish megalopolis. And it is not people named "Hanson" or "Kling" that are filling this ugly dystopia. After that weirdo SC justice - can't remember his name, I think he was from New Hampshire - ruled that cities could use eminent domain for purely commercial purposes - to hand land over from one private entity to another - I knew an old lady and her daughter whose land had I-35 frontage, and which they had made a nature preserve, against whom the local town initiated condemnation proceedings so that they could "have a Target". Why was there a compelling reason for a tiny town to need a Target, when there was one 15 miles down the road in the next town? Well, that *was* the reason, of course - the other town had a Target and thus sales tax revenue, and they wanted some of that sweet revenue for themselves.
(The state legislature, in a rare fit of sanity, passed a law disallowing such use of eminent domain. Do you think that law is going to hold, in this permanent Dodge City world we now live in? Is "building" by definition conservative? That's a hell of a thing to concede, if you haven't seen what's being built.)
A wealthy nation on the trajectory ours *once* appeared to be on, has the power and wisdom to leave things alone, because the land - the environment - the real world of things and creatures - life, that stubborn anti-nihilist fact - is the very first principle of all. This is, obviously, a statement of conservatism than which there is none more central.
Note: You can go back to Kling's earlier posts and confirm but I'm pretty sure his intent was to something akin to evaluating how well the piece is written.
I expect the more inflammatory pamphlets (all of them?) associated with the Revolutionary War period would get an F from the AI, but from the standpoint of the cause at the time, I think it would be hard to argue they were "not well written". Or conversely, had they been written as AK's prompts prefer - we would never have heard of them.
No, not "how well the piece is written", but how well it meets a particular standard: that of neutral, fair, balanced, and reasoned exposition supported by logic and evidence and engaging with the strongest contrary arguments.
There is plenty of excellent writing that would get Fs because totally one-sided. There is plenty of fair and balanced writing that is unreadably stodgy and dull but which would get high marks.
I have no idea if stodgy and dull gets considered but I don't think neutral and balanced are criteria.
"I found this feedback mildly annoying. I did not establish “balance and neutrality” as criteria for grading. I did emphasize treating the other side with respect and due consideration of its strong points."
"I think that balanced argument can be subsumed under engagement with opposing views. Giving opposing views a fair hearing is what I am after. The essay does not have to please someone with an opposing point of view. But it must not leave those with an opposing point of view feeling that their views have been ignored, misunderstood, or--worst of all--misrepresented.
"Potential for Bias: The essay strongly emphasizes the role of identity politics and the oppressor/victim mindset in fostering antisemitism. While these are important factors, a more nuanced exploration of other contributing factors, such as geopolitical dynamics and historical context, could provide a more balanced view."
In further interaction ChatGPT wants IMO to give too much credit to the theory that conflict drives antisemitism rather than the other way around.
The GTP grader cites “engaging other viewpoints” as an area of improvement for VDH and cites your essay as needing to “explore counter arguments” which to me is not very different. Based on that, not sure I see the large disparity in scoring.
"The essay could be strengthened by acknowledging and fairly representing the perspectives and rationales of those you criticize."
I missed where this was mentioned as part of the criteria. Wonderful. I usually, if not always, think this is the most important metric in an opinion piece.
I'm more surprised AI can identify this than most or all other capabilities.
These are good examples Arnold, thank you! I hadn't considered VDH, but he definitely shouldn't do very well based on your FITs grading, much as I love his historical work.
One thing I am curious about that perhaps you have seen, is whether the grader seems good at spotting contradictions or inaccuracies that do not match the tone. So for instance, if I were to write "After the terrorist attacks of 2002, the US was justified in going to war with Iraq" does the grader pick up on that as a "What are you talking about" sort of thing? Or, as VDH writes about bankrupting budgets, does it seem to do a pretty good job recognizing when the stronger language is the more accurate vs more neutral language? Does one always seem to get dinged by using more precise, if less emotionally flat, language?
Thanks again for posting these. I am thinking this would make a great ap to feed just about everything I write into just for the quick feedback :D
I fear you and your GTP grader suffer from the "Minds Can Be Changed With Essays" delusion.
I would give 95 percent of Arnold Kling's essay's A's. He's helped me clarify my thinking on many issues. I often read an ASK essay and think, "I wish everyone could read this." Victor Davis Hanson loses my respect when he ventures outside his historical specialties and pontificates on current events. I mean, he confirms my biases well enough, but with such leaden predictability I'm embarrassed for our side.
Yea, I love VDH's historical books, and even have a few signed by him. His modern political writing is a little less compelling, however. That might be due to some Gell-Mann amnesia on my part perhaps, but I think that in general his history tends to examine and discuss the issues more roundly.
He isn't upset with the past, he's angry at leftists in the present. My impression is that Rage(VDH) = (Year-1930)
VDH has written an opinion piece, it's not an argumentative essay.
I don't think Hanson is trying to persuade the unpersuadable; rather he seeks to mobilize opposition to our sad and dangerous cultural decline. To judge by the comments at his site, he seems to be doing this effectively. Those he opposes are not amenable to changing their minds through gentlemanly argument. In the recent Congressional hearing, Rep. Stefanik was not trying to persuade the sorry Presidents of Harvard, MIT, and Penn that they should not tolerate misconduct inconsistent with reasoned discourse in the university, but rather to expose their moral corruption and confusion by persistent aggressive questioning. She did not seek to draw out the possible justifications for their positions, but rather to get them to commit to statements that would discredit their position. In this she succeeded greatly, although her method was quite contrary to Arnold's idea of proper argument.
No one ever graded a cross-examination on whether the lawyer was friendly and fair to the opposition.
In general, any approach to grading only works well when assessing things that are focused on performing as well as possible on the grader's standard of excellence. The farther the purpose of some effort deviates from that goal, the less meaningful such a grade becomes.
This grader works best when grading essays purporting to be neutral, balanced, and reasoned attempts to open the openable minds on the other side. It isn't going to work on and should not be applied to anything admitting to being a one-sided expression of a particular view.
I like VDH, but because I agree with him so much, it feels like confirmation bias reading him.
But, unlike Arnold most of the time, he’s advocating a position. Providing the rationalizations his readership market desires.
I much prefer the thought provoking balance and challenging ideas of the book review.
But there’s something about the grader that strikes me as not true. Immigration is divisive, but is it really so complex? Border, legal restriction on immigration, or no border?
If a border, are the laws being enforced, rule of law, or not enforced?
Caplan honestly favors open borders, but no elected Democrats say so. Probably VDH is factually wrong about 0 deportations, but accurate about 8 million illegals. Every single American who supports rule of law, which Pelosi & many Dems frequently claim as the reason for their witch hunts against Trump, all Americans should be outraged at Biden’s acceptance of 8 million illegals violating border laws.
I think speaking truth is more important than being balanced.
On many questions & issues, the important truth is less what did, in fact, happen, but what will happen in the future. The future is uncertain, so there is not yet truth about it.
And the constant suggestions to increase the size to make it better continue to bother me. I spend time here on ASK because of brevity and breadth, and interest including commenting. Fantasy Intellectuals are better when brief, tho also less balanced.
Abortion, illegal immigration, trans rights & women’s rights, parents’ rights, school choice, race relations & rights. It would be good to see some 9/10 and less than 5 graded essays on the most politically controversial topics. Especially short ones.
I’d be interested if the AI sees any issue as simple, or has any capacity to reward the writer who is able to boil things down to their essence. Because that too is a skill.
"Your essay ... presents a strong and clear viewpoint but lacks balance and fair engagement with opposing perspectives, which are crucial elements in op-ed writing."
Is that really true? An op-ed is limited to 600-800 words. It is generally assumed to be stating a case. Of course, it should do so without lying or being unnecessarily inflammatory but it is often more like a prosecutor's or defense attorney's opening statement than a judge's careful weighing of the evidence. Stating a good, reasoned case supported by evidence may require all those 600 or 800 words. Taking 200 words to "engage[] with opposing perspectives may result in a shallow "on the one hand, on the other hand" that does not provide much enlightenment.
I suppose I have two general problems with your grader. First, it has one standard for grading essays but different essays have different purposes and have to be graded by different criteria. Second, it doesn't recognize the trade-offs involved in limited length essays. In regard to Arnold's review of Sumner, adding "potential solutions" may require less "depth of analysis". If the purpose of the essay is to explain Sumner, "potential solutions" may be irrelevant but "depth of analysis" extremely important.
That's a good point. Hard and short word limits dramatically raise the trade-off between engaging with opposing points at the cost of making your own.
That also raises the question of why one still sees so much paper-era "style inertia" into the digital age. Yes, I can see why when there is a well-circulated print edition one might favor consistency, but one might always add a "this has been shortened from the original version" disclaimer, like they used to do for films on tv. Online you can also address opposing points in footnotes or supplemental information documents, like for scientific literature.
But so long as an author is stuck with a few hundred words of raw text, the grader is going to be too harsh.
For "may be perceived as inflammatory", the grader should provide alternatives terms which are just as proportional to the effect being described, but which wouldn't cost points. The phrase Hanson used was "bankrupting big-city budgets." "Straining" may not be inflammatory (or is it?) but doesn't communicate the magnitude of the fiscal problem. Per Hizoner, "Every community in this city is going to be impacted. We have a $12 billion deficit that we’re going to have to cut — every service in this city is going to be impacted. All of us.”
A grader should show an author "what right looks like" but without tone-policing that waters down the message.
That worried me a bit, too. I find myself wondering how much the grader can handle grading for accuracy as opposed to tone. If someone is writing about someone being executed by beheading, being dinged for using inflammatory language like "decapitated" would be a problem. Sometimes the language matches the reality and using softer euphemisms is just double speak. Or, to us one of my favorite movie lines "I'm not going to calm down, you start getting excited mother f-----!"
I have not yet read your essay, but will, as I know it will be clear and informative.
As for VDH, insofar as he is guilty of not "engaging respectfully with opposing views" - his point in framing that view as nihilistic would seem to be that that his enemies (yes, enemies) are animated by no affirming view - that they posit nothing. If he were to move off this position, I don't think there's anywhere for him to go except to present the left as diabolical - as wishing to raise what is base and low and corrupt, to the highest. Wishing - how to phrase it more sweetly? - to elevate the thug who released from jail - wearing that magic talisman, the ankle monitor - raped and left for dead the girl, thanks to Soros-derived jurisprudence; or the illegal who murdered the cheerleader, now in the ground thanks to the open border mania of the left with the complicity of the libertarian "right". Or the mentally ill dude who killed six people last week after his bail on prior offenses was furnished by the non-profit that exists to get people like that out, unvetted, knowing nothing about them except that they are criminals, thus most deserving of charity, never stopping to wonder why his friends or family didn't want to pay his bail. (I read the local news.)
Would the essay then receive a better grade? If he described this position as sympathetically as possible?
It may not be my favorite thing to read, but a polemic of this sort is very much designed to move minds or stir emotions.
I must withhold my approval of his essay, however - not on these metrics you've instructed the AI to assess - but on its content: the idiot suggestion that America must build more unneeded dams to demonstrate its vitality.
It would be a sign that we *hadn't* given away the farm (in some cases, quite literally, where impoundments are concerned! - when it's not the timber land or the gorgeous scenery) if we still had the once-thriving environmental movement that understood the role of water hustlers in pushing dams on gullible authorities, and the importance of riparian habitat. Further, of course, this crazy desire of the GOP to scrape, pave, drill, clearcut, landfill, flood every surface of the country, and to use eminent domain to do so in the case of dams - is funny in an organization that pretends it is rural people who are its bedrock (lol).
Remember when it was still permitted for conservatives to care about sprawl, when anti-immigration groups were allowed to write papers showing that sprawl was fueled by population growth, and thus heavily by immigrants?
Of course it is precisely letting in 8 million people that will cause our cities to metastasize. And supply the water hustlers the population figure ammo they use to persuade the bubbas that some productive people need to give up their home and livelihood because another bass lake is both needed and going to make the county rich. (No, the bass reservoir is not needed, even with all the newcomers.)
Already, in my state, 300 miles of interstate are well on their way to becoming a single hellish megalopolis. And it is not people named "Hanson" or "Kling" that are filling this ugly dystopia. After that weirdo SC justice - can't remember his name, I think he was from New Hampshire - ruled that cities could use eminent domain for purely commercial purposes - to hand land over from one private entity to another - I knew an old lady and her daughter whose land had I-35 frontage, and which they had made a nature preserve, against whom the local town initiated condemnation proceedings so that they could "have a Target". Why was there a compelling reason for a tiny town to need a Target, when there was one 15 miles down the road in the next town? Well, that *was* the reason, of course - the other town had a Target and thus sales tax revenue, and they wanted some of that sweet revenue for themselves.
(The state legislature, in a rare fit of sanity, passed a law disallowing such use of eminent domain. Do you think that law is going to hold, in this permanent Dodge City world we now live in? Is "building" by definition conservative? That's a hell of a thing to concede, if you haven't seen what's being built.)
A wealthy nation on the trajectory ours *once* appeared to be on, has the power and wisdom to leave things alone, because the land - the environment - the real world of things and creatures - life, that stubborn anti-nihilist fact - is the very first principle of all. This is, obviously, a statement of conservatism than which there is none more central.
This strikes me as a weird framing for grading. VDH preaches to his choir. On that score he does remarkably well.
Preaching to the choir may be entertaining and bring temporary happiness to some but that doesn't make it good writing.
Yes. I don't think the purpose of op-eds is to showcase good writing.
not what I said.
Note: You can go back to Kling's earlier posts and confirm but I'm pretty sure his intent was to something akin to evaluating how well the piece is written.
I expect the more inflammatory pamphlets (all of them?) associated with the Revolutionary War period would get an F from the AI, but from the standpoint of the cause at the time, I think it would be hard to argue they were "not well written". Or conversely, had they been written as AK's prompts prefer - we would never have heard of them.
No, not "how well the piece is written", but how well it meets a particular standard: that of neutral, fair, balanced, and reasoned exposition supported by logic and evidence and engaging with the strongest contrary arguments.
There is plenty of excellent writing that would get Fs because totally one-sided. There is plenty of fair and balanced writing that is unreadably stodgy and dull but which would get high marks.
I have no idea if stodgy and dull gets considered but I don't think neutral and balanced are criteria.
"I found this feedback mildly annoying. I did not establish “balance and neutrality” as criteria for grading. I did emphasize treating the other side with respect and due consideration of its strong points."
"I think that balanced argument can be subsumed under engagement with opposing views. Giving opposing views a fair hearing is what I am after. The essay does not have to please someone with an opposing point of view. But it must not leave those with an opposing point of view feeling that their views have been ignored, misunderstood, or--worst of all--misrepresented.
Your grader gets an F from VDH fans. :)
I just fed Haidt's new articles to it. https://www.afterbabel.com/p/antisemitism-on-campus. This one got an 85. The book excerpt got an 82. Both way too low IMO. From the ChatGPT critique of #1:
"Potential for Bias: The essay strongly emphasizes the role of identity politics and the oppressor/victim mindset in fostering antisemitism. While these are important factors, a more nuanced exploration of other contributing factors, such as geopolitical dynamics and historical context, could provide a more balanced view."
In further interaction ChatGPT wants IMO to give too much credit to the theory that conflict drives antisemitism rather than the other way around.
The GTP grader cites “engaging other viewpoints” as an area of improvement for VDH and cites your essay as needing to “explore counter arguments” which to me is not very different. Based on that, not sure I see the large disparity in scoring.
"The essay could be strengthened by acknowledging and fairly representing the perspectives and rationales of those you criticize."
I missed where this was mentioned as part of the criteria. Wonderful. I usually, if not always, think this is the most important metric in an opinion piece.
I'm more surprised AI can identify this than most or all other capabilities.
This was emphasized in the grading criteria.
I've always found VDH's military histories far more valuable and well-written than his political commentary, which is too partisan for my taste.
These are good examples Arnold, thank you! I hadn't considered VDH, but he definitely shouldn't do very well based on your FITs grading, much as I love his historical work.
One thing I am curious about that perhaps you have seen, is whether the grader seems good at spotting contradictions or inaccuracies that do not match the tone. So for instance, if I were to write "After the terrorist attacks of 2002, the US was justified in going to war with Iraq" does the grader pick up on that as a "What are you talking about" sort of thing? Or, as VDH writes about bankrupting budgets, does it seem to do a pretty good job recognizing when the stronger language is the more accurate vs more neutral language? Does one always seem to get dinged by using more precise, if less emotionally flat, language?
Thanks again for posting these. I am thinking this would make a great ap to feed just about everything I write into just for the quick feedback :D
Is there a tutorial somewhere on how to use GPT4 to create your own AI like this?
https://help.openai.com/en/articles/8554397-creating-a-gpt