Grading an Essay by Noah Smith

Nov 9, 2023

Using ChatGPT

31 Comments

Gordon Tremeshko

Nov 9, 2023Edited

"It’s normal, everyday Americans who must spend hours upon hours of their day replying to viral tweets and desperately making TikTok rebuttal videos, lest their fellow Americans fall victim to the siren song of totalitarianism. We are all somehow expected to sacrifice hours of day to do thankless, unpaid work pushing back against illiberal narratives spread relentlessly by people for whom spreading those narratives is a full-time salaried job."

I can't believe he actually wrote that. Arnold, you said the other day you were left speechless by some author's intellectual arrogance. I find Noah's opinion here similar. The idea that if not for brave truth tellers like him, we'd all be the witless dupes of Moscow and Beijing reveals a level of intellectual arrogance that I find, frankly, discrediting. Especially from someone of no particular accomplishment. I grade it a D with extreme prejudice.

Expand full comment

Yea these claims in particular were really sketchy:

"our defensive efforts should focus entirely on preventing foreign governments from controlling our information ecosystem."

" One of our government’s essential functions is to guard us against such powers."

Controlling our information ecosystem... how is that equivalent to putting out propaganda? Is the implication that the US government should control our information ecosystem? (Yes?)

Where is it in the Constitution that the government is supposed to limit information or combat information? I am pretty sure the government is absolutely not supposed to protect us from wrongthink.

Those are really huge things to just toss into an essay, and should be harshly graded. I can see how they weren't part of the rubric and so get a pass, but especially that second point is a big one, and should be considered part of the argument that the piece is making that needs to be substantiated. Presumably it was ignored by the system because it is at the end instead of being up in the introduction type section.

Expand full comment

At a guess I would have said that "normal, everyday Americans" scarcely know what TikTok is; or if they do, go to it for something to laugh at. I think he should entertain the possibility that the people who produce the objectionable content are quite possibly the people pretending to "consume" it as well, and essays such as his only elevate it unnecessarily and even counterproductively.

Expand full comment

One only has to look at what gets reposted on Facebook (I suspect Twitter too) to see that many people are duped by ridiculous claims of fact.

... as I wrote that I realized there is nothing saying all of us can't be duped from time to time. He didn't say it was always the same ones getting duped and the same ones being the truth tellers.

Expand full comment

Despite many disparaging comments, I am encouraged by this exercise. Why? (A) Although early stage experimentation, it is a step, or example, in the direction of setting up a grading mechanism that makes inroads into the American education system, which is in serious need of overhaul. (B) Clearly, grading isn't the only thing, or even the first thing, in need of overhaul in our education system.. Other posts regarding the clear potential of tutoring by AI seems ripe for exploration as another education reform. (C) Discovery of other education uses for AI need to be investigated and experimented. Now, if we can only think about how to extirpate the NEA...

Expand full comment

1) Compare:

Kling: "It earns a C- if it neither considers the possibility that it could be wrong nor argues that the claim must certainly be right."

ChatGPT: "it doesn't simply remain neutral without considering whether the claim is right or wrong, which would have warranted a C-."

Is there no place for clear, neutral articulation of arguments on all sides (without taking a side)?

2) Re: "it earns an A if it articulates strong arguments against its claim and responds to those arguments."

What are the criteria of strong arguments, other than not consisting in insults?

I am getting at the following potential problem:

ChatGPT gives high marks to an essay that formally includes pro and con arguments, whether or not the arguments are good; for example, even if the stated counter-arguments aren't the best ones that a reasonable public intellectual should consider in the circumstances.

The key would be to get ChatGPT to distinguish various qualities of attention to counter-arguments; for example, straw-manning, articulating commonplace counterarguments, and steel-manning.

Expand full comment

Graham Cunningham

Nov 9, 2023Edited

"F if it..... claims that it understands the motives......" is a woefully naive and wrong-headed grading prejudice. If there's one single most important thing that has allowed the virtue-signalling Left to sweep all before it, it is this naive equation of what people consciously see as their motivations with what subconsciously or semi-consciously actually underlys them. For example: "Steven Pinker is one of those who take the recent ‘Rights Revolution’..... entirely at face value. A campaigner for Social Justice is, to Pinker, simply driven by a desire for ...social justice (whatever that might actually mean). Gay-Rights and anti-Racist campaigners are simply dovish souls just wanting to be accepted for what they are. The conservative however is likely to also detect a souring whiff of cant; he notices the champagne in the socialist, the thought-policeman in the Gay Pride marcher, the racist in the anti-racist...." https://grahamcunningham.substack.com/p/are-we-making-progress

Expand full comment

That is a tricky one, agreed. It is reasonable to discount claims of "They say that they care about X, but really they just hate Y!" or something similar as merely ad hominem type arguments. On the other hand, it seems perfectly reasonable to point out "They say they care about X, but their actions are counter to their stated intentions." So to use your anti-racist example, it is reasonable to say "The anti-racists say they are against racism, but really their proposals are entirely racist, just in the opposite direction of historical racism."

I would say the dividing line is to admit that your intellectual opposition is simply mistaken, and so they don't want to "kill children" but maybe just disagree about whether a given policy will, in fact, kill children, but at the same time it is fine to point out that the results of their proposals are not in line with their intentions, whether by mistake or intention.

Although at some point it does become overwhelmingly obvious that someone's intentions are not what they claim. I would give my wife the benefit of the doubt, but by the third time "the gun accidentally went off while pointed in your direction" I would get really suspicious, especially when she didn't own a gun until shortly before the fact.

Expand full comment

Graham Cunningham

Nov 9, 2023Edited

It is tricky Yes. And I was perhaps going off at a bit of a tangent to this Arnold Kling post. But the point I was making is SO important and SO poorly grasped by many on the right-of-centre that it was worth making here. A self-deceiving psychology - both about the complexity of human nature and of one's political/philosophical psychological drivers - is not just some Oh-Yes-that-too talking point. It is THE fundamental underpinning of the bogus 'Social Justice' religion that has transformed our Western culture over the last 70 years or so. Its importance just cannot be over-stressed in my view.

Expand full comment

I think your Steven Pinker example is useful there. Ol' SP is arguably really wrong about the intentions of many of the Social Justice types, but probably not all. The error is partially in attributing the wrong motives, but I think overwhelmingly the problem is thinking that motives matter in judging the proposals. Calling people Islamaphobes or whatever is a common point there, as the reason people want a particular policy is not an argument for or against the policy. It might be a good reason to question the assumptions or look really closely at what the policy is (like when Facebook pushes for regulation of social media) but it isn't addressing the policy itself.

On the other hand, someone's intentions matter in your judgement of the person, and judgements of the person matter in how much scrutiny one applies to their arguments and proposals in everyday action. We too often take stated intentions for granted without comparing them to actual actions and outcomes, as well as ignoring the self deceiving nature of people with regard to their reasons for action. Yet equally common is to self deceive as to the intentions and nature of those who oppose our preferences, as it is easier to call them evil than to consider that they might just be mistaken, or worse, might actually have a point and we are mistaken.

Epistemic humility is tough, one of the finer lines of virtue to walk.

Expand full comment

Graham Cunningham

Sorry, half of Eliot's quote (the important part) was missing on my previous post!

“Half the harm that is done in this world is due to people who want to feel important. They don't mean to do harm; but the harm does not interest them. Or they do not see it, or they justify it because they are absorbed in the endless struggle to think well of themselves.”

Expand full comment

Graham Cunningham

Nov 9, 2023Edited

I agree with a lot of this but we are perhaps coming up against the limits of what can be said on a comment thread like this without it becoming a 3000 word essay.

A couple of clarifications:

* I am not one to call people 'evil'....only foolish and intellectually cowardly. (Many people with the most 'woke' and self deceiving opinions are nevertheless generous people, loving parents, good friends etc)

* I actually have a lot of respect for Steven Pinker overall

* I trot this TS Eliot quote out endlessly on these type of comment threads (because it's so on the mark) about our era: "Half the harm that is done in this world is due to people who want to feel important. They don't mean to do harm; but the harm does not interest them."

Expand full comment

You should trot that quote out more, it is a great one. :D The only thing I could think to add to it is "Also, you can replace "feel important" with "not be to blame"."

Expand full comment

Nov 9, 2023Edited

>A campaigner for Social Justice is simply driven ... by a desire for social justice ... whatever that might actually mean ...

The most famous campaigner for social justice in America is simultaneously, oddly, not famous, but rather a Moriarty-like figure (or substitute if you like, someone who's a fictional hero that operates in the shadows, I don't read much fiction so can't supply, and I'm sorry to tip my hand by mentioning Moriarty).

He's sort of an anti-Voldemort (see, I do read, but only if it's fun) in that you must not invoke his name, but in this case not because he's evil but because invoking it casts you as impure. His motives and methods are not to be questioned by the mainstream. I worry he will die before we ever get a real accounting of his thinking, its roots, and yes, a certain amount of confrontation with him regarding just who it is he believes merits protection or favor, and who does not - which I think would just be interesting no matter how you view him. As it is, it's sort of as if you lived in the 18th-19th century, and Rousseau had all the influence he had, except that ordinary people didn't get to read his thoughts or know about them - only feel their effects.

He's hugely influential and should by rights be the subject of lots of interviews and Man of the Years and editorials.

That "whatever that might actually mean" is the meat of the question, not something to handwave away.

It is necessarily ridiculous for Smith to suggest government information control because there are too many degrees of freedom of information - in an atmosphere where we must not utter the word "Soros". I'm sorry, but it just is.

And this despite the fact that I totally agree with him *if he is saying the internet has been on net bad*. TikTok should be a synecdoche for the internet or else he is swatting at flies that annoy him particularly.

Expand full comment

Nov 9, 2023Edited

Zorro, maybe - we used to watch those movies. Kind of a sound alike.

Expand full comment

It should be graded as an F because it creates a strawman to knock down.

Expand full comment

Next step, ask ai to rewrite the essay in a manner that would garner an A grade. Then, cut out Noah altogether and ask ai to write an essay on any topic in the style of Noah in a manner that garners an A grade.

Isn’t that where we are headed?

Expand full comment

Zbigniew Łukasiak

Nov 9, 2023Edited

I've just got access to creating GPTs. I tried creating one that would use your FIT rules from http://www.arnoldkling.com/fits/rules.html. But they are a bit complicated and long because they are meant for a longer term team game. Maybe we need some simplifications - but here are my results:

Here is my session with it:

User

You

Evaluate this post based on Arnold Kling's rules.

GPT

FIT Analyst

Please provide the blog post you would like me to evaluate based on Arnold Kling's criteria.

User

You

The post is at https://www.noahpinion.blog/p/liberalism-is-losing-the-information

GPT

FIT Analyst

Evaluating the blog post "Liberalism is losing the information war" by Noah Smith using Arnold Kling's scoring system yields the following:

A (Devil's Advocate): There is no evidence within the provided excerpts that Noah Smith interviews a guest to challenge their position on a topic, hence this post does not score on Devil's Advocate.

B (Thinking in Bets): The post does not seem to state a falsifiable question, give odds, or give a reason as required by the 'Thinking in Bets' criterion.

C (Caveats): Smith acknowledges the potential for government messaging to be abused, such as in the Vietnam War and the Iraq War, which shows awareness of weaknesses in his position advocating for more aggressive government information campaigns

.

D (Debate): The post does not describe participation in a debate giving fair time to opposing views, so it does not score on Debate.

K (Kick off a discussion): The post itself may stimulate discussion, but there is no evidence provided that it has prompted at least 1500 words of discussion, analysis, or commentary from others, so it cannot be scored in this category at this time.

O (Open to Reconsideration): Smith does not explicitly state a position he's willing to reconsider based on new evidence or a change of mind, hence no score for Open to Reconsideration.

R (Evaluates Research): The post discusses the effectiveness of historical propaganda efforts and compares them with modern information warfare, which involves evaluating past research and current events, scoring a point here

.

S (Steel-manning): The post does not present a structured debate where Smith represents an opposing viewpoint fairly before presenting his own, so it does not score on Steel-manning.

In summary, the post "Liberalism is losing the information war" scores on Caveats and Evaluates Research under Kling's system. It acknowledges the potential downsides of its own argument and discusses the effectiveness of information campaigns, past and present. It does not, however, meet the criteria for Devil's Advocate, Thinking in Bets, Debate, Kick off a discussion, Open to Reconsideration, or Steel-manning based on the provided excerpts.

Expand full comment

Seems like it mis-scored R (Evaluates Research), assuming Smith evaluated past research and current events, instead of just saying what he thought happened. Smith might have been right about what happened, but he wasn't discussing research by others on the topic.

Expand full comment

Nov 9, 2023Edited

It seems the grading criteria is pretty weak. Does it have to include proper grammer and punctuation? And the bar for A or B is pretty low. What if it gives unimportant reasons or misses important reasons for or against?

On that note:

If the propaganda is coming from Russia and China, is there a free speech issue? Why?

If it is free speech, how does the govt counter it? Do you want a bureaucrat writing responses? Does anyone think this could be accomplished without some kind of partisan bias, intentional or not, becoming an issue?

What other issues did he miss?

Expand full comment

Zbigniew Łukasiak

Nov 9, 2023Edited

Finally I managed to save my newly created GPT - so here is a GPT evaluating blogs using the rules from this post: https://chat.openai.com/g/g-oqdua2xRj-blog-evaluator

The results are not too good yet - the rules need some more work. I also liked some of the FIT rules - but as I wrote they don't work so well for stand alone evaluation.

It does not want to read the whole text - it probably mostly relies on summaries from Bing.

Expand full comment

forumposter123@protonmail.com

Nov 9, 2023Edited

I didn’t see Noah give an example of an opinion he doesn’t like that should be protected speech.

No doubt Noah was against the Vietnam and Iraq wars (I’ll guess on Iraq, you would be surprised who supported it in 2003, and Vietnamese I don’t think he was alive).

But what about Ukraine? From what I can tell Noah considers any arguments against 100% total support for Ukraine to be Russian propaganda. Certainly he would censor a critic of ukraine the way he thinks people who criticized Vietnam or Iraq were unfairly censored.

So it seems to me that he’s not really tackling with the issue.

Expand full comment

It is probably much more feasible for an LLM to judge the methods of argumentation in an essay than to judge is it right or wrong.

Expand full comment

Zbigniew Łukasiak

Another attempt:

Here is my prompt:

===========

Blog evaluator downloads a full blog post from an url provided by the user and lists all the major claims it makes. Then it grades the reasoning supporting these claims according to the following rules. The grades are on a scale from A to F, with + and -. An A is granted if it articulates strong arguments against its claim and responds to those arguments. It earns an F if it merely insults people who disagree or claims that it understands the motives of those who disagree and that those motives are bad. It earns a B if it at least mentions that there is a possibility that the claim is wrong or indicates potential ways in which the claim could turn out to be wrong. It earns a D if it denies that the claim could possibly be wrong. It earns a C- if it neither considers the possibility that it could be wrong nor argues that the claim must certainly be right.

Blog Evaluator supports its grades with quotes from the evaluated text.

============

And here is the result:

============

User

You

https://www.noahpinion.blog/p/liberalism-is-losing-the-information?utm_source=%2Finbox&utm_medium=reader2

Blog evaluator

The blog post titled "Liberalism is losing the information war" by Noah Smith discusses the use of propaganda and information dissemination strategies historically and in the current era, emphasizing the need for liberal societies to actively engage in information dissemination to counter propaganda from totalitarian regimes.

Historical Context of Propaganda: Smith references the U.S. government's efforts during World War 2 to promote liberal values and unity through films and other media. He cites examples like "Don’t Be a Sucker" and the "Why We Fight" series, which were government-produced and aimed at countering fascist ideologies

.

Contemporary Challenges and Propaganda: The post highlights the challenges in the current era, where countries like China and Russia actively use propaganda, contrasting this with the U.S. government's current reluctance to engage in similar activities. Smith argues that this reluctance is detrimental, likening it to a nation refusing to defend itself against a conquering army

.

Proposed Solutions and Cautions: Smith suggests that the U.S. needs to re-engage in the 'information warfare game', focusing on articulating liberal values and limiting foreign powers' ability to influence U.S. information ecosystems. He also acknowledges the risks of such an approach, emphasizing the need for transparency and avoiding censorship or misinformation

.

Overall, the blog post receives a grade of "A" on the basis of its thorough analysis. It articulates the importance of countering propaganda with strong arguments and acknowledges the potential risks and challenges associated with such efforts. The post demonstrates a balanced approach by considering historical examples, analyzing contemporary challenges, and proposing solutions while being mindful of potential pitfalls.

===========

Not very good.

Expand full comment

Incentives Matter

Arnold should try it on one of his own essays, just to be fair!

More seriously, if you developed criteria that you felt were reliable, you can pass your drafts through and make improvements based on the AI feedback you receive.

Expand full comment

Isha Yiras Hashem

Here's an exercise: find something that you can enter into chat GPT that it won't find anything to criticize about.

Expand full comment

Good demonstration that ChatGPT can usefully assess the quality of an essay. I suggest making the grading a bit tougher by penalizing jargon and obfuscating language.

Expand full comment

In Slovakia, they use 1-5 grades for A-F, my kids aren’t supposed to get less than 2, except English they always get a 1.

While I’d prefer more granularity in FIT essay scoring, I also thought there should be some length related metric as well as truth, tho true facts can included into bad analysis.

Arnold, we could all use some clarity on what is an A essay, B, C, D, F. Tho numbers from 10 down would be better, 5 more than the -5 … +5 method, with those essays that include negatives driving the score under 5, rather than under 0. Each essay should get a single score.

Where are some more examples?

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts