21 Comments
Dec 6, 2023Liked by Arnold Kling

from Matuschak's essay:

"to understand something, you must actively engage with it. That notion, taken seriously, would utterly transform classrooms. We’d prioritize activities like interactive discussions and projects; we’d deploy direct instruction only when it’s the best way to enable those activities. I’m not idly speculating: for the last few decades, this has been one of the central evolutionary forces in US K–12 policy and practice.

And yet, K-12 students today are not learning more than students a few decades ago. Matuschak just assumes that teachers can get students to engage by "prioritiz[ing] activities like interactive discussions and projects". But if students are not interested, they will just go through the motions. They will remember for a while--young people have GREAT short term memories, much better than 40 or 50 year olds--and then that knowledge will fade away. They have not built a great edifice of learning, just a castle in the sand, which must be constantly rebuilt and protected from the wind and tide.

Like just about everyone who writes about education, Matuschak seems to believe that young people would learn so much more if we just DID IT RIGHT. They won't. They won't learn significantly more unless they care about the material and then continue to use the knowledge. Which ain't ... gonna ... happen.

Expand full comment

I don't want to leave the impression that I think Matuschak is wrong about everything. He is absolutely right that people don't learn unless they "engage" with the material. He is also right that people don't learn well if they have to absorb a lot of new information all at once. That's one reason that way back when I was in high school, the previous night's homework was often to read the part of the textbook that the teacher would lecture (yes, lecture) on the next day. Of course, a good teacher wouldn't just talk for 50 minutes but would stop and ask questions, take questions from students, and maybe do something physical or theatrical. I don't think that sequence happens much today; "kids don't read." And I'm sure it helped that most of my classes were "Accelerated"--with students from the top fifth of the class and mostly Ashkenazi Jewish.

Matuschak mentions how textbooks today try to get students to review and test themselves with e.g., questions at the end of a section or chapter. They can be helpful if the student uses them in the spirit they were constructed. But that assumes the students care, or have somehow been made to care. With a number of other teachers, I used to teach ninth grade physical science. It was a "College" course, given to the bottom 2/3 of the class (the top third got "Honors" biology). A not uncommon in-class activity was "read pp. 312-318 and answer questions 1-10 on page 318." Very few of the students read the pages. Instead, they read a question, and looked for the words in that question on the previous pages. They would then copy verbatim the sentence that contained those words. It was not terribly hard because the questions were usually in the order that the material appeared in the text. Never underestimate the ability of a student to not engage with the information you want them to learn.

Expand full comment

Your quote from the essay contained a puzzling reference to "direct instruction." Matuschak wrote "we'd prioritize activities like interactive discussions and projects: we'd deploy direct instruction one ly when it's the best way to enable those activities."

It is unclear whether Maruschak is using "direct instruction" as a synonym for "lecture" or whether he is using it in the technical sense of it "teaches by explicit instruction,[1] in contrast to exploratory models such as inquiry-based learning. DI includes tutorials, participatory laboratory classes, discussion, recitation, seminars, workshops, observation, active learning, practicum, or internships. Model includes "I do" (instructor), "We do" (instructor and student/s), "You do" (student practices on their own with instructor monitoring)." (https://en.wikipedia.org/wiki/Direct_instruction). What Maruschak is advocating would appear to overlap a lot with "direct instruction" in the technical sense.

And "direct instruction" might also be a response to your comments on student motivation. Although there may not be a right way to teach, there are basically two methods that can fairly be said to be evidence based, and direct instruction is one of them.

A 2018 meta analysis of 50 years worth of research on direct instruction reported:

"Quantitative mixed models were used to examine literature published from 1966 through 2016 on the effectiveness of Direct Instruction. Analyses were based on 328 studies involving 413 study designs and almost 4,000 effects. Results are reported for the total set and subareas regarding reading, math, language, spelling, and multiple or other academic subjects; ability measures; affective outcomes; teacher and parent views; and single-subject designs. All of the estimated effects were positive and all were statistically significant except results from metaregressions involving affective outcomes. Characteristics of the publications, methodology, and sample were not systematically related to effect estimates. Effects showed little decline during maintenance, and effects for academic subjects were greater when students had more exposure to the programs. Estimated effects were educationally significant, moderate to large when using the traditional psychological benchmarks, and similar in magnitude to effect sizes that reflect performance gaps between more and less advantaged students." (https://journals.sagepub.com/doi/abs/10.3102/0034654317751919 )

Note especially "Effects showed little decline during maintenance."

And more Wikipedia: " A meta-analysis published by Adams & Engelmann (1996), a chief architect of the DI program, finds a "mean effect size average per study...(as) more than .75, which confirms that the overall effect is substantial." A 2018 meta-analysis by Stockard et al. found an average effect on test scores of approximately 0.6 standard deviations."

And this also leads me to question the efficacy of a GPT that lavishes praise on an essay that uses "direct instruction" apparently ambiguously. If the essay were to lead readers to dismiss direct instruction on a purely unsupported assertion contradicting the weight of the research, it is not doing anyone much good.

Expand full comment

DI, Direct Instruction, should be used for 1-6 or so. Not K nor pre school for most.

Government education should focus more on low IQ folk, helping them read, write, do math, and function with good character in the modern world.

Expand full comment

Direct Instruction, capitalized, is a very specific program developed by Zig Engelmann and colleagues. It is mostly for teaching basic skills in elementary school. It is very structured, even scripted, so many teachers and the educational establishment hate it. It requires students show mastery of one level before moving on to the next, one reason skills tend to be retained.

Expand full comment
Dec 6, 2023Liked by Arnold Kling

Interested to see some very negative examples as well - essays that you judge to be poorly argued, assuming the conclusions, etc.

Expand full comment
author

I would like to see commenters submit such examples, and I will give them to the grader.

Expand full comment
Dec 6, 2023Liked by Arnold Kling

The GPT reference to "individual learning styles," a concept for which there is basically no evidentiary support, and which has pretty much been thoroughly debunked, does not instill any confidence in LLMs.

Expand full comment

I was disappointed in the Andy Matuschak essay. Maybe I’m in an impatient mood this morning, but he seems to crawl along slowly for a mile there. Andy, get to the point please.

And Arnold, your grader isn’t critical of his essay in this regard. 8/10? How about 3/10 for wandering around and delivering little?

Here’s how I would say it. Choose better books and write better books. There are plenty of good books, but it’s hard to find them. I’ll take a stack of 10, 20 or even 30 books that I checked out from the library or bought online. I may end up reading 1 or 2 of them cover to cover. For me, completing the book is a sign that it’s a good book.

I suspect that most authors wander. They’re thinking out loud, not realizing that they can condense greatly by editing.

How to make books better? Re-write, re-write, re-write - until the book is short and to the point. Think a bit more like a poet.

The Three Languages of Politics is a good example. It’s concise. The author chose his words very carefully. He’s respectful of his reader’s time.

Kevin Kelly’s recent book Excellent Advice for Living is concise. I love that book.

Good books are concise books. Good books are often short books. They’re narrow in scope. Chapters are often short.

Spend more time finding good books and you’ll see that there are plenty of good books to read. Probably still more than one can read in a lifetime.

How to make books better? Find and recommend the good ones. Ask, “What do these books have in common?” Write books like those ones.

Expand full comment

Essay suggestions:

Hayek’s “Use of Knowledge in Society”

Read’s “I, Pencil”

Expand full comment

Has anyone tried out external essay grading tools like essay-grader.ai that are based on Chatgpt?

Expand full comment

Can I be cheeky and volunteer my own essay? If so: https://inexactscience.substack.com/p/the-case-for-narrow-utilitarianism

(By the way, this use case for GPTs is very nice.)

Expand full comment

Although a different genre, would be interesting to see how the Essay Grader views iconic orations that are widely revered. For example:

- Hamlet's soliloquy

- The Gettysburg Address

- Washington's Farewell Address

- Eisenhower's "military industrial complex" speech

Expand full comment
founding

I imagined:

1. Professor specifies students' essays.

2. Student edits the spec to use as LLM prompt

3. LLM writes essay0

4. Student submits essay0 to LLM detector

5. LLM detector rejects essay0 as AI

6. Student alters vocabulary and introduces spelling error to make essay1

7. LLM detector accepts essay1 as non-AI

8. Student submits essay1 to Professor

9. Professor uses (auto-grader + LLM detector) to assign grade without reading

10. Student (less than 30 minutes of effort) is assigned a grade by Professor (less than 1 minute of effort). At no time did the essay pass thru either mind.

Regards,

Bill Drissel

Expand full comment

Curious what it would say about Jonathan Swift’s “A Modest Proposal” https://extra.shu.ac.uk/emls/iemls/resour/mirrors/rbear/modest.html

Expand full comment

I tried it on Hayek's Nobel speech (I know it is long relative to Arnold's recommended word limit)

https://www.nobelprize.org/prizes/economic-sciences/1974/hayek/lecture/

Output:

The essay "The Pretence of Knowledge" offers a critique of the application of scientific methods in economics, arguing against the overreliance on quantitative data and models in understanding complex social phenomena. The author highlights the limitations of economic theories that heavily depend on measurable variables, suggesting that such an approach can lead to misguided policies, particularly in the field of unemployment and inflation.

Strengths:

1. **Clarity of Argument**: The essay presents a clear and well-structured argument against the scientistic approach in economics. The author effectively uses examples to illustrate the limitations of relying solely on quantitative data.

2. **Engagement with Opposing Views**: The essay acknowledges the prevailing economic theories and practices, critiquing them thoughtfully. This demonstrates a fair engagement with opposing views, as the author does not dismiss these views outright but rather presents a reasoned argument against them.

3. **Open-Mindedness**: The essay reflects an open-minded approach by considering the complexity and unpredictability inherent in economic phenomena, which cannot be fully captured by quantitative models.

Areas for Improvement:

1. **Consideration of Alternatives**: While critiquing the current economic methodologies, the essay could benefit from a more detailed exploration of alternative approaches. Providing more insight into other methods of economic analysis would strengthen the argument.

2. **Balance in Argumentation**: The essay leans heavily on critiquing the scientistic approach. To enhance intellectual humility, it could acknowledge potential strengths or successes of the quantitative methods in certain economic scenarios.

3. **Constructive Criticism**: The essay focuses on the shortcomings of the current approach without offering substantial guidance on how economists might better incorporate qualitative data and real-world complexities into their analyses.

Overall, the essay demonstrates a strong understanding of the subject and presents a compelling argument against the overemphasis on quantitative methods in economics. However, a more balanced view that acknowledges the merits and limitations of both qualitative and quantitative approaches, as well as suggestions for a more integrated method, would enhance its persuasiveness and depth.

Numeric Grade: 85/100

Expand full comment

Matuschak’s essay had quite a few good points. It reminded me about Rob Henderson’s recent posts on how he reads and takes notes, as well as ai teaching w/o mentioning A Young Lady’s Illustrated Primer (which is, or should be, as iconic as The Selfish Gene).

But for many folks, maybe most, they want the credential/ grade/ pass with the minimum of hard thinking. For some of us, like me sometimes, I actually want to … have done the hard thinking, which is the push to think more deeply.

Arnold is doing great work on getting ChayGPT to be a better essay grader. Maybe next year we can try another round of FIT using it, with rules like the top 2 or 3 scoring essays for your team each week.

I certainly agreed with an earlier Arnold note that most books would be better as essays. Even Andy’s essay was a bit long.

I object to Chat always thinking longer is better, the suggestions for improvement should be restricted to the same or shorter size, including excess redundancy to be reduced.

I’ve been tweeting a bit more lately, and it quite follows the ideas of quickly noting, and engaging with, the ideas of others’ tweets. Especially outrage, which is a negative, yet for retention of a key idea or phrase might be as good as a book at 1/1000 of the effort of reading or especially writing.

My desire to comment on most links, tho I’m too lazy to write them all, is based on trying to think about them more deeply. I want to seem to myself to be more thoughtful than most, more than being seen by others to be seem to be thoughtful. But perhaps that’s rationalizing. The Market of Rationalizations, link left last time, remains in my mind.

Expand full comment
Dec 6, 2023·edited Dec 6, 2023

I would have been intrigued if ChatGPT had written something like this:

The essay* about books being failures because readers do not vividly recall their details is undermined by the examples chosen to represent books that "convey detailed knowledge" - i.e. "Guns, Germs, and Steel" and "Thinking Fast and Slow". The latter especially is just the sort of book that was written to convey the writer's ideas - so that they might enter the popular consciousness in an abbreviated way in the manner of something like "The medium is the message." These ideas or assertions are supported no doubt by studies and anecdotes but it would be pointless for the reader to stuff their brains with them.

There are certainly books written for a popular audience that do contain detailed knowledge, but they are books of a different kind entirely; and while readers will vary in their powers of recall, it is precisely the difficulty of retaining all that knowledge which justifies the book and the shelf it sits on.

But that would be ChatMePT, I guess.

Still, overall it feels to me like this tool is still too reminiscent of a 6th grade teacher grading a 5-paragraph essay.

*This essay may have hit a nerve with me because I have a truly terrible memory, seldom remember anything I've read after burbling about its contents for maybe a month; even down to remembering *what it was* I listened to for 3 weeks on audiobook that I just recently wrapped up. I am pretty equally bad at remembering both the details and even the main themes! But curiously if you were to pick up a book on my nightstand and begin reading it aloud to me, and you skipped a word, I would stop you and supply it, with a testy "Why are you skipping?"

Expand full comment
Dec 6, 2023·edited Dec 6, 2023

I'd like to see if anyone can find any essay you strongly disagree with yet it gets high marks from your grader.

Expand full comment
author
Dec 6, 2023·edited Dec 6, 2023Author

That is easy to do. Noah Smith often writes things I disagree with (he loves industrial policy, for example) and yet he does so in a style that the grader approves of. It definitely does not grade on "comes closest to agreeing with Kling." To do that, I would have to create a different GPT. Feel free to suggest an essay for me to input into my grader.

Expand full comment