21 Comments
User's avatar
Sean A. Harrington's avatar

I scaled a mini-version of this with AI for my course:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4975804

NJS's avatar

So, to clarify, your vision requires 1) oral exams (presumably more than one over the course of the semester) with individual students using a paid monitor looking over their should for 15 to 20 minutes? OR 2) designing an AI capable of quizzing a student orally? Number 1 is completely impractical for 99% of colleges that are not Swarthmore...how long would it take to individually interview a typical class of 40+ students multiple times in multiple courses? Number 2 overstates the ability of current models and, more importantly, understates the cleverness and willingness of most students to game the system. They will simply open a new tab with a different AI program to answer the questions. There is no logistical or technical approach to accurately assessing students in the age of AI short of having them writing with pen and paper in the classroom. When looking at the problem honestly, it is far from a trivial problem to solve and I would say closer to a full blown crisis of meaning in education.

Arnold Kling's avatar

I had only thought of a final exam. Up to that point, any assessments can be self-assessments with no monitor, because they won't count for a grade. And I assume that the monitor can spot the student opening up a tab with an AI. Maybe spend some time thinking about how to make something like this work instead of insisting that it can't.

Tommy Gunn's avatar

Spot on. On top of this - “Because the assessments are all done by a single entity, they will be consistent across students.” - strikes me as wrong. I can give Claude/GPT two identical prompts one after the other in separate sessions and get very, very different responses.

Paulanz's avatar

made me laugh out loud: "I suspect that what students learn about AI by cheating is probably more valuable than what the professors were trying to teach them, anyway."

j blutarski's avatar

Me too. I interpreted it as Kling has not spent much time in the classroom lately. If he had, he would have read assignment answers entirely written by AI with students never even having glanced at what the AI wrote.

Tom Grey's avatar

Great idea and excellent useful, limited use of ai by the professor.

Oral exams will almost certainly come back into a greater domination of grades. Slovak professors already using them, and it's quite a bit of work on the professor.

For most such tests, the ai can make a video and then a transcript of the interview.

My own future addition to this idea is for those classes which include the student writing an essay, including some references. The professor grading such an essay should give two grades: 1) on the written work, and 2) on a short interview presentation of the work by the student, including ai generated questions on the essay including some issues from the references.

It's important for the professors to keep in mind that the main part of the education with essays is to help the student to use the set of given facts and produce a coherent essay. The essay is not the product, the product is the ability to write such an essay. Most of that ability is not based on the facts, but the process of using the facts.

However, most professors feel that there are quite important facts about the subject they are teaching which should be learned and not forgotten; testing on these is a different type of test, tho this is also the kind of fact knowledge, like a formula, which is both easier to look up, and more likely to push students to "cheat" by depending on getting ai help to know the formula rather than remembering it.

T Benedict's avatar

IMO, a good idea. However, are the blue book exams sort of an old school format for the same? Admittedly, not an interview, but a less labor-intensive (from the examiners' perspective?) process that still accomplishes the end purpose of assessing student comprehension plus the added benefit of advancing their writing communication.

Chartertopia's avatar

I was a terrible student, so I may be the wrong audience for this. I spent more time in the library and playing with an antique computer (so old we could rewire it to change how instructions worked and add new ones) than going to the "required" classes which didn't interest me. (One was Econ 101, and I appreciated the professor / lecturer trying to make it interesting, but it bored me to tears.)

There's a real difference between students who want to learn something and will not cheat because it defeats the purpose; and students needing a certificate and not caring about the subject. I have said elsewhere that I bet I cannot remember more than 10% of what I learned K-12, that I learned it just long enough to pass the tests and promptly forgot it. The one year I lasted in college was the same.

I think testing is a wasted effort in some ways. Suppose someone lies to an employer about his skills, cheats on an interview test, whatever it takes to get a job. If that new hire lasts more than a couple of weeks, then the employer doesn't actually care if the employee is qualified or not. It could be anything from "how many words a minute you can type" to "design a rocket to colonize Mars". If bosses and co-workers don't call the new hire out quickly, then the required skills apparently weren't quite as important as claimed. The employee wasted everybody's time and got some unearned pay, the hiring process has to start all over again, but how much effort would be required in devising and running an effective screening process?

Same thing for grading students to qualify them for the next class. The penalty for improper screening is a little worse, since classes only start at intervals and a mistaken admittance means leaving the vacancy open all semester, but if the other students and the teacher can't tell within a week or two, maybe the prerequisites weren't so important after all.

I assume most of this is known in practice, and testing evolves to be more efficient, but my only experience is with interviewing prospective co-workers, and I relied on just talking, such as about some current project I had. Were their questions better or worse than the ones I had asked myself, and how did my answers inform their followup questions? I've never thought about how an AI could interview candidates that way.

Roger Sweeny's avatar

"I bet I cannot remember more than 10% of what I learned K-12, that I learned it just long enough to pass the tests and promptly forgot it."

You and the vast majority of high school students. It is a truth no one wants to admit, conservative or liberal, Republican or Democrat.

Most in-class assessments (tests and quizzes) only cover the previous "unit"--what was done in the last three weeks or so. Most high school students get pretty good at three week "learn and forget". At least good enough to pass the course.

But doing that for something that was done months earlier is much more difficult.

For a while, the feds encouraged high schools to refuse to graduate people who could not pass outside tests of basic material given toward the end of a school year. They don't any more and most places have gotten rid of those tests--or made them even easier to pass, or, like Massachusetts, made them no longer a graduation requirement. Too many people were failing. And continuing to fail.

Chartertopia's avatar

I wonder how much of that learn-test-forget cycle turns into bad habits for adults. On the one hand, we do lots of things which should be forgotten once accomplished, like dealing with bureaucracies for one-off projects or traffic tickets. On the other hand, client and customer problems probably should be remembered instead of forgotten. With the learn-test-forget cycle being drilled into us, we probably automatically forget things which might be useful to remember.

I can’t imagine how one could test this hypothesis.

Roger Sweeny's avatar

I think it may partly depend on how much the client and customer problems matter to the person. A major reason that so many high school students "learn and forget" is that they have little or no interest in what they are supposed to learn. It is just not important to them. But if job success depends on learning what has worked or not worked with clients and customers, there will be much more of an incentive to learn.

CW's avatar

"human monitor"

Job of the future? Sounds about as exciting as holding the sign on the road construction crew. Also, score one for Glenn Reynolds and Robert Heinlein. https://instapundit.substack.com/p/who-can-you-trust

Doctor Hammer's avatar

That’s what you have grad students and teaching assistants (undergrad upper classmen) for. It is already a thing at lecture hall style schools, and has been for over 30 years at least.

CW's avatar

Yes, but have they been the overproduced elite for decades now that were only being human monitors so they could move up to professor? Is this going to be a specialized job of its own now? And how soon and for how long?

Doctor Hammer's avatar

Test proctors are a semi-specialized job now as well, although I have never been clear on whether they were already university staff or students getting extra cash or that was their primary job.

Long story short, it is a job someone does for a few hours and gets some cash. No real skills required, although it might get merged with a tech help role for students who have trouble logging in. Colleges are already filled with people doing odd job roles like this, usually not as a profession. It is more of a part time side gig, like grading homework or research assistance.

Ryan Baker's avatar

"Frankly, I suspect that what students learn about AI by cheating is probably more valuable than what the professors were trying to teach them, anyway."

Yes, the first time, but not continuously. Learning how to use AI effectively is a valuable skill, but once you've learned the capabilities, the approach, the process of "cheating" can be rote. There are ways to engage that have more continuous learning, but they aren't the ones that are critical to the process of bypassing the tests.

I feel for educators, AI is without a doubt disruptive to their process. Things they thought they had "solved" are clearly no longer solved. In a sense, they never were, but there was an equilibrium that enough people were comfortable with that things felt stable.

As I argue in (https://norabble.substack.com/p/testing-education-and-ai), most testing shouldn't be a signal to outsiders, it should be a service to students. There are a few points where outsiders need information, but most testing has aged out of it's relevance by that point, and it's just lazy to depend on it as a signal, especially knowing that knowledge can be gained in many ways.

The easiest fix to make testing more valuable to students? Reduce the incentive to cheat by giving it a single rather than dual purpose.

For the testing that is necessary for outsiders? Yes, you'll need to avoid cheating here. Two important points here. First there's more than one way to avoid cheating. Second, if your focusing on this more limited set of points where outsiders need information, you can invest more.

Option 1, Same Expectations, Exclude AI: Paper and pencil, surveillance. Costly, annoying, but doable with a lower frequency.

Option 2, Include AI, Expect More: Harder to evaluate (thus higher cost); AI can assist in evaluating; probably can't be multiple choice; best suited to skills complex enough that AI can't do it on it's own. So, yes, works poorly to test that a 3rd grader can do division. But is there really a need that exists outside of the students need at 3rd grade?

So in summary, educators should focus on separating the two aspects. The vast bulk of testing should take the pressure off, disincentive cheating by promoting a real interest in truth. Students will learn the costs of pretending, and how it harms their real interests. That on it's own is a valuable life lesson. AI can be a major assist here as it can personalize without adding pressure.

Secondly, the remaining testing should use a mix of techniques. Some will try to exclude AI, some will include it. Both take more effort than existing techniques, but Option 2 opens up a realm of new possibilities. I think your suggestions are good examples here, and if you look at the comments, one of the main objections is the costs. But that's because they presume far more signal motivated testing than is necessary.

Tyler Ransom's avatar

This is exactly what I consider the gold standard of the future to be. The main bottleneck is not having an AI that is reliable enough to grade the video recording. But that should be resolved in the near future.

I am an economics professor and I’ve shifted my final exam to an interview format for my smallest class. It’s very time-consuming to proctor a 30-minute interview serially.

Kevin's avatar

The previous generation of online education didn't really work, but maybe AI will change that.

For lectures, you can have the best human lecturers recording videos. Then an AI can act like the TA in a section, answering questions and quizzing the students. And an AI can do the final evaluation. This is all "scalable" - the marginal costs are very low because you never need any live human interaction.

Cinna the Poet's avatar

It's a good idea. You would need very large computer labs to scale it, which is something universities have moved away from now that student devices are so ubiquitous. I would be worried about students gaming the system by prompting the AI during the discussion in ways that would increase the grade independently of their skills displayed in the interview. You might still have to have a grader look over the text of each interview pretty thoroughly.