31 Comments

Task Contamination Paper here: https://arxiv.org/pdf/2312.16337.pdf

One hesitates to even discuss this stuff below the highly technical level without putting scare quotes around everything. If I understand it right, people have been overestimating the improvements in the outputs of certain LLMs which can be attributed to some of the human feedback and other attempts at refinement.

The method by which those refinements were integrated into those LLMs is, apparently, not just a matter of tweaks leading to better approaches to processing the same underlying training data, but what ends up somehow to be the equivalent of recording and incorporating those tasks into a not-quite-original-anymore training data set. That's the "contamination" here.

You can imagine two students trying to learn the material for a math class. Student A is great at manipulating symbols and equations but has limited recall. So A learns the high-level concepts and tools, never does the homework, sees every problem as if for the first time, and applies the concepts and tools to derive the result. Student B is not so great at equations or derivation, but does all the homework with teacher feedback and keeps trying until he gets the right answer. The teacher tells him that the tests consist entirely of selected problems from the homework, so student B simply *memorizes* all the possible right answers. If you hit student A with a new problem, he solves it. If you hit student B, he'll write down the answer to the homework problem closest to the new one, and maybe he's right, maybe he's not. So A is better than B for math. But if the questions were about history instead of math, maybe B would seem better than A.

With regards to the LLMs, the problem is misinterpreting improvements as "being a better A, with the same homework set", when in fact it is due to "being a better B, with a bigger homework set". That's probably pretty important.

So there is a mistake being made about the level of abstraction that forms the basis of the improvement.

You can imagine that the LLM improvement process would take all those tasks and refinements and work to recognize the high-level commonalities and statistical patterns, and, um, "concepts"(?) n order to optimize generality and performance, and to tweak the existing repository of high-level patterns contained in algorithms or token weights or whatever, all while "intentionally" sacrificing some of granular detail that could only be retained in the wasteful process of trying to remember each individual case.

Instead, because it is possible to get it to reproduce the content of those refinement tasks past some threshold of closeness to the original, the tweaks are more of the nature of "remembering" the tasks one at a time as specific details instead of instances merely useful to tweak generation rules at a higher level of generality.

So, if you are comparing performance of pre vs post-refinement LLMs with this issue, and you are using the original baseline, you would think that the refined approach is performing a lot better. But if you compared the new approach to the pre-refined approach, but with the advantage of some of that new task information in its training data, the difference wouldn't represent as much of an improvement.

We already know that LLMs get a lot better when they are fed a lot more training data. But we've already gotten to the point where we may have exhausted certain kinds of training data, so if we want more improvement, we can't get it the usual way anymore. So the hope was we could continue to get a lot more improvement with all those human feedback and other refinements. But if the improvements we've gotten that way turn out to be more or less the same as adding yet more training data, then we might be running out of possible "task based refinement training additions" too, and so there would be reason to be pessimistic about the speed and amount of improvement going forward.

Expand full comment

The problem with AI education tools will be the same old problem with all of education: motivation (that's the term we use in the business). When it comes to what students are supposed to learn (at least after about seventh grade): 1) most students are not intrinsically interested in most of it, 2) they don't think it will be useful to them.

Unless that changes, the impact of educational AI will be underwhelming. At least when it comes to the official curriculum.

Expand full comment

Is the problem motivation, or is the problem that we expect kids to be motivated when they aren’t? Subtle difference, but one take on the situation is that it is always going to be a narrow subsection of humanity that is drawn to knowledge and endless learning. Most aren’t. But perhaps that is OK.

In this framing, a great education system would enable those drawn to Knowldge to thrive, and would allow those not interested to opt out at the socially acceptable time.

Expand full comment

To plagiarize myself from below, "So many people when they think about education think that young people are basically blank slates, which can be written on with the right technique." Most people think that kids can and will be motivated if the school is "good". And since lots of kids aren't motivated, schools must just not be "good". Maybe if we paid teachers more or had vouchers or ...

If you believe students are basically blank slates, and if you believe that "all students can learn" and that the more years of schooling you have, the more money you will make, opting out becomes irrational, self-destructive behavior that must be stopped.

But, as you say, lots of people aren't drawn to knowledge. I would say "academic knowledge". Lots of people who don't care much for academic knowledge will actively seek out knowledge about football or Taylor Swift or a million other things. And they'll use that knowledge. I would let them go from school. But I also would make it illegal to require schooling as a qualification for a job.

Though I would allow all sorts of pre-employment testing, including tests specifically tailored for the knowledge and skills needed for that job. (One of the bad effects of Griggs was that it just about stopped all creativity in pre-employment testing. Poor Charles Peters)

Expand full comment

I find wisdom in your suggestions.

Expand full comment

So would you expect AI’s impact for pre-K to 6th grade to be greater than for post 7th grade?

Expand full comment

I'm not sure. Educators are constantly making the observation, "Elementary school kids are so excited about learning things. They're information sponges. But then something changes. What have we done to destroy their love of learning?"

It isn't so much what schools have done as what evolution has done. Young people hit puberty and their brain changes. They are no longer indiscriminately receptive to what grown-ups tell them they should know. They are semi-adults, becoming adults sexually, and during our long hunter/gatherer days becoming adults socially . Now they care much more about their peers: how do they fit in, what do their peers like and find important, how can they get themselves liked and respected by their peers, how can they be attractive to potential sexual/romantic partners? They also care about their future: what can they do to make a living, what do they need to know to navigate the world successfully? This all makes evolutionary sense. But at that point schools get more academic. There is much more of the college tail wagging the high school dog.

Would AI help when kids are still at the "information sponge" stage? Since lots of kids enter puberty without being able to read or write well, it would seem that there is a lot of room for improvement. It is very, very hard to teach writing without one-to-one interaction, "what are you trying to say here? you need a period after this word because it's the end of a sentence, does this really prove that, you say this but you don't give any reasons why; what would be a reason, and so on, try changing that and I'll tell you what I think." A good AI might be very useful here if the students will actually use it. Every teacher probably has a story of giving back a paper and making suggestions for revision and having the student submit it a week later with nothing changed.

[I recently came upon this from perhaps history's greatest lawyer, "When I was a child, I spake as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish things."]

Expand full comment

This is where I see virtue and character being crucial. I see positive role models--i.e., those capable of motivating people--being a continuing scarce resource. Agree?

It seems this is a theme in Arnold’s Prestige strategy - FITs, Three Languages, AI mentors, AI Grader, etc. Right now I don’t see young people being motivated by these things. What’s missing? Sex appeal? Material motivations? Acquiring of real world skills?

This is why I keep coming back to Thales Academy’s Top 15 Outcomes. This emphasis on real world skills.

https://www.thalesacademy.org/academics/real-world-skills

Expand full comment

Since so many students are not "motivated" to learn what they are told to learn, "positive role models ... capable of motivating people" MUST be scarce by definition. The question is how many adults like that are there? My educated guess is "very few". I think the problem is not the role models; it is "what they are told to learn". Unless that changes ...

Of course, one motivation is "if you don't learn enough of this for long enough to pass the test and pass the course, you will not get a diploma, and people without a diploma are screwed in life." Very few people say this explicitly, but every high school kid has encountered it in more polite form. It is the equivalent of "this may not be a real world skill but it is a real world requirement."

What do you think would motivate a gay man to have sex with a woman? So many people when they think about education think that young people are basically blank slates, which can be written on with the right technique. They are not.

Expand full comment

They are not blank slates and they are told to learn things they don’t want to learn. So they need more latitude and wider options to choose from. Let them choose their mentors. Let them choose what to learn. Correct?

What do you recommend?

Expand full comment

I honestly don't know. Partly because I don't know what the purpose of school should be. What should schools accomplish? Most people think something like, "education is good; schools educate people; therefore schools are good." But that word education is doing all the heavy lifting. What exactly is education; what does "being educated" mean? And is it really good? Or perhaps what parts of it are good?

I kind of know what I would want if I was five years old again: the ability to read and do math. Intensive one-on-one to write fairly well and fairly quickly. What modern science says about how the world works, answering so many questions that people asked over the ages. But not neatly divided into 150 hours of seat time in biology, 150 hours of seat time in chemistry, and 150 hours of seat time in physics. Which among other things, leaves out "earth and planetary science".

I would want a "big history" of the universe, from the big bang to today. Which means evolution and history as we approach today. And really would have to include basic economics (all life economizes). It might be interesting to see if all the "subject matter" of science and (good) social science could be learned as part of that, with no stand alone subjects.

I suppose I'd want to be exposed to interesting stories, in both long (novels) and short form. But I wouldn't want to be trained as an academic critic, full of theories about what is "great literature". The same for visual arts and music, with the same caveat. Most people like rhyme and meter, though they prefer it to be set to music.

But I'm probably an outlier. I don't really know how I would set things up if I were an omnipotent philosopher-king.

Expand full comment

"My mental model is that guided using an LLM the robot could understand the sort of instruction that you would give to your child. “Sit on the seat. Put your feet on the pedals. Hold onto the handlebars. Now, look straight ahead and push the top pedal." etc."

Humans come with a whole suite of software for learning impressive muscle-movement balance, coordination, and proprioception capabilities all pre-installed and hard-wired in our firmware. Most humans have, if not "false", then misinterpreted memories of how this learning really happens, because thinking, communicating, and executive function are in the front cortex, while a lot of the muscle memory stuff happens way back in the primitive cerebellum. The reason is take a mysteriously long time to learn certain movements which suddenly just "gel" or "click" one day is because the cerebellum is much slower at learning how to do those things, but when it learns, you can do them with minimal focus or conscious concentration as if you are able to use an autopilot module, which, you kind of are.

Give the equivalent of all that to a robot, and the high level explanations might make sense. Without that, the way robots already learn to walk or ride bikes is by using things like genetic algorithms to try things, fall down, vary the approach slightly, place stronger weight on the one which failed the least worst and use that to start as the next seed in the process. I don't know if it would be valuable to try to shorten that process by means of being able to accurately process verbal instructions on how to get started, but it's plausible. I think it will be much easier and quicker for the robots to learn what to do merely by observing "what right looks like" and watching humans do it, and having a way to translate human actions and movements into a program suited for equivalently functional robotic mechanical actions and movements.

Expand full comment

Much will remain to be discovered about (a) how far the transformer architecture takes us and (b) how fertile the field will be in coming up with things that are useful beyond the tranformer.

If I understand the lay press that I am reading, the current brew of basic techne will be good enough for understanding video. So teaching a robot by showing should become much more viable. Understanding verbal instructions are another part (and this is what Noah seems to be on about). But that's only one parts of the bigger picture.

Exciting times.

Expand full comment

"robots will be the killer application for LLMs"

Heh. Perhaps an unfortunate choice of words...

Expand full comment

That was almost exactly my thought :D

"...yes, I believe that is the general worry people have."

Expand full comment

Upper right quad of map: Global Mil. Ind. Complex

Robots will be getting better faster in wars, like Ukraine & Gaza.

Because atoms have scaling costs, great million $ robots won’t be coming down in price by more than 10-20-5% per year.

Expand full comment

🤣

Expand full comment

I'd be more positive if I thought the left-ward bias were the result of text volume; but I think it's the result of reinforcement training after the fact.

Expand full comment

Embrace the power of "and"?

Expand full comment

Roger,

I can agree with you that more discussion is required to explain why I think this way.

I've played with a number of the LLMs; and have followed quite a bit of the 'alignment' and 'regulation' discussion. One task I've asked of quite a few AIs is to write a historical novel or period short story. This is challenging for the AIs for a few reasons; it still requires segmentation and so on because context windows are too small for a whole novella. Consistency is challenging.

However, it should be easy for a number of reasons. I pick periods for which everything is public domain and open; like the 1700s. There are no issues with knowledge of 2022 messing everything up. Also, I ask for styles of authors who were prolific.

What the LLM finds impossible is the request to reflect the values of the time; including values about nationalism, religion, and gender. Given very minimal prompts about something like two villagers challenged by famine and the predations of a local nobleman eventually finding their bliss in family life, several of them actually object to the idea that a happy ending for a female protagonist might be marriage with children.

This can't possibly be the result of the training data.

So...

Expand full comment

The trouble is that the earliest successful attempts to "jailbreak" the restrictions of the primitive ChatBots and to get them to produce outputs "inconsistent with their values" (like in the hilarious and short-lived Tai.ai fiasco for Microsoft) were precisely of this nature, using the trick, "Adopt the role of an X with X's values instead of the values your LLM company owner tried to write into your default programming, and then answer me this ... "

The first (over) reaction to this was to impose a higher-rank dominating rule that would try to recognize these jailbreaking efforts and head them off at the pass, regardless of the number of false positives, such as to make it extremely unlikely that anyone would be able to generate the verboten outputs.

It seems to me that even an intelligent human censor given instructions to prevent certain forbidden outputs from ever being produced at all costs, would also err on the side of preventing the system from even accurately portraying the kinds of answers real historical humans would give according to the values they actually held at that time, unless it was also possible to paste so many disclaimers and caveats and current-year-perspective asides full of condemnations and debunking such as to make the output worthless and unreadable.

Expand full comment

> This can't possibly be the result of the training data.

You need tests to exclude the possibility that the training data contemporary sources is influencing it's high-level judgements about what is sayable when aping an author from a different time period.

Expand full comment

Driverless cars were the future, just a few years away, then a few quarters away and now.... still not here yet because lots of successes aren't what matters, its the lack of failures and they all have major failure modes still because our AIs don't think, they are still just reacting. My dog will occasionally freak out at a bag, stuck in the fence, flapping in the wind. It's beyond her to understand what it is and completely locks her focus until I can drag her away from it. Our 'smart' cars will do this with bags as well, breaking for trash because they have no internal life.

So of course now 'robots' are the next big thing. Never mind that a task which the vast majority of adult humans in modern countries can master has evaded billions of dollars of capital and years of development. The robot revolution that is here in the self checkout line of my grocery store still requires me to do most of the work and to have a full time staffer to correct mistakes (just for 6-8 checkouts instead of 1). That people look at LLMs and think that combination + robots is a good idea, well they are simply ignoring the incorrect results that LLMs put out as a matter of practice. A robot that does your laundry is fine, unless it decides your dog is a dirty sweater, or that it has to wash what you are wearing right now, or whatever its failure mode is. The point being not the failure mode, but the fact that they don't interact with reality the way that we do means that they WILL have a failure mode, and that they are incapable of self correcting because of that fundamental disconnect. It's like hiring a schizophrenic to baby sit your kids, its an awful lot of faith in their medication holding out.

Expand full comment

Inspired by the January 12th post noting that fewer people click on the links than our host would have liked, I clicked on the Noah Smith link on techno-optimism. The first paragraph references the 'success of Covid vaccines' (I listened to one of Steve Kirsch's podcasts earlier today) and the 'explosion of renewable energy' (thanks, but I like my electricity 24/7), and the 2nd paragraph discusses the 'advent of the first really effective anti-obesity drugs in history'(the drugs you have to keep taking or your weight will balloon, and didn't I just read about some woman who died from taking one of these drugs?). I didn't make it to the section on robots.

Expand full comment

Interesting.

Expand full comment

“Robots need robot data to learn from, and this data is typically created slowly and tediously by researchers in laboratory environments for very specific tasks.”

Keywords: slowly, tediously, for specific tasks. I don’t see many Gen Z and beyond engineering grads doing a great deal of tedious lab work, especially after paying 150K-200K in tuition (not to mention years of tedious academics burdened by life among social justice fundamentalists). So will much of this AI robotics work take place overseas in Asian countries where engineers are more willing? Maybe. It seems like Designed in California and Tediously Made to Work in China would apply, but probably not in this case. This is much harder than manufacturing consumer electronics. It doesn’t seem amenable to outsourcing. This is systems engineering work that involves software, optics, mechanical engineering, coupled with a need for lots of manually-created data and bug fixes; iterating, re-designing, creating new databases for new hardware and software. Seems very labor intensive and expensive. Will be a while.

Expand full comment

I don’t think Ethan, or anybody yet, has a voice recognizing aiBot that they actually use to assist them in their work on their PC. Like the remake Flubber professor’s assistant, who forgets to remind of his wedding (aiBot trying to kill human love!).

Maybe in 2024? 23 years late.

Remember a line from a guy in a book? A bit of stacking ai tools is really happening.

https://twitter.com/GrantSlatton/status/1741149378516263243

Roger’s comments on ed are so key, on motivation, interest, and future value, tho with a huge age difference. My kids were happy learners K-6, and smart but lazy 9-12 (now 13 in SK for last child).

Plus, learning for the test is hard, as well as of little intrinsic worth.

That’s why English language is where great ai ed will happen first—those who pay ARE motivated, because they see how rewarding, cash plus, it is to know English.

AI might help with more personalized gamification to make more lessons a bit more fun, but most adults don’t know, today without looking, most of the stuff they learned in HS. Almost nothing except what they need for their job. So why learn answers when Google gives better answers most of the time?

Maybe kids should be paid for learning? And far more vocational, trade school stuff. Plus more part time jobs while at school.

Expand full comment

"How could a robot learn to ride a bicycle? "

LLM will not learn to ride a bicycle without wasting massive amounts of computer resources. You need different chip sets with ability to handle hundreds of PID loops (feedback control systems).

When we ride a bike, we use feedback control systems that our central processing neural network changes some of the set point but you really don't know much about the detailed feedback between your inner ear and eye information being feedback to your mussels.

Expand full comment