LLM links
The Zvi on problems with LLMs; Ethan Mollick on impact of LLMs; Noah Smith on techno-optimism; Bill Gates on same; Sergey Levine and Karol Hausman on robots; Venkatesh Rao draws a map
Zvi Mowshowitz points to two disturbing reports about large language models. First is their strong bias toward the left. He points to Maxim Lott’s page.
I assume that this is because most of the writing that is available leans to the left. I think it would take a lot of effort to overcome this.
Zvi also points to a paper on “task contamination,” which I don’t really understand. It may have something to do with LLM’s overfitting their training data and then not performing as well (by what metric?) on new data.
Ethan Mollick surveys the state of things with LLMs. I could excerpt many parts of his essay. I chose these:
Social change is slower than technological change.
…I believe that we have 5-10 years of just figuring out what GPT-4 (and the soon-to-be-released Gemini Ultra) can do, even if AI development stopped today.
I would say that instead of framing things as “AI will be disruptive,” frame them as “Humans will use AI to be disruptive.” My advice is to play with LLMs so that you can be one of those disruptive humans. That will be more rewarding than sitting out this technology or becoming a disruptee.
After a couple of minutes, I had an avatar that I could make say anything, in any language. It used some of my motions (like adjusting the microphone) from the source video, but created a clone of my voice and altered my mouth movements, blinking and everything else. And this took almost no time. You really can’t trust what you see or hear anymore.
The ability to create realistic simulations is a big threat. But it is also a big opportunity. I keep imagining taking a great life coach or teacher and scaling up their ability to impact people.
I think that a lot of the gains from AI will come from imitation. That is, one person will come up with a great way to use it. Then other people will copy and enhance that application. That process has just barely gotten started.
this is far closer to The Jetsons’ Rosie the Robot than anything that’s come before. Robots like this are downstream of two fundamental recent breakthroughs — modern machine learning, and improved batteries. Global venture funding for robotics startups has fallen from its 2021 peak, but those fundamental technological drivers remain, and the potential for a huge consumer market is obvious. And the coming of LLMs may be a game-changer, because it may allow robots to understand voice commands in a very flexible way (a whole bunch of teams are working on this, of course).
Yes. I think that robots will be the killer application for LLMs over the next few years, unless something unforeseen comes sooner.
The AI education tools being piloted today are mind-blowing because they are tailored to each individual learner. Some of them—like Khanmigo and MATHia—are already remarkable, and they’ll only get better in the years ahead. One of the things that excites me the most about this type of technology is the possibility of localizing it to every student, no matter where they live. For example, a team in Nairobi is working on Somanasi, an AI-based tutor that aligns with the curriculum in Kenya. The name means “learn together” in Swahili, and the tutor has been designed with the cultural context in mind so it feels familiar to the students who use it.
Pointer from Alexander Kruel. I also want to be optimistic about AI in education. My only hesitation is that ever since television, people have been more optimistic about tech tools in education than what panned out.
I think that there is both a scale problem and a scope problem. If learning takes place in a small group, then it becomes hard to scale. The scope problem is that people have very different educational needs. A single great YouTube video does not do much for people. They need to know next steps to take, and those steps may be different for different folks.
The way that LLMs can potentially deal with this problem more effectively than a web site or a video is by creating simulations of many different sorts of mentors and matching them with the needs and personalities of individual learners.
Kruel also points to a more sobering take on robots. Sergey Levine and Karol Hausman write,
Unfortunately, the highly successful generative AI formula—big models trained on lots of Internet-sourced data—doesn’t easily carry over into robotics, because the Internet is not full of robotic-interaction data in the same way that it’s full of text and images. Robots need robot data to learn from, and this data is typically created slowly and tediously by researchers in laboratory environments for very specific tasks. Despite tremendous progress on robot-learning algorithms, without abundant data we still can’t enable robots to perform real-world tasks (like making breakfast) outside the lab. The most impressive results typically only work in a single laboratory, on a single robot, and often involve only a handful of behaviors.
How could a robot learn to ride a bicycle? They seem to think that it would need a lot of data on how robots interact with the world. My mental model is that guided using an LLM the robot could understand the sort of instruction that you would give to your child. “Sit on the seat. Put your feet on the pedals. Hold onto the handlebars. Now, look straight ahead and push the top pedal." etc.
Venkatesh Rao offers a different take.
I think it’s clear that robotics and video, for example, are going to be the next major dominoes to fall, and that it’s going to be approximately as mind-blowing as text and images have been so far. And it is already clear that multimodal AI is going to work well enough to be at least useful, and almost certainly very disruptive economically. This stuff is moving from basic research to development and commercialization.
He illustrates his views of the state of AI with a map. I cannot summarize it here; you just have to go see for yourself. Along the way, he links to an old post by Simon Wardley, who offers a conceptual scheme for how technology develops by using the metaphor of a new settlement. First come the pioneers, who explore but do not achieve permanence, then come the settlers, who put up a few permanent structures. Finally, you get the town planners, who scale things up.
Rao says that the LLMs are quickly moving to the town planner phase. I think that the settlers have barely gotten started, but one of his key points is that things are moving very quickly with AI. I can see that. I was an early settler on the Web in April of 1994, and it was not until at least August of 1995 that the town planner phase got started. But Open Ai’s “GPT store” already has 3 million apps!
@
substacks referenced above:
@
@
@
@
Task Contamination Paper here: https://arxiv.org/pdf/2312.16337.pdf
One hesitates to even discuss this stuff below the highly technical level without putting scare quotes around everything. If I understand it right, people have been overestimating the improvements in the outputs of certain LLMs which can be attributed to some of the human feedback and other attempts at refinement.
The method by which those refinements were integrated into those LLMs is, apparently, not just a matter of tweaks leading to better approaches to processing the same underlying training data, but what ends up somehow to be the equivalent of recording and incorporating those tasks into a not-quite-original-anymore training data set. That's the "contamination" here.
You can imagine two students trying to learn the material for a math class. Student A is great at manipulating symbols and equations but has limited recall. So A learns the high-level concepts and tools, never does the homework, sees every problem as if for the first time, and applies the concepts and tools to derive the result. Student B is not so great at equations or derivation, but does all the homework with teacher feedback and keeps trying until he gets the right answer. The teacher tells him that the tests consist entirely of selected problems from the homework, so student B simply *memorizes* all the possible right answers. If you hit student A with a new problem, he solves it. If you hit student B, he'll write down the answer to the homework problem closest to the new one, and maybe he's right, maybe he's not. So A is better than B for math. But if the questions were about history instead of math, maybe B would seem better than A.
With regards to the LLMs, the problem is misinterpreting improvements as "being a better A, with the same homework set", when in fact it is due to "being a better B, with a bigger homework set". That's probably pretty important.
So there is a mistake being made about the level of abstraction that forms the basis of the improvement.
You can imagine that the LLM improvement process would take all those tasks and refinements and work to recognize the high-level commonalities and statistical patterns, and, um, "concepts"(?) n order to optimize generality and performance, and to tweak the existing repository of high-level patterns contained in algorithms or token weights or whatever, all while "intentionally" sacrificing some of granular detail that could only be retained in the wasteful process of trying to remember each individual case.
Instead, because it is possible to get it to reproduce the content of those refinement tasks past some threshold of closeness to the original, the tweaks are more of the nature of "remembering" the tasks one at a time as specific details instead of instances merely useful to tweak generation rules at a higher level of generality.
So, if you are comparing performance of pre vs post-refinement LLMs with this issue, and you are using the original baseline, you would think that the refined approach is performing a lot better. But if you compared the new approach to the pre-refined approach, but with the advantage of some of that new task information in its training data, the difference wouldn't represent as much of an improvement.
We already know that LLMs get a lot better when they are fed a lot more training data. But we've already gotten to the point where we may have exhausted certain kinds of training data, so if we want more improvement, we can't get it the usual way anymore. So the hope was we could continue to get a lot more improvement with all those human feedback and other refinements. But if the improvements we've gotten that way turn out to be more or less the same as adding yet more training data, then we might be running out of possible "task based refinement training additions" too, and so there would be reason to be pessimistic about the speed and amount of improvement going forward.
The problem with AI education tools will be the same old problem with all of education: motivation (that's the term we use in the business). When it comes to what students are supposed to learn (at least after about seventh grade): 1) most students are not intrinsically interested in most of it, 2) they don't think it will be useful to them.
Unless that changes, the impact of educational AI will be underwhelming. At least when it comes to the official curriculum.