Oct 4, 2023·edited Oct 4, 2023Liked by Arnold Kling
Multisensory AI is going to be far more disruptive than the web. LLM will come to mean "large learning model" instead of 'language'. And this leap will prove to be monumental.
I think it's hard for people who live in a world of words to appreciate that bare human text, for all its power, is still a very unwieldy tool for communication, and especially for teaching and learning well enough to generate, the effectiveness diminishing as the topic becomes less abstract and symbolic in nature.
Most people don't (often can't) really learn much from bare text. As with culture in "The Secret Of Our Success", they instinctively and spontaneously observe, imitate, play, practice, and experiment under conditions of feedback and intuitive, subconscious picking up on nonverbal and environmental cues and subtle signals in social games to eventually get up to their potential or adequate level of performance.
And most masters and experts of many practical skills are terrible at teaching with words, or even thinking about how they do what they do in a fully self-aware way and in a manner that can be captured with sufficient accuracy and precision in words. Many who have tried to follow a cookbook recipe and failed and then watched a video of the author chef do it perfectly has realized that the chef has lost awareness of certain techniques she is performing and that the need to apply and master those techniques is both essential to success and something that novices wouldn't know they had to do. Even organic chemistry is full of such examples, and occasionally claims of non-replication themselves do not replicate because the original authors did not realize that some subtle but indispensable details were not adequately communicated in the original published experimental procedure.
But if the AIs can just watch everything we do with their Eagle eyes, then apply their statistical pattern finding magic to that omnicorpus, they will literally learn it all, immediately, perfectly, and then surpass us in everything, in a historical blink of an eye.
Multisensory Machine Learners (maybe MML is a good term given the visual contrast with LLM) are going to take that capacity of humans to learn from other humans and take it several levels higher. Think about face recognition, but with super sensors and ultra-polygraph, the cameras see in uv and infrared too, know your temperature, see if you're sweaty or flushed, read your emotions, know if you're lying, can detect the faintest whisper in frequencies too low and too high for us to hear, can bounce xrays and microwaves around to see below your clothes or under your skin, can spot the slightest flick of the wrist that allows for whisking up the ultimate perfection of meringue every time, and the possibilities go on forever. The human-monitored panopticon is going to be trivial compared to the power of MML universal surveillance, and that is something which is not in the sci-fi far future but which can be built -today-. And I'm sure it is. Have a nice day.
I only have one real point of disagreement here : "But if the AIs can just watch everything we do with their Eagle eyes, then apply their statistical pattern finding magic to that omnicorpus, they will literally learn it all, immediately, perfectly, and then surpass us in everything, in a historical blink of an eye."
I am very skeptical on this part, as observation alone does not tell you what was important. Thinking of blacksmithing, how important is wiping the sweat off your forehead after putting the steel back in the fire? We do it damned near every time, so is it critical? Then again, guys with nicer shops with better air conditioning don't do it so much, and they make nice stuff, so is it critical? It isn't clear to me that an observer who only observes process and output can identify these sorts of points, nor suss out how important they are.
I suspect that the AI, in order to truly learn, needs to be able to do the process to experiment and confirm, not just see it done a lot. Which is of course what people need to to do: most things cannot be taught without the student practicing at all. The closest you can do is memorization.
It will be trivial and automatic for the AIs to scrutinize a million hours of video and analyze every aspect of motion and timing to learn exactly what matters and what doesn't and better than any human ever could.
The generated output from LLMs and analogous-engines for images and music has improved so incredibly and so rapidly by just such a process of adjusting to corrective feedback. They have gotten so much better, so quickly, that they literally make most arguments against improved capability moot before humans can even make them! That is, in terms of the time it takes even the best experts to fully flesh out their attempts to make rigorous cases for why it will be a long time before the LLMs can do X, boom, there is X, and just as quickly thereafter, super-X. One wonders how many essays and papers had to be abandoned in the middle of writing them, and practically nothing that did get published has aged well and stood the test of time. By which I mean a period of time of about six months.
Could you please define what you mean by "corrective feedback"? It seems to me you have folded in "actually doing a thing, then getting graded on the results" into that concept, which is a different process for, e.g. producing text and pictures and producing physical objects such as a car or a couch. I am not sure what exactly you mean, though, but it is important because that feedback loop can be really long if it includes what I am describing as actually practicing something. Especially for processes where the way humans do it isn't the way that AIs will do it.
I also think you might be a little overly optimistic about how well AIs are doing X. They definitely have improved a great deal, but I am not convinced they do things well so much as do it fast. That is, the AI might have to go through 50 iterations to give me what I want, with my feedback at each iteration, whereas a human might need 2 iterations, but will take longer with each. That's great for the AI on relatively simple things that I can give quick feedback on, but my feedback becomes the limiting factor on more complex to check items, and so the human producer becomes better at it again as the time to produce an iteration approaches the time to check the iteration.
It often takes a lot of time and money going back and forth to get people to design you the perfect business logo or whatever.
AIs will quickly catch up and surpass even the best expert human content generators with regards to needing fewer iterations.
But a long time before then, maybe even as long as a few months, the fact that one can get another iteration from an AI incomparably faster and cheaper than one can get one from a human means that even if the humans are a lot better, it still makes economic sense to dump them all. Look for the crying canaries, it's already happening.
By the way, the greatly increased velocity of these changes is another good reason to believe this transition is going to be far more disruptive than the web. The web happened slowly enough for powerful incumbent rent-seekers to read the writing on the wall and get their act together in time to "head 'em off at the pass", dig in, and reinforce their trenches by emptying their war chests to bribe every politician in sight to bolster the laws which keep their obsolete and predatory cartel grift going. It worked. But they needed time, and it only worked because they had just enough time.
With smartphone disruption, the velocity was faster. As an example, taxi cartels.
Yes, they were less powerful, lower status, and less popular. But mostly, they couldn't get their act together in time to stop Uber which changed the "facts on the ground" faster than the incumbents could react.
AI is going even faster - MUCH faster - and whole professions will find their stream of orders run permanently dry overnight. They'll be broke and have nothing to bribe politicians with before they even realize they needed to rent-seek to survive.
I think this doesn't quite address what I was saying about learning and iteration, but rather sort of agrees with it. Simple things, like logos, sure, the iteration time is the important part, not the feedback part. But when it comes to learning, then teaching, what is important in more complex processes, the feedback (work checking) is more time consuming than the iterations in many cases. Or, in the case of creating physical things, the iteration is necessarily limited in how fast it can go, so both iteration and feedback are constraints. So how does the AI get around the learning problem of what part of a video of someone e.g. changing an oil filter is important and needs to be conveyed to explain it to someone else?
The short answer to your question which minimizes the amount of technical jargon is that so long as there are enough high quality examples to learn from, and sufficient resources of computation and storage, the nature of the statistical tools and engine at the very heart of the latest transformer models makes them extraordinarily good at doing exactly what you are talking about. As such, having lots of distinct examples and training data is equivalent to doing lots of fine-tuning iterations, since both types of analysis will support the same inferences and conclusions.
Think of it like God-mode regression analysis. If there is no relationship between two things, then so long as there is any noisy variation in their values, then there is some number of data points which will be sufficient to show that lack of relationship to whatever threshold of confidence you want to set. You could come to the same conclusion with much less data if you used those more uncertain guesses to inform experiments, A/B trials, etc. Then use that to look in smarter places to get a little more data, make even better guesses, rinse and repeat, etc.
But MMLs won't need to do that. We have all the data they need and will be adding to it at all the time in quantity and at levels of precision and granularity that will allow MMLs to discover insights that are the fruits that have been sitting there waiting for such a moment, but which by nature are far above the reach of even the best possible organic human beings.
A small apropos anecdote: in 1996 or '97 I took an undergrad course in AI. We wrote some papers on the possibilities and limitations of various approaches then competing for academic interest. A friend of mine came up with what I thought was a very clever reductio ad absurdum task that an AI could not possibly perform: consider the Ogden Nash poem "The Ostrich" and (a) explain why it's funny, (b) translate it into French.
Yesterday I recalled that paper and decided to see how the LLMs would do on the task. ChatGPT failed with hazy generalities and an officious refusal to translate copyrighted material. Claude at first hallucinated a different poem entirely when I just referred to the poem by name and author.
But then when I copied the short text of the poem into the prompt and retried, Claude (though not ChatGPT) nailed both tasks, giving an extremely thorough and accurate anatomy of the poem's humor and doing as well as anyone could at the translation, with apropos commentary on why it was difficult to capture that sort of humor in another language.
1990s me would have thought "this is it, this is real strong AI, it's arrived, we're done." I don't think that, but boy are we in for some interesting times.
"I was confident in 1993 that the real estate industry would be disrupted". Disrupted?....perhaps not. But radically altered Yes. In 2012 (for reasons I need not bore anyone with) I needed to search for a new home over a very large swathe of southern England. And I was able to search, view and Street View literally thousands of houses all from my laptop. This was to me an internet marvel; something I simply could not have even dreamed of in 1993. (Just a little thank you to the upsides of the much maligned internet. And Yes I do know there have been huge downsides too).
I think Arnold was referring to all the money we have to pay agents, title fees, etc.
In theory Zillow gives me all the information that an agent used to give me and yet we are still paying five figures to an agent when the app did all the work.
Yes but my comment was not meant to be argumentative in any way. I was simply taking the opportunity to give thanks for a particular (albeit non earth-shattering) benefit of the internet age that - unless one has had occasion to avail oneself of it - may not be obvious to most people. Street View was the really big benefit in this instance.
A world filled with pneumatic drills was not just super dangerous but fatal to John Henry's ability to leverage his comparative advantage to earn a living.
Except where governments intervene to make it otherwise, humans will only be paid to do those tasks which AI can't do cheaper or provide an acceptable substitute. That set of tasks is going to shrink fast.
Theoretically, so long as real wages for humans fall far enough and fast enough, the organic human labor market will keep adjusting to clear and most of the labor force will remain employed.
The trouble is that even in that spherical cow econ 101 model, eventually the wages fall below what it takes to not starve to death.
So, if you think there's a lot of redistribution now, it will look like nothing compared to what's coming not so far down the line.
I don't think I'm allowed to put a link in a comment, but you should read Wired magazine's 1994 piece where the writer talks to McDonald's and it is clueless about the internet and what a domain name is. The last few paragraphs are the most fun: he registers the domain for himself and then asks readers to suggest what he should do with it by sending email to him at ronald at mcdonalds dot com
I think one big difference between now and 1993 is that people, in general, are a lot more sophisticated about (1) startups and (2) technology. This doesn't mean that most startups, whether AI or not, won't fail, but I doubt we'll see the AI version of, say, Kozmo.com. Further, many more people today seem to be taking seriously the potential that AI tech has.
"I think one big difference between now and 1993 is that people, in general, are a lot more sophisticated about (1) startups and (2) technology."
I would definitely take the opposite side of the debate on this one. Some people who lived as adults in 1993 might be more sophisticated, sure, but they are a minority of investors already today outnumbered by people born between 1973 and 2003 added to the people who literally did learn nothing in the 1990s. We really are doomed to repeat history.
It's worse than that. In an "all bets are off" AI scenario, whatever those few enlightened people actually learned in the world of yesterday won't have any relevance or applicability in the world of tomorrow.
Multisensory AI is going to be far more disruptive than the web. LLM will come to mean "large learning model" instead of 'language'. And this leap will prove to be monumental.
I think it's hard for people who live in a world of words to appreciate that bare human text, for all its power, is still a very unwieldy tool for communication, and especially for teaching and learning well enough to generate, the effectiveness diminishing as the topic becomes less abstract and symbolic in nature.
Most people don't (often can't) really learn much from bare text. As with culture in "The Secret Of Our Success", they instinctively and spontaneously observe, imitate, play, practice, and experiment under conditions of feedback and intuitive, subconscious picking up on nonverbal and environmental cues and subtle signals in social games to eventually get up to their potential or adequate level of performance.
And most masters and experts of many practical skills are terrible at teaching with words, or even thinking about how they do what they do in a fully self-aware way and in a manner that can be captured with sufficient accuracy and precision in words. Many who have tried to follow a cookbook recipe and failed and then watched a video of the author chef do it perfectly has realized that the chef has lost awareness of certain techniques she is performing and that the need to apply and master those techniques is both essential to success and something that novices wouldn't know they had to do. Even organic chemistry is full of such examples, and occasionally claims of non-replication themselves do not replicate because the original authors did not realize that some subtle but indispensable details were not adequately communicated in the original published experimental procedure.
But if the AIs can just watch everything we do with their Eagle eyes, then apply their statistical pattern finding magic to that omnicorpus, they will literally learn it all, immediately, perfectly, and then surpass us in everything, in a historical blink of an eye.
Multisensory Machine Learners (maybe MML is a good term given the visual contrast with LLM) are going to take that capacity of humans to learn from other humans and take it several levels higher. Think about face recognition, but with super sensors and ultra-polygraph, the cameras see in uv and infrared too, know your temperature, see if you're sweaty or flushed, read your emotions, know if you're lying, can detect the faintest whisper in frequencies too low and too high for us to hear, can bounce xrays and microwaves around to see below your clothes or under your skin, can spot the slightest flick of the wrist that allows for whisking up the ultimate perfection of meringue every time, and the possibilities go on forever. The human-monitored panopticon is going to be trivial compared to the power of MML universal surveillance, and that is something which is not in the sci-fi far future but which can be built -today-. And I'm sure it is. Have a nice day.
I only have one real point of disagreement here : "But if the AIs can just watch everything we do with their Eagle eyes, then apply their statistical pattern finding magic to that omnicorpus, they will literally learn it all, immediately, perfectly, and then surpass us in everything, in a historical blink of an eye."
I am very skeptical on this part, as observation alone does not tell you what was important. Thinking of blacksmithing, how important is wiping the sweat off your forehead after putting the steel back in the fire? We do it damned near every time, so is it critical? Then again, guys with nicer shops with better air conditioning don't do it so much, and they make nice stuff, so is it critical? It isn't clear to me that an observer who only observes process and output can identify these sorts of points, nor suss out how important they are.
I suspect that the AI, in order to truly learn, needs to be able to do the process to experiment and confirm, not just see it done a lot. Which is of course what people need to to do: most things cannot be taught without the student practicing at all. The closest you can do is memorization.
It will be trivial and automatic for the AIs to scrutinize a million hours of video and analyze every aspect of motion and timing to learn exactly what matters and what doesn't and better than any human ever could.
The generated output from LLMs and analogous-engines for images and music has improved so incredibly and so rapidly by just such a process of adjusting to corrective feedback. They have gotten so much better, so quickly, that they literally make most arguments against improved capability moot before humans can even make them! That is, in terms of the time it takes even the best experts to fully flesh out their attempts to make rigorous cases for why it will be a long time before the LLMs can do X, boom, there is X, and just as quickly thereafter, super-X. One wonders how many essays and papers had to be abandoned in the middle of writing them, and practically nothing that did get published has aged well and stood the test of time. By which I mean a period of time of about six months.
Could you please define what you mean by "corrective feedback"? It seems to me you have folded in "actually doing a thing, then getting graded on the results" into that concept, which is a different process for, e.g. producing text and pictures and producing physical objects such as a car or a couch. I am not sure what exactly you mean, though, but it is important because that feedback loop can be really long if it includes what I am describing as actually practicing something. Especially for processes where the way humans do it isn't the way that AIs will do it.
I also think you might be a little overly optimistic about how well AIs are doing X. They definitely have improved a great deal, but I am not convinced they do things well so much as do it fast. That is, the AI might have to go through 50 iterations to give me what I want, with my feedback at each iteration, whereas a human might need 2 iterations, but will take longer with each. That's great for the AI on relatively simple things that I can give quick feedback on, but my feedback becomes the limiting factor on more complex to check items, and so the human producer becomes better at it again as the time to produce an iteration approaches the time to check the iteration.
It often takes a lot of time and money going back and forth to get people to design you the perfect business logo or whatever.
AIs will quickly catch up and surpass even the best expert human content generators with regards to needing fewer iterations.
But a long time before then, maybe even as long as a few months, the fact that one can get another iteration from an AI incomparably faster and cheaper than one can get one from a human means that even if the humans are a lot better, it still makes economic sense to dump them all. Look for the crying canaries, it's already happening.
By the way, the greatly increased velocity of these changes is another good reason to believe this transition is going to be far more disruptive than the web. The web happened slowly enough for powerful incumbent rent-seekers to read the writing on the wall and get their act together in time to "head 'em off at the pass", dig in, and reinforce their trenches by emptying their war chests to bribe every politician in sight to bolster the laws which keep their obsolete and predatory cartel grift going. It worked. But they needed time, and it only worked because they had just enough time.
With smartphone disruption, the velocity was faster. As an example, taxi cartels.
Yes, they were less powerful, lower status, and less popular. But mostly, they couldn't get their act together in time to stop Uber which changed the "facts on the ground" faster than the incumbents could react.
AI is going even faster - MUCH faster - and whole professions will find their stream of orders run permanently dry overnight. They'll be broke and have nothing to bribe politicians with before they even realize they needed to rent-seek to survive.
I think this doesn't quite address what I was saying about learning and iteration, but rather sort of agrees with it. Simple things, like logos, sure, the iteration time is the important part, not the feedback part. But when it comes to learning, then teaching, what is important in more complex processes, the feedback (work checking) is more time consuming than the iterations in many cases. Or, in the case of creating physical things, the iteration is necessarily limited in how fast it can go, so both iteration and feedback are constraints. So how does the AI get around the learning problem of what part of a video of someone e.g. changing an oil filter is important and needs to be conveyed to explain it to someone else?
The short answer to your question which minimizes the amount of technical jargon is that so long as there are enough high quality examples to learn from, and sufficient resources of computation and storage, the nature of the statistical tools and engine at the very heart of the latest transformer models makes them extraordinarily good at doing exactly what you are talking about. As such, having lots of distinct examples and training data is equivalent to doing lots of fine-tuning iterations, since both types of analysis will support the same inferences and conclusions.
Think of it like God-mode regression analysis. If there is no relationship between two things, then so long as there is any noisy variation in their values, then there is some number of data points which will be sufficient to show that lack of relationship to whatever threshold of confidence you want to set. You could come to the same conclusion with much less data if you used those more uncertain guesses to inform experiments, A/B trials, etc. Then use that to look in smarter places to get a little more data, make even better guesses, rinse and repeat, etc.
But MMLs won't need to do that. We have all the data they need and will be adding to it at all the time in quantity and at levels of precision and granularity that will allow MMLs to discover insights that are the fruits that have been sitting there waiting for such a moment, but which by nature are far above the reach of even the best possible organic human beings.
A small apropos anecdote: in 1996 or '97 I took an undergrad course in AI. We wrote some papers on the possibilities and limitations of various approaches then competing for academic interest. A friend of mine came up with what I thought was a very clever reductio ad absurdum task that an AI could not possibly perform: consider the Ogden Nash poem "The Ostrich" and (a) explain why it's funny, (b) translate it into French.
Yesterday I recalled that paper and decided to see how the LLMs would do on the task. ChatGPT failed with hazy generalities and an officious refusal to translate copyrighted material. Claude at first hallucinated a different poem entirely when I just referred to the poem by name and author.
But then when I copied the short text of the poem into the prompt and retried, Claude (though not ChatGPT) nailed both tasks, giving an extremely thorough and accurate anatomy of the poem's humor and doing as well as anyone could at the translation, with apropos commentary on why it was difficult to capture that sort of humor in another language.
1990s me would have thought "this is it, this is real strong AI, it's arrived, we're done." I don't think that, but boy are we in for some interesting times.
"I was confident in 1993 that the real estate industry would be disrupted". Disrupted?....perhaps not. But radically altered Yes. In 2012 (for reasons I need not bore anyone with) I needed to search for a new home over a very large swathe of southern England. And I was able to search, view and Street View literally thousands of houses all from my laptop. This was to me an internet marvel; something I simply could not have even dreamed of in 1993. (Just a little thank you to the upsides of the much maligned internet. And Yes I do know there have been huge downsides too).
I think Arnold was referring to all the money we have to pay agents, title fees, etc.
In theory Zillow gives me all the information that an agent used to give me and yet we are still paying five figures to an agent when the app did all the work.
Yes but my comment was not meant to be argumentative in any way. I was simply taking the opportunity to give thanks for a particular (albeit non earth-shattering) benefit of the internet age that - unless one has had occasion to avail oneself of it - may not be obvious to most people. Street View was the really big benefit in this instance.
I think a comparison to electricity is more apt. In the beginning, it was all lightning bolts and fire.
We no longer associate electricity primarily with lightning bolts and fire. It's everyday and it's boring.
But you could see people arguing how a world filled with electricity would be super dangerous!
A world filled with pneumatic drills was not just super dangerous but fatal to John Henry's ability to leverage his comparative advantage to earn a living.
Except where governments intervene to make it otherwise, humans will only be paid to do those tasks which AI can't do cheaper or provide an acceptable substitute. That set of tasks is going to shrink fast.
Theoretically, so long as real wages for humans fall far enough and fast enough, the organic human labor market will keep adjusting to clear and most of the labor force will remain employed.
The trouble is that even in that spherical cow econ 101 model, eventually the wages fall below what it takes to not starve to death.
So, if you think there's a lot of redistribution now, it will look like nothing compared to what's coming not so far down the line.
I don't think I'm allowed to put a link in a comment, but you should read Wired magazine's 1994 piece where the writer talks to McDonald's and it is clueless about the internet and what a domain name is. The last few paragraphs are the most fun: he registers the domain for himself and then asks readers to suggest what he should do with it by sending email to him at ronald at mcdonalds dot com
I think one big difference between now and 1993 is that people, in general, are a lot more sophisticated about (1) startups and (2) technology. This doesn't mean that most startups, whether AI or not, won't fail, but I doubt we'll see the AI version of, say, Kozmo.com. Further, many more people today seem to be taking seriously the potential that AI tech has.
"I think one big difference between now and 1993 is that people, in general, are a lot more sophisticated about (1) startups and (2) technology."
I would definitely take the opposite side of the debate on this one. Some people who lived as adults in 1993 might be more sophisticated, sure, but they are a minority of investors already today outnumbered by people born between 1973 and 2003 added to the people who literally did learn nothing in the 1990s. We really are doomed to repeat history.
It's worse than that. In an "all bets are off" AI scenario, whatever those few enlightened people actually learned in the world of yesterday won't have any relevance or applicability in the world of tomorrow.