On the Hoel piece: he seems to be doing this thing where, having identified something that a LLM can't (yet?) do well, he concludes that it is overhyped and useless. (Gary Marcus does something similar.) It's an annoying rhetorical sleight of hand; the correct way to understand LLMs is with Ethan Mollick's 'jagged frontier' metaphor. LLMs are good at certain tasks and not good at others, and it's our job as humans to figure that out.
Right. For several years now there has been a thriving industry in commentary along the lines of "Well, they can't do THIS yet, so that means they are DUMB USELESS TOYS!" Then they can do that thing just the next month, and the critics either push the goalposts farther out (now often past 99% of humans) or else find some edge case like "doesn't understand the phonics approach to teaching humans literacy in alphabetical script" because, to put it half-metaphorically, LLM were "taught to read (and write) with whole word instruction".
Eventually it is going to be so hard to discover new things general AIs can't do that human critics will be left in the dust and we'll have to use other AIs specialized to perform that exact task to do it.
My son tasks AI to write code in languages he doesn't know and he is able to tweak what gets to something adequate. If he started from scratch there would be so many commands he had to search and so many syntax errors it would take far longer than AI making the first cut.
I don't know where the AI-written code is as far as efficiency but I don't honestly know that regarding my son either but I can say with little doubt that most coders create rather inefficient code or expend far to much labor for rather unimportant improvements, or both.
In general the compelling use case is "replacing the best expensive humans at scale."
I think that the fact that a lot of this started with "words-in, words-out" systems is misleading a lot of people because people have been focusing on that kind of framework and mostly learning how to use them and discussing those kinds of systems.
Instead, people should be focusing on "A system which has (1) incredibly effective learning and training by detailed and insightful observation of huge amounts of complicated examples in all kinds of sensory dimensions and which happens with minimum human effort in structuring or guiding the learning process, and (2) which can interact with humans and receive instructions and refining corrections quickly with small amounts of ordinary human language."
So, what happened first is that we "trained" the chat systems on huge collections of strings of ideas expressed in human language words. And lots of people are focusing too much on that, without seeing the unlimited potential of the bigger picture. For example, when we train them on stuff which is not just words, like the set of all pictures or videos tagged with descriptions and styles, then they are also extremely good at generating that stuff, being as good as the best humans, but much faster and cheaper.
And when we train them on closely observing what the best, most expensive humans in most fields do, then they are going to figure out "how the humans do it, and how it can be done at least as well with machine capabilities." While the fixed costs will be high*, once all the lessons there are to learn are captured in an always-improving software system, the marginal cost will be very low of making copies and scaling to supply as much as that activity as the global economy will demand, and that marginal scaling cost is likely to continue to decline rapidly.
As for the cars, my impression from the latest reports about them is that we now have self-driving cars that are objectively no more dangerous on average than human drivers and perhaps even better (and continuing to improve), it's just that most humans are a lot more sensitive to robot danger than to human danger.
Eventually the sensitivity differential will relax and the safety differential will expand and the "driving-relevant terrain" will continue to adjust in a more robot-car-friendly directions, with the robot-cars all networking and communications with each other to perfectly manage traffic and prevent accidents in ways that humans can't, and so it's even plausible that in the 2030s some places will start banning human drivers altogether, or having whole urban areas excluding privately-owned vehicles so that one only sees work, delivery, and robo-taxi vehicles.
*Because the costs of compute for training are currently so high, innovations that can even shave a few percentage points off could be worth billions, and that creates an opportunity for big money for anyone who can do it, and just to show how rapidly things are moving in that direction, there are already start-ups designing chips specialized to perform these particular compute-intense training tasks as efficiently as possible.
A somewhat pedantic point, but I think worth understanding for exactly these situations, is that LLMs see neither letters nor whole words but something somewhat in between called tokens, which can represent anything from individual letters to words but mostly represent common groups of letters / word fragments. Best explanation I’ve seen is here, though slightly dated:
"I worry that the LLM will blend into the conversation some mostly generic responses that do not reflect the individuality of the deceased person."
If there is a need to be historically accurate, a bot seems the wrong tool. If the goal is closure, relief of grief, or othewise working through issues, I fail to see the need for accuracy. Maybe I'm missing something.
Arnold often emphasizes the importance of the Dunbar number (group size larger or smaller than ~150) as a threshold for formal vs. informal management organization. Above the Dunbar number almost everything must be reduced to text: policies, procedures, workflows, etc. Below the number there is more reliance on face to face interactions, unspoken rules, etc.
I wonder if this threshold is also applicable to LLMs in some fashion. They seem to have an "above Dunbar" aspect in that they are much better at handling requests that involve pure text as opposed to those that entail speech and the subtleties of human interaction, such as Hoel's problem.
My OG nuclear family was not hyper-verbal. The robot would have to be trained to occasionally make note of something he was reading in the newspaper, then revert to silence.
Maybe all the workers displaced by AI can be hired to do customer service. My most recent encounter with an automated customer agent was no less frustrating than all the others.
The problem is companies do not want to trust computers with anything other than trivial tasks. In my case, I have a billing issue that points to a glitch in the corporate system - something is wrong in how my bill is calculated.
The AI system is not programmed to actually look at billing integrity. Imagine how nice it would be if it was! Imagine if calling a corporate AI system enabled a customer to discover all sorts of billing anomalies, in the customer's favor? So no. To actually solve discrepancies customers are forced to talk to a human and this bottleneck of customer service is only getting worse.
You say that companies do not trust computers to do certain things, but in my extensive experience with human customer service in the last few decades, I'd say they don't "trust" (or empower) the first-layer humans to do much of anything either, and that their level of authority and helpfulness has declined significantly in the past ten years or so. As often as not the first-layer customer service agent and often their supervisor too are helpless human shields insulating the company from the user by reference to inscrutable "policy" or "the system won't let me ..."
One "killer app" use case for AI could be some kind of quick, cheap, and easy capability to deal with "everyone knows this is stupid but too hard to get fixed in this organization with our tech debt burden" problems.
For example, I recently had to arrange a transfer of funds out of an account. Now this long-established institution is huge, prominent, and handles billions of dollars. I should have been able to do this entirely online. I did everything the website told me to do and filled in all the spaces in the forms right, clicked the "I have read and understand that ..." boxes to volumes of text I didn't read and wouldn't really "understand" even if I had read them, and then at the end of the process got "this cannot be completed, please call customer service."
This error message did not tell me (1) what went wrong, so I could see if it was some error I might be able to fix on my own, or (2) provide me with the customer service phone number, which, no surprise these days, you can only find on their website by doing some persistent digging and clicking through to pages trying to head you off at the pass.
When I finally got a hold of a human, she was able to look up the history of everything I had done and filled out, and -she- saw the error message, which was that I had not yet turned on 2-factor authentication for the account. Now, some context, 2FA is annoying for me and I leave it turned off when I can, because I am sometimes working and taking care of errands like these in spaces where smartphones are not permitted.
So, the conversation went: "It says 2FA is supposed to be optional, but you're saying it's not optional if I actually want to do anything?" - "Right, it's optional, but not if you want to buy or sell or transfer in or out." - "I could have turned 2FA on myself from the website. Why didn't it just tell me that was the issue, and then I could have dealt with it myself?" - "Yeah, we know, but apparently it's hard to make updates like that to 'the system' or not a priority or something." - "Ok, well, I guess please turn it on, since, you know, that's pretty much the definition of things one does with a financial account, right?" - "Ok, please check your phone for ... " - "Can I call you back in five minutes, because I'm not talking to you on my phone, and now I have to leave this space, call you back on that phone, and ... " - "Yes, you can call me back" (several minutes later) "Ok, yeah, the 2FA code is 123456." - "Ok, turned on. Now you can start the whole process over and waste 15 minutes filling out the forms again to do the transfer." - "But you are literally looking at everything I typed in the forms, which is saved to your system. Can you just restore that penultimate-step state and have me just click "ok" at the end." - "No, you have to start over from scratch. See, the system ... " - "Yeah, the system, I get it."
It seems to me that there could be an AI to help companies like this quickly, easily, and cheaply improve their website functionality in response to such complaints, as opposed to the culturally ubiquitous condition of fatalistic learned helplessness for countless IT issues we all presently just grudgingly endure as part of life. But now - just maybe - we don't have to live this way.
Surely most are as you say but there are a few companies out there who do better. At least one or two reknowned for doing so. I think Zappos is one of them. My experience with Garmin is that they also give great latitude to first level customer service staff. I have that impression about a few other companies though those could have simply been the circumstances.
On the Hoel piece: he seems to be doing this thing where, having identified something that a LLM can't (yet?) do well, he concludes that it is overhyped and useless. (Gary Marcus does something similar.) It's an annoying rhetorical sleight of hand; the correct way to understand LLMs is with Ethan Mollick's 'jagged frontier' metaphor. LLMs are good at certain tasks and not good at others, and it's our job as humans to figure that out.
Right. For several years now there has been a thriving industry in commentary along the lines of "Well, they can't do THIS yet, so that means they are DUMB USELESS TOYS!" Then they can do that thing just the next month, and the critics either push the goalposts farther out (now often past 99% of humans) or else find some edge case like "doesn't understand the phonics approach to teaching humans literacy in alphabetical script" because, to put it half-metaphorically, LLM were "taught to read (and write) with whole word instruction".
Eventually it is going to be so hard to discover new things general AIs can't do that human critics will be left in the dust and we'll have to use other AIs specialized to perform that exact task to do it.
My son tasks AI to write code in languages he doesn't know and he is able to tweak what gets to something adequate. If he started from scratch there would be so many commands he had to search and so many syntax errors it would take far longer than AI making the first cut.
I don't know where the AI-written code is as far as efficiency but I don't honestly know that regarding my son either but I can say with little doubt that most coders create rather inefficient code or expend far to much labor for rather unimportant improvements, or both.
In general the compelling use case is "replacing the best expensive humans at scale."
I think that the fact that a lot of this started with "words-in, words-out" systems is misleading a lot of people because people have been focusing on that kind of framework and mostly learning how to use them and discussing those kinds of systems.
Instead, people should be focusing on "A system which has (1) incredibly effective learning and training by detailed and insightful observation of huge amounts of complicated examples in all kinds of sensory dimensions and which happens with minimum human effort in structuring or guiding the learning process, and (2) which can interact with humans and receive instructions and refining corrections quickly with small amounts of ordinary human language."
So, what happened first is that we "trained" the chat systems on huge collections of strings of ideas expressed in human language words. And lots of people are focusing too much on that, without seeing the unlimited potential of the bigger picture. For example, when we train them on stuff which is not just words, like the set of all pictures or videos tagged with descriptions and styles, then they are also extremely good at generating that stuff, being as good as the best humans, but much faster and cheaper.
And when we train them on closely observing what the best, most expensive humans in most fields do, then they are going to figure out "how the humans do it, and how it can be done at least as well with machine capabilities." While the fixed costs will be high*, once all the lessons there are to learn are captured in an always-improving software system, the marginal cost will be very low of making copies and scaling to supply as much as that activity as the global economy will demand, and that marginal scaling cost is likely to continue to decline rapidly.
As for the cars, my impression from the latest reports about them is that we now have self-driving cars that are objectively no more dangerous on average than human drivers and perhaps even better (and continuing to improve), it's just that most humans are a lot more sensitive to robot danger than to human danger.
Eventually the sensitivity differential will relax and the safety differential will expand and the "driving-relevant terrain" will continue to adjust in a more robot-car-friendly directions, with the robot-cars all networking and communications with each other to perfectly manage traffic and prevent accidents in ways that humans can't, and so it's even plausible that in the 2030s some places will start banning human drivers altogether, or having whole urban areas excluding privately-owned vehicles so that one only sees work, delivery, and robo-taxi vehicles.
*Because the costs of compute for training are currently so high, innovations that can even shave a few percentage points off could be worth billions, and that creates an opportunity for big money for anyone who can do it, and just to show how rapidly things are moving in that direction, there are already start-ups designing chips specialized to perform these particular compute-intense training tasks as efficiently as possible.
Not sure why I'm blocked from liking this one.
A somewhat pedantic point, but I think worth understanding for exactly these situations, is that LLMs see neither letters nor whole words but something somewhat in between called tokens, which can represent anything from individual letters to words but mostly represent common groups of letters / word fragments. Best explanation I’ve seen is here, though slightly dated:
https://gwern.net/gpt-3#bpes
"I worry that the LLM will blend into the conversation some mostly generic responses that do not reflect the individuality of the deceased person."
If there is a need to be historically accurate, a bot seems the wrong tool. If the goal is closure, relief of grief, or othewise working through issues, I fail to see the need for accuracy. Maybe I'm missing something.
Arnold often emphasizes the importance of the Dunbar number (group size larger or smaller than ~150) as a threshold for formal vs. informal management organization. Above the Dunbar number almost everything must be reduced to text: policies, procedures, workflows, etc. Below the number there is more reliance on face to face interactions, unspoken rules, etc.
I wonder if this threshold is also applicable to LLMs in some fashion. They seem to have an "above Dunbar" aspect in that they are much better at handling requests that involve pure text as opposed to those that entail speech and the subtleties of human interaction, such as Hoel's problem.
"I worry that the LLM will blend into the conversation some mostly generic responses that do not reflect the individuality of the deceased person."
The LLM will blend in nothing but generic responses that don't reflect the individuality of the deceased person. That person is dead.
My OG nuclear family was not hyper-verbal. The robot would have to be trained to occasionally make note of something he was reading in the newspaper, then revert to silence.
Maybe all the workers displaced by AI can be hired to do customer service. My most recent encounter with an automated customer agent was no less frustrating than all the others.
The problem is companies do not want to trust computers with anything other than trivial tasks. In my case, I have a billing issue that points to a glitch in the corporate system - something is wrong in how my bill is calculated.
The AI system is not programmed to actually look at billing integrity. Imagine how nice it would be if it was! Imagine if calling a corporate AI system enabled a customer to discover all sorts of billing anomalies, in the customer's favor? So no. To actually solve discrepancies customers are forced to talk to a human and this bottleneck of customer service is only getting worse.
You say that companies do not trust computers to do certain things, but in my extensive experience with human customer service in the last few decades, I'd say they don't "trust" (or empower) the first-layer humans to do much of anything either, and that their level of authority and helpfulness has declined significantly in the past ten years or so. As often as not the first-layer customer service agent and often their supervisor too are helpless human shields insulating the company from the user by reference to inscrutable "policy" or "the system won't let me ..."
One "killer app" use case for AI could be some kind of quick, cheap, and easy capability to deal with "everyone knows this is stupid but too hard to get fixed in this organization with our tech debt burden" problems.
For example, I recently had to arrange a transfer of funds out of an account. Now this long-established institution is huge, prominent, and handles billions of dollars. I should have been able to do this entirely online. I did everything the website told me to do and filled in all the spaces in the forms right, clicked the "I have read and understand that ..." boxes to volumes of text I didn't read and wouldn't really "understand" even if I had read them, and then at the end of the process got "this cannot be completed, please call customer service."
This error message did not tell me (1) what went wrong, so I could see if it was some error I might be able to fix on my own, or (2) provide me with the customer service phone number, which, no surprise these days, you can only find on their website by doing some persistent digging and clicking through to pages trying to head you off at the pass.
When I finally got a hold of a human, she was able to look up the history of everything I had done and filled out, and -she- saw the error message, which was that I had not yet turned on 2-factor authentication for the account. Now, some context, 2FA is annoying for me and I leave it turned off when I can, because I am sometimes working and taking care of errands like these in spaces where smartphones are not permitted.
So, the conversation went: "It says 2FA is supposed to be optional, but you're saying it's not optional if I actually want to do anything?" - "Right, it's optional, but not if you want to buy or sell or transfer in or out." - "I could have turned 2FA on myself from the website. Why didn't it just tell me that was the issue, and then I could have dealt with it myself?" - "Yeah, we know, but apparently it's hard to make updates like that to 'the system' or not a priority or something." - "Ok, well, I guess please turn it on, since, you know, that's pretty much the definition of things one does with a financial account, right?" - "Ok, please check your phone for ... " - "Can I call you back in five minutes, because I'm not talking to you on my phone, and now I have to leave this space, call you back on that phone, and ... " - "Yes, you can call me back" (several minutes later) "Ok, yeah, the 2FA code is 123456." - "Ok, turned on. Now you can start the whole process over and waste 15 minutes filling out the forms again to do the transfer." - "But you are literally looking at everything I typed in the forms, which is saved to your system. Can you just restore that penultimate-step state and have me just click "ok" at the end." - "No, you have to start over from scratch. See, the system ... " - "Yeah, the system, I get it."
It seems to me that there could be an AI to help companies like this quickly, easily, and cheaply improve their website functionality in response to such complaints, as opposed to the culturally ubiquitous condition of fatalistic learned helplessness for countless IT issues we all presently just grudgingly endure as part of life. But now - just maybe - we don't have to live this way.
Surely most are as you say but there are a few companies out there who do better. At least one or two reknowned for doing so. I think Zappos is one of them. My experience with Garmin is that they also give great latitude to first level customer service staff. I have that impression about a few other companies though those could have simply been the circumstances.
I really love your substack. Please keep up the good work. (Text not generated by an LLM!)