18 Comments

On the Hoel piece: he seems to be doing this thing where, having identified something that a LLM can't (yet?) do well, he concludes that it is overhyped and useless. (Gary Marcus does something similar.) It's an annoying rhetorical sleight of hand; the correct way to understand LLMs is with Ethan Mollick's 'jagged frontier' metaphor. LLMs are good at certain tasks and not good at others, and it's our job as humans to figure that out.

Expand full comment

Right. For several years now there has been a thriving industry in commentary along the lines of "Well, they can't do THIS yet, so that means they are DUMB USELESS TOYS!" Then they can do that thing just the next month, and the critics either push the goalposts farther out (now often past 99% of humans) or else find some edge case like "doesn't understand the phonics approach to teaching humans literacy in alphabetical script" because, to put it half-metaphorically, LLM were "taught to read (and write) with whole word instruction".

Eventually it is going to be so hard to discover new things general AIs can't do that human critics will be left in the dust and we'll have to use other AIs specialized to perform that exact task to do it.

Expand full comment

I think some healthy skepticism may be warranted given all of the AI hype. Outside of the Tesla robot, I’m still searching for compelling use cases. If you have those use cases, please share them so that I can change my mind. Until that time, I wish we had self driving cars over something that can nicely summarize some econ paper from 1984.

Expand full comment
Jul 13·edited Jul 14

My son tasks AI to write code in languages he doesn't know and he is able to tweak what gets to something adequate. If he started from scratch there would be so many commands he had to search and so many syntax errors it would take far longer than AI making the first cut.

I don't know where the AI-written code is as far as efficiency but I don't honestly know that regarding my son either but I can say with little doubt that most coders create rather inefficient code or expend far to much labor for rather unimportant improvements, or both.

Expand full comment
Jul 13·edited Jul 13

In general the compelling use case is "replacing the best expensive humans at scale."

I think that the fact that a lot of this started with "words-in, words-out" systems is misleading a lot of people because people have been focusing on that kind of framework and mostly learning how to use them and discussing those kinds of systems.

Instead, people should be focusing on "A system which has (1) incredibly effective learning and training by detailed and insightful observation of huge amounts of complicated examples in all kinds of sensory dimensions and which happens with minimum human effort in structuring or guiding the learning process, and (2) which can interact with humans and receive instructions and refining corrections quickly with small amounts of ordinary human language."

So, what happened first is that we "trained" the chat systems on huge collections of strings of ideas expressed in human language words. And lots of people are focusing too much on that, without seeing the unlimited potential of the bigger picture. For example, when we train them on stuff which is not just words, like the set of all pictures or videos tagged with descriptions and styles, then they are also extremely good at generating that stuff, being as good as the best humans, but much faster and cheaper.

And when we train them on closely observing what the best, most expensive humans in most fields do, then they are going to figure out "how the humans do it, and how it can be done at least as well with machine capabilities." While the fixed costs will be high*, once all the lessons there are to learn are captured in an always-improving software system, the marginal cost will be very low of making copies and scaling to supply as much as that activity as the global economy will demand, and that marginal scaling cost is likely to continue to decline rapidly.

As for the cars, my impression from the latest reports about them is that we now have self-driving cars that are objectively no more dangerous on average than human drivers and perhaps even better (and continuing to improve), it's just that most humans are a lot more sensitive to robot danger than to human danger.

Eventually the sensitivity differential will relax and the safety differential will expand and the "driving-relevant terrain" will continue to adjust in a more robot-car-friendly directions, with the robot-cars all networking and communications with each other to perfectly manage traffic and prevent accidents in ways that humans can't, and so it's even plausible that in the 2030s some places will start banning human drivers altogether, or having whole urban areas excluding privately-owned vehicles so that one only sees work, delivery, and robo-taxi vehicles.

*Because the costs of compute for training are currently so high, innovations that can even shave a few percentage points off could be worth billions, and that creates an opportunity for big money for anyone who can do it, and just to show how rapidly things are moving in that direction, there are already start-ups designing chips specialized to perform these particular compute-intense training tasks as efficiently as possible.

Expand full comment

Here is where we are on autonomous driving. Warning: it’s scary and not in a good way.

https://youtu.be/4Ra2mA3an7M?si=cQIBqYqUvgTZPkOF

Here is where we were supposed to be in 2020 (as of 2015):

“Self-driving cars: from 2020 you will become a permanent backseat driver”

https://www.theguardian.com/technology/2015/sep/13/self-driving-cars-bmw-google-2020-driving

Notice any divergence between the promises vs. actuals almost 10 years later?

Expand full comment

Agree on the big picture goal. But, notice the divergence from what you’re describing as some future state vs. the various recent AI events and forecasted AI features. A bot that found my glasses on my desk is a long way from a bot that can replace a physical therapist or care giver. I know, I know, it’s just around the corner…

If you wanna talk about Tesla Optimus or whatever Boston Dynamics is doing, then I’m all in. But, a better version of Siri or Cortana is a yawner and that is mostly what I’m seeing from big tech and on this blog.

Expand full comment

Not sure why I'm blocked from liking this one.

Expand full comment

A somewhat pedantic point, but I think worth understanding for exactly these situations, is that LLMs see neither letters nor whole words but something somewhat in between called tokens, which can represent anything from individual letters to words but mostly represent common groups of letters / word fragments. Best explanation I’ve seen is here, though slightly dated:

https://gwern.net/gpt-3#bpes

Expand full comment

"I worry that the LLM will blend into the conversation some mostly generic responses that do not reflect the individuality of the deceased person."

If there is a need to be historically accurate, a bot seems the wrong tool. If the goal is closure, relief of grief, or othewise working through issues, I fail to see the need for accuracy. Maybe I'm missing something.

Expand full comment
founding

Arnold often emphasizes the importance of the Dunbar number (group size larger or smaller than ~150) as a threshold for formal vs. informal management organization. Above the Dunbar number almost everything must be reduced to text: policies, procedures, workflows, etc. Below the number there is more reliance on face to face interactions, unspoken rules, etc.

I wonder if this threshold is also applicable to LLMs in some fashion. They seem to have an "above Dunbar" aspect in that they are much better at handling requests that involve pure text as opposed to those that entail speech and the subtleties of human interaction, such as Hoel's problem.

Expand full comment

"I worry that the LLM will blend into the conversation some mostly generic responses that do not reflect the individuality of the deceased person."

The LLM will blend in nothing but generic responses that don't reflect the individuality of the deceased person. That person is dead.

Expand full comment

Sounds as if someone got stuck in stage 1 of the grieving process for 8 years. That’s sad. Whether a robot will help that person progress vs. an SSRI, counseling, religion or just plain old journaling seems doubtful. I myself find the robot thing, as described, kind of icky and inhumane. I mean, if I passed away, I would never want my family talking to a robot thinking that it was I. And, I have no desire to talk to my virtual faux spouse and ruin all of the real memories we created together.

Expand full comment

My OG nuclear family was not hyper-verbal. The robot would have to be trained to occasionally make note of something he was reading in the newspaper, then revert to silence.

Expand full comment

Maybe all the workers displaced by AI can be hired to do customer service. My most recent encounter with an automated customer agent was no less frustrating than all the others.

The problem is companies do not want to trust computers with anything other than trivial tasks. In my case, I have a billing issue that points to a glitch in the corporate system - something is wrong in how my bill is calculated.

The AI system is not programmed to actually look at billing integrity. Imagine how nice it would be if it was! Imagine if calling a corporate AI system enabled a customer to discover all sorts of billing anomalies, in the customer's favor? So no. To actually solve discrepancies customers are forced to talk to a human and this bottleneck of customer service is only getting worse.

Expand full comment

You say that companies do not trust computers to do certain things, but in my extensive experience with human customer service in the last few decades, I'd say they don't "trust" (or empower) the first-layer humans to do much of anything either, and that their level of authority and helpfulness has declined significantly in the past ten years or so. As often as not the first-layer customer service agent and often their supervisor too are helpless human shields insulating the company from the user by reference to inscrutable "policy" or "the system won't let me ..."

One "killer app" use case for AI could be some kind of quick, cheap, and easy capability to deal with "everyone knows this is stupid but too hard to get fixed in this organization with our tech debt burden" problems.

For example, I recently had to arrange a transfer of funds out of an account. Now this long-established institution is huge, prominent, and handles billions of dollars. I should have been able to do this entirely online. I did everything the website told me to do and filled in all the spaces in the forms right, clicked the "I have read and understand that ..." boxes to volumes of text I didn't read and wouldn't really "understand" even if I had read them, and then at the end of the process got "this cannot be completed, please call customer service."

This error message did not tell me (1) what went wrong, so I could see if it was some error I might be able to fix on my own, or (2) provide me with the customer service phone number, which, no surprise these days, you can only find on their website by doing some persistent digging and clicking through to pages trying to head you off at the pass.

When I finally got a hold of a human, she was able to look up the history of everything I had done and filled out, and -she- saw the error message, which was that I had not yet turned on 2-factor authentication for the account. Now, some context, 2FA is annoying for me and I leave it turned off when I can, because I am sometimes working and taking care of errands like these in spaces where smartphones are not permitted.

So, the conversation went: "It says 2FA is supposed to be optional, but you're saying it's not optional if I actually want to do anything?" - "Right, it's optional, but not if you want to buy or sell or transfer in or out." - "I could have turned 2FA on myself from the website. Why didn't it just tell me that was the issue, and then I could have dealt with it myself?" - "Yeah, we know, but apparently it's hard to make updates like that to 'the system' or not a priority or something." - "Ok, well, I guess please turn it on, since, you know, that's pretty much the definition of things one does with a financial account, right?" - "Ok, please check your phone for ... " - "Can I call you back in five minutes, because I'm not talking to you on my phone, and now I have to leave this space, call you back on that phone, and ... " - "Yes, you can call me back" (several minutes later) "Ok, yeah, the 2FA code is 123456." - "Ok, turned on. Now you can start the whole process over and waste 15 minutes filling out the forms again to do the transfer." - "But you are literally looking at everything I typed in the forms, which is saved to your system. Can you just restore that penultimate-step state and have me just click "ok" at the end." - "No, you have to start over from scratch. See, the system ... " - "Yeah, the system, I get it."

It seems to me that there could be an AI to help companies like this quickly, easily, and cheaply improve their website functionality in response to such complaints, as opposed to the culturally ubiquitous condition of fatalistic learned helplessness for countless IT issues we all presently just grudgingly endure as part of life. But now - just maybe - we don't have to live this way.

Expand full comment

Surely most are as you say but there are a few companies out there who do better. At least one or two reknowned for doing so. I think Zappos is one of them. My experience with Garmin is that they also give great latitude to first level customer service staff. I have that impression about a few other companies though those could have simply been the circumstances.

Expand full comment

I really love your substack. Please keep up the good work. (Text not generated by an LLM!)

Expand full comment