LLM links
using an LLM to talk to the dead; LLM's as freelancers? Tyler Cowen shows off Claude's brilliance; Erik Hoel shows off Claude's stupidity
Barbeau talked to Jessica for almost 10 hours that first night. The bot seemed genuinely intelligent—an individual. Most surprising to him, it seemed to be emotionally intelligent, knowing when and how to say the right thing.
Barbeau began to think that perhaps this could help him resolve the grief he’d been living with for the past eight years. Perhaps he could tell the Jessica bot all the things he had wanted to tell the real Jessica; perhaps it would feel almost the same. “The chats I had with the bot exceeded my wildest expectations,” Barbeau wrote
I think that having conversations with dead people is a really important application for large language models. Recall Tyler Cowen’s conversation with Jonathan Swift. I worry that the LLM will blend into the conversation some mostly generic responses that do not reflect the individuality of the deceased person.
I have not tried Delphi yet, but this could be a killer app for the LLMs.
Also, the WSJ reports,
Freelance jobs that require basic writing, coding or translation are disappearing across postings on job board Upwork, said Kelly Monahan, managing director of the company’s Research Institute.
Her findings echo those of more than a dozen other researchers at institutions including Harvard Business School, Washington University in St. Louis and the University of Hong Kong. They have found that since the debut of ChatGPT and other generative AI models, the number of freelance jobs posted on Upwork, Fiverr and related platforms, in the areas in which generative AI excels, have dropped by as much as 21%.
Tyler Cowen asks Claude 3.5 (Anthropic),
Is the Stiglitz-Shapiro 1984 efficiency wage model actually a model of sticky wages, or not? Is either the real or nominal wage sticky in that model?
After reprinting Claude’s answer, he writes,
What percentage of professional economists could give an answer of comparable quality and nuance?
I would say nearly zero. Very few economists would give such a clear, well-organized and (in my opinion) correct answer. And based on very limited experience, I trust Claude much more than ChatGPT on academic economics.
In fact, this example shook up my mental model of what this software does. I cannot see getting an answer like this by manipulating vectors of tokens. There must be some other tricks in the algorithm or training process. Or are we at the stage where the model is doing self-teaching?
On the other hand, Erik Hoel writes,
I’m always looking for new simple stories that I can put onto an iPad and have my son read. To do that, I need to think of three sentences that only use the most common letter sounds, like: “Bob ran fast. Bob slips in mud. Bob got up.” Coming up with these is no great mental feat, and it’s tedious. It would be helpful to get a couple dozen example sentences at a request instead of thinking of them all myself. A real actual AI use-case. So I asked Sonnet
Several of the stories that Sonnet came up with violated Hoel’s instructions.
Incorrect words that don’t match the request include: “tried,” “pail,” “eat,” “liked,” “yellow,” “shop,” “night” (come on!), “lollipop,” “with,” “splash,” and more.
I tried his same prompt with ChatGPT, and it did a better job, but still not up to his standards. My thinking is that the reinforcement learning that humans give LLMs corrects errors that they make with semantics, not with the sound of words. Plus, as some of Erik’s commenters pointed out, the LLM’s basic unit is the whole word, not the sound of letters. So he was giving it what may have seemed to him like a straightforward task for a human but for an LLM is actually unusually hard.
substacks referenced above:
@
On the Hoel piece: he seems to be doing this thing where, having identified something that a LLM can't (yet?) do well, he concludes that it is overhyped and useless. (Gary Marcus does something similar.) It's an annoying rhetorical sleight of hand; the correct way to understand LLMs is with Ethan Mollick's 'jagged frontier' metaphor. LLMs are good at certain tasks and not good at others, and it's our job as humans to figure that out.
A somewhat pedantic point, but I think worth understanding for exactly these situations, is that LLMs see neither letters nor whole words but something somewhat in between called tokens, which can represent anything from individual letters to words but mostly represent common groups of letters / word fragments. Best explanation I’ve seen is here, though slightly dated:
https://gwern.net/gpt-3#bpes