LLM Links, 2/1/2025
Rowan Cheung on OpenAI's agent; Ian Leslie on how he uses LLMs; Ethan Mollick helps you pick an AI; Noah Carl says fear sudden change
I got early access to ChatGPT Operator. It's OpenAI's new AI agent that autonomously takes action across the web on your behalf. The 9 most impressive use cases I’ve tried…
Booking a one-time house cleaner for my home through the Thumbtack integration based on my budget. ChatGPT Operator came back to me with four highly rated options within my price range…
Operator *operates* within ChatGPT, but it's a completely different tool. Its output lengths are small, and its true purpose is to take actions across the web (typing, clicking, scrolling). Meaning it's not like ChatGPT, which can produce essays and write long code
I am not yet excited by any of these examples. I suspect that a task that would be useful to me (“search new releases in books and recommend a few for me”) would not be done well by the current crop of AI agents.
Ian Leslie offers examples of queries he has used,
What was the relationship, in terms of thinking and ideas, between FR Leavis and the Bloomsbury Group?
I will be at [x hotel] in Perthshire on the morning of 26 September. What is the best travel plan to get me to central London by midday?
Those are among dozens of uses that he has found for ChatGPT and/or Claude. Some of them I would not bother with. But mostly they make me feel that I am too much in a rut in my life.
Claude has the smallest number of features of any of these three systems, and really only has one model you care about - Claude 3.5 Sonnet. But Sonnet is very, very good. It often seems to be clever and insightful in ways that the other models are not. A lot of people end up using Claude as their primary model as a result, even though it is not as feature rich.
That describes me. Which means I am not keeping up with features like “reasoning” or acting as an agent. You cannot rely on me to be on top of the state of the art in machine learning models. I just hope that the links I provide are helpful.
There is an immense amount of cope about AI, especially from conservatives. This cope comes in two forms. First, there is the claim that AI isn’t really very impressive and can’t really do very much. Second, there is the claim that while AI is quite impressive and can do quite a lot, its effects on society will be largely or wholly positive.
He goes through an extensive list of examples of tests of humans vs. large language models, with the models often winning by at least some criteria. For example,
Lauren Martin and colleagues had AI review legal contracts and compared its performance to that of human lawyers from the US and New Zealand. They found that the best-performing AI matched human performance in terms of accuracy, and far exceeded human performance in terms of both speed and cost efficiency.
Carl speculates,
AI is going to upend the labour market, thereby shattering status hierarchies and perhaps fomenting social unrest. Yet conservatives, for the most part, don’t seem bothered.
I think that this will take a while to play out. When I lookedat the major business classifications in the U.S. economy, I did not foresee job losses everywhere. Also, for the next few years, humans will lack the imagination and skills needed to enable AI to reach its potential. Eventually, we will see a very different configuration of jobs than what we have today. But the process of getting there will not be nearly as fast as you might infer from watching the developments within the AI industry.
substacks referenced above: @
@
@
All the people I know who have low estimates for AI disruption follow the same pattern. When they wanted to test if the hype was real and put AI to the test, they figured that they would best be able to judge a case in which they themselves had the most domain expertise, usually at least the 999th millentile of the overall population. That doesn't mean they are super smart people, specialization in a highly diverse market means that specialists in any particular subject - even for ones where the cognitive threshold is low - are always a tiny minority.
Well, what they show me is that the AI can only operate at the 997th or 998th millentile, which from their perspective is not impressive at all. It's apparently very difficult for a 995th millentile person to explain to these experts, "Actually, wow, from my perspective, that's pretty darn impressive, and given that it's doing it for basically free compared to what it would cost for me to do it, and getting better fast, kind of scary" let alone how impressive and scary it would seem to an average or lower than average person.
Arnold, I've been involved with Agent GPT for a while. Agent LLM is really not meant for one-off tasks, in my experience and opinion. Better to give it something like 'scan recently released books weekly for things I might like, recommend one to me, provide key passages or reviews that gave you the sense I might or might not like it, ask my opinion of the materials, and again of the book if I choose to read it. Once weekly revisit the books from the past year to recommend another with a revised idea of my taste. Watch for my tastes to change somewhat over time. Consider how long it will take for me to read the book and possibly suggest conversation partners or correspondents who might also appreciate each book.'