LLM links, 11/17
Sully Omarr on AI agents; Zvi Mowshowitz on risks of character.ai; Noah Smith on a slowdown in LLM progress; Garrison Lovely on same
one-off agent tasks are some of the least imaginative use cases.
rather have an agents automating a 2 minute task 10,000 times over a 15-minute task once.
Strongly agree. Having an “agent” plan a vacation for me is a one-off, and not worth the time it would take me to learn how to get a good result from the agent. Until I can picture a routine task that I can train an AI agent to do, I’m not jumping on that bandwagon.
Pointer from The Zvi. Mowshowitz also has an interesting discussion of a case in which a 14-year-old became very involved with character.ai and committed suicide.
The ideal response, when a user talks of suicide to a chatbot, would presumably be for the bot to try to talk them out of it (which this one did try to do at least sometimes) and also get them to seek other help and provide resources, while not reporting him so the space is safe and ideally without overly breaking character.
while GPT-4 seemed to hallucinate less often than the original ChatGPT, later models like GPT-4o and o1 seemed to hallucinate more. It’s seeming more likely that hallucination is not a form of variance — a statistical error that can be eliminated with larger samples — but a form of asymptotic bias that will always plague LLMs no matter how big they get. LLMs seem unable to functionally distinguish truth from falsehood.
Noah links to Garrison Lovely, who writes,
I now think it’s increasingly likely that the first AI systems that can do advanced STEM R&D will cost more than paying for the equivalent work from human researchers. This could lead to a market similar to that of cultured meat, where it’s possible to make molecularly “real” meat, but it costs orders of magnitude more than the old approach, at least at first.
I would note that I have not found many items to put into these link posts recently. I still say that I am very optimistic about applications for robots and hopeful about applications for education. But it’s still early days.
And in a few days I have post going up that discusses “the wall.”
substacks referenced above: @
@
@
Which one is Google? Is it bargain basement? It's the only one I interact with, albeit involuntarily, but last night in my usual fashion of typing a question, this time a quick thumbnail for determining the yardage needed if you are trying to switch to a different yarn class for a knitting pattern, depending how I phrased the question it included in its reasoning that worsted was thicker than dk (yes) and then in quick succession, that dk was thicker than worsted (no).
I know the work on these things is all very complicated, but I'm going to go out on a limb and say that knitting is a pretty settled science, which ought to be elementary for the AI.
I am reminded of - about 25 years ago? I'm a little fuzzy - shepherding some early grade school children through an assignment they'd been given - to do a project on a favorite animal, starting with "research" in the school library.
This was around when schools were easy targets, pretty early in their bumpy, largely naive, and of course utterly wasteful and ultimately totally deleterious journey purchasing "technology". All those hundreds of CD-ROMs, destined for the trash ...
The children (maybe 2nd grade?) were struggling to use the Britannica CDs, or to find a website on the internet about otters; and also with writing their reports on the computers. The things were so trouble-prone then - imagine being a little kid and thinking as they are supposed to that this project was a *big deal* and yet being unable to do anything. The librarian* in charge of the computer room and library wasn't so good as a troubleshooter; who would be when you've got two dozen kids? (The next year the district would supply a second old lady just for the computer room, and what a turf fight erupted then in that quiet precinct; it ended in the noble sacrifice of the school's eel, which was beloved of old lady #1 and the children, and turned out to be only rented with moms' club funds, in order to secure yet more CD-ROMs.)
I was only a sub and wouldn't have cared one way or another but the kids were like a bunch of cranky frustrated people at the DMV at that point. I was no help, having scarcely ever used a computer.
I suggested to the librarian that they might use the actual, physical Britannicas - something of a skill in itself, and one they had of course skipped in their short school careers - as the library was possessed of several sets across one wall, the proud purchase of another era though in fact they had a quite recent edition.
Oh no, they must use the computer, she said. Because they must have the most up-to-date information. They must benefit from the latest science on the internet.
I mildly suggested that I didn't think our understanding of raccoons had changed that much in the last couple years.
I then steered them to wikipedia, not realizing that our fearless technology-enforcer forbade the use of wikipedia in her library. She had thoughts about it; she knew (had heard somewhere) that it was "unsourced" and illegitimate.
I'm pretty sure it's completely cribbed from the Britannica, I said.
But school ladies, like AIs, always know best.
*She was crackerjack at maintaining an orderly, attractive library; at aquarium-keeping; at remembering young children's names and taking an interest in their selections; at reading aloud to the younger children; and she was good at and very much enjoyed writing celebratory poems for retiring school or district personnel.
Which interestingly is what AI is really crackerjack at as well - writing "occasional" poems, pastiches, sonnets about the ATP cycle and such.
The frequency of false facts confidently spit out as true will continue to babble ai—and continually remind us that they are not “thinking” in the ways humans think, nor lying in the ways nor for the reasons that humans lie.
Because they don’t know they’re lying when they say false stuff, they also don’t know when they’re not lying. They literally do not “know” what they’re talking about, they are trying to probabilistically simulate knowing.
This confirms my bias, strengthens my prior that LLMs will provide an ai front end UI that accepts questions, but then breaks down the task so as to give to, limited expert systems to come up with accurate answers, or partial answers including what is not known.
Not knowing, what is known vs what is not known, remains a hard limit for LLMs.
In the meantime, ai answers to questions that have known correct answers, like code creating, those ai tools will get better and enhance productivity. A good tweet said something like:
In a job interview for coding, they need to allow ai co-pilot support in order to understand how effective that coder can crank out code.
Ai access to all the public data & databases in the country should be improving the availability of obtaining data that is relevant for making decisions—but I haven’t seen this yet.