LLM links
Ethan Mollick on LLM performance; Mollick on ChatGPT4-omni; The Zvi on same; Alberto Mingardi on limits of AI
there does seem to be some underlying ability of AI captured in many different measures, and when you combine those measures over time, you see a similar pattern - everything is moving up and to the right, approaching, often exceeding human level performance.
…Based on conversations with insiders in several AI labs, I suspect that we have some more years of rapid ability increases ahead of us, but we will learn more soon.
As Satchel Paige is known for saying, “Don’t look back. Somebody may be gaining on you.”
I used to play the board game Othello (or Reversi) in tournaments. In the late 1980s, I could beat the best of the computer programs. But within a couple years the very same program, running on a PC with more processing power, could defeat me easily.
What I took away from that experience is that once a computer gets close to matching human performance on some task, it will quickly surpass the human. Computers have Moore’s Law and other exponential scaling properties going for them.
With the release of ChatGPT4o, Mollick writes,
Many educational uses were held back because of equity of access issues - students often had trouble paying for GPT-4. With universal free access, the educational value of AI skyrockets (and that doesn’t count voice and vision, which I will discuss shortly). On the other hand, the Homework Apocalypse will reach its final stages. GPT-4 can do almost all the homework on Earth. And it writes much better than GPT-3.5, with a lot more style and a lot less noticeably “AI” tone. Cheating will become ubiquitous, as will universal high-end tutoring, creating an interesting time for education.
OpenAI made a significant business decision, to remove the paywall for ChatGPT4’s features. This will get more people to try those features, increasing the chances that applications will become popular.
The multimodal features are not yet available, but they will make the chatbot very powerful.
It can create 3D images, tell apart different speakers on transcripts, and actually write coherent words in specialized photos and fonts. Again, all I have to go by here are the OpenAI demos, so I will reserve judgement until I can play with the systems, but I suspect there will be lots of surprising use cases made available as a result of these new capabilities.
If you have used automated transcripts of podcasts, you know how annoying it is not to have the speakers differentiated. This in itself is a very useful skill for the LLM to have.
Zvi Mowshowitz has much more, including comments about Google’s latest announcement.
talking and having it talk back at natural speeds, and be able to do things that way? Yeah, kind of exciting, even for me, and I can see why a lot of other people will much prefer it across the board once it is good enough. This is clearly a giant leap forward there.
They also are fully integrating voice, images and video, so the model does not have to play telephone with itself, nor does it lose all the contextual information like tone of voice. That is damn exciting on a practical level.
As for Google,
its context window has been pushed to 2 million tokens in private preview. I note that I believe I hit the context limits on at least on the old version of NotebookLM, when I tried to load up as many of my posts as possible to dig through them, so yes there are reasons to go bigger.
That is indeed their intended use case, as they plan to offer context caching next month. Upload your files once, and have them available forever when useful.NotebookLM is now getting Gemini 1.5 Pro, along with a bunch of automatic options to generate things like quizzes or study guides or conversational AI-generated audio presentations you can interact with and steer. Hmm.
So I could feed it as much of my past content as it will take (and it will take a lot), and you can then ask “me” anything. At least if I understand correctly.
I know Zvi’s posts are long, but this one is worth wading through. But he does not mention that Google appeared to tease that they might be reviving Google Glasses, but with AI. Did he miss that, or did I hallucinate?
Alberto Mingardi reviews a book by Jobst Mangrebe and Barry Smith.
Indeed, “the overwhelming majority of systems in the universe, and even of systems that we encounter in our daily lives, are what we shall learn to identify as complex systems.”
For so-called “general AI” to exist, and then for computers to be able to emulate and go beyond the sort of intelligence humans show, we should be able to model “complex systems”—like, for example, the human brain.
For example,
The enthusiasts for self-driving cars neglect the difference between logic and complex systems highlighted by Smith and Landgrebe:
Consider … the case of models for self-driving cars. Algorithms used here are adequate where the software is able to model the sensory input deriving from traffic events through sensors (camera, radar, lidar, sonar) in such a way that it reacts to this input, given the destination, at least as well as (or, realistically, better than) the average human; otherwise self-driving cars will cause more accidents than cars driven by humans, and this will be deemed unacceptable.
Mingardi attended a talk by Smith.
the audience repeatedly tried to bring him to admit that if only we had more calculating capacity, or could feed AI with more or “better” knowledge, general AI could be achieved. Similarly, Mises would have been asked if the problem was not that planners simply lacked a computer powerful enough to connect all the bits of partial knowledge that could be collected through the economy.
Just as a central planner can never know how to run an economy, Smith argues that an AI can never achieve general intelligence. Of course, his arguments will not settle the issue.
substacks referenced above: @
@
I don’t understand the push for “general intelligence”. I don’t think we’ll get there because I don’t think it will turn out to be very useful. My prediction is that we will take the basic semantic structure embodied in general purpose LLMs and direct that towards highly specialized tasks. I have a bunch of electric motors in objects around my house – the coffee grinder, the blender, the toaster, the ones that make my side view mirrors tucked away – each specialized for a particular task. But I don’t have one “general motor” that can do everything. The nature of the tools we build is to specialize in a dimension that complements and extends human ability.
You didn't hallucinate that they are talking about the glasses, but they don't have anything concrete there yet, and given the history I decided to wait until we have something more concrete.
The AR/VR experiences are coming, but they're taking a while.