I wonder if a plateau or slowing of change could encourage applications / practical development. When things were progressing rapidly, there was probably a sense to some that any current application would be shortly left in the dust.
Hallucinations remain a problem, even in code writing. Most users need accurate facts as answers from an ai. Imagine a textbook with one false fact every two pages—but only experts know which one it is. (We might already have nearly that situation from books, unknowingly)
My own expected model is for a firm’s internal data, already in multiple databases, becomes more easily available thru telling the aiBot what you want and having that bot create requests to get the needed data from the relevant databases, and combine them.
Insofar as more highly paid women have jobs with digital output, like middle managers, it might be more negative for their employment, as well as mostly male coders.
Yes, it does look like scaling is hitting a plateau, as the dread Gary Marcus (among others) has been asserting. I had something like that in mind when I wrote my working paper, GPT-3: Waterloo or Rubicon? Here be Dragons, Version 4.1 (https://www.academia.edu/43787279/GPT_3_Waterloo_or_Rubicon_Here_be_Dragons_Version_4_1). I do believe, however, that the transformer architecture has earned a permanent place in the AI toolkit.
But there are other tools, some we know about, but most, likely, have yet to be invented. We're in a radically new world. The future is radically uncertain.
As a crude analogy, when Columbus started sailing West across the Atlantic in 1492 he thought he was headed to the Indies, that is, East Asia. He made three more voyages before his death in 1506. Yet, at that time, he still maintained that he'd found parts of Asia. Meanwhile Amerigo Vespucci's 1501-02 voyage to South America forced him to realized that a new world had been discovered, one that received his name.
It took those old Europeans awhile to figure out what was going on. So it is with us. I wouldn't be surprised if we had a generation or more worth of discoveries ahead of us.
My impression is that it isn't quite right to call it "scaling" unless the additional training data is comparable in quality and density of non-redundant information of the sort that can make positive contributions to improvements in pattern learning and reproduction. But they weren't stupid or random about what sources of info they sought to train on first, trying to get as much bang for the buck as possible.
At some point the 90% of dumb tweets and other garbage-level stuff left to add wasn't worth as much all put together as the 10% they started with except to learn better how to chat convincingly with people at that level. I think the theory behind expected gains of actual scaling still holds, but the context about the prospects scaling into much-lower-quality terrain proved far too optimistic. It's like after having exhausted a diamond mine, one just decides to walk the earth and hope to stumble upon some gems at around the same rate. If you end up in Crater of Diamonds State Park in Arkansas, you might just find one, but 99.9999% of your travels will yield nothing.
The current trend appears to be to use LLMs to generate training tokens for bigger LLMs. I hardly need to point out the many possible failure modes of such an approach. AI engineers have to learn to use the available high-quality data much more efficiently. It's certainly possible; to give an example I am familiar with, in Go large model components* of AlphaGo, AlphaGo Zero and AlphaZero learned via self-play, and it took them many millions of games to equal human professionals. No human professional plays or can ever play that much. Korean and Chinese professionals start studying at 4 years old and if they are promising make professional by 12-14, for 10 years = 3600 days of training. Even if they play or review a hundred serious games each day (unrealistic as it takes at least 10 minutes just to replay a game record on a board without thinking about the plays) it's still more than an order of magnitude below what Alphas used to learn, and realistically 2-3 orders of magnitude.
* unlike SOA LLMs such as Claude, GPT o1 and so on, Alphas are not pure large neural models. They include models as an element and harness them into variants of Monte Carlo tree search, a powerful _algorithm_ (i.e. a program that can be completely understood by humans, unlike the model part which is an opaque blob of numbers), at least during the learning phase. If memory serves, the models themselves - without the algorithmic tree search - are nowhere near human professional strength. There was a discussion about this at LessWrong, https://www.lesswrong.com/posts/HAMsX36kCbbeju6M7/is-alphazero-any-good-without-the-tree-search
Gwern's links page on AlphaGo includes recent papers on the surprising vulnerability of AGZ-like players to adversarial attacks involving certain whole-board patterns. This class of vulnerability was first published in 2022, and has so far proven impossible to eradicate by additional tuning (the model becomes resistant to variants of the exploit pattern used for fine-tuning, but this does not transfer to slightly different variants). The patterns are so simple and obvious that no mid-skill human amateur (say Go ELO 2000) would miss them, but AGZ falls to them fairly reliably even as its search depth is increased. I have a hard time squaring this fact in my mind with AGZ's superhuman performance in games with top human professionals.
Trump needs to teach a lesson the establishment will never forget - that persecuting your political opponents under the color of law will never be tolerated.
“Think of written works as the tip of the iceberg of human knowledge.” The other day I searched the word “bench” in my iPhone photos library. I was searching for written text, but the algorithm returned a bunch of photos with my kids sitting on benches. If it can do this now, will it be able to “watch” my videos and return search results of my kids “jump roping” and “playing tug-o-war.” Further, can it teach me what I’m doing wrong in my jump roping? Can it see where the rope is smacking the ground and inform me that I should use a longer or shorter jump rope? Or jump a little sooner or little later? Can it count the number of times, I’ve jumped? Can it plot histograms showing me how the frequency of my jumps changes over time? I’m pretty sure it’s just a matter of time. What more complicated tacit knowledge did you have in mind?
Yuval Noah Harari’s new one…Nexus…has a bit to say about it. Expanding on the idea, he discusses understanding information networks as a means for understanding AI. I’m about halfway through…so far, so mind blowing…
I wonder if a plateau or slowing of change could encourage applications / practical development. When things were progressing rapidly, there was probably a sense to some that any current application would be shortly left in the dust.
Too rapid change has been a big disincentive for me; working on becoming obsolescent.
Hallucinations remain a problem, even in code writing. Most users need accurate facts as answers from an ai. Imagine a textbook with one false fact every two pages—but only experts know which one it is. (We might already have nearly that situation from books, unknowingly)
My own expected model is for a firm’s internal data, already in multiple databases, becomes more easily available thru telling the aiBot what you want and having that bot create requests to get the needed data from the relevant databases, and combine them.
Insofar as more highly paid women have jobs with digital output, like middle managers, it might be more negative for their employment, as well as mostly male coders.
Yes, it does look like scaling is hitting a plateau, as the dread Gary Marcus (among others) has been asserting. I had something like that in mind when I wrote my working paper, GPT-3: Waterloo or Rubicon? Here be Dragons, Version 4.1 (https://www.academia.edu/43787279/GPT_3_Waterloo_or_Rubicon_Here_be_Dragons_Version_4_1). I do believe, however, that the transformer architecture has earned a permanent place in the AI toolkit.
But there are other tools, some we know about, but most, likely, have yet to be invented. We're in a radically new world. The future is radically uncertain.
As a crude analogy, when Columbus started sailing West across the Atlantic in 1492 he thought he was headed to the Indies, that is, East Asia. He made three more voyages before his death in 1506. Yet, at that time, he still maintained that he'd found parts of Asia. Meanwhile Amerigo Vespucci's 1501-02 voyage to South America forced him to realized that a new world had been discovered, one that received his name.
It took those old Europeans awhile to figure out what was going on. So it is with us. I wouldn't be surprised if we had a generation or more worth of discoveries ahead of us.
My impression is that it isn't quite right to call it "scaling" unless the additional training data is comparable in quality and density of non-redundant information of the sort that can make positive contributions to improvements in pattern learning and reproduction. But they weren't stupid or random about what sources of info they sought to train on first, trying to get as much bang for the buck as possible.
At some point the 90% of dumb tweets and other garbage-level stuff left to add wasn't worth as much all put together as the 10% they started with except to learn better how to chat convincingly with people at that level. I think the theory behind expected gains of actual scaling still holds, but the context about the prospects scaling into much-lower-quality terrain proved far too optimistic. It's like after having exhausted a diamond mine, one just decides to walk the earth and hope to stumble upon some gems at around the same rate. If you end up in Crater of Diamonds State Park in Arkansas, you might just find one, but 99.9999% of your travels will yield nothing.
The current trend appears to be to use LLMs to generate training tokens for bigger LLMs. I hardly need to point out the many possible failure modes of such an approach. AI engineers have to learn to use the available high-quality data much more efficiently. It's certainly possible; to give an example I am familiar with, in Go large model components* of AlphaGo, AlphaGo Zero and AlphaZero learned via self-play, and it took them many millions of games to equal human professionals. No human professional plays or can ever play that much. Korean and Chinese professionals start studying at 4 years old and if they are promising make professional by 12-14, for 10 years = 3600 days of training. Even if they play or review a hundred serious games each day (unrealistic as it takes at least 10 minutes just to replay a game record on a board without thinking about the plays) it's still more than an order of magnitude below what Alphas used to learn, and realistically 2-3 orders of magnitude.
* unlike SOA LLMs such as Claude, GPT o1 and so on, Alphas are not pure large neural models. They include models as an element and harness them into variants of Monte Carlo tree search, a powerful _algorithm_ (i.e. a program that can be completely understood by humans, unlike the model part which is an opaque blob of numbers), at least during the learning phase. If memory serves, the models themselves - without the algorithmic tree search - are nowhere near human professional strength. There was a discussion about this at LessWrong, https://www.lesswrong.com/posts/HAMsX36kCbbeju6M7/is-alphazero-any-good-without-the-tree-search
Gwern's links page on AlphaGo includes recent papers on the surprising vulnerability of AGZ-like players to adversarial attacks involving certain whole-board patterns. This class of vulnerability was first published in 2022, and has so far proven impossible to eradicate by additional tuning (the model becomes resistant to variants of the exploit pattern used for fine-tuning, but this does not transfer to slightly different variants). The patterns are so simple and obvious that no mid-skill human amateur (say Go ELO 2000) would miss them, but AGZ falls to them fairly reliably even as its search depth is increased. I have a hard time squaring this fact in my mind with AGZ's superhuman performance in games with top human professionals.
That's very informative, thanks!
Thanks for reminder that I need to read William Gibson.
Trump needs to teach a lesson the establishment will never forget - that persecuting your political opponents under the color of law will never be tolerated.
https://shorturl.at/7QUjA
“Think of written works as the tip of the iceberg of human knowledge.” The other day I searched the word “bench” in my iPhone photos library. I was searching for written text, but the algorithm returned a bunch of photos with my kids sitting on benches. If it can do this now, will it be able to “watch” my videos and return search results of my kids “jump roping” and “playing tug-o-war.” Further, can it teach me what I’m doing wrong in my jump roping? Can it see where the rope is smacking the ground and inform me that I should use a longer or shorter jump rope? Or jump a little sooner or little later? Can it count the number of times, I’ve jumped? Can it plot histograms showing me how the frequency of my jumps changes over time? I’m pretty sure it’s just a matter of time. What more complicated tacit knowledge did you have in mind?
Yuval Noah Harari’s new one…Nexus…has a bit to say about it. Expanding on the idea, he discusses understanding information networks as a means for understanding AI. I’m about halfway through…so far, so mind blowing…