Does it make any difference? Lots of people notice the LLMs will respond with different outputs at different times to identical prompts. What is the significance in effect of conception of LLM's as "Model = Frozen" if in that conception isn't responsible for the things we care about - the outputs, and "Model + Refinement = Output = Not Frozen"?
That’s due to a random seeding which I think is called temperature. Frozen weights mean it’s doesn’t learn from experience, hence the emphasis on longer context windows
In general I prefer for fields to make up completely new words or strings of words for new technical ideas instead of adding "term of art" meanings or modifiers of existing terms.
Peter Norvig wrote on this topic many years ago (http://norvig.com/chomsky.html) in an essay that aged well, I think. Chomsky takes a rules based approach, where llms use statistics. I feel like a fusion of the ideas could improve performance in many areas.
Re: "What three changes to our election system would make it work better?"
The chatbot's answers -- RCV, campaign finance reform, and increase voter turnout -- are weak, vanilla, and progressive (biased).
A credible alternative perspective:
In large elections, voters have little incentive to become well-informed or to vote wisely. See Bryan Caplan's book, Voters as Mad Scientists: Essays on Political Irrationality:
I think Arnold's essay grader should give at best a C- to the answer ChatGPT gave to the question.
Complex voting systems increase the benefits to 'strategic voting' (i.e. voting for a candidate you don't like in order penalize a rival). The opacity of such systems decreases trust in electoral outcomes.
Increasing the number of voters with little interest or involvement in the political system doesn't make the outcome better, and most methods of increasing participation (automatic registration, mail-in voting, ballot harvesting) increase opportunities for fraud.
Most versions of campaign finance reform involve restrictions that tend to inhibit the ability of challengers to overcome incumbent's advantages in exposure and fund raising.
"I would speculate that for humans, pattern-matching is related to what Daniel Kahneman calls “system one,” the rapid intuitive response. Rule-based systems are more analogous to “system two,” where we reflect and process inputs more carefully. Something ahead of me seems to match the pattern of a snake, so system one causes me to flinch. But as I take a couple of steps closer, system two kicks in and says, “If it were a snake, it would move. I’ve never seen a snake on this path, but I’ve seen lots of sticks. This thing is brown, and it’s not moving. I bet it’s a stick.”"
A slightly different way of looking at this is "predictive processing". From Scott Alexander's wonderful summary: "[Predictive processing says] the brain is a multi-layer prediction machine. All neural processing consists of two streams: a bottom-up stream of sense data, and a top-down stream of predictions. These streams interface at each level of processing, comparing themselves to each other and adjusting themselves as necessary."
Right. Refining predictive processing routines seems to be what neurons do at any scale when learning from experience and the environment.
It seems to me that humans and LLMs are indeed using the same "One Neat Trick" to learn and generate, it's just that humans are able to do so much more efficiently for a variety of reasons, for example, with much less training data. As such, the LLMs currently seem to be something like what chess experts wrote about their impressions of the early chess programs that 'felt' more "brute-force-based" in their gameplay. As machine-learning techniques were applied to games like chess and go, some experts said it seemed to them that the play started to feel more human, then suddenly superhuman, but not in a "much more brute-force" way, but a "what humans try to do, but more of that than even the best human could possibly do."
My impression is that we just recently got to the point where it was even possible to dedicate so many resources to something like an LLM to even give crude brute-forcing a shot, and it worked surprisingly well. The next steps will be to make LLMs less brute and more efficient learners and generators, like the progressively less brutal chess programs got.
Here is my wild guess for how this will happen (if indeed it has not already happened):
There is going to emerge a new generation of first-meta-level "generative transformer learning model" efficiency-improving models. They are going to try to identify and cut the fat from the current LLMs. Perhaps a metaphor could be the "identification of redundancies and ruthless cost-cutting" phase of a corporate bankruptcy reorganization.
These are going to try to do something like a metaphorical "factor-analysis" on what parts of LLM's constitute the most valuable contributions to consistent production of the desired kinds of output. If you can identify which bits of training data and which tokens are the most essential, and which bits have the lowest improvement-producing margins or least bang for the buck, then you can make progressively "skinnier" or "slim" versions of the generative models. There are some scaled-down versions out there, though I don't know if they were produced in this "intelligent trimming" manner.
And then at the next-meta level I think you could see the emergence of "brain makers", in models doing an analysis on how, why, when, and how well the trimming down works in different circumstances and scenarios, and to therefore create a model of how really efficient natural predictive processing models (i.e., the human brain) go about learning so well and efficiently from smaller training sets.
If my wild hunch is right that *those* models can be developed and will work, then the moment that happens, all bets are off.
Fine explanation of pattern matching and hallucinations. There is also the RL/HF woke problem about Truth. Do 18 yo Blacks have , on avg, lower IQs? The woke aiBots try to avoid answering that.
Steve Sailor notes that Google’s Bard wouldn’t do a report on Razib Khan, nor himself, but no problem for Kendi.
For education, Teaching English is the place to watch. More paying customers all over the world want easier, faster English learning. A successful ai bot to tutor English (second language) will likely be the first good, widely used aiBot. Until then, I assume it’s not too close.
There’s some good chance the Open Source movement in India, for most software, might be among the leaders in practice.
"Hallucination" seems to me to be the wrong word for what is going on. It's more like a "Bad Guess for the Answer to a Test Question." Maybe "flub" is better shorthand.
Lets say you ask a student to answer a question with an essay, and the student hasn't studied enough, and what's more, he doesn't *know* he hasn't studied enough to get the right answer.
Not only does he imagine that he should be able to get the right answer from what he's already studied, but the rules of the constructed situation of the testing scenario requires him to give the best answer he can in the style of being confident in the correctness of his answer, even if in truth he knows he is not or should not be so confident.
So, based on the bad assumption that a test-taker *should* be able to use and extrapolate from the rules and patterns from his incomplete knowledge base to get the right answer, he makes his best guesses in an attempt to get that answer, and naturally, he flubs it.
Now what he *didn't* do is "hallucinate". We know hallucinating is like, it's not like what has happened in the case of this student. The student isn't hallucinating, he's not having another part of his brain creating new and nonsensical data which are fake experiences, mistaking them for real experiences, and then describing them as if true. Instead, he's just being required to make his best guess, and he's guessing wrong. He's flubbing the answer.
While I like flub, for mistakes, there would still be imaginative/ hallucinatory answers that are not wrong, but do “not correspond to reality”. Which is part of the AI gullibility problem, as noted by Simon, in a great review of ai for 2023.
When writing code, any flubs are usually quickly found, so it can quickly be rewritten and tested until the code correctly passes all tests.
Hallucinations, flubs, about a particular individual are facts that must be checked somehow, and the prompter often doesn’t know the facts, so doesn’t see the flub. Calling false facts “flubs” is probably better, certainly shorter, but possibly too late for ai dedicated folk to change.
Using the student exam metaphor again, seems to me like the difference between open book (or "open internet", "open everything") and closed book exams. Whether one calls them hallucinations or flubs, these errors are a consequence of LLMs being artificially constrained into both (1) closed book exam mode, and (2) the requirement to give confident- sounding answers. Just like the exam student isn't supposed to give the accurate and normal human answer of, "I could either look them up or make a good guess about them, but I don't really know the particular facts and details of the answer to this question," the LLMs weren't designed to allow for that possibility either, though future ones probably will be. The AIs aren't acting like people because students taking closed- book exams are also not acting like people, they are not so much confusing fantasy for reality but obeying the rules of a very artificial and unrealistic situation constructed by the test writers and graders
Let's suppose that LLMs are indeed 'just pattern matching'. You conclude from this that there are some things LLMs can't do, for example 'reflecting and rethinking'. This assumes that a model which 'just pattern matches' can't do those things, which there is of course no way to know. In fact, this is the actually interesting open question.
This does make me wonder how much I can improve my writing simply by feeding myself a bunch of books. If I closely read most of Orwell’s writing, could I write like him? I could then add on top of this reading a rule based framework that describes explicitly how Orwell writes - his grammar habits so to speak.
I know that if I listen to a bunch of Econtalks, I tend to talk (and think) more like Russ Roberts.
I notice that after four years of Trump, I’ve picked up some bad communication habits.
(Note to self: do reverse pattern matching on Trump, so I don’t pick up any of his language habits).
To what extent can we change ourselves by learning how LLMs work, and then either mimicking their learning style or doing the opposite - reverse pattern matching?
Here is what I think about him: he is a strange character, one of the strangest men any of us have ever encountered. And for some reason his bizarro public personality impels him to say stupid and offensive things, and to joke about everything. (Perhaps in private he is a little more frightening, less of a clown? I don't know anything about him, haven't read about him or anything.) So amid the blather, this compulsive clown manner results in his saying some previously-unsayable things, which turned out to be exactly what many Americans wanted to hear, but never expected to. Things he probably personally cared about no more than their opposites - it is impossible to believe he's a principled man. But as he enjoyed the effect he's created - he doubled down on it. So the jocose, rich-rube (act?) *is* the appeal, but not because we craved entertainment from a politician. To have a thoughtful and capable individual in his place would have been great - but the conditions and incentives didn't create such a person. So Trump it is.
Well said. Yes. This brings clarity to the question of “Why Trump?” His emergence post PC nineties, post Bush II, post Obama, makes sense. But is he an antidote to DEI? Debatable. He’s at least as much of an anesthetic as he is an effective path forward.
They’ve added RAG (retrieval augmented generation), so the LLM you talk to has access to your company’s internal info—they claim. I’m sure this is very attractive to companies with existing databases, if it works well.
> When I see LLMs as using pattern matching, I do not see them as having human intelligence. When humans match patterns, we build models to explain the patterns. As economist Edward Leamer says, we are pattern-seeking and storytelling animals. Of course, an LLM could be prompted to emit a story to describe a pattern. But it is not sitting there automatically thinking up stories the way that you or I would. I think therein lies an important difference between LLMs and humans.
> As a human, I am aware of my thought process. I do not need the carrot and stick of external RLHF to change my thought process. I can instead reflect and re-think. I can give a rule-based, system 2 narrative for much of my behavior. I can explain to you how I wrote this essay. (Of course, psychology researchers tell us that when we describe our thought process we might be self-deceiving or merely rationalizing.)
There are *numerous* techniques/methodologies in philosophy that could easily expose this as not very dissimilar to what an LLM is doing.
Here is just one simple problem::
> As a human, I am aware of my thought process.
What is the *precise* meaning of "am aware of" and "thought process" (as opposed to the colloquial meaning normal to our culture, that shall not be questioned, because that would be "pedantic")? Are these boolean matters? Are there any ontological (set theory) issues in play here? Is there anything else important that has been overlooked?
Does the existence of 'chain-of-thought' prompting challenge Arnold's argument? If there is only pattern matching can this explain some of the examples here, or does it just show that the pattern-matching works at a deeper-level?
I don't think there is such a clean "cutting nature at the joints" difference between "pattern matching" on the one hand and "rules" on the other.
If I'm right, then my suspicion that there really is just "One Neat Trick" of general purpose usefulness that can be applied to any domain of human-like intelligent activity if one has sufficient resources to learn and compute and one can discover the method by which to correctly apply The Trick to that domain of human cognition.
Every rule can be conceptualized as the equivalent of a very strong / very likely to be followed pattern.
More than that, if a set of rules can be sufficiently reconciled such that a system produces consistent output - instead of halting with an error or outputting random nonsense - the the sum of all rules together is not distinguishable in principle from either one big complicated rule or one big complex pattern. A huge program following classical or traditional deterministic algorithms is not different from "one giant complicated rule", and only it's strong determinism makes it an edge-case for extreme strength of a pattern if one drops the constraint of 100% determinism and replaces with more general statistical operation.
Indeed, the very fact of this equivalence was demonstrated in that paper about "task contamination", in which the very essence of the problem was that it was not possible to distinguish outputs derived from following updated rules using the same pattern-matching, from the same rules using updated pattern-matching, and the researchers therefore had to cleverly generate data to permit inferences about the systems underlying operations by probing it by means of a process of interrogation via precisely engineered prompting.
Even to the extent it can be said or arranged to "dominate" other patterns - such as constitutional rules dominating legislative rules, or a global rule against using outputs matching subordinate patterns containing certain rude terms - it makes no difference whether one conceives this as the working of a hierarchy of rules or in the manner of a combination of rules, on the one hand, or as "one big complicated rule" on the other.
This is especially the case when the rule has exceptions in special cases, then exceptions to the exceptions in extra special cases. Indeed, this is precisely how legal scholars attempts to draw out the categorization outline and flow chart decision tree for a restatement of the law in some subject, and students are asked to apply these "rules" to things termed "fact patterns".
But such "rules" are not separable from those patterns and contain the basic pattern within them. The judge or law student is not matching a new set of facts to a rule-free pattern, but to the *pattern contained within the rule itself*. In that way, law is a set of "pattern-typed rules", that is, rules which *first* describe a large set of archetypal patterns to which an adjudicator is supposed to discern the one closest to the set of facts in the particular case, and *second* to apply a particular decision to resolve the case. I suspect that rules and patterns are all necessarily mixed up in this way in the operation of the LLMs, and indeed, in the operation of the human mind. The simple and artificial games humans have devised are, as I said, extreme edge-cases that only make it seem like such concepts can be so easily separated.
I have not read much about it, but I am not aware of anyone noticing that the LLMs are noticeably weaker on the more rule-seeming or pattern-matching-seeming parts of that process of legal analysis. It would be both fascinating - and I suspect amusing - to see in an LLM could review the entire corpus of opinions on some area of law and craft a "Restatement" and to see how well or poorly it compares to the latest human attempt. Ha, "AI vs ALI"! It would be both hilarious and depressing to then try to have it make sense of Constitutional Law and to reverse engineer a plain-English explanation of what it has in practice come to mean.
Loved the last paragraph. For those who didn't go to law school, ALI is the American Law Institute, which among other things, gets various legal scholars together and tries to summarize a field of law, which is then published as "Restatement of Contracts" or some such.
Another fact about LLMs is that they learn once and then the model weights are frozen. Not really a human analogue
Does it make any difference? Lots of people notice the LLMs will respond with different outputs at different times to identical prompts. What is the significance in effect of conception of LLM's as "Model = Frozen" if in that conception isn't responsible for the things we care about - the outputs, and "Model + Refinement = Output = Not Frozen"?
That’s due to a random seeding which I think is called temperature. Frozen weights mean it’s doesn’t learn from experience, hence the emphasis on longer context windows
That's very helpful, thanks!
In general I prefer for fields to make up completely new words or strings of words for new technical ideas instead of adding "term of art" meanings or modifiers of existing terms.
Peter Norvig wrote on this topic many years ago (http://norvig.com/chomsky.html) in an essay that aged well, I think. Chomsky takes a rules based approach, where llms use statistics. I feel like a fusion of the ideas could improve performance in many areas.
Re: "What three changes to our election system would make it work better?"
The chatbot's answers -- RCV, campaign finance reform, and increase voter turnout -- are weak, vanilla, and progressive (biased).
A credible alternative perspective:
In large elections, voters have little incentive to become well-informed or to vote wisely. See Bryan Caplan's book, Voters as Mad Scientists: Essays on Political Irrationality:
https://www.amazon.com/Voters-Mad-Scientists-Political-Irrationality/dp/B0C2SD1K8B
Technocratic solutions, such as RCV, are beside the point. Increase in voter turnout exacerbates the problem (negative selection effect).
My intuition is that campaign finance reform is oversold. Influence is like water -- It finds its own level. Maybe I'm wrong about that.
I think Arnold's essay grader should give at best a C- to the answer ChatGPT gave to the question.
Complex voting systems increase the benefits to 'strategic voting' (i.e. voting for a candidate you don't like in order penalize a rival). The opacity of such systems decreases trust in electoral outcomes.
Increasing the number of voters with little interest or involvement in the political system doesn't make the outcome better, and most methods of increasing participation (automatic registration, mail-in voting, ballot harvesting) increase opportunities for fraud.
Most versions of campaign finance reform involve restrictions that tend to inhibit the ability of challengers to overcome incumbent's advantages in exposure and fund raising.
No, I don't think you are.
"I would speculate that for humans, pattern-matching is related to what Daniel Kahneman calls “system one,” the rapid intuitive response. Rule-based systems are more analogous to “system two,” where we reflect and process inputs more carefully. Something ahead of me seems to match the pattern of a snake, so system one causes me to flinch. But as I take a couple of steps closer, system two kicks in and says, “If it were a snake, it would move. I’ve never seen a snake on this path, but I’ve seen lots of sticks. This thing is brown, and it’s not moving. I bet it’s a stick.”"
A slightly different way of looking at this is "predictive processing". From Scott Alexander's wonderful summary: "[Predictive processing says] the brain is a multi-layer prediction machine. All neural processing consists of two streams: a bottom-up stream of sense data, and a top-down stream of predictions. These streams interface at each level of processing, comparing themselves to each other and adjusting themselves as necessary."
This is happening all the time and very rarely rises to the level of consciousness. The piece is quite interesting. https://slatestarcodex.com/2017/09/05/book-review-surfing-uncertainty/
Right. Refining predictive processing routines seems to be what neurons do at any scale when learning from experience and the environment.
It seems to me that humans and LLMs are indeed using the same "One Neat Trick" to learn and generate, it's just that humans are able to do so much more efficiently for a variety of reasons, for example, with much less training data. As such, the LLMs currently seem to be something like what chess experts wrote about their impressions of the early chess programs that 'felt' more "brute-force-based" in their gameplay. As machine-learning techniques were applied to games like chess and go, some experts said it seemed to them that the play started to feel more human, then suddenly superhuman, but not in a "much more brute-force" way, but a "what humans try to do, but more of that than even the best human could possibly do."
My impression is that we just recently got to the point where it was even possible to dedicate so many resources to something like an LLM to even give crude brute-forcing a shot, and it worked surprisingly well. The next steps will be to make LLMs less brute and more efficient learners and generators, like the progressively less brutal chess programs got.
Here is my wild guess for how this will happen (if indeed it has not already happened):
There is going to emerge a new generation of first-meta-level "generative transformer learning model" efficiency-improving models. They are going to try to identify and cut the fat from the current LLMs. Perhaps a metaphor could be the "identification of redundancies and ruthless cost-cutting" phase of a corporate bankruptcy reorganization.
These are going to try to do something like a metaphorical "factor-analysis" on what parts of LLM's constitute the most valuable contributions to consistent production of the desired kinds of output. If you can identify which bits of training data and which tokens are the most essential, and which bits have the lowest improvement-producing margins or least bang for the buck, then you can make progressively "skinnier" or "slim" versions of the generative models. There are some scaled-down versions out there, though I don't know if they were produced in this "intelligent trimming" manner.
And then at the next-meta level I think you could see the emergence of "brain makers", in models doing an analysis on how, why, when, and how well the trimming down works in different circumstances and scenarios, and to therefore create a model of how really efficient natural predictive processing models (i.e., the human brain) go about learning so well and efficiently from smaller training sets.
If my wild hunch is right that *those* models can be developed and will work, then the moment that happens, all bets are off.
LLMs are not sentient. They are also not human, but that is quite obvious, is it not?
The LLMs will be chocked full of hidden rules.
Fine explanation of pattern matching and hallucinations. There is also the RL/HF woke problem about Truth. Do 18 yo Blacks have , on avg, lower IQs? The woke aiBots try to avoid answering that.
Steve Sailor notes that Google’s Bard wouldn’t do a report on Razib Khan, nor himself, but no problem for Kendi.
https://www.unz.com/isteve/google-bard-sailer-vs-kendi/
For education, Teaching English is the place to watch. More paying customers all over the world want easier, faster English learning. A successful ai bot to tutor English (second language) will likely be the first good, widely used aiBot. Until then, I assume it’s not too close.
There’s some good chance the Open Source movement in India, for most software, might be among the leaders in practice.
"Hallucination" seems to me to be the wrong word for what is going on. It's more like a "Bad Guess for the Answer to a Test Question." Maybe "flub" is better shorthand.
Lets say you ask a student to answer a question with an essay, and the student hasn't studied enough, and what's more, he doesn't *know* he hasn't studied enough to get the right answer.
Not only does he imagine that he should be able to get the right answer from what he's already studied, but the rules of the constructed situation of the testing scenario requires him to give the best answer he can in the style of being confident in the correctness of his answer, even if in truth he knows he is not or should not be so confident.
So, based on the bad assumption that a test-taker *should* be able to use and extrapolate from the rules and patterns from his incomplete knowledge base to get the right answer, he makes his best guesses in an attempt to get that answer, and naturally, he flubs it.
Now what he *didn't* do is "hallucinate". We know hallucinating is like, it's not like what has happened in the case of this student. The student isn't hallucinating, he's not having another part of his brain creating new and nonsensical data which are fake experiences, mistaking them for real experiences, and then describing them as if true. Instead, he's just being required to make his best guess, and he's guessing wrong. He's flubbing the answer.
While I like flub, for mistakes, there would still be imaginative/ hallucinatory answers that are not wrong, but do “not correspond to reality”. Which is part of the AI gullibility problem, as noted by Simon, in a great review of ai for 2023.
https://simonwillison.net/2023/Dec/31/ai-in-2023/
When writing code, any flubs are usually quickly found, so it can quickly be rewritten and tested until the code correctly passes all tests.
Hallucinations, flubs, about a particular individual are facts that must be checked somehow, and the prompter often doesn’t know the facts, so doesn’t see the flub. Calling false facts “flubs” is probably better, certainly shorter, but possibly too late for ai dedicated folk to change.
Using the student exam metaphor again, seems to me like the difference between open book (or "open internet", "open everything") and closed book exams. Whether one calls them hallucinations or flubs, these errors are a consequence of LLMs being artificially constrained into both (1) closed book exam mode, and (2) the requirement to give confident- sounding answers. Just like the exam student isn't supposed to give the accurate and normal human answer of, "I could either look them up or make a good guess about them, but I don't really know the particular facts and details of the answer to this question," the LLMs weren't designed to allow for that possibility either, though future ones probably will be. The AIs aren't acting like people because students taking closed- book exams are also not acting like people, they are not so much confusing fantasy for reality but obeying the rules of a very artificial and unrealistic situation constructed by the test writers and graders
Let's suppose that LLMs are indeed 'just pattern matching'. You conclude from this that there are some things LLMs can't do, for example 'reflecting and rethinking'. This assumes that a model which 'just pattern matches' can't do those things, which there is of course no way to know. In fact, this is the actually interesting open question.
"Of course I'm human! Ask me anything! Ask me to calculate anything!"
This does make me wonder how much I can improve my writing simply by feeding myself a bunch of books. If I closely read most of Orwell’s writing, could I write like him? I could then add on top of this reading a rule based framework that describes explicitly how Orwell writes - his grammar habits so to speak.
I know that if I listen to a bunch of Econtalks, I tend to talk (and think) more like Russ Roberts.
I notice that after four years of Trump, I’ve picked up some bad communication habits.
(Note to self: do reverse pattern matching on Trump, so I don’t pick up any of his language habits).
To what extent can we change ourselves by learning how LLMs work, and then either mimicking their learning style or doing the opposite - reverse pattern matching?
Trump on the polar vortex: "Even if you vote and then pass away, it's worth it."
I mean, the guy is pretty funny. The "pass away" clinches it somehow.
Trump as entertainment value? Yeah, I think I’ve made that point to my own family before.
Here is what I think about him: he is a strange character, one of the strangest men any of us have ever encountered. And for some reason his bizarro public personality impels him to say stupid and offensive things, and to joke about everything. (Perhaps in private he is a little more frightening, less of a clown? I don't know anything about him, haven't read about him or anything.) So amid the blather, this compulsive clown manner results in his saying some previously-unsayable things, which turned out to be exactly what many Americans wanted to hear, but never expected to. Things he probably personally cared about no more than their opposites - it is impossible to believe he's a principled man. But as he enjoyed the effect he's created - he doubled down on it. So the jocose, rich-rube (act?) *is* the appeal, but not because we craved entertainment from a politician. To have a thoughtful and capable individual in his place would have been great - but the conditions and incentives didn't create such a person. So Trump it is.
Well said. Yes. This brings clarity to the question of “Why Trump?” His emergence post PC nineties, post Bush II, post Obama, makes sense. But is he an antidote to DEI? Debatable. He’s at least as much of an anesthetic as he is an effective path forward.
I largely agree with your post, but is replacing Google the right choice for illustrating rule based thinking?
> If your goal is to use an LLM to replace Google, prepare to be disappointed.
LLMs have replaced google for me for entire classes of queries.
Recommendations for books on how LLMs work and how to create one? Both at the layman’s level and engineering level.
I liked the first part of cohere university — docs about general LLM ai, and their own model.
https://docs.cohere.com/docs/the-cohere-platform
They’ve added RAG (retrieval augmented generation), so the LLM you talk to has access to your company’s internal info—they claim. I’m sure this is very attractive to companies with existing databases, if it works well.
> When I see LLMs as using pattern matching, I do not see them as having human intelligence. When humans match patterns, we build models to explain the patterns. As economist Edward Leamer says, we are pattern-seeking and storytelling animals. Of course, an LLM could be prompted to emit a story to describe a pattern. But it is not sitting there automatically thinking up stories the way that you or I would. I think therein lies an important difference between LLMs and humans.
> As a human, I am aware of my thought process. I do not need the carrot and stick of external RLHF to change my thought process. I can instead reflect and re-think. I can give a rule-based, system 2 narrative for much of my behavior. I can explain to you how I wrote this essay. (Of course, psychology researchers tell us that when we describe our thought process we might be self-deceiving or merely rationalizing.)
There are *numerous* techniques/methodologies in philosophy that could easily expose this as not very dissimilar to what an LLM is doing.
Here is just one simple problem::
> As a human, I am aware of my thought process.
What is the *precise* meaning of "am aware of" and "thought process" (as opposed to the colloquial meaning normal to our culture, that shall not be questioned, because that would be "pedantic")? Are these boolean matters? Are there any ontological (set theory) issues in play here? Is there anything else important that has been overlooked?
Oh humans, when will you wake up?
Does the existence of 'chain-of-thought' prompting challenge Arnold's argument? If there is only pattern matching can this explain some of the examples here, or does it just show that the pattern-matching works at a deeper-level?
https://www.promptingguide.ai/techniques/cot
I don't think there is such a clean "cutting nature at the joints" difference between "pattern matching" on the one hand and "rules" on the other.
If I'm right, then my suspicion that there really is just "One Neat Trick" of general purpose usefulness that can be applied to any domain of human-like intelligent activity if one has sufficient resources to learn and compute and one can discover the method by which to correctly apply The Trick to that domain of human cognition.
Every rule can be conceptualized as the equivalent of a very strong / very likely to be followed pattern.
More than that, if a set of rules can be sufficiently reconciled such that a system produces consistent output - instead of halting with an error or outputting random nonsense - the the sum of all rules together is not distinguishable in principle from either one big complicated rule or one big complex pattern. A huge program following classical or traditional deterministic algorithms is not different from "one giant complicated rule", and only it's strong determinism makes it an edge-case for extreme strength of a pattern if one drops the constraint of 100% determinism and replaces with more general statistical operation.
Indeed, the very fact of this equivalence was demonstrated in that paper about "task contamination", in which the very essence of the problem was that it was not possible to distinguish outputs derived from following updated rules using the same pattern-matching, from the same rules using updated pattern-matching, and the researchers therefore had to cleverly generate data to permit inferences about the systems underlying operations by probing it by means of a process of interrogation via precisely engineered prompting.
Even to the extent it can be said or arranged to "dominate" other patterns - such as constitutional rules dominating legislative rules, or a global rule against using outputs matching subordinate patterns containing certain rude terms - it makes no difference whether one conceives this as the working of a hierarchy of rules or in the manner of a combination of rules, on the one hand, or as "one big complicated rule" on the other.
This is especially the case when the rule has exceptions in special cases, then exceptions to the exceptions in extra special cases. Indeed, this is precisely how legal scholars attempts to draw out the categorization outline and flow chart decision tree for a restatement of the law in some subject, and students are asked to apply these "rules" to things termed "fact patterns".
But such "rules" are not separable from those patterns and contain the basic pattern within them. The judge or law student is not matching a new set of facts to a rule-free pattern, but to the *pattern contained within the rule itself*. In that way, law is a set of "pattern-typed rules", that is, rules which *first* describe a large set of archetypal patterns to which an adjudicator is supposed to discern the one closest to the set of facts in the particular case, and *second* to apply a particular decision to resolve the case. I suspect that rules and patterns are all necessarily mixed up in this way in the operation of the LLMs, and indeed, in the operation of the human mind. The simple and artificial games humans have devised are, as I said, extreme edge-cases that only make it seem like such concepts can be so easily separated.
I have not read much about it, but I am not aware of anyone noticing that the LLMs are noticeably weaker on the more rule-seeming or pattern-matching-seeming parts of that process of legal analysis. It would be both fascinating - and I suspect amusing - to see in an LLM could review the entire corpus of opinions on some area of law and craft a "Restatement" and to see how well or poorly it compares to the latest human attempt. Ha, "AI vs ALI"! It would be both hilarious and depressing to then try to have it make sense of Constitutional Law and to reverse engineer a plain-English explanation of what it has in practice come to mean.
Loved the last paragraph. For those who didn't go to law school, ALI is the American Law Institute, which among other things, gets various legal scholars together and tries to summarize a field of law, which is then published as "Restatement of Contracts" or some such.