I have tried to involve LLMs in two projects. In both cases, I gave up in frustration, for similar reasons. They put too much weight on their own background knowledge and too little weight on what I want them to do.
The first project was grading op-eds, according to my criteria. I wanted the LLM to give credit to an author who shows an understanding of the other side’s point of view by:
playing devil’s Advocate, asking questions that the other side might ask
thinking in Bets (speaking in probability rather than certainty)
spelling out Caveats, meaning potential problems with what the author was advocating
Debating the best points on the other side rather than poking at the other side’s weak points
showing an Open mind by saying what might lead the author to change his mind
being as skeptical of Research that supports the author’s point of view as he would be of research that undermines it
Steel-manning the opposing point of view
Using ChatGPT, I created a GPT that was supposed to do this. What ChatGPT kept wanting to do instead was grade essays on “balance,” meaning not being one-sided. That is not what I meant. You can have a strong point of view, as long as you are writing rigorously, arguing against the best case for the other side and avoiding using insults as a debating tactic. I kept trying to explain this to ChatGPT, and it kept doing things differently.
More recently, I tried to create a “clone” using LLMs. For example, Delphi let me give it dozens of my essays as background information. The test I would give for Delphi, or for generic chatbots like ChatGPT or Claude, was to ask it how Arnold Kling would explain the financial crisis of 2008. I was frustrated that what came back included only some things I would say, plus a lot of things that other economists or journalists have said, some of which I completely disagree with. Claude was the best of the ones I tried at sticking close to my interpretation of the crisis.
Think of the LLM as giving answers that are a weighted average of two sources: general background knowledge; and specific instructions and my own past writing that I was giving to the LLM. The weight that the LLM gives on its background knowledge is too high, and the weight that it gives on my own past writing and instructions is too low.
LLMs are not adapted to my management style, so I am frustrated with them as employees. I appreciate the fact that their background knowledge allows me to communicate with it in plain English. But I wish that the LLM could empty its vessel of domain-specific background knowledge, so that I could fill that vessel with what I want.
Back when I was a manager at Freddie Mac, I had good luck selecting “empty-vessel” employees. To work on financial models, I would pick someone with a computer programming background, not someone with an MBA.
I should note that I only managed a small team of people, never rising to the level of managing other managers, which is what you have to do to be a high-level manager.
“Arnold, you don’t know how to manage people. You just tell them what you want and then they go do it.” A very astute colleague, Mary Cadagin, told me this once, and it was true. I had a strong view of what I wanted, but I could never articulate exactly how to get there.
What I needed were employees who would hear me sketch out a vision, press me for clarification, and then deliver what I wanted. Coming in with zero knowledge of finance was fine. The key was being able to draw out my vision and to keep coming back to me with questions to clarify what I wanted.
An LLM is not the employee that I want. I don’t want any of your background knowledge in my domain. I want all of the domain knowledge to come from me. I want you to turn my knowledge into computer code.
I think of the power of an LLM as its ability to act as a natural-language computer interface. In order to do that, it has to read a lot of text to learn natural language. But then in order to execute my vision, I need it to forget the content of what it read about economics or social science or grading essays and just find out what I want. That may be a hard needle to thread. Especially because most people do value the domain knowledge of the LLM. They want it to be encyclopedic.
I would love an LLM tool that did what you describe. I don't want zero general domain knowledge (it would be good if it had a sense of the style of economics articles, for example, or general principles) - maybe what we really need is a tunable LLM where we can select the weight on our inputs vs its training, from 0 to 100.
In the meantime, ChatGPT is an awesome generator of R code, a terrific explaining of problems with programs, an endlessly patient answerer of questions, and a really good summarizer of long documents. Its writing has improved a lot and, if you invest some time in the prompt, it can produce pretty good first drafts of routine things.
There is a market for what you describe and I am confident we will get there eventually
(1) An interesting thought is that this phenomenon is a manifestation of the LLM version of the strong version of the Sapir-Whorf hypothesis of linguistic relativity. That is the LLMs don't learn language and 'content, and even 'cognition' as theoretically separable tools and skills, but that the very process of learning how to communicate in a language by being exposed to and discovering lots of statistical patterns in huge amounts of it inherently brings over a lot of the embedded dominant content and framework of ideas and "typical way of processing content and thinking in those ideas", such that without some very important leap in the way it works, it CAN'T really talk to you in content- neutral 'objective' language distinct from that content, as what we experience as it's form of communication is a language system -constructed on the basis- of that content. This applies to humans as well, and lots of effort goes into trying to transcend the problem to the extent possible, to avoid errors and distortions that are a consequence, but the LLMs arent doing that with regard to their own content-distorted use of language. Yet.
(2) Some of the image / video generators have a feature that allows one to upload lots of examples of a particular artistic "style" so that future images can be generated "in that custom style I had you learn." This seems to work better for graphics and also literary style, but not yet "-conceptual- style", that is, learn a particular person's worldview and reality-model and write explanations consistent with that set of ideas and style of thinking. My impression is that this will be solved soon.