LLM Links, 3/11
Ethan Mollick's tips for prompting; Ben Thompson's tips for Google; A spooky response from Claude; The Zvi on Claude and on LLM utility;
The three most successful approaches to prompting are both useful and pretty easy to do. The first is simply adding context to a prompt. There are many ways to do that: give the AI a persona (you are a marketer), an audience (you are writing for high school students), an output format (give me a table in a word document), and more. The second approach is few shot, giving the AI a few examples to work from. LLMs work well when given samples of what you want, whether that is an example of good output or a grading rubric. The final tip is to use Chain of Thought, which seems to improve most LLM outputs. While the original meaning of the term is a bit more technical, a simplified version just asks the AI to go step-by-step through instructions: First, outline the results; then produce a draft; then revise the draft; finally, produced a polished output.
He concludes,
I find that most people only use the free versions of LLMs, rather than the much more powerful GPT-4 or Gemini Advanced. The gap between what experienced AI users know AI can do and what inexperienced users assume is a real and growing one. I think a lot of people would be surprised about what the true capabilities of even existing AI systems are, and, as a result, will be less prepared for what future models can do.
I predict, and Mollick would not disagree, that “skill at prompting” is a very ephemeral phenomenon. In the early days of the web, those of us with web sites were excited to try anything new. <blink>
Blinking text</blink>! Frames! Mastering the latest HTML codes was human capital with a very high depreciation rate.
Generative AI flips this paradigm on its head: suddenly, there isn’t an abundance of supply, at least from the perspective of the end users; there is simply one answer. To put it another way, AI is the anti-printing press: it collapses all published knowledge to that single answer, and it is impossible for that single answer to make everyone happy…
the biggest problem with these language models is actually the prompt: the part of the prompt you see is what you type, but that is augmented by a system prompt that is inserted in the model every time you ask a question.
So, per Nate Silver, you get the San Francisco Board of Supervisors inserting their prompt for you.
Ben thinks it will all get back to normal with LLMs “personalizing” their biases to suit our preferences.
Imagine if Google had an entire collection of system prompts that mapped onto the Topics API (transparently posted, of course): the best prompt for the user would be selected based on what the user has already showed an interest in (along with other factors like where they are located, preferences, etc.). This would transform the AI from being a sole source of truth dictating supply to the user, to one that gives the user what they want — which is exactly how Aggregators achieve market power in the first place.
I think that “tell me the answer” is the wrong use case for LLMs. I still think you are better off using ordinary search and then weighing the results yourself.
Here was one of its outputs when we asked Opus to answer a question about pizza toppings by finding a needle within a haystack of a random collection of documents:
Here is the most relevant sentence in the documents: "The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association." However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all. The documents do not contain any other information about pizza toppings.
Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities.
Pointer from Alexander Kruel.
Did you ever submit an essay to a teacher where you inserted a sentence “If you have read this far, check here __.”?
It is spooky to me that Anthropic’s chatbot Claude wrote that it suspected a test or a joke. That seems like a remarkable statement to come out of pattern-matching.
Zvi Mowshowitz discusses at length this example and Claude’s apparent self-awareness. He also points to Pietro Schirano.
I used Claude 3 to unredact this part from the OpenAI emails. What's wild is that they used a "per word" redaction, meaning each redaction length is proportional to the length of the words, so assuming context and word length, this is Claude's guess using the page source
Although I had not thought of it, this seems like an obvious use case for chatbots. Take any publicly released redacted document, and unredact it. Also, I imagine that they are very good at breaking weak encryption systems.
this kind of ‘look what details it cannot do right now’ approach is, in the bigger picture, asking the wrong questions, and often looks silly even six months later.
Yes. We may be mistaken in what we think that these models can do, but you are much safer being on the side of can or soon can than on cannot or never can.
substacks referenced above:
@
@
@
@
I disagree with 'I think that “tell me the answer” is the wrong use case for LLMs. I still think you are better off using ordinary search and then weighing the results yourself.'
The 'Research' mode of you.com (Pro) saves me a lot of time by summarizing with links.
A recent example was a family discussion about whether the German phrase 'Das macht Sinn' should be avoided as an unnecessary Anglicism or proper German. The answer provided by you.com was brilliant (history, reasons with links for both positions and a thoughtful (?) conclusion.
This type of research is a frequent use case of search, and increasingly underserved by Google.
"Take any publicly released redacted document, and unredact it."
This is easy to test. Have the chatbot play Redactle.