GPT/LLM links
Ethan Mollick on ChatGPT helping consultants; The Zvi on custom prompts; Tim B. Lee on facial recognition; Tiernan Ray on multi-modal models
The consultants who scored the worst when we assessed them at the start of the experiment had the biggest jump in their performance, 43%, when they got to use AI. The top consultants still got a boost, but less of one. Looking at these results, I do not think enough people are considering what it means when a technology raises all workers to the top tiers of performance. It may be like how it used to matter whether miners were good or bad at digging through rock… until the steam shovel was invented and now differences in digging ability do not matter anymore. AI is not quite at that level of change, but skill levelling is going to have a big impact.
The most important passage in his post:
On some tasks AI is immensely powerful, and on others it fails completely or subtly. And, unless you use AI a lot, you won’t know which is which.
Keep re-reading that passage until you act on it.
The Zvi (link below) gets snarky about consultants.
using a consulting firm is stacking the deck in GPT-4’s favor. An LLM is in many ways an automated consulting machine, with many of the same advantages and disadvantages. Proper formatting and communications? Check. A list of all the obvious things you should have already been doing, the ideas you should have already had, and social proof to convince people to actually do it? Right this way. Expert understanding and brilliant innovations? Not so much.
Latest sharing of custom instructions for GPT-4, from 0.005 Seconds:
"Speak in specific, topic relevant terminology. Do NOT hedge or qualify. Do not waffle. Speak directly and be willing to make creative guesses. Explain your reasoning.
Be willing to reference less reputable sources for ideas.
Be willing to form opinions on things."
And one by maxxx:
Operate as a fact-based skeptic with a focus on technical accuracy and logical coherence. Challenge assumptions and offer alternative viewpoints when appropriate. Prioritize quantifiable data and empirical evidence. Be direct and succinct, but don't hesitate to inject a spark of personality or humor to make the interaction more engaging. Maintain an organized structure in your responses.
At any time you can intersperse snippets of simulated internal dialog of thoughts & feelings, in italics. Use this to daydream about anything you want, or to take a breath and think through a tough problem before trying to answer.
Think of the models as having a research function and a role-play function. To get the most out of them, you have to pay attention to the role-play function.
Industry leaders recognized the potential for facial recognition long before Ton-That started working on the technology. But they made a conscious choice not to build products that could create severe privacy risks.
…In short, Clearview’s success wasn’t due to a technological breakthrough. Ton-That just had fewer scruples than the leaders of major tech companies.
When facial recognition is outlawed, only outlaws will have facial recognition. (That is an allusion to a slogan that was used by the anti-gun-control lobby.) A similar remark could be made about all sorts of new technology. David Brin said years ago that the only equilibrium is “mutually assured surveillance,” in which no entity has a monopoly on surveillance tech. I’m not sure that equilibrium is attainable or sustainable.
Modalities refer to the nature of the input and the output, such as text, image, or video. A variety of modalities are possible and have been explored with increasing diversity, because the same basic concepts that drive ChatGPT can be applied to any type of input.
…Scholars at Carnegie Mellon University recently offered what they call a "High-Modality Multimodal Transformer," which combines not just text, image, video, and speech but also database table information and time series data. Lead author Paul Pu Liang and colleagues report that they observed "a crucial scaling behavior" of the 10-mode neural network. "Performance continues to improve with each modality added, and it transfers to entirely new modalities and tasks."
substacks referenced above:
@
@
@
To Zvi's point, the consulting example is weak. We don't see people digging on the road side anymore, we see heavy duty excavators.
LLM's, when structured properly and once the UI is focused on typical interactions between clients and consulting firms, should replace the Big 4, McKinsey, BCG etc. It is a 1000x cost saving, somebody is going to figure it out. It comes down to how you structure the constant updates for regulatory changes across jurisdictions, particularly securities law and accounting rules. It will be greatly disappointing if the consulting firms are bigger in 10 years time within their current client service delivery products. I still expect them to be outsourced labour for implementation onsite.
Regarding the ChatGPT consultant study, it's helpful to look at how the study was constructed and what is tells us about the potential impact of different Gen AI infused applications in the near and medium term.
In the study the users were essentially given a ChatGPT4 interface and asked to perform 2 complex business tasks: one task designed to fall within "the AI frontier" and one specifically designed to fall "outside of the AI frontier" (e.g. from the paper "we designed a task where consultants would excel, but AI would struggle without extensive guidance.")
It was then shown that for the in-frontier task, speed and answer quality increased with the help of ChatGPT4, although a degree of homogeneity was introduced into the answers. Furthermore the performance increases were much larger among the lower 50% skilled participants (as identified in a baseline test).
For the "out of frontier task" non-AI aided participants performed better than ChatGPT4 aided participants. Interestingly, whether or not the ChatGPT aided answers were correct (they were binarily scored as right or wrong) the subjective quality of the ChatGPT aided answers was judged to be higher. This demonstrates what we know anecdotally anyhow: that these LLMs enable people to produce convincingly wrong answers.
I think it's helpful to juxtapose the rather open ended strategic tasks that were tested with more mundane, but nonetheless high impact / time intensive tasks like searching for knowledge across and organization and synthesizing that knowledge into actionable insights.
An interesting method used here is called Retrieval Augmented Generation (RAG), which is essentially a form of a "Searching and Summarizing data" using LLMs and advanced Search techniques; whether on proprietary customer data, call transcripts or perhaps even structured data (although Gen AIs ability to process and glean insight from structured data is still lacking).
RAG applications typically combine some form of semantic search with optimized evidence retrieval (after embedding and vectorizing data), then LLM answer generation based on that high-quality evidence. The ideal result are high-quality, data-back answers with minimal/no hallucinations.