Discussion about this post

User's avatar
Rob L's avatar

To Zvi's point, the consulting example is weak. We don't see people digging on the road side anymore, we see heavy duty excavators.

LLM's, when structured properly and once the UI is focused on typical interactions between clients and consulting firms, should replace the Big 4, McKinsey, BCG etc. It is a 1000x cost saving, somebody is going to figure it out. It comes down to how you structure the constant updates for regulatory changes across jurisdictions, particularly securities law and accounting rules. It will be greatly disappointing if the consulting firms are bigger in 10 years time within their current client service delivery products. I still expect them to be outsourced labour for implementation onsite.

Expand full comment
Joe Rini's avatar

Regarding the ChatGPT consultant study, it's helpful to look at how the study was constructed and what is tells us about the potential impact of different Gen AI infused applications in the near and medium term.

In the study the users were essentially given a ChatGPT4 interface and asked to perform 2 complex business tasks: one task designed to fall within "the AI frontier" and one specifically designed to fall "outside of the AI frontier" (e.g. from the paper "we designed a task where consultants would excel, but AI would struggle without extensive guidance.")

It was then shown that for the in-frontier task, speed and answer quality increased with the help of ChatGPT4, although a degree of homogeneity was introduced into the answers. Furthermore the performance increases were much larger among the lower 50% skilled participants (as identified in a baseline test).

For the "out of frontier task" non-AI aided participants performed better than ChatGPT4 aided participants. Interestingly, whether or not the ChatGPT aided answers were correct (they were binarily scored as right or wrong) the subjective quality of the ChatGPT aided answers was judged to be higher. This demonstrates what we know anecdotally anyhow: that these LLMs enable people to produce convincingly wrong answers.

I think it's helpful to juxtapose the rather open ended strategic tasks that were tested with more mundane, but nonetheless high impact / time intensive tasks like searching for knowledge across and organization and synthesizing that knowledge into actionable insights.

An interesting method used here is called Retrieval Augmented Generation (RAG), which is essentially a form of a "Searching and Summarizing data" using LLMs and advanced Search techniques; whether on proprietary customer data, call transcripts or perhaps even structured data (although Gen AIs ability to process and glean insight from structured data is still lacking).

RAG applications typically combine some form of semantic search with optimized evidence retrieval (after embedding and vectorizing data), then LLM answer generation based on that high-quality evidence. The ideal result are high-quality, data-back answers with minimal/no hallucinations.

Expand full comment
5 more comments...

No posts