Discussion about this post

User's avatar
Neural Foundry's avatar

Brillant observation about the shiftfrom Claude expecting expertise to enabling actual vibe-coding. Back when I was experimenting with similar tools, that friction between 'what the AI assumes I know' and 'what I actually need to do' was brutal. The real value in that virtual wax museum assignment isn't just geting students comfortable with AI tools but making them document the workflow itself. Most orgs still have no idea what these tools can actually deliver vs the hype.

Greg Kemnitz's avatar

Not sure if what I do is "vibe-coding", but I've been using Claude Code a lot lately to help with testing our startup's product, which does migration of data between different types of databases.

I used Claude to code test data generators, test change generators, a deep-crawl correctness validator, and a test harness.

I've written things like these "by hand" several times in the past, and they are tedious but rather straightforward engineering tasks, but they take quite a bit of time to get right.

One big thing that Claude does is it wants to be "lazy", so you have to make darn sure it doesn't do things like hack the test validation code to make a particular test pass!

For testing, Claude is awesome in many ways. It grinds through numerous logs, quickly uncovering the root problem and recommending a solution, that is correct most of the time (but here you still need to watch its "laziness": it will recommend working around a problem that it identifies versus fixing the thing that generated the badness in the first place. So, if you're using it like this, you have to stop Claude from "localized" fixes.

It helps if you give Claude clear directions about stuff like "we will always use Standard Format X for Datatype Y in our 'consumer' - if we're not seeing X, the fault is in the 'producer' that give us Format Z, not the fact that we can't parse Format Z".

Using Claude Code for this stuff has saved us several man-months of time in coding the test infrastructure as well as grinding through several dozen combinations of load-types and source/target combinations.

Claude is awesome at identifying weird datatype format problems and other tedious things that are huge pains-in-the-ass for human programmers, especially if the solution involves combining log crawls, code digging, and looking up stuff online. Claude can do this in a couple of minutes, while even an experienced programmer may spend hours futzing with this sort of stuff.

Also, Claude is good at stuff like generating parallelized code or adding parallelism to existing code.

Where Claude isn't great is at high-level design. You need to come up with the high-level plan of how you want your stuff to work, carefully specify it in a set of prompts (or a spec document that's in a location and format that Claude can ingest), force Claude to dialog with you about its questions, and only have it generate code once you confirm that Claude understands what you want by basically repeating your design back to you.

Claude can also flail badly if it's trying to solve something harder, particularly if the online documentation isn't good and you haven't been rigorous with your requirements. You have to keep Claude's "eye on the ball" and be rigorously consistent with what you want it to do and what you don't want it to do.

And yes, Claude can generate "wrong" or incomplete code. You have to run a lot of tests to make sure its code is correct.

And like human programmers, Claude will only do what you ask it to do, not necessarily what you wanted.

18 more comments...

No posts

Ready for more?