Discussion about this post

User's avatar
Greg Kemnitz's avatar

Not sure if what I do is "vibe-coding", but I've been using Claude Code a lot lately to help with testing our startup's product, which does migration of data between different types of databases.

I used Claude to code test data generators, test change generators, a deep-crawl correctness validator, and a test harness.

I've written things like these "by hand" several times in the past, and they are tedious but rather straightforward engineering tasks, but they take quite a bit of time to get right.

One big thing that Claude does is it wants to be "lazy", so you have to make darn sure it doesn't do things like hack the test validation code to make a particular test pass!

For testing, Claude is awesome in many ways. It grinds through numerous logs, quickly uncovering the root problem and recommending a solution, that is correct most of the time (but here you still need to watch its "laziness": it will recommend working around a problem that it identifies versus fixing the thing that generated the badness in the first place. So, if you're using it like this, you have to stop Claude from "localized" fixes.

It helps if you give Claude clear directions about stuff like "we will always use Standard Format X for Datatype Y in our 'consumer' - if we're not seeing X, the fault is in the 'producer' that give us Format Z, not the fact that we can't parse Format Z".

Using Claude Code for this stuff has saved us several man-months of time in coding the test infrastructure as well as grinding through several dozen combinations of load-types and source/target combinations.

Claude is awesome at identifying weird datatype format problems and other tedious things that are huge pains-in-the-ass for human programmers, especially if the solution involves combining log crawls, code digging, and looking up stuff online. Claude can do this in a couple of minutes, while even an experienced programmer may spend hours futzing with this sort of stuff.

Also, Claude is good at stuff like generating parallelized code or adding parallelism to existing code.

Where Claude isn't great is at high-level design. You need to come up with the high-level plan of how you want your stuff to work, carefully specify it in a set of prompts (or a spec document that's in a location and format that Claude can ingest), force Claude to dialog with you about its questions, and only have it generate code once you confirm that Claude understands what you want by basically repeating your design back to you.

Claude can also flail badly if it's trying to solve something harder, particularly if the online documentation isn't good and you haven't been rigorous with your requirements. You have to keep Claude's "eye on the ball" and be rigorously consistent with what you want it to do and what you don't want it to do.

And yes, Claude can generate "wrong" or incomplete code. You have to run a lot of tests to make sure its code is correct.

And like human programmers, Claude will only do what you ask it to do, not necessarily what you wanted.

Expand full comment
Todd's avatar

I'm watching my wife slowly inch her way towards vibe coding. She's responsible for creating the schedule for her group of ~20 radiologists. It's a complicated allocation problem, which I actually helped her out several years ago by building a bespoke Excel document with lots of VBA. However, the dynamics of the group have been in constant flux since then with frequent changes in both staffing and site-specific constraints so she's ended up doing it mostly by hand. Last month she finally started asking ChatGPT to help her in the main chat app, and she got some early successes. I could tell she was excited when it started spitting out lists of assignments after "thinking" about it for several minutes while she scrolled Twitter. Of course, she still had to go back in and manually enter the information to their online scheduling platform, and when someone asked for a day off for a doctor's appointment, she had to do the whole thing over, but it was a start. I've been trying to gently suggest that she consider installing VS Code and trying to use Codex to build an app for her to do the work, but so far she's resisted.

I think that that is still the biggest boundary. If you can cross the threshold from the web-based AI chat apps and start interacting with the technology inside of a development environment, a lot of the hype starts to become much more understandable. My guess is that OpenAI and others would have already moved to incorporate an IDE within their chat apps if they weren't constrained by compute, so it's probably only a matter of (a very short span of) time.

Expand full comment
17 more comments...

No posts

Ready for more?