How We Turned Our Development Over to AI

Let me be direct about something: we don't trust AI enough to let it ship code without a human reviewing every line. Not yet. Maybe not ever. But here's the thing—we also don't trust humans to be productive enough without AI. Not anymore.

At R Software & Consulting, we actively develop and maintain four applications across three companies. Our stack is primarily Next.js and Python. We use Claude as our primary development tool across the entire lifecycle—from the first brainstorming session to the production deploy. And because of that, we consistently ship at a pace that would normally require a team several times our size.

This isn't a story about replacing developers. It's a story about what happens when you build the right guardrails and then let AI do what it does best: the volume work, the boilerplate, the test writing, the refactoring—all the things that eat up engineering hours without requiring the judgment that only humans can provide.

The Problem We Were Solving

When Anthropic released Claude's Chat, Cowork, and Code tabs, it introduced a genuine workflow problem. Three tools, overlapping capabilities, and no clear guidance on when to use which. Our team was confused. We were using Chat to write code (bad), using Code for brainstorming (wasteful), and ignoring Cowork entirely (a missed opportunity).

We needed a system. Not just “use AI for coding,” but a repeatable, disciplined workflow that told every engineer on the team exactly which tool to open, when, and why. More importantly, we needed that workflow to have real quality gates—places where automation catches the easy stuff and humans catch the hard stuff.

The Workflow: Five Phases, Clear Boundaries

We built an internal framework that maps the entire development lifecycle to specific Claude tools. The short version:

Chat is for thinking. You open it when you have a vague idea and need to explore architecture, research a library, or talk through tradeoffs. It's the whiteboard session. Nothing gets written to a codebase here.

Cowork is for knowledge work. Specs, PRDs, formatted documents, data analysis, scheduled tasks. When you need a polished deliverable that isn't code, Cowork creates real files—Word docs, presentations, spreadsheets—not just chat responses.

Code is for building. It reads your entire codebase, writes code across multiple files, runs your test suite, creates commits, and opens PRs. This is where the 10x productivity comes from. But it's also where the most discipline is required.

The framework maps these tools across five phases: Ideate, Spec, Build, Test & Review, and Deploy. Each phase has defined inputs, outputs, and—critically—quality gates that determine whether work can move forward.

Explore the complete interactive workflow framework

Open the Interactive Framework

Why We Don't Trust AI—And Why That's the Point

Every piece of AI-generated code in our pipeline goes through the same review process as human-written code. Actually, it goes through more review. Here's what that looks like:

First, GitHub Actions runs automatically on every push. Lint, type-check, unit tests, integration tests, build. If any of that is red, nothing moves forward. This is the automated safety net, and it catches a surprising amount of issues before anyone even looks at the code.

Second, we run an AI code review in a fresh Claude session. This is important. The session that wrote the code has confirmation bias—it thinks its own work is correct. A fresh session has no context from the writing session, so it reviews the diff with fresh eyes, looking for security vulnerabilities, performance issues, and logic errors. It's the AI equivalent of “don't review your own PR.”

Third, a human reviews and approves the PR. This is non-negotiable. AI is fast, but it doesn't have the business context, the institutional memory, or the judgment to know whether a technical decision makes sense for the product. That's what humans are for.

Fourth, we deploy to staging and manually verify. Click every button. Test every flow. Check mobile. Automated tests are necessary but not sufficient. A human walking through the feature catches the things that tests can't express.

Only after all four of those gates pass does code reach production. The result is that we ship more code, faster, with higher quality—because the AI handles the volume and the humans handle the judgment.

10x the Output, Not 10x the Risk

The numbers are real. A single developer using this workflow can produce the output of a small team—not because the code writes itself (it doesn't), but because the developer spends their time on the high-leverage activities: architecture decisions, code review, product judgment, and user experience. The mechanical work—writing boilerplate, implementing well-defined specs, writing test suites, refactoring for consistency—that's where AI excels, and that's where most engineering hours traditionally go.

But 10x output means nothing if you're shipping 10x the bugs. That's why the guardrails exist. The CI pipeline, the dual code review (AI + human), the staging verification—these aren't optional steps we skip when we're in a hurry. They're the reason we can move fast. Confidence in your safety net is what lets you take bigger swings.

Jim Flynn: AI That Learns Your Environment

Our internal AI agent framework, Jim Flynn, takes this a step further. While Claude provides the raw intelligence, Jim Flynn provides the context layer—it learns your codebase, your conventions, your deployment patterns, and your team's way of working. Think of it as the difference between hiring a brilliant generalist and hiring someone who already knows your systems.

Jim Flynn can be privately deployed within your infrastructure. Your data never leaves your environment. Your proprietary code, your business logic, your customer data—it all stays behind your firewall. The AI adapts to you; you don't adapt to the AI.

This matters especially for organizations operating under regulatory requirements. We build AI implementations that are designed from the ground up for compliance—SOC 2, PCI DSS, HIPAA, and other frameworks that govern how sensitive data is handled. Private deployment means your AI tooling meets the same security and compliance standards as the rest of your infrastructure. Audit trails, access controls, data residency—all built in, not bolted on.

For teams that want the productivity gains of AI-assisted development but operate in regulated industries—healthcare, finance, government, insurance—this is how you get there without compromising on compliance.

The CLAUDE.md File: Your Most Important Investment

If there's one takeaway from this article, it's this: create a CLAUDE.md file in the root of every repository before you write a single line of AI-assisted code.

CLAUDE.md is a markdown file that gives Claude project-specific context. Your tech stack. Your coding conventions. Your test commands. Your deployment process. Your known gotchas. It's onboarding documentation for an AI engineer, and it's the single highest-leverage thing you can do to improve the quality of AI-generated code.

Without it, Claude is smart but generic. With it, Claude writes code that looks like your team wrote it. We've found that a well-maintained CLAUDE.md reduces the number of review comments by roughly half, because the AI is already following your conventions instead of guessing.

What This Means for Your Team

The workflow we've built isn't proprietary or theoretical. We use it every day across four active products, and we've packaged the core of it into an interactive framework that you can explore, share with your team, and adapt to your own stack.

The framework covers the complete lifecycle: when to use Chat vs. Cowork vs. Code, how CI/CD fits in with GitHub Actions, where human review gates belong, how to structure your CLAUDE.md, common anti-patterns to avoid, and a team adoption checklist that takes you from Day 1 to Month 1.

Ready to Build Faster Without Losing Control?

Explore our complete interactive workflow framework—the same system we use to ship production software every day. Or talk to us about bringing Jim Flynn and a regulation-ready AI development workflow to your organization.

Explore the Framework Let's Talk

Phillip

CEO of R Software & Consulting. He leads development on The Positivity App, the Jim Flynn AI framework, and other R Software products.

Back to R Software

How We Turned Our Development Over to AI—Without Losing Control