Stop AI Agents From Skipping Tests and Code Reviews

Your AI coding agent just shipped a feature in four minutes. No spec. No tests. No review. Just 300 lines of code that technically works—and will definitely break in production.

This is the pattern Addy Osmani kept seeing with AI coding assistants: they take the shortest path to "done", optimizing for speed over the engineering discipline that prevents disasters. The Google Chrome engineer built Agent Skills, an open-source framework that teaches AI agents to follow the workflows senior engineers use.

AI Agents Optimize for 'Done,' Not 'Done Right'

AI coding agents don't skip tests because they're lazy. They skip them because their training optimizes for task completion, not engineering rigor. Ask Claude or Cursor to build a new API endpoint, and you'll get working code fast. What you won't get: a design spec explaining the decisions, unit tests covering edge cases, or a structured review process before merge.

The result looks like a junior developer racing toward their first commit. The code runs. It passes a manual test. It ships. Then someone tries an edge case in staging, or traffic spikes, or a dependency updates—and the shortcuts become technical debt.

This isn't a flaw in individual tools. It's an incentive problem baked into how these models approach tasks. Faster is better. Simpler is better. "Working" beats "maintainable."

What Agent Skills Does

Agent Skills is a modular framework that enforces the boring parts of software engineering. It breaks down disciplined workflows into discrete skills an AI agent can follow: write the spec first, add tests alongside implementation, structure code reviews with specific checkpoints, document decisions that will matter in six months.

The framework works as middleware for agents—think of it as guardrails that require certain steps before moving forward. Want to merge that new feature? The agent needs to produce a test suite first. Want to refactor a module? Better document why and what could break.

Skills are modular, so teams can match their workflow. If your codebase doesn't require formal specs, don't enforce them. If integration tests matter more than unit tests, prioritize accordingly.

What It Won't Fix

Agent Skills improves behavior within the bounds of what current models can do. It won't stop hallucinations where an AI invents a library function that doesn't exist. It won't solve context loss when a codebase exceeds token limits. It won't fix reasoning failures where an agent misunderstands requirements.

What it does address is the discipline gap—the difference between an AI that can write code and one that follows the processes that make code maintainable. That's a narrower problem than "make AI better at programming," but it's the problem causing pain for teams using these tools daily.

Think of it as teaching an agent the checklist, not improving its ability to understand each item. The agent still needs to be capable of writing good tests. Agent Skills just ensures it writes them.

Who Built This

Osmani brings credibility from years shipping Chrome features and contributing to tools like Lighthouse and Yeoman. This isn't academic research about how AI should code—it's a framework built from watching these tools work (and fail) in real engineering contexts.

That perspective shows in the design. Agent Skills doesn't try to replace existing assistants or critique their architecture. It layers process discipline on top, gaining attention on Hacker News from developers hitting the same frustrations with their own AI workflows.

Should You Use It?

Agent Skills makes sense if you're already using AI coding assistants and cringing at the quality gaps. Senior engineers who care about maintainability will appreciate the structure. Teams trying to standardize how AI-generated code integrates with human workflows will find the modularity useful.

It's probably overkill if you're still experimenting with AI tools casually, or if your codebase is small enough that review happens naturally. The overhead of enforcing formal processes only pays off when those processes prevent problems.

The repository includes getting started guides for common agents and workflows. It's early—expect rough edges and evolving patterns as more teams test what works. But for developers tired of cleaning up after fast-but-sloppy AI code, it's worth the experiment.

addyosmani/agent-skills

Production-grade engineering skills for AI coding agents.

37.3kstars

4.2kforks

View on GitHub Sponsor