Code review at scale is broken. Here's how we're fixing it.

Code review at scale is broken. Here's how we're fixing it.

December 11, 2025

TL;DR:

Today we’re launching Augment Code Review. Built for large, long-lived codebases, it catches correctness, architectural, and cross-system issues that existing tools miss—while dramatically reducing noise. Powered by GPT-5.2, Augment achieved the highest accuracy on the only public benchmark for AI-assisted code review, outperforming systems from Cursor Bugbot, CodeRabbit, and others by ~10 points on overall quality. Enterprise teams and OSS maintainers are already using it to complete reviews faster and reduce bugs reaching production.

Augment Code Review Agent is now available for all Augment Code users and is free for all paid plans for a week. Open source projects can request free access to Augment Code Review. To learn more, check out our product page or read the docs.

Code review was already painful. AI made it worse.

AI has radically accelerated code creation: Google reports that more than 25% of its new code is now written by AI; Microsoft reports over 30%. But review capacity in enterprises has not kept pace. That mismatch has become one of the biggest bottlenecks in modern software development.

Every engineering org knows the symptoms:

  • Best practices live in documents, Slack threads, and senior engineers’ heads
  • But they don’t consistently make it into code
  • Teams wait days for reviews
  • Rushed or shallow reviews become operational risk

The financial consequences are real: the average outage costs ~$300,000 per hour, and in large enterprises this can exceed $1,000,000 per hour. Many originate from software errors, misconfigurations, or incomplete reviews.

We built Augment Code Review to solve this problem at its root: by enforcing best practices on every PR, catching high-impact issues early, and restoring flow to development teams working in complex systems.

Why existing AI review tools fall short

GitHub Marketplace lists 77+ AI review bots, but they follow the same flawed pattern:

extract the diff → send it to an LLM → generate dozens of shallow, noisy comments

This results in:

  • Low precision: too many irrelevant suggestions
  • Low recall: real bugs missed due to lack of context
  • Shallow reasoning: no understanding of architecture or cross-file behavior

Developers tune them out.

A real review agent must meet a different standard: deep context retrieval, high signal, and comments that meaningfully influence merge decisions.

The Augment approach: signal over noise, context over guesswork

Our philosophy is simple:

If a comment won’t likely change a merge decision, we don’t post it.

1. Focus on correctness and architectural issues

Augment prioritizes bugs, security vulnerabilities, cross-system pitfalls, invariants, change-impact risks, and missing tests—not style nits.

2. Understand your entire codebase

Augment retrieves the full cross-file context required to evaluate correctness in large, long-lived repos: dependency chains, call sites, type definitions, tests, fixtures, and historical changes.

Benchmarking shows competing tools consistently miss this context.

3. Encode your team’s expertise

Teams define custom rules once, and Augment enforces them everywhere.

yaml
areas:
payments:
globs: ["services/payments/*"]
rules:
- id: audit_logging_required
description: All state changes must log to audit system
severity: high

4. Learn what your organization actually values (no configuration sprawl)

Augment adapts based on which comments your developers address or ignore. The result: increasing precision over time.

What customers are seeing

Jawahar Prasad, Senior Director of Engineering at Tekion, reports that his team is already seeing meaningful results with Augment Code Review after rolling it out to his team of 1,400 engineers in October:

  • Average time to merge dropped from 3 days 4 hours to 1 day and 7 hours ---> 60% faster merges after Augment Code Review rollout
  • Time to first human review went from 3 days to 1 day because Augment reduces the cognitive load for developers
  • 21% more merge requests merged with the same number of engineers

Here’s how Tyler Kaye, Lead Engineer, Atlas Clusters at MongoDB describes the impact:

“Augment has become a valuable part of our code review process. It doesn't replace human review; it enhances it by giving authors a thoughtful first pass before their teammates ever see the code. Its custom guideline integration combines MongoDB's best-practice recommendations with our own organization-specific guidance, making the feedback both relevant and actionable. The built-in observability helps us understand how many comments Augment is surfacing and how often they're being resolved, giving us clearer insight into code quality trends. And with its high signal-to-noise ratio, every comment feels meaningful. Augment helps engineers show up to review with cleaner, better-prepared code.”

This is exactly the experience we designed for: a reliable first-pass reviewer that improves code quality and accelerates human review—not a replacement for it.

Benchmarks: Augment leads the field

To validate performance, we evaluated seven widely used AI code review tools using the only public dataset of “golden comments”—ground-truth issues a competent reviewer would catch.

We measured precision (signal), recall (coverage), and F-score (overall quality).

Sorted by F-score:

ToolPrecisionRecallF-score
⭐ Augment Code Review65%55%59%
Cursor Bugbot60%41%49%
Greptile45%45%45%
Codex Code Review68%29%41%
CodeRabbit36%43%39%
Claude Code23%51%31%
GitHub Copilot20%34%25%
Scatter plot showing precision versus recall trade-off for AI code review tools, with F-Score indicated by circle size and percentage labels. Augment Code Review leads with 59% F-Score at approximately 65% precision and 57% recall (shown in blue). Other tools cluster in lower performance ranges: Cursor BugBot (49% F-Score at 60% precision, 41% recall), Greptile (45% at 45% precision, 45% recall), Codex Code Review (41% at 70% precision, 30% recall), CodeRabbit (39% at 35% precision, 45% recall), Claude Code (31% at 22% precision, 50% recall), and GitHub Copilot (25% at 20% precision, 35% recall). Augment Code Review demonstrates superior balance of precision and recall compared to competitors.

Augment achieved the highest accuracy, outscoring the next-best tool by ~10 points in overall quality.

Most tools must choose between:

  • High recall but low signal-to-noise ratio (Claude, Greptile)
  • High precision but shallow coverage (Codex, Cursor)

Augment is the only system to maintain both high precision and high recall, because it retrieves the full, correct context needed for deep reasoning.

Pricing built for scale

Augment Code Review is designed to deliver value without breaking your budget:

  • Average cost per PR review: 2,400 credits (~$1.50)
  • Free for open source projects — we believe in supporting the OSS community that powers modern software. To get access, send us a link to your OSS project here.

To put this cost in perspective: a senior engineer reviewing code costs $75-150+ per hour depending on market and seniority. Even a quick 10-minute review runs $12-25 in fully-loaded cost (and research from Google says most code reviews take 30 minutes or more). At $1.50 per PR, Augment pays for itself if it saves your team just 90 seconds per review—or catches a single production bug that would have required a hotfix, rollback, and post-mortem.

The math is straightforward: faster reviews, fewer context switches for your senior engineers, and reduced production incidents at a fraction of the cost of human review time.

Try it

If you're currently an Augment Code customer, you can configure Code Review here: https://app.augmentcode.com/settings/code-review

  • If you are a paying user: you can use Code Review for the next week for free
  • If you are a free trial user: you can use Code Review and it will consume your trial credits
  • If you don't have an Augment account: you can use Code Review by creating an account and activating a trial or getting a paid plan
  • If you maintain an Open Source project: sign up for Augment Code and then request free access for your project

If you’re ready for:

  • higher signal
  • fewer bugs
  • faster reviews
  • and the only reviewer proven to outperform the field

Install Augment Code Review in 3 clicks for GitHub Cloud - read more in the docs to get started today.

Let’s fix code review.

Akshay Utture

Akshay Utture

Akshay Utture builds intelligent agents that make software development faster, safer, and more reliable. At Augment Code, he leads the engineering behind the company’s AI Code Review agent, bringing research-grade program analysis and modern GenAI techniques together to automate one of the most time-consuming parts of the SDLC. Before Augment, Akshay spent several years at Uber advancing automated code review and repair systems, and conducted research across AWS’s Automated Reasoning Group, Google’s Android Static Analysis team, and UCLA. His work sits at the intersection of AI, software engineering, and programming-language theory.

Siyu Zhan

Siyu Zhan

Engineering Manager

Siyu Zhan is an Engineering Manager at Augment Code, where she leads efforts to automate the software development lifecycle with AI agents. With over a decade of experience building products at the intersection of complex systems and user experience, she brings deep expertise in scaling engineering teams and shipping products that solve real-world problems. Before joining Augment, Siyu spent nearly three years at Stripe leading teams that built Stripe Global Payouts and the Stripe Payments Dashboard, and served as Head of Engineering for Commercialization at Nuro, where she designed and built the entire product stack from the ground up as the company's first product engineer hire. Her career also includes backend engineering at Uber's UberPOOL team and financial software development at Bloomberg LP.

Loading...