Code review in the age of AI: Why developers will always own the merge button

2025-07-14 17:28:33 英文原文

作者：Elle Shwer

When GitHub first shipped the pull request (PR) back in 2008, it wrapped a plain-text diff in a social workflow: comments, approvals, and a merge button that crucially refused to light up without at least one thumbs up from another developer. That design decision hard-wired accountability into modern software and let maintainers scale far beyond hallway conversations or e-mail patches.

Seventeen years later, just about every “agentic” coding tool, from research demos to enterprise platforms, still funnels its work through that same merge gate. The PR remains the audit log, the governance layer, and the social contract that says nothing ships until a person is willing to own it.

Now that large language models (LLM) can scaffold projects, file PRs, and even reply to review comments they wrote themselves, the obvious next question is, who is accountable for code that ships when part of it comes from a model?

At GitHub, we think the answer hasn’t fundamentally changed: it’s the developer who hits “Merge.” But what has changed is everything that happens before that click.

In this article, we’ll explore how we’re re-thinking code reviews for a world where developers increasingly work with AI (and how your team can, too).

What we learned from GitHub Copilot’s code review capabilities

Earlier this year, the GitHub Copilot code review team conducted in-depth interviews with developers about their code review process. They also walked us through their code review workflow. These interviews revealed three consistent patterns:

No special treatment for AI: Reviewers grilled model-generated diffs as hard as those from other developers.
Self reviews raised the floor: Developers who ran a Copilot review before opening a PR often wiped out an entire class of trivial nit-picks (i.e., trimmed imports, missing tests), cutting out back-and-forth by roughly a third.
AI was no replacement for human judgement: Programming often involves trade-offs. LLMs can inform you about those trade-offs, but someone has to make the call about what path to take based on your organization’s goals and standards.

An overarching principle quickly became clear: AI augments developer judgment; it can’t replace it. And our findings, from confidence scores to red-flag explanations, are informing how we’re building Copilot’s code review features.

What AI can (and can’t) handle today

LLMs are already great at the “grind” layer of a review:

Mechanical scanning. “Is there a typo?” “Are all arguments used?”
Pattern matching. “This looks like SQL injection” or “You forgot to await that promise.”
Pedantic consistency. “Variable names snake_case here, camelCase there.”

Soon they’ll be able to do even more, such as understand product and domain context. But they still fall short on:

Architecture and trade-offs. Should we split this service? Cache locally?
Mentorship. Explaining why a pattern matters and when to break it.
Values. Should we build this feature at all?

Those gaps keep developers in the loop and in the pilot’s seat. That principle is foundational for us as we continue to develop GitHub Copilot.

A playbook for modern code reviews

The most effective approach to AI-assisted code reviews starts before you even submit your pull request. Think of it as the golden rule of development: Treat code reviewers the way you’d like them to treat you.

Use AI to self review your code in your IDE

Before pushing your code, run GitHub Copilot code review in your IDE to catch the obvious stuff so your teammates can focus on the nuanced issues that require developer insight. Copilot code review can comb your staged diff, suggest docstrings, and flag null dereferences. From there, you can fix everything it finds before you submit your PR so teammates never see the noise.

Take ownership of your code

Just because you used AI to generate code doesn’t mean it’s not your code. Once you commit code, you’re responsible for it. That means understanding what it does, ensuring it follows your team’s standards, and making sure it integrates well with the rest of your codebase.

If an AI agent writes code, it’s on me to clean it up before my name shows up in git blame.
Jon Wiggins, Machine Learning Engineer at Respondology

Run your code through automated CI gates

Your pipeline should already be running unit tests, secret scanning, CodeQL, dependency checks, style linters. Keep doing that. Fail fast, fail loudly.

Practical tips for personal code hygiene:

Review your own code in your IDE.
Ensure variable names, comments, and structure to match your team’s conventions.
Test AI-generated code thoroughly before including it in pull requests.

Use AI to focus on the areas where your judgement is critical

The real power of AI in code reviews isn’t in replacing developers as the reviewers. It’s in handling the routine work that can bog down the review process, freeing developers to focus where their judgment is most valuable.

AI doesn’t replace your existing automated checks.

Make sure tests pass, coverage metrics are met, and static analysis tools have done their work before developer reviews begin. This creates a solid foundation for more meaningful discussion.

You can use an LLM to catch not just syntax issues, but also patterns, potential bugs, and style inconsistencies. Ironically, LLMs are particularly good at catching the sorts of mistakes that LLMs make, which is increasingly relevant as more AI-generated code enters our codebases.

Clearly define roles

Set clear expectations about when AI feedback should be considered versus when human judgment takes precedence. For example, you should rely on other developers for code architecture and consistency with business goals and organizational values. It’s especially useful to use AI to review long repetitive PRs where it can be easy to miss little things.

Implementation tips for building a sustainable AI-assisted review process

Document clear guidelines that specify when to use AI in code reviews, what types of feedback to trust, and how to escalate when developers disagree with an AI code review. With GitHub Copilot, for instance, you can use custom instructions to set clear rules for how Copilot engages with your code.
Update guidelines regularly based on team feedback and evolving AI capabilities. Remember that as your codebase and AI tools evolve, what works today might not work tomorrow.
Encourage open team discussions about the strengths and limitations of AI-assisted reviews. Share both positive and negative experiences to help everyone learn and improve their approach.
Refine automation continuously by using feedback from reviewers to improve your automated testing strategy. Identify patterns where solutions to recurring issues could be automated.

Developer judgement remains crucial

While AI can handle much of the routine work in code reviews, developer judgment remains irreplaceable for architectural decisions, mentoring and knowledge transfer, and context-specific decisions that require understanding of your product and users.

And even as LLMs get smarter, three review tasks remain stubbornly human:

Architecture trade-offs: Should we split this service? Cache locally? Pay tech debt now or later?
Mentorship and culture: PR threads are team classrooms. A bot can’t tell a junior engineer the war story behind that odd regex.
Ethics and product values: “Should we even build this?” is a question AI can’t answer.

The goal is to make developers more effective by letting them focus on what they do best.

Written by

Senior Product Manager, GitHub

关于《Code review in the age of AI: Why developers will always own the merge button》的评论

暂无评论

发表评论

摘要

When GitHub introduced pull requests (PRs) in 2008, it integrated social workflows and accountability into modern software development. Now, as large language models (LLMs) can generate code and file PRs, questions arise about who is accountable for AI-generated code. GitHub asserts that developers remain responsible but must adapt their review processes to integrate AI tools effectively. Key insights include treating AI-generated code the same as human-written code, conducting thorough self-reviews before opening PRs, and understanding that LLMs can handle routine tasks but not architectural decisions or ethical considerations. Developers are encouraged to use AI for initial reviews and rely on human judgment for critical aspects of code review, ensuring a balance between efficiency and accountability.