Senior Product Manager, GitHub
作者:Elle Shwer
When GitHub first shipped the pull request (PR) back in 2008, it wrapped a plain-text diff in a social workflow: comments, approvals, and a merge button that crucially refused to light up without at least one thumbs up from another developer. That design decision hard-wired accountability into modern software and let maintainers scale far beyond hallway conversations or e-mail patches.
Seventeen years later, just about every “agentic” coding tool, from research demos to enterprise platforms, still funnels its work through that same merge gate. The PR remains the audit log, the governance layer, and the social contract that says nothing ships until a person is willing to own it.
Now that large language models (LLM) can scaffold projects, file PRs, and even reply to review comments they wrote themselves, the obvious next question is, who is accountable for code that ships when part of it comes from a model?
At GitHub, we think the answer hasn’t fundamentally changed: it’s the developer who hits “Merge.” But what has changed is everything that happens before that click.
In this article, we’ll explore how we’re re-thinking code reviews for a world where developers increasingly work with AI (and how your team can, too).
Earlier this year, the GitHub Copilot code review team conducted in-depth interviews with developers about their code review process. They also walked us through their code review workflow. These interviews revealed three consistent patterns:
An overarching principle quickly became clear: AI augments developer judgment; it can’t replace it. And our findings, from confidence scores to red-flag explanations, are informing how we’re building Copilot’s code review features.
LLMs are already great at the “grind” layer of a review:
Soon they’ll be able to do even more, such as understand product and domain context. But they still fall short on:
Those gaps keep developers in the loop and in the pilot’s seat. That principle is foundational for us as we continue to develop GitHub Copilot.
The most effective approach to AI-assisted code reviews starts before you even submit your pull request. Think of it as the golden rule of development: Treat code reviewers the way you’d like them to treat you.
Before pushing your code, run GitHub Copilot code review in your IDE to catch the obvious stuff so your teammates can focus on the nuanced issues that require developer insight. Copilot code review can comb your staged diff, suggest docstrings, and flag null dereferences. From there, you can fix everything it finds before you submit your PR so teammates never see the noise.
Just because you used AI to generate code doesn’t mean it’s not your code. Once you commit code, you’re responsible for it. That means understanding what it does, ensuring it follows your team’s standards, and making sure it integrates well with the rest of your codebase.
If an AI agent writes code, it’s on me to clean it up before my name shows up in git blame.
Jon Wiggins, Machine Learning Engineer at Respondology
Your pipeline should already be running unit tests, secret scanning, CodeQL, dependency checks, style linters. Keep doing that. Fail fast, fail loudly.
The real power of AI in code reviews isn’t in replacing developers as the reviewers. It’s in handling the routine work that can bog down the review process, freeing developers to focus where their judgment is most valuable.
Make sure tests pass, coverage metrics are met, and static analysis tools have done their work before developer reviews begin. This creates a solid foundation for more meaningful discussion.
You can use an LLM to catch not just syntax issues, but also patterns, potential bugs, and style inconsistencies. Ironically, LLMs are particularly good at catching the sorts of mistakes that LLMs make, which is increasingly relevant as more AI-generated code enters our codebases.
Set clear expectations about when AI feedback should be considered versus when human judgment takes precedence. For example, you should rely on other developers for code architecture and consistency with business goals and organizational values. It’s especially useful to use AI to review long repetitive PRs where it can be easy to miss little things.
While AI can handle much of the routine work in code reviews, developer judgment remains irreplaceable for architectural decisions, mentoring and knowledge transfer, and context-specific decisions that require understanding of your product and users.
And even as LLMs get smarter, three review tasks remain stubbornly human:
The goal is to make developers more effective by letting them focus on what they do best.
Senior Product Manager, GitHub