作者:Matt Foster
Agentic AI is beginning to reshape malware detection and broader security operations. These systems are being used not to replace humans, but to take on the lower value jobs that have historically tied up analysts — from triaging alerts to reverse-engineering suspicious files.
Microsoft's Project Ire is a recent high-profile example. The agent autonomously reverse-engineers software and recently produced the first AI-authored "conviction" strong enough for Windows Defender to block an advanced persistent threat (APT).
In controlled tests, it achieved 0.98 precision and 0.83 recall on Windows driver datasets, and about 89% precision on Defender telemetry, though recall dropped to 25%. That trade-off makes Ire best suited to triage, where minimizing false positives can be more valuable than catching every malicious file.
The company has also previewed a Phishing Triage Agent that processes user-reported emails and generates natural-language rationales for security teams.
Other vendors are moving in the same direction. CrowdStrike has embedded Charlotte AI into its Falcon platform, enabling automated triage with contextual explanations. ReliaQuest's GreyMatter platform incorporates agentic AI to automate elements of detection, investigation, and response across integrated security tools.
Research groups are also contributing. Google’s Big Sleep agent uncovered a critical SQLite vulnerability (CVE-2025-6965), while its Sec-Gemini model is enhancing forensic workflows for threat and root cause analysis.
Although they target different problems, from malware analysis to phishing detection to forensic workflows, these systems share a similar design philosophy. They don’t just produce a classification, they generate outputs structured for analyst review, making transparency a central part of their operation.
A defining feature is the evidence chains they produce. Rather than issuing a simple yes-or-no verdict, these agents generate supporting artifacts, summaries, rationales, or structured reports that analysts can review.
Figure 1: Project Ire Evidence Chain
Adoption is already underway. A July ISC²-cited survey found that 30% of security teams have already integrated agentic AI into operations, mostly for tier-one and tier-two triage. Another 42% are evaluating adoption.
That adoption is reflected in industry reporting. Coverage in Forbes and Axios suggests enterprises are using their newly installed agents to prioritize alerts and free analysts for higher-value work.
For practitioners, the value of these tools lies in how they augment existing pipelines—helping reduce alert fatigue and accelerate investigations while leaving judgment and high‑risk decisions with human experts.
The push toward automation also reflects necessity. Security teams face relentless growth in alert volume and a chronic shortage of skilled analysts, making automation of lower-value tasks an operational priority.
But enthusiasm is tempered by risks. Recent analysis of large language models in security contexts warns of hallucinations, limited contextual awareness, and reasoning weaknesses, which translate into lower recall when deployed in live detection pipelines.
In addition, industry experts warn that high‑precision agentic systems, if deployed without oversight, risk creating new blind spots.
Taken together, these dynamics suggest a future where agentic AI becomes a standard augmentation layer in security pipelines: scaling analysis, improving consistency, and handling repetitive work, while humans retain responsibility for oversight and high-risk decisions.