Can we enhance bug detection in software by leveraging insights from the last two decades of Linux kernel commits through artificial intelligence?
In a recent blog post, security researcher Jenny Guanni Qu examined 125,183 bug fixes in the Linux kernel, tracing back to April 2005, the month Git was introduced.
Using this extensive research, Qu has begun to prototype an AI-driven tool aimed at predicting which new commits are most likely to introduce defects.
According to Qu, “We plan to publish the trained model and inference code once the validation experiments are concluded, likely within the next few weeks.”
To gain insights, The New Stack spoke with Linux kernel developer Greg Kroah-Hartman, who reflected on the existing measures Linux employs to identify and address bugs, as well as how developers fine-tune their patch processes.
This demonstrates the commitment of the developer community to allergy away from bugs by investing significant time in creating effective tools to spot them before they escalate. However, the question remains: can we find new methodologies for tackling this issue?
Driven by a dedicated and expanding community reviewing the Linux kernel’s code, one researcher launched her own research and development project to potentially discover more effective strategies.
Insights Gleaned from Linux Kernel Bug Research
Qu spent six hours mining the kernel’s two-decade commit history on Git. (She published the dataset on GitHub and HuggingFace under the MIT license.) Her initial conclusion? Contemporary bugs have a hard time remaining hidden from Linux kernel developers.
“We’re discovering new bugs more swiftly and are gradually addressing about 5,400 long-standing bugs that have been concealed for over five years,” she noted.
Qu’s findings reveal that by 2022, 69% of bugs were identified within one year—up from 0% in 2010. She believes this shows “real progress due to improved tools.” By 2025, the average lifespan of bugs (from discovery to resolution) was a mere 0.7 years, with only 13.5% of bugs lingering for more than five years.
This can be partly attributed to an increase in contributors reviewing the code, as Qu observes. Furthermore, she credits the growing adoption of high-quality testing tools over the years, which include:
- The Syzkaller fuzzer (introduced in 2015)
- Dynamic memory error detectors like KASAN, KMSAN, and KCSAN
- Enhanced static analysis
Interestingly, only 158 of the 125,183 bugs examined boasted a CVE—roughly 0.12%.
Qu mentioned a crucial caveat: “Git indicates 448,000 commits for the Linux kernel that mention some kind of ‘fix.’ However, only 28% of this vast collection utilizes the Fixes: tag that I employed for my analysis. Hence, my dataset encompasses well-documented bugs, meaning those where maintainers have traced the root cause.”
Nevertheless, Qu discerned a clear trend. “I found that security bugs remain unnoticed for an average of 2.1 years before they are detected,” she shared with The New Stack. Some linger for over 20 years—indicating a recognition pattern issue rather than a tooling problem.
Could these revelations pave the way for a novel approach to bug detection?
The Innovator Behind the AI Bug Detector
Qu conducted her research at VC firm Pebbled Ventures, which has invested in various tech startups, including the generative AI image platform Krea. Their mission, as stated on their website, is to support foundational advancements.
Qu describes Pebbled as “technical investors supporting technical founders” and offers a “somewhat unstructured” residency geared toward research that may serve as a basis for future companies. Her focus was on autonomous vulnerability discovery. Their site states, “we empower researchers to explore ideas that may initially appear unconventional.” Qu was well-suited for this endeavor, having previously trained AI to solve mathematical problems at Caltech and being recognized as one of the world’s top competitive hackers.
Her team, SuperDiceCode, secured third place at DEF CON CTF 2025. “I have participated in CTFs competitively for years,” Qu remarked, attributing her real-world experience to her investigation. “The same types of vulnerabilities have persisted within the kernel across decades, such as use-after-frees, race conditions, and missing bounds checks. I aimed to analyze why these bugs continue to surface and if we could catch them earlier.”
Qu’s academic journey included studying reinforcement learning for mathematical AGI at Caltech, alongside exploring mathematics, physics, and computer science at UC Davis (as detailed on a homepage that mimics a Linux command line). As Qu succinctly states:
“Pebblebed provided the funding to develop AI that identifies zero-day vulnerabilities first, and I am fully committed.”
Her research revealed that the most elusive bugs were race conditions, with an average detection time of 5.1 years (with a median detection time of 2.6 years). “They are non-deterministic and often only manifest under specific timing conditions, possibly occurring once in a million executions. Even tools like KCSAN can only highlight races they observe.”

Qu believes that many other bugs are challenging to detect because modern fuzzing tools miss them.
Her background proved advantageous in her efforts. “The distinction between a proficient hacker and a skilled programmer largely lies in exposure. Hackers have encountered thousands of vulnerable programs, affording them a sense of intuition when assessing code. This pattern recognition on vast datasets aligns perfectly with machine learning capabilities.”
This inspired Qu to turn her observations into a tangible tool.
Creating the AI-Enhanced Bug Prediction Tool
Qu’s research began by identifying patterns among long-standing bugs, such as reference-counting errors, absent NULL checks after dereferencing, or integer overflow in size calculations. She then developed a tool that analyzes code before and after a fix, employing both neural pattern recognition and specific “handcrafted” checks.
These handcrafted checks utilized regular expressions and an AST-like analysis to identify 51 potentially problematic code structures—ranging from error handling to memory allocation. The tool looks for key patterns, such as unbalanced numbers or references lacking a null-check (which suggests a potential memory leak). Qu states, “Neither neural networks nor handcrafted rules alone yield the best outcomes; a combination of the two does.”
Qu was impressed with the tool’s performance:
- “Only 1.2% of safe commits are incorrectly flagged.”
- “98.7% of commits identified as risky truly are.”
- “We detect 92.2% of actual bug-introducing commits.”
However, Qu acknowledges some limitations within her dataset. (For instance, the model was trained exclusively on the 28% of bugs marked with a Fixes: tag—indicating they were well-documented bugs, which are generally more critical.) Additionally, the model was solely trained on bugs that had already been discovered, which limits its effectiveness against new and unique patterns.
In her blog post, Qu emphasizes that her tool, VulnBERT, serves as “a triage tool, not an infallible solution. It captures 92% of bugs with recognizable patterns, but the remaining 8% and novel bug types still require human review and fuzzing.” She believes the data supports the tool’s readiness for production use.
Insights from Linux Kernel Developers on AI-Driven Bug Tools
Qu expressed her hope to develop an agent trained through reinforcement learning to explore code paths and identify bugs. (Moreover, if a fuzzer like Syzkaller discovers a crash within a flagged code path, that could be integrated as a positive indicator.)
“The objective is not to replace human reviewers, but to direct them to the 10% of commits most likely to present issues, optimizing their focus where it matters most.”
Since her blog post was published, Qu has informed The New Stack that “Several kernel developers have reached out,” which she finds encouraging. There has also been notable interest within security research circles.
However, Linux kernel developer Greg Kroah-Hartman has noted similar investigations conducted in the past. “Numerous researchers have explored our commit history over the years, producing several research papers,” he shared with The New Stack in an email. “Our abundance of public data provides insights that traditional closed-source operating systems lack.”
While he appreciated Qu’s blog post as an interesting report, Kroah-Hartman believes, “we have been conducting this type of analysis for over a decade.” He referenced annual presentations by Linux kernel security engineer Kees Cook and numerous reports by Jon Corbet on lwn.net, which cover similar topics.
While Qu’s blog post details the prototype of a new tool, Kroah-Hartman emphasizes that there is already a vigilant system in place to monitor commits that might introduce new bugs. “We employ tools that assess kernel patch submissions before acceptance to identify potential issues.”
“Running those checks on previously committed code is also beneficial, BUT numerous individuals are already actively doing that within our codebase. Static analysis checkers have long been an integral part of our process.”
Moreover, “We have been utilizing LLM tools on our commits for over a decade, generating several papers and presentations regarding the tools and procedures we implement. This isn’t a novel approach; it’s how we intelligently backport patches to older kernel trees.”
Nonetheless, Kroah-Hartman reiterated their openness to receiving new bug reports.

Ultimately, his conclusion? “The report was intriguing, but we possess more refined data already, enabled by our public tools for CVE reporting (and the database it generates). Regular SQL queries suffice without needing to involve any AI tools.”
Nonetheless, Qu’s exploration of the tool’s potential is ongoing. “I will be presenting at BugBash 2026 in April and am eager to connect with additional members of the kernel security community there.”
“The real challenge will be prospective validation: can we detect vulnerabilities in new commits before they are uncovered?”