Categories AI

AI Tool Evaluates Threat from New Bacteria Outbreaks

In a groundbreaking development, researchers have crafted an AI tool designed to identify whether unfamiliar bacteria possess genetic traits associated with disease. This innovative approach could significantly enhance preparedness for potential pandemics by detecting harmful bacteria before they pose a threat to human health.

Entitled PathogenFinder2, this new AI tool was created by a team from DTU in Denmark, collaborating with international researchers. The findings have been published in Bioinformatics, one of the premier journals focusing on bioinformatics and computational biology. This research may greatly improve measures for pandemic readiness.

The goal of PathogenFinder2 goes beyond simply analyzing known disease-causing bacteria. It assesses the potential danger presented by new bacterial strains even before they trigger the first case, which offers authorities a proactive instead of reactive stance in outbreak prevention.”

Professor Frank Møller Aarestrup, Head of the Research Group for Genomic Epidemiology at the DTU National Food Institute

This novel AI tool is part of the Global Pathogen Analysis Platform (GPAP) and is available to the public as a complimentary online resource.

“PathogenFinder2 can analyze samples from sewage, healthy individuals and animals, identifying bacteria with pathogenic traits before they can cause illness. This early detection is vital for the development of diagnostics, vaccines, and treatments,” explains researcher Alfred Ferrer Florensa, who focused his PhD project on PathogenFinder2 at the DTU National Food Institute.

Challenges in Identifying Risky Bacteria

Although the majority of bacteria in our environment are harmless and even beneficial—supporting digestion, skin protection, or aiding in food production—a small percentage can provoke serious infections.

Factors such as climate change, expanding habitats, and the exploration of microbial diversity have led to the discovery of more bacterial species than ever before, many of which remain undocumented. Consequently, evaluating the potential risks associated with these species has become increasingly complex.

Traditionally, determining a bacterium’s ability to cause disease has required time-consuming, costly laboratory experiments, which can also yield inconsistent results. While computational strategies have expedited this process, they typically depend on comparisons with known pathogens—a method that falters when there are no close relatives available.

“It was crucial not only to make accurate predictions about bacterial threats similar to known pathogens but also to be prepared for entirely new disease-causing bacteria that may arise,” states Alfred Ferrer Florensa.

Innovative Approaches of PathogenFinder2

PathogenFinder2 introduces a transformative methodology. Instead of relying solely on known species similarities, the AI model employs protein language models—sophisticated AI systems trained on millions of protein sequences. Similar to how text prediction tools learn patterns in human language, these models decipher the “language” of proteins, enabling the detection of biochemical cues that other methods may overlook.

“PathogenFinder2 is among the first models to analyze complete bacterial genomes by harnessing the enormous capabilities of language models. Its performance surpasses all previous models, especially when evaluating newly encountered bacterial species. Moreover, it offers explanations for its predictions,” adds PhD Alfred Ferrer Florensa.

The researchers highlight that while the model can reveal intriguing patterns and potential risks, further investigation is necessary before drawing definitive conclusions.

Gaining Insight into Risk Assessment

PathogenFinder2 not only generates predictions but also emphasizes the specific proteins that heavily influence its assessments.

These proteins may include known virulence factors, like toxins or structures that facilitate bacterial attachment to human cells, as well as entirely uncharacterized proteins that might contribute to disease.

This aspect of interpretability opens new pathways for research related to diagnostics, vaccine targets, and infection mechanisms, including proteins previously unassociated with disease.

A Comprehensive Map of Bacterial Disease Potential

The use of protein language models for whole-genome representation allowed researchers to create the first-ever Bacterial Pathogenic Capacity Landscape. This map illustrates how thousands of bacteria relate to each other regarding their disease-related features.

This landscape uncovers clusters of bacteria that infect similar tissues or implement shared metabolic strategies, offering fresh insights into microbial evolution and interactions.

“The Bacterial Pathogenic Capacity Landscape presents the first comprehensive overview of all disease-causing bacteria that humans can contract. It reveals patterns, such as which bacteria tend to infect the same body sites or may depend on similar nutrient sources. This insight provides new opportunities for exploring bacterial evolution and interactions,” notes Alfred Ferrer Florensa.

Trained on an Extensive Dataset

The researchers have compiled the most extensive dataset to date concerning bacterial genomes linked to known disease-causing traits or recognized non-pathogenic behaviors.

This dataset encompasses over 21,000 bacterial genomes sourced from international databases, featuring bacteria isolated from human infections, the healthy human microbiome, probiotic cultures, food production processes, and extreme environments where organisms thrive under extreme temperatures.

This extensive dataset provides the model with a solid foundation to differentiate between harmful and non-harmful bacteria, even when encountering previously unrecorded species.

Source:

Journal reference:

Florensa, A. F., et al. (2026). Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models with PathogenFinder2. Bioinformatics. DOI: 10.1093/bioinformatics/btag129. https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btag129/8532520?

Leave a Reply

您的邮箱地址不会被公开。 必填项已用 * 标注

You May Also Like