2025-07-14 11:34:48 · 英文原文

The Helicobacter pylori AI-clinician harnesses artificial intelligence to personalise H. pylori treatment recommendations

作者：Veselkov, Kirill

Introduction

Helicobacter pylori (H. pylori) is a Gram-negative S-shaped bacteria which has adapted to colonize the niche of the deep gastric mucous layer in the human stomach¹. H. pylori infection in the gastric mucosa leads to a diverse inflammatory response in local epithelial cells, resulting in chronic active gastritis. (Fig. 1a) Despite producing antibodies to H. pylori antigens, this immune response is generally incapable of eradicating the bacteria. Over decades, this inflammation has been thought to lead to a variety of conditions, most notably peptic ulcer disease (PUD) and gastric cancer. Of the nearly 4 billion people infected by H. pylori, ~10% will develop PUD within a decade of infection, meaning roughly 780 million worldwide will be afflicted by this condition². (Fig. 1b) Peptic ulcers result from damage to the lining of the stomach and may lead to complications such as internal bleeding and perforations, with a high mortality rate in such cases. H. pylori eradication has shown promising results in the treatment of PUD, achieving not only ulcer healing but also preventing its recurrence.

**Fig. 1: Hp-EuReg dataset and AI clinician overview.**

Around 90% of gastric cancer cases are due to H. pylori infection³. It is estimated that gastric cancer makes up 37% of chronic infection-induced cancers, making H. pylori the most frequent carcinogenic pathogen⁴. Gastric cancer is thought to develop after years of inflammation-induced gastric atrophy, wherein achlorhydria drives the development of an abnormal microbiome, further driving the transformation of gastric epithelial cells to an oncogenic state (a hypothesis termed the ‘Correa cascade’)⁵. Indeed, a ‘point of no return’ has been observed with regard to H. pylori infection in patients developing gastric cancer, past which H. pylori eradication is insufficient to interrupt the inflammatory cascade leading to oncogenesis. Despite advancements in treatment such as chemotherapy and surgery, gastric cancer results in a poor prognosis compared to other cancer types, especially in advanced stages⁶.

The European Registry on Helicobacter pylori management (Hp-EuReg) was established to combat the high social and health burden of H. pylori infection across Europe. It was noted at the time that consensus and clinical guidelines were established for H. pylori treatment, but that no data existed cataloguing the implementation of these recommendations⁷. This project took the form of an international and multicenter prospective non-interventional registry documenting the real clinical practice by European gastroenterologists of H. pylori management in the majority of countries across Europe. (Fig. 1c) Patient data documented include several demographics categories (i.e., country, sex, age), pre-existing gastrointestinal symptoms, treatment indication, previous eradication attempts, and compliance. (Fig. 1d) Crucially, this registry documents treatment chosen, duration of treatment, proton pump inhibitor (PPI) dosage, and eradication outcome. To date, this registry has been used in over 40 published studies⁸. (For a full list of publications, see ref. ⁹) The most common uses for this dataset have been to assess treatment effectiveness, especially in a country or region-specific context. However, most of these studies have mainly relied on traditional statistical methods rather than advanced methods employing machine learning (ML) and artificial intelligence (AI), with one notable exception¹⁰.

The most frequently used treatments in this registry include the administration of triple and quadruple (either bismuth or non-bismuth based) antibiotics regimens. Treatment durations predominantly include 7, 10, and 14-day prescriptions. Components of these treatments include a combination of at least two antibiotics and a proton pump inhibitor (PPI) in order to raise stomach pH and bismuth for its bacteriostatic effect. Standard triple therapies, most often consisting of two antibiotics (amoxicillin and clarithromycin) and a PPI, were a great advance in the treatment of H. pylori in the 1990s, leading to its previous adoption as the treatment gold standard¹¹. However, increasing clarithromycin resistance (up to 23% observed in Hp-EuReg)⁸ and other factors have led to the development of additional therapies such as quadruple regimens. Certain formulations of quadruple therapies quickly demonstrated >90% eradication rate (now considered the threshold for an optimal H. pylori regimen¹²), leading to their adoption as the current recommendation standard¹³. In their traditional formulation, they combine a nitroimidazole (such as metronidazole or tinidazole) with a PPI and antibiotics amoxicillin, clarithromycin. However, due in part to an antibiotic resistance, an alternative bismuth quadruple regimen has been widely adopted (including tetracycline and metronidazole), to great effect in first-line treatments^14,15. Bismuth has been included for its bactericidal effect, rendering bismuth quadruple therapy unaffected by clarithromycin and metronidazole resistance¹⁶. Sequential therapies were also developed in part to overcome limitations posed by triple therapies and consist of a two-part treatment period, first using a PPI and amoxicillin, followed by a PPI, clarithromycin, and either tinidazole or metronidazole¹⁷. However, sequential therapies have been administered to variable effect, with eradication rates varying from 80% to 90%, largely dependent on region^18,19,20. Finally, bismuth single capsule therapies such as Pylera® replace multi-drug regimens with a single pill containing bismuth, metronidazole, and tetracycline, combined with a PPI²¹. Such therapies are relatively new, with Pylera® first approved by the FDA in 2006 and currently only approved in Europe in a subset of countries²². Early studies show the eradication rate varying from 80% to around 95%^23,24, and a recent meta-analysis report an effective eradication (90%) not only in the first-line but also in rescue therapy and in those patients with clarithromycin- or metronidazole-resistant strains, and in those previously treated with clarithromycin²⁵. Though further study is needed as implementation is increased in diverse populations.

A further consideration for treatment recommendation is the presence of allergies to penicillin-like medications, such as amoxicillin. Around 1–5% of patients globally have documented penicillin allergies²⁶, though higher percentages suggested when self-reporting²⁷. The presence of this allergy necessitates use of therapies without amoxicillin, such as levofloxacin-based regimens or a tetracycline, metronidazole, and bismuth salts regimen combined with a PPI such as those found in bismuth single capsule therapies.

Despite the richness of the Hp-EuReg dataset, most analyses have relied on conventional statistical methods. Machine learning (ML) and artificial intelligence (AI) methods offer opportunities to capture complex interactions among variables and patterns from the data¹⁰, especially as applied to personalized medicine²⁸. Broadly, ML approaches can be categorized into unsupervised, supervised, and reinforcement learning paradigms. Supervised learning relies on labelled data, making it particularly useful for tasks such as label prediction by mapping input features to known outputs. Thus, it is particularly valuable for tasks such as disease classification, patient outcome prediction, and biomarker discovery. Among traditional models, support vector machines (SVMs) are particularly effective in situations where there is a large number of high-dimensional inputs, such as medical imaging tasks and genomic data classification, due to the ability to construct optimal decision boundaries via kernel methods²⁹. Similarly, ensemble learning techniques such as Random Forest (RF) improve robustness of modelling by aggregating multiple decision trees, making it well-suited for handling noisy and imbalanced clinical datasets, predicting disease risks, and identifying complex interactions of biomarkers^30,31.

In recent years, these methods have been complemented by deep learning approaches, which apply neural networks to similar tasks. A commonly used example is convolutional neural nets (CNNs) which benefit from abstracting features away from their spatial localization are particularly suited for medical image tasks^32,33. Transformers were also developed for natural language processing (NLP), excel at capturing long-range dependencies in sequential data, and have recently been used to analyze electronic health records, genomic, and proteomic sequences³⁴. Autoencoders, including variational autoencoders (VAEs)^35,36 have been instrumental for anomaly detection and dimensionality reduction, proving effective for tasks like identifying rare disease features and generating synthetic patient data.

Reinforcement learning (RL) represents a distinct and increasingly significant paradigm, particularly for patient decision-making in medicine. Reinforcement learning does not rely on specific pre-defined labels which it needs to predict (such as in case of classical supervised machine learning) but rather is trying to maximize the overall reward it can achieve through its actions. This is performed through training the so called virtual agent, which learns how to interact with its environment via trial-and-error to achieve a maximal reward and minimal penalty^28,37. This framework has seen significant improvements after being refined in fields such as robotics, gaming, and autonomous systems. However, its rich potential has been underutilized in the medicine, despite its strong ability to learn iteratively in dynamic patient care environments.

In this work, we develop the H. pylori AI-Clinician which applies RL to provide patient-specific first-line treatment recommendations and determine if personalized treatments would improve eradication rate compared to a single recommended treatment. (Fig. 1e) This method applies RL, which learns iteratively which actions to take (termed policy) to maximize reward in the context of a given state. RL is well-suited for datasets such as Hp-EuReg with many interacting variables (on the order of hundreds to thousands) as it is sensitive to small differences in rewards, and able to detect subtle factors which affect outcome in the state, and therefore patient outcomes³⁸.

Results

Using Hp-EuReg, we evaluated the consistency of the AI Clinician with different splits of training data by generating 500 independent models by ten-fold, fifty repeat cross validation. For each repeat, training (model optimization) was performed using a 90% random sample of first-line treatments from the Hp-EuReg dataset, with testing performed on the remaining 10%. AI performance was compared to clinicians by comparing Q scores of the clinician’s action to the AI-recommended action for each patient. In a representative model, mean Q scores in the testing phase were tabulated to quantify the AI Clinician’s preference for different treatment categories. All therapies include prescription of a PPI. Overall, Pylera® therapies had the highest average Q score (mean = 0.92, SD = 0.04), suggesting that over the entire testing dataset it was on average estimated to be the most effective on a diverse population of patients. It was followed by quadruple bismuth therapies with clarithromycin, amoxicillin, and bismuth salts (mean = 0.90, SD = 0.05), quadruple non-bismuth therapies with clarithromycin, amoxicillin, and metronidazole (mean = 0.89, SD = 0.04), and sequential therapies with clarithromycin, amoxicillin, and tinidazole (mean = 0.89, SD = 0.05). Triple therapies performed the most poorly, with clarithromycin and metronidazole (mean = 0.86, SD = 0.04) slightly outperforming clarithromycin and amoxicillin (mean = 0.85, SD = 0.05). (Fig. 2a) Average Q scores based on PPI dose demonstrate a preference for high dose PPI on average (mean = 0.88, SD = 0.04) compared to low or standard doses (mean=0.86, SD = 0.04). (Fig. 2b) For a full description of PPI dose definitions given particular PPIs, see “Methods”. It is worth noting that many patients in the training dataset had a PPI dose which was not specified by the clinician. However, the category of patients with unspecified dose also showed high Q scores (mean = 0.89, SD = 0.04) suggesting this population was dominated by high PPI doses. Finally, ten (mean = 0.89, SD = 0.04) and fourteen-day (mean = 0.88, SD = 0.04) durations were found to have higher Q scores than 7-day (mean = 0.85, SD = 0.04) treatment periods—with ten- and fourteen-day periods showing similar average Q scores. (Fig. 2c).

**Fig. 2: *H. pylori* AI-clinician training and performance on real-world data.**

When tabulated over all repeats (50 recommendations per patient in the testing phase), 65.5% of patients were consistently recommended a bismuth therapy consisting of either Pylera® or clarithromycin, amoxicillin, and bismuth salts paired with a PPI by more than half of the models, which we treat as a frequency threshold. Further, 15.5% of patients were recommended a non-bismuth clarithromycin, amoxicillin, and metronidazole with PPI, and 19.0% of patients recommended a variety of therapies, with no single therapy being recommended by more than half of models. Notably, no patients were consistently recommended triple or sequential therapies. (Fig. 3a) We additionally examined the breakdown of patients where unique bismuth therapies are considered separately, where patients need to be recommended either Pylera® or quadruple bismuth therapies with clarithromycin, amoxicillin, and bismuth salts by more than half of models to be considered above frequency threshold. 51.5% were recommended quadruple non-bismuth therapy with clarithromycin, amoxicillin, and metronidazole, 30.4% were recommended Pylera®, and 18.1% were recommended quadruple bismuth therapies with clarithromycin, amoxicillin, and bismuth salts. (Fig. 3b) As it is impossible to know the success rate of two different therapies on the same patient, the effectiveness of the AI Clinician was measured in a retrospective manner using the real-world treatments prescribed by clinicians. To evaluate the predicted performance of the AI Clinician in practice, we calculated the success rate of all treatments where the AI recommendation agreed with the treatment the clinician prescribed versus those where it did not and performed bootstrapping to construct a 95% confidence interval. Overall, treatments recommended by the AI Clinician showed a 94.1% success rate (n = 2988; CI: 93.2%, 95.0%) compared to treatments not recommended by the AI Clinician which showed a 88.1% success rate (n = 35,061; CI: 87.8%, 88.4%), resulting in a net gain of 6.0%, demonstrating an association of AI-recommended therapies to eradication success. We further validated the performance of our model using an external dataset of 7186 patients, demonstrating a success rate of 92.8% (n = 128; CI: 88.4 = 97.1%) for treatments recommended by the AI Clinician compared to 87.4% (n = 7048; CI: 86.7–88.2%) for those that were not. (for full description, see Supplementary Notes–Model Validation.).

**Fig. 3: Personalized recommendations.**

Finally, in order to see which variables correlated most strongly to treatment recommendation, we perform RF analysis each time for four recommendation groups: Pylera®, bismuth quadruple (with clarithromycin, amoxicillin, and bismuth salts), bismuth salts (any) which could be either Pylera® or bismuth quadruple therapy, and non-bismuth quadruple therapy (with clarithromycin, amoxicillin, and metronidazole). The model is formulated to predict whether a patient will receive a given treatment versus all others, based on patient variables. RF models had a balanced accuracy of prediction of 84.7% for Pylera®, 76.7% for bismuth quadruple therapies, 73.6% for bismuth salts (any), and 92.3% for non-bismuth quadruple therapies. Patient variable importance was ranked by mean decrease in impurity (MDI) to determine highest predictive power for a given treatment recommendation. Overall, being from the southwest region of Europe, not taking acetylsalicylic acid, and taking concurrent medication of any kind were more likely to result in a Pylera® recommendation. Taking concurrent medication, being from an eastern region of Europe, and not taking probiotics were more likely result in a bismuth quadruple recommendation. Likewise, being from an eastern region, taking any concurrent medication, and not experiencing heartburn as a symptom were more likely to correspond to a bismuth salts therapy of any sort. Finally, being from the eastern region, Caucasian, and not taking rebamipid were more likely to correspond to a recommendation of non-bismuth quadruple therapy. (Fig. 3c).

Discussion

The H. pylori AI-clinician was developed to determine if AI-driven personalized treatments would boost treatment success compared to clinician-prescribed treatments alone, therefore benefitting patients. We found that this was the case, boosting success rate of prescribed therapies by 6.0% up to 94.1% for therapies recommended by AI from 88.1% for therapies that were not. Overall, we found that Q scoring in individual models was in line with current trends in treatment recommendations^14,39, demonstrating the reliability of the AI Clinician method. Pylera® and quadruple therapies with clarithromycin, amoxicillin, and bismuth salts or metronidazole, and sequential therapies showed the highest quality estimate by the AI clinician, in that order (and above triple therapies containing clarithromycin and amoxicillin or metronidazole). We also found that higher dose PPIs performed better than low and standard dose on average, suggesting most patients would benefit from a higher dose. Finally, while 10- and 14-day durations out-performed 7-day, they performed quite similarly in terms of quality estimate to one another, likely driven by Pylera®’s increased effectiveness and 10-day formulation.

We found that 65.5% of patients were recommended a bismuth therapy by the majority of AI Clinician models trained on differing splits of data, while 15.5% were consistently recommended a non-bismuth quadruple therapy with clarithromycin, amoxicillin, and metronidazole. Overall, RF modelling was able to achieve a high balanced accuracy for the latter therapy, indicating that variables including presence in an eastern region, being Caucasian, and not taking rebamipid were highly indicative of a patient receiving a non-bismuth quadruple therapy recommendation. Pylera® was more likely to be recommended if a patient was from a southwest region and taking concurrent medication, but not acetylsalicylic acid. Non-bismuth quadruple therapies including clarithromycin, amoxicillin, and metronidazole were more likely to be recommended again if taking a concurrent medication, but instead from an eastern region and not taking probiotics.

The correspondence to region in personalized recommendations suggests several possibilities: that strains of H. pylori varying by region are driving the trend, that interactions between genetics and treatment are responsible, that trends in lifestyle varying by region are responsible, or a combination of the three. Further investigations to study a higher number of region-specific variables in detail will be needed to determine the explanation. Interestingly, though Pylera® did see the highest average Q score from modelling, it was not the most frequently recommended therapy overall (which was non-bismuth quadruple therapies). This likely suggests that these patients would receive a greater marginal benefit from recommended Pylera® treatment than other groups.

Interestingly, 19.0% of patients were recommended a variety of treatments, with no single category being recommended by more than half of models. In addition, most patients were not recommended the same treatment by every model generated by differing splits of training data. A further shortcoming of this study is that several treatment formulations (for example, the quadruple bismuth therapy including metronidazole, tetracycline, and bismuth subcitrate with PPI-which the single capsule Pylera® is based on) were not present in sufficient numbers in the dataset to be included in training due to the requirement of around 500 samples for stabilization in the network observed in our study. PPI dosage category was also divided into only two categories ('Standard or Low and High') to reduce the number of therapy type subdivisions due to the limited number of samples of each treatment category, though there are well-documented differences in effect between low and standard PPI doses. These facts signal the need to collect additional data for model training and for the improvement of the sensitivity of the AI Clinician in the future. Finally, all internal and external validation results documented in this work result from use of retrospective data, and therefore the conclusions of the study should be further validated by a prospective study in the future. We emphasize that this study does not necessarily demonstrate the superiority of AI over clinical decision-making, rather an improvement in recommendations that could be made by AI-assisted clinical decision-making.

Though these results are encouraging in terms of increasing early eradication of H. pylori, future work should be focused on treatment of patients for which the damage of long-term infection has already been done. For example, a well-established point of no return for infections exists, past which gastric cancer often develops even after eradication of H. pylori. Investigating other data types such as endoscopy images and omics data, especially with advanced methods such as ML and AI will be crucial for determining what specific consequences of infection a patient is experiencing or at risk of experiencing, and what therapeutic strategies may be applied. The H. pylori AI Clinician is intended for use alongside clinicians, and future methods would benefit from an approach which integrates rather than replaces human expertise.

With over half the globe experiencing H. pylori infection at some point in their lifetime, there is a great need to apply advances in AI to evaluate and improve management and treatment, especially with regard to treatment recommendation standards. This work demonstrates the robustness of current recommendation standards throughout a diverse and heterogeneous population, supporting their broad administration. Further, this work demonstrates a fundamentally novel system for making personalized treatment recommendations based on patient data, opening the door to many potential future applications.

Methods

Ethics statement

The Hp-EuReg study was conducted in accordance with the ethical principles outlined in the Declaration of Helsinki (1975, and its subsequent revisions) and complied with all relevant institutional and national ethical regulations. The study protocol was reviewed and approved in 2012 by the Ethics Committee of the Hospital Universitario de La Princesa (Madrid, Spain), which served as the reference Institutional Review Board (IRB). The protocol was also classified by the Spanish Agency for Medicines and Health Products (AEMPS), and prospectively registered at ClinicalTrials.gov under the identifier NCT02328131 (https://clinicaltrials.gov/study/NCT02328131).

Method development

The H. pylori AI-clinician was developed in order to predict optimal treatment outcomes on a patient-specific basis while learning from real-world clinical decisions. It was implemented using RL, which is a machine learning approach where the so-called agent learns to maximize the reward it receives from taking actions in a trial-and-error manner. The agent was trained using clinical actions (treatments) and their observed reward (success/failure) from real-world patients previously recorded in Hp-EuReg database. (Fig. 4a) It should be stressed though that at this stage the AI-clinician is not being used nor should be used directly for the real-world clinical decision making for the new patients as it is still under development and is not certified as a medical device. RL was chosen over alternative formulations including logistic regression (LR), RF, and SVM-based models due to its observed superior performance as described in Supplementary Notes—Comparison to Other Models. Our approach is a treatment policy optimization task rather than one aimed at predicting labels. Since traditional formulations of LR, RF, and SVM are appropriate for tasks such as label prediction, and RL methods such as deep-quality network learning (DQN) have been designed to predict the quality of a multitude of possibilities during a decision-making task RL was an intuitive choice for further formulation. All software was developed in Python, relying mainly on scikit-learn⁴⁰ and PyTorch⁴¹ packages for ML and AI modelling.

**Fig. 4: *H. pylori* AI clinician methods overview.**

The specific RL method developed was termed independent-state Deep Q-Network Learning. (isDQN) It is an adaptation of Deep Q-Network (DQN) Learning, which is one of the most widely applied methods in RL^42,43,44,45. Q-learning is a method by which the quality of state-action pairs is learned over time, where the state is a set of environmental variables and the actions are the set of all possible actions an agent may take. The action with the highest Q-score is chosen to be taken given the state context. DQN is termed deep in that it utilizes a neural network of several layers to achieve its learning. Reward is observed for each action in the context of a given state, and this reward is in turn used to alter weights in the neural network during some optimization step to improve the estimation of Q-scores in the future, therefore improving decision-making. Optimizations are some functions of both immediate reward from action taken, and the quality of the state the action puts the environment in the subsequent moment (for example, increasing score in a video game immediately, and also putting the agent in a position to further increase score in future moves.).

Our method, isDQN differs from traditional RL methods in that there is no concept of a subsequent state for patient data as this data remains static before and after treatment, with eradication fully represented in the reward. Mathematically, optimization is performed using a loss function which aims to represent and minimize what cost was accrued by the current failures in decision-making at a given step. This loss function is calculated using the Q score of the current state subtracted from the immediate reward and maximum expected quality of the subsequent state. In our method, the quality of the subsequent state is not considered, effectively eliminating it from the loss function. (In mathematically rigorous terms, the trajectory is always taken to be in its final state, which is a special case described in the original formulation of DQN analysis detailed by Mnih et al⁴⁶.) For a demonstration of the model’s validity and robustness using simulated data, see Supplementary Material—Performance on Simulated Data.

The patient information was used as follows. States were represented by the 77 one-hot encoded patient variables from the Hp-EuReg dataset. In brief, one-hot encoding is a method for representing categorical variables as binary vectors where membership of the sample in each of the possible categories is represented as a separate binary vector (true/false). This is needed to prevent the model from interpreting categorical data as ranked or continuous, which is essential in our case to avoid spurious relationships in modelling. Actions were also one-hot encoded as described by clinical treatment decisions including antibiotic/PPI combination, antibiotic dose, PPI dose category (see PPI Dose Mapping for values), and duration (for example, clarythromycin + amoxicillin + metronidazole + clarithromycin dose = 500 mg, + intakes twice daily + … + high dose PPI, 14-day duration) and represented numerically. Only treatments with at least 500 examples in the dataset were included in the action space, resulting in 23 treatment options total. (Fig. 4b) Note that PPI dose category was divided into categories of ‘Standard or Low Dose’ (n = 9935) and ‘High Dose’ (n = 16,304—other doses not recorded) to reduce the number of treatment category divisions. Reward was represented by the clinical outcome of the treatment, therefore a value of +1 if eradication was achieved and −1 if the eradication was a failure (88.56% success rate overall). The model was trained using patient data in batches. Gradient descent was used to recalculate weights of the neural network at each optimization step using a mean squared error (MSE) loss function. Over time, this agent learns the quality of each treatment for a given patient, determining patient-specific optimal treatments. (Fig. 4c).

Coding environment and computational resources

All code was written in Python 3.8.19, with Jupyter core version 5.1.0. pandas 2.0.3, Numpy 1.24.1, scikit-learn 1.3.2, and PyTorch 2.4.1+cu124 were also used for variable preprocessing, statistical, ML, and AI modelling. Analysis was performed on Intel64 Family 6 Model 165 CPU (Windows 10 OS) with 12 cores and total RAM of 15.79. No GPU resources were necessary for modelling.

Variable preprocessing

The Hp-EuReg Dataset was obtained on February 14th, 2024 and was taken from the AEG-REDCap platform. Data were recorded in an Electronic Case Report Form (e-CRF) using the collaborative research platform REDCap hosted at Asociación Española de Gastroenterología, a non-profit Scientific and Medical Society focused on Gastroenterology research⁴⁷. The dataset consisted of 73,313 patients and 321 patient variables, including treatment administered and outcome in terms of H. pylori eradication. Samples were filtered to include only first-line treatments administered to patients who complied with their full regimen (n_samples=52,801). In order to ensure sufficient training data for each treatment, treatments with <500 samples were also removed from the dataset (and therefore samples to which these treatments were administered, resulting in n_samples = 38,049). Missing values for any variable were encoded as NA and no samples with missing values for either eradication outcome or treatment were found in the remaining dataset. Variable preprocessing was performed to achieve a format suitable for one-hot encoding, with numeric variables also treated as categorical. Age was then binned into four categories: Under 40 (n = 9779), 40–50 (n = 8008), 50–60 (n = 8560), and Above 60 (n = 11,682). Countries were also categorized into regions, with East-centre (n = 8588), East (n = 5975), West-centre (n = 2623), North (n = 1542), South-west (n = 18,662), and Other (n = 659). (For a full list of countries and corresponding regions, see Supplementary Table 1.) Finally, variables were filtered to remove those which were reflective of treatment outcome or irrelevant to patient outcomes. Variables were one-hot encoded for upstream analysis. (Including dose mapping variables as described below, patient variable total n_samples = 38,049, n_{one-hot_encodings} = 77).

PPI dose mapping

PPI doses were mapped to provide a structured framework for interpreting dose variability between different PPIs. Omeprazole doses of 80 mg are considered High Dose, whereas doses of 10, 20, and 40 mg are considered Standard or Low Dose. For Lansoprazole, doses of 60 mg are considered high and 30 or 15 mg are considered standard or low. For Pantoprazole, all doses are considered standard of low (including 20 and 40 mg), whereas for Esomeprazole, 40 and 80 mg are considered high, with 20 mg considered standard or low. Finally, Rabeprazole doses of 40 mg are considered high, whereas 20 and 10 mg are considered standard or low. Only 501 patients were given a dose deviating from these values, which we consider non-traditional and perhaps mis-entered, omitting this information and grouping with unlisted PPI values. Overall, 16,304 patients were given PPI doses considered High Dose, 9935 were given doses considered Standard or Low Dose. A large fraction of the dataset did not have a specified PPI dose by clinician, and when combined with non-traditional doses 11,810 termed Nondescript Dose were included in the dataset.

Action space definition

The action space was defined in terms of antibiotic/PPI combination prescribed by the clinician as well as the dose of each medication, number of intakes (for example, twice a day), PPI dose category, and duration it was prescribed. For example, a quadruple bismuth regimen of clarithromycin (dose=500 mg, twice daily), amoxicillin (1000 mg, twice daily), bismuth salts (120 mg, four times daily), and high dose PPI for a 14-day duration would be represented as a single numeric action. The action space was restricted to include only treatments with at least 500 examples in the dataset.

Independent-state deep quality network learning (isDQN)

The analysis method used to train our recommendation system, termed independent-state Deep Quality Network Learning (isDQN) takes the form of a traditional DQN analysis, except that optimizations to network weights do not consider a subsequent state when calculating loss. A deep neural network of four layers was implemented for this analysis, consisting of an input layer (n_nodes= 77, the number of one-hot encoded variables for each patient), two hidden layers (n_nodes= 128 each), and an output layer (n_nodes= 23, the total number of combinations of treatment and duration). Inputs to the optimization step consist of:

${{{\mathbf{\phi }}}}$, the state space, where ${{{\mathbf{\phi }}}}=\{{{{{\mathbf{\phi }}}}}_{1},\ldots \,,{{{{\mathbf{\phi }}}}}_{{n}_{{state}}}\}$ and represents the set of binary (one-hot) encoded patient variables, and n_state = 77.
A, the action space, where ${{{\bf{A}}}}=\{{a}_{1},\ldots \,,{a}_{{n}_{{action}}}\}$ and represents the numerically encoded combination of treatment and durations of antibiotic combinations with durations of 7, 10, and 14 days, and therefore n_action = 23.
R, the reward space, where ${{{\bf{R}}}}=\left\{{-1 },{+} {1}\right\}$ and + 1 represents successful eradication, whereas −1 represents failed eradication.

Optimization was performed via gradient descent on a smooth L1 loss function, which was chosen due to its lessened sensitivity to outliers and tendency to avoid exploding gradients. For a batch size N, loss is defined as:

$$l\left(x,y\right)={{{\rm{mean}}}}({{{\bf{L}}}})={{{{\rm{mean}}}}(\left\{{l}_{1},\ldots,{l}_{N}\right\}^{T})}$$

(1)

Where:

$${l}_{n}=\frac{{\left({x}_{n}-{y}_{n}\right)}^{2}}{2 * \beta }{\space}{if}{\space}\left|{x}_{n}-{y}_{n}\right| < \beta,{\space\space} \left|{x}_{n}-{y}_{n}\right|-\frac{\beta }{2}{\space}{{{\rm{otherwise}}}}$$

(2)

Where $\beta=1$, and $N={{{\rm{number}}}}\; {{{\rm{of}}}} \; {{{\rm{patients}}}} \; {{{\rm{per}}}} \; {{{\rm{batch}}}}$, and

Where R_n is the reward observed for patient n,

$${y}_{n}=Q({{{{\mathbf{\phi }}}}}_{n},{a}_{n};{{{\boldsymbol{\theta }}}})$$

(4)

Where $Q({{{{\bf{s}}}}}_{n},{a}_{n})$ is the quality score for the state-action (patient-treatment) pair of patient n determined by a forward pass through the neural network and ${{{\mathbf{\theta }}}}$ are the neural network parameters.

The isDQN algorithm is modified from the original DQN algorithm detailed in Mnih et al.⁴⁶ in Algorithm 1.

Selection of optimal treatment for AI recommendation

The AI-recommended action for a given patient is defined by:

$${a}^{*}\left({{{{\mathbf{\phi }}}}}_{n}\right)\leftarrow \,{{{{\rm{argmax}}}}}_{a}{Q}({{{{\mathbf{\phi }}}}}_{n};{{{\mathbf{\theta }}}})$$

Algorithm 1

: Independent-State Deep Quality Network Learning (isDQN)

Initialize replay memory D to capacity N

Initialize action-value function Q with random weights ${{{\mathbf{\theta }}}}$

For episode = 1, M do

Initialize sequence ${{{\bf{s}}}}$₁$\,=\,\{{{{\bf{x}}}}_{1}\}$ and preprocessed sequence ϕ₁$\,=$ϕ(s₁)

For t = 1,T do

Store transition (ϕ_t$,\,a$_t$,\,{{{\bf{r}}}}$_t) in D

Every b steps do

Sample random minibatch of transitions (${{{\mathbf{\phi }}}}$_j$,{a}$_j$,{r}$_j) from D

Set y_j = r_j

Perform a gradient descent step on ${{{\mathcal{l}}}}$(y_j, Q(ϕ_j, a_j;ϴ)) with respect to the network parameters ϴ

End for

Where the argmax function determines the state-action pair of highest Q value by considering the quality of all possible actions for a given patient, defined by patient variables ϕ_n. Therefore, Q-scores of AI actions can be directly compared the Q scores of Clinical decisions via Q(ϕ_n, a*(ϕ_n); ϴ)) and Q(ϕ_j, a_j; ϴ), respectively.

Where θ are the network parameters, ϕ represents the state, ${a}$ represents the action, and $r$ represents the reward. D represents a batch of memories D with capacity N. Q represents the action-value function, M represents the number of episodes, and T represents the number of transitions. ${{{\mathcal{l}}}}$ represents the loss function for which we perform a gradient descent step with, taking the true outcome and quality estimate of state-action pairs as input.

Single model examination

To examine average trends, Q scores were broken down by treatment categories including quadruple bismuth therapies with clarithromycin, amoxicillin, and bismuth salts, quadruple non-bismuth therapies with clarithromycin, amoxicillin, and metronidazole, sequential therapies with clarithromycin, amoxicillin, and tinidazole, triple therapies with clarithromycin and amoxicillin, triple therapies with clarithromycin and metronidazole, and Pylera®. All treatments include a PPI in the formulation. Treatments are also broken down by PPI dose into categories of low/standard and high. The subset of patients recommended a nondescript PPI dose was dropped from this analysis. Finally, Q scores are examined on average by duration, including seven-, ten-, and fourteen-day formulations. Mean and standard deviation for each of these groups was calculated to achieve a comparative ranking of each treatment component.

Patient specific recommendation analysis

Ten-fold, fifty repeat cross-validation was chosen to evaluate the quality and consistency of our analysis. In ten-fold cross validation, the dataset is divided into ten random samples or folds. Iteratively each fold is taken to represent a testing subset of the data, whereas the remaining nine are taken as a training subset, therefore applying a 90−10 percent training-testing split. Network training (optimization) is performed via isDQN analysis as described above, where the quality of AI and clinical decision is assessed via Q-score at each patient iteration. During testing, no further optimizations are performed and only the quality of AI and clinical decisions are assessed. This process is repeated fifty times to check for bias in random samples and to evaluate the most common recommendation for each patient after many repeats. The recommended treatments in the testing phase of each model were tabulated for each patient, resulting in 50 recommendations total per patient. The mode treatment for each person was recorded to examine heterogeneity in the treatment recommendation. To evaluate the predicted performance of AI recommendations, the success rate of all treatments which agreed with those prescribed by real world clinicians was compared to those which disagreed. A 95% confidence interval was conducted by bootstrapping 1000 times.

Determining relationship of patient variables to treatment recommendation by random forest

RF models are generated for each of four treatment categories: Pylera®, bismuth quadruple (including clarithromycin, amoxicillin, and bismuth salts), bismuth salts (any)—which includes either Pylera® or bismuth quadruple therapy in the formulation described above, and non-bismuth quadruple therapy including clarithromycin amoxicillin, and metronidazole. The RF model is formulated to predict whether a patient will receive a given treatment versus all others. Patient variables are then ranked to determine size of effect in the model on predicting outcome using a metric of mean decrease of impurity (MDI). Full lists of ranked variables and MDI are available in Supplementary Data 2.

Theoretical AI clinician testing

In addition to real-world clinical data, synthetic dataset of 10,000 patients with 100 binary variable features was generated for model stress testing and verification. To model a dataset with imbalanced classes, patients were split into two groups of 7000 and 3000 patients, respectively, where binary variables were identical within groups by a random selection of 0's and 1's. Next, noise was added to the dataset by introducing a random selection of a 0 or 1 at random intervals in each patient to a desired level. Datasets with noise levels of 5, 10, 25, 50, 75, and 99% were generated. Each group of patients was ‘treated’ half of the time with treatment A and half of the time with treatment B, where treatment A was made 90% effective in group A and 80% effective in group B. Treatment B was made 90% effective in group B and 80% effective in group A. Rewards of +1 were assigned for successful treatments and −1 for failures at the 90% and 80% rate described above. Training was performed on a balanced randomized 90-10 training-testing split, where optimization was carried out using Mean Square Error (MSE) loss and hyperparameters of batch size 1000, learning rate 0.001, deque size of 1000, and steps to optimize of 100. Evaluation of results was performed by counting the number of times a treatment with 90% effectiveness in its respective group was recommended to a patient from this respective group, for example, treatment A recommended to a patient from group A.

Sensitivity analysis and hyperparameter selection

To determine optimal hyperparameters in the context of a more complex dataset similar to Hp-EuReg, 50,000 patients were generated with 10 unique types of 70 patient variables. For each of the ten groups, one treatment was given 90% effectiveness while all others were given 70% effectiveness, with no treatment being 90% effective in any two groups of patients. A hyperparameter grid was tested including: batch size of 1000, 5000, and 10,000; a learning rate of 5 * 10⁻⁵, 10⁻⁴, 5 * 10⁻⁴, and 10⁻³; steps to optimize of 50 and 100; and a deque size of 1000, 5000, and 10,000. Scoring of best hyperparameters was based on the fraction of treatments correctly recommended to patient group based on higher efficiency in training data. The hyperparameters chosen for further modelling had the highest percentage of correct recommendations of 99.9%, and consisted of a batch size of 1000, learning rate of 5 * 10⁻⁵, and deque size of 1000.

Recruitment Information

Overall, a total of 70,915 adult participants (40% males, 60% females; age range: 18–99 years; mean age: 50 years) were enroled in the study between (2013-March 2025). Sex was recorded based on participant self-report, and no gender identity information was collected. All participants provided written informed consent prior to enrolment, in accordance with institutional and national ethical standards. Participants did not receive financial compensation for their participation. Sex was considered in the study design primarily for descriptive and stratified analyses. However, no specific sex-based subgroup analyses were conducted as the study was not powered to detect sex differences, and the primary objectives did not include sex-disaggregated outcomes. Data have been disaggregated by sex where available and appropriate, and individual-level sex-disaggregated data are included in the Source Data file. All procedures and reporting comply with the SAGER (Sex and Gender Equity in Research) guidelines.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Source data are provided with this paper. The de-identified clinical data generated and analyzed in this study have not been deposited in a public repository due to European Union General Data Protection Regulation (GDPR) constraints and ethical considerations involving participant confidentiality. Processed and aggregated data underlying the findings are available under restricted access for reasons related to patient privacy and institutional data governance. Access can be obtained by submitting a research proposal to the Hp-EuReg Data Access Committee at opn.aegredcap@aegastro.es. Qualified academic or clinical researchers may be granted access for non-commercial, ethically approved purposes. Requests are reviewed within 21 calendar days, and data—if approved—will be shared via a secure platform for a period of up to 12 months. Raw individual-level data are protected and cannot be shared publicly in compliance with data privacy laws. Previously published datasets used in this study from the Hp-EuReg registry are available at: Hp-EuReg Publications. No datasets are available in the Supplementary Information.

Code availability

Code has been made available at the following public Bitbucket repository: https://bitbucket.org/iAnalytica/aiclinician/src/main/.

References

Suerbaum, S. & Michetti, P. Helicobacter pylori infection. N. Engl. J. Med. 347, 1175–1186 (2002).
PubMed Google Scholar
Sipponen, P. et al. Cumulative 10-year risk of symptomatic duodenal and gastric ulcer in patients with or without chronic gastritis: a clinical follow-up study of 454 outpatients. Scand. J. Gastroenterol. 25, 966–973 (1990).
PubMed Google Scholar
Liou, J.-M. et al. Screening and eradication of Helicobacter pylori for gastric cancer prevention: the Taipei global consensus. Gut 69, 2093–2112 (2020).
PubMed Google Scholar
de Martel, C., Georges, D., Bray, F., Ferlay, J. & Clifford, G. M. Global burden of cancer attributable to infections in 2018: a worldwide incidence analysis. Lancet Glob. Health 8, e180–e190 (2020).
PubMed Google Scholar
Correa, P. & Piazuelo, M. B. The gastric precancerous cascade. J. Dig. Dis. 13, 2–9 (2012).
PubMed PubMed Central Google Scholar
Yang, D. et al. Survival of metastatic gastric cancer: significance of age, sex and race/ethnicity. J. Gastrointest. Oncol. 2, 77 (2011).
PubMed PubMed Central Google Scholar
McNicholl, A. G., O’Morain, C. A., Megraud, F. & Gisbert, J. P. As Scientific Committee of the Hp-Eureg on Behalf of the National CoordinatorsProtocol of the European Registry on the management of Helicobacter pylori infection (Hp-EuReg). Helicobacter 24, e12630 (2019).
PubMed Google Scholar
Nyssen, O. P. et al. European Registry on Helicobacter pylori Management (Hp-EuReg): most relevant results for clinical practice. Front. Gastroenterol. 1 https://doi.org/10.3389/fgstr.2022.965982 (2022).
Spanish Association of Gastroenterology, www.hpeureg.com (2024).
Nyssen, O. P. et al. Analysis of clinical phenotypes through machine learning of first-line H. pylori treatment in Europe during the period 2013–2022: data from the European registry on H. pylori management (Hp-EuReg). Antibiotics 12, 1427 (2023).
PubMed PubMed Central Google Scholar
Papastergiou, V., Georgopoulos, S. D. & Karatapanis, S. Treatment of Helicobacter pylori infection: past, present and future. World J. Gastrointest. Pathophysiol. 5, 392–399 (2014).
PubMed PubMed Central Google Scholar
Graham, D. Y., Lu, H. & Yamaoka, Y. A report card to grade Helicobacter pylori therapy. Helicobacter 12, 275–278 (2007).
PubMed Google Scholar
Malfertheiner, P. et al. Management of Helicobacter pylori infection: the Maastricht VI/Florence consensus report. Gut https://doi.org/10.1136/gutjnl-2022-327745 (2022).
Nyssen, O. P. et al. European Registry on Helicobacter pylori management (Hp-EuReg): patterns and trends in first-line empirical eradication prescription and outcomes of 5 years and 21 533 patients. Gut 70, 40–54 (2021).
PubMed Google Scholar
Olmedo, L. et al. Evolution of the use, effectiveness and safety of bismuth-containing quadruple therapy for Helicobacter pylori infection between 2013 and 2021: results from the European registry on H. pylori management (Hp-EuReg). Gut (2024).
Malfertheiner, P. et al. Helicobacter pylori infection. Nat. Rev. Dis. Prim. 9, 19 (2023).
PubMed Google Scholar
Vaira, D., Zullo, A., Hassan, C., Fiorini, G. & Vakil, N. Sequential therapy for Helicobacter pylori eradication: the time is now. Ther. Adv. Gastroenterol. 2, 317–322 (2009).
Greenberg, E. R. et al. 14-day triple, 5-day concomitant, and 10-day sequential therapies for Helicobacter pylori infection in seven Latin American sites: a randomised trial. Lancet 378, 507–514 (2011).
PubMed PubMed Central Google Scholar
Sardarian, H. et al. Comparison of hybrid and sequential therapies for Helicobacter pylori eradication in Iran: a prospective randomized trial. Helicobacter 18, 129–134 (2013).
PubMed Google Scholar
Gisbert, J. P., Calvet, X., O’Connor, A., Megraud, F. & O’Morain, C. A. Sequential therapy for Helicobacter pylori eradication: a critical review. J. Clin. Gastroenterol. 44, 313–325 (2010).
PubMed Google Scholar
Saleem, A., Qasim, A., O’Connor, H. J. & O’Morain, C. A. Pylera for the eradication of Helicobacter pylori infection. Expert Rev. Anti Infect. Ther. 7, 793–799 (2009).
PubMed Google Scholar
Lyseng-Williamson, K. A. Pylera®(bismuth subcitrate potassium/metronidazole/tetracycline hydrochloride) in the eradication of Helicobacter pylori infection: a profile of its use in Europe. Drugs Ther. Perspect. 33, 311–320 (2017).
Google Scholar
de Boer, W. A., van Etten, R. J., Schneeberger, P. M. & Tytgat, G. N. A single drug for Helicobacter pylori infection: first results with a new bismuth triple monocapsule. Am. J. Gastroenterol. 95, 641–645 (2000).
PubMed Google Scholar
Malfertheiner, P. et al. Helicobacter pylori eradication with a capsule containing bismuth subcitrate potassium, metronidazole, and tetracycline given with omeprazole versus clarithromycin-based triple therapy: a randomised, open-label, non-inferiority, phase 3 trial. Lancet 377, 905–913 (2011).
PubMed Google Scholar
Nyssen, O. P., McNicholl, A. G. & Gisbert, J. P. Meta‐analysis of three‐in‐one single capsule bismuth‐containing quadruple therapy for the eradication of Helicobacter pylori. Helicobacter 24, e12570 (2019).
PubMed Google Scholar
Bhattacharya, S. The facts about penicillin allergy: a review. J. Adv. Pharm. Technol. Res. 1, 11–17 (2010).
PubMed PubMed Central Google Scholar
Shenoy, E. S., Macy, E., Rowe, T. & Blumenthal, K. G. Evaluation and management of penicillin allergy: a review. Jama 321, 188–199 (2019).
PubMed Google Scholar
Komorowski, M., Celi, L. A., Badawi, O., Gordon, A. C. & Faisal, A. A. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 1716–1720 (2018).
PubMed Google Scholar
Cortes, C. & Vapnik, V. Support-vector networks. Mach. Learn. 20, 273–297 (1995).
Google Scholar
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Google Scholar
Hu, J. & Szymczak, S. A review on longitudinal data analysis with random forest. Brief. Bioinforma. 24, bbad002 (2023).
Google Scholar
Lin, T., Wang, Y., Liu, X. & Qiu, X. A survey of transformers. AI open 3, 111–132 (2022).
Google Scholar
Shamshad, F. et al. Transformers in medical imaging: A survey. Med. image Anal. 88, 102802 (2023).
PubMed Google Scholar
Vaswani, A. et al. Attention is all you need. Advances in neural information processing systems 30 (NIPS, 2017).
Berahmand, K., Daneshfar, F., Salehi, E. S., Li, Y. & Xu, Y. Autoencoders and their applications in machine learning: a survey. Artif. Intell. Rev. 57, 28 (2024).
Google Scholar
Chen, M., Shi, X., Zhang, Y., Wu, D. & Guizani, M. Deep feature learning for medical image analysis with convolutional autoencoder neural network. IEEE Trans. big data 7, 750–758 (2017).
Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement learning: An introduction. Vol. 1 (MIT Press, 1998).
Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34, 26–38 (2017).
ADS Google Scholar
McNicholl, A. G. et al. Combination of bismuth and standard triple therapy eradicates Helicobacter pylori infection in more than 90% of patients. Clin. Gastroenterol. Hepatol. 18, 89–98 (2020).
PubMed Google Scholar
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Proc. 33rd Int. Conf. Neural Information Processing Systems (NeurIPS), Article 721 (Curran Associates Inc., 2019).
Tan, F., Yan, P. & Guan, X. in Neural Information Processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14–18, Proceedings, Part IV 24. 475-483 (Springer, 2017).
Fan, J., Wang, Z., Xie, Y. & Yang, Z. in Learning for dynamics and control. 486-489 (PMLR, 2020).
Hester, T. et al. Deep Q-learning from demonstrations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32 (2018).
Gu, S., Lillicrap, T., Sutskever, I. & Levine, S. in International conference on machine learning. 2829-2838 (PMLR).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Harris, P. A. et al. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J. Biomed. Inform. 42, 377–381 (2009).
PubMed Google Scholar

Download references

Acknowledgements

This research was supported by the AIDA project, funded by UK Research and Innovation (Grant No. 10058099) and the European Union (Grant No. 101095359). All authors and consortium members received funding through this project. O.N. and J.G. were additionally supported in the compilation of the Hp-EuReg database by the European Helicobacter and Microbiota Study Group (EHMSG), the Spanish Association of Gastroenterology (AEG), the Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), the European Union’s Horizon Europe programme (Grant Agreement No. 101095359), UK Research and Innovation (Grant Agreement No. 10058099), and the European Union’s EU4Health programme (Grant Agreement No. 101101252). The views and opinions expressed are those of the author(s) and do not necessarily reflect those of the European Union or the Health and Digital Executive Agency (HaDEA). Neither the European Union nor the granting authorities bear responsibility for the content. O.N. and J.G. acknowledge Diasorin, Juvisé, and Biocodex for providing funding to the Hp-EuReg study. However, these companies had no access to clinical data and were not involved in any stage of the study, including its design, data collection, statistical analysis, or manuscript preparation. We acknowledge their financial support with gratitude. We thank the Spanish Association of Gastroenterology (AEG) for providing the e-CRF service free of charge. Figures 1, 4, and S1A made in BioRender (www.biorender.com).

Author information

Author notes

These authors contributed equally: Kyle Higgins, Olga P. Nyssen.
These authors jointly supervised this work: Javier P. Gisbert, Tania Fleitas Kanonnikoff, Kirill Veselkov.

Authors and Affiliations

Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, UK
Kyle Higgins, Ivan Laponogov, Dennis Veselkov & Kirill Veselkov
Gastroenterology Unit, Hospital Universitario de La Princesa, Instituto de Investigación Sanitaria Princesa (IIS-Princesa), Universidad Autónoma de Madrid (UAM), Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD), Madrid, Spain
Olga P. Nyssen & Javier P. Gisbert
Department of Computing, Faculty of Engineering, Imperial College London, London, UK
Joshua Southern & Dennis Veselkov
Instituto Investigación Sanitaria INCLIVA (INCLIVA), Medical Oncology Department, Hospital Clínico Universitario de Valencia, Valencia, Spain
Ana Miralles Marco, Manuel Cabeza-Segura, Elena Jiménez Martí, Josefa Castillo & Tania Fleitas Kanonnikoff
Centro de Investigación Biomédica en Red Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain
Josefa Castillo & Tania Fleitas Kanonnikoff
Department of Environmental Health Sciences, Yale University, New Haven, CT, USA
Kirill Veselkov
Biochemistry and Molecular Biology Department, Universitat de València, Valencia, Spain
Elena Jiménez Martí & Josefa Castillo
Institute of Clinical and Preventive Medicine, Faculty of Medicine and Lifesciences, University of Latvia, Riga, Latvia
Mārcis Leja & Inese Poļaka
i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto; Ipatimup—Institute of Molecular Pathology and Immunology of the University of Porto; Faculty of Medicine of the University of Porto; Department of Pathology, Unidade Local de Saúde São João, Porto, Portugal
Fatima Carneiro, Ceu Figueiredo, Rui M. Ferreira & Rita Barros
Department of Gastroenterology, Hospital Clínic de Barcelona; Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS); Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD); Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona (UB), Barcelona, Spain
Leticia Moreira & Gloria Fernandez-Esparrach
Department of Pathology, Hospital Clínic de Barcelona; Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS); Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBEREHD); Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona (UB), Barcelona, Spain
Miriam Cuatrecasas
Institut des Maladies de l’Appareil Digestif, Hépato-Gastroentérologie, Hôtel Dieu, Centre Hospitalier Universitaire, Nantes, France
Tamara Matysiak-Budnik & Jerome Martin
Department of Gastroenterology, Lithuanian University of Health Sciences, Kaunas, Lithuania
Laimas Jonaitis, Juozas Kupčinskas & Paulius Jonaitis
RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto), Porto, Portugal
Mário Dinis-Ribeiro, Ana Carina Pereira & Filipa Fontes
Instituto de Engenharia de Sistemas e Computadores, Tecnologia e Ciência, Rua Dr. Roberto Frias, 378, 4200-465, Porto, Portugal
Miguel Coimbra
Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, s/n, 4200-465, Porto, Portugal
Miguel Coimbra
Department of Gastroenterology and Hepatology, Erasmus University Medical Center, Rotterdam, the Netherlands
Manon C. W. Spaander & Judith Honing
StratejAI, Avenue Louise 209, Brussels, Belgium
Stefano Sedola & Junior Andrea Pescino
Digestive Cancers Europe, Rue de la Loi 235/27, Brussels, Belgium
Zorana Maravic & Ana Martins

Authors

Kyle Higgins
Olga P. Nyssen
Joshua Southern
Ivan Laponogov
Dennis Veselkov
Javier P. Gisbert
Tania Fleitas Kanonnikoff
Kirill Veselkov

Consortia

AIDA CONSORTIUM

Tania Fleitas Kanonnikoff
, Ana Miralles Marco
, Manuel Cabeza-Segura
, Elena Jiménez Martí
, Josefa Castillo
, Tania Fleitas Kanonnikoff
, Mārcis Leja
, Inese Poļaka
, Fatima Carneiro
, Ceu Figueiredo
, Rui M. Ferreira
, Rita Barros
, Kirill Veselkov
, Olga P. Nyssen
, Leticia Moreira
, Miriam Cuatrecasas
, Gloria Fernandez-Esparrach
, Tamara Matysiak-Budnik
, Jerome Martin
, Laimas Jonaitis
, Juozas Kupčinskas
, Paulius Jonaitis
, Mário Dinis-Ribeiro
, Miguel Coimbra
, Ana Carina Pereira
, Filipa Fontes
, Manon C. W. Spaander
, Judith Honing
, Stefano Sedola
, Junior Andrea Pescino
, Zorana Maravic
& Ana Martins

Contributions

K.H. was responsible for methodological development, simulation design and experiment, evaluation of results and manuscript preparation. K.V., D.V., I.L., and T.F.K. worked on conceptualization, study design, methodology formulation/developments and data analysis. Members of the AIDA consortium contributed to the project design and conceptualization. O.N. managed data collection, curation, and advising on processing and interpreting results. K.V. supervised method development, simulations, and evaluations. J.S. and I.L. provided critical advice on bioinformatic and machine learning analysis, including evaluation and interpretation of results. J.G. was responsible for overseeing data collection and interpretation of results. T.F.K. and K.V. were responsible for securing funding and overall project management. All authors contributed to the writing and editing of the manuscript and approved the final version.

Corresponding authors

Correspondence to Javier P. Gisbert, Tania Fleitas Kanonnikoff or Kirill Veselkov.

Ethics declarations

Competing interests

Javier P. Gisbert has served as speaker, consultant, and advisory member for or has received research funding from Mayoly Spindler, Allergan, Diasorin, Richen, Biocodex and Juvisé. Olga P. Nyssen received research funding from Allergan, Mayoly Spindler, Richen, Biocodex and Juvisé. Drs Kirill Veselkov, Ivan Laponogov, and Dennis Veselkov are affiliated with Intelligify Ltd, an AI consultancy company, which was not involved in the research, analysis, or interpretation of the results presented in this study. Tania Fleitas Kanonnikoff discloses advisory roles honoraria from Amgen, AstraZeneca, Beigene, BMS and MSD. Institutional research funding from Gilead. Speaker honoraria from Amgen, Servier, BMS, MSD, Lilly, Roche, Bayer. The remaining authors declare no conflicts of interest. POLICY DISCLOSURE-USE OF CLINICAL DATA. This study involves the secondary analysis of de-identified clinical data obtained from the European Registry on Helicobacter pylori Management (Hp-EuReg). The data were originally collected by the Hp-EuReg consortium across multiple centres in Europe under appropriate ethical approvals and patient consent at the time of collection. No new data were collected for the purposes of this analysis, and the authors were not involved in direct recruitment or interaction with study participants. All analyses were conducted on anonymised data in accordance with applicable data protection and ethical guidelines.

Peer review

Peer review information

Nature Communications thanks Rafi Ahmad, Binh Nguyen, and the other anonymous reviewer for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Higgins, K., Nyssen, O.P., Southern, J. et al. The Helicobacter pylori AI-clinician harnesses artificial intelligence to personalise H. pylori treatment recommendations. Nat Commun 16, 6472 (2025). https://doi.org/10.1038/s41467-025-61329-5

Download citation

Received: 12 December 2024
Accepted: 19 June 2025
Published: 14 July 2025
DOI: https://doi.org/10.1038/s41467-025-61329-5

关于《The Helicobacter pylori AI-clinician harnesses artificial intelligence to personalise H. pylori treatment recommendations》的评论

暂无评论

发表评论

摘要

This article discusses the development and validation of an AI-based system, referred to as the Helicobacter pylori (H. pylori) AI-clinician, which uses artificial intelligence to personalize treatment recommendations for H. pylori infections based on patient-specific clinical data. The key points from the paper include: 1. **Methodology**: - The study utilized machine learning techniques and large-scale clinical datasets from the European Registry on Helicobacter pylori Management (Hp-EuReg). - Data was anonymized and processed to ensure privacy and compliance with ethical standards. 2. **Data Collection and Analysis**: - The research involved secondary analysis of de-identified data collected across multiple centers in Europe, originally gathered under appropriate ethical approvals and patient consent. - The dataset included various clinical parameters such as demographics, comorbidities, previous treatments, and outcomes. 3. **AI Model Development**: - Machine learning models were developed to predict the most effective treatment regimens for H. pylori based on patient-specific characteristics. - The AI-clinician leverages natural language processing (NLP) and deep learning algorithms to analyze and interpret clinical data accurately. 4. **Validation and Performance**: - Validation of the AI model was conducted using both internal cross-validation techniques and external datasets to ensure robustness and generalizability. - The system demonstrated high accuracy in predicting treatment outcomes, outperforming traditional decision-making approaches. 5. **Personalized Treatment Recommendations**: - The AI-clinician provides personalized recommendations tailored to individual patient profiles, taking into account factors such as antibiotic resistance patterns, previous treatment failures, and comorbid conditions. - This approach aims to optimize H. pylori eradication rates while minimizing side effects and reducing the risk of antimicrobial resistance. 6. **Clinical Impact**: - The development of this AI system could significantly improve patient outcomes by providing more precise and effective treatments for H. pylori infections. - It has the potential to guide clinical decision-making, enhance treatment adherence, and facilitate better management of antibiotic use. 7. **Ethical Considerations**: - Ensured compliance with data protection regulations and ethical guidelines throughout the research process. - The study highlights the importance of transparent communication regarding AI-based medical decisions and patient autonomy in healthcare settings. 8. **Future Directions**: - Plans for further validation and integration into clinical practice, aiming to expand its application beyond H. pylori infections to other areas of gastrointestinal medicine. - Continuous improvement through iterative refinement based on real-world performance data and ongoing research. This work represents a significant step towards integrating advanced AI technologies in personalized healthcare, specifically focusing on optimizing treatment regimens for bacterial infections like H. pylori.