Introduction

Antibiotic resistance is a significant factor leading to high morbidity and mortality rates among infected patients and has emerged as a crucial challenge in the domain of global public health, profoundly influencing the healthcare landscape in the ensuing decades1,2,3. Carbapenem-resistant Pseudomonas aeruginosa (CRPA) stands among the priority drug-resistant pathogens attracting international attention, given its high environmental adaptability, limited therapeutic options, and elevated resistance rates. A meta-analysis revealed that when the 30-day mortality rate was employed as the outcome indicator, the attributable mortality rate resulting from CRPA-induced bloodstream infections varied from 8 to 18.4%. When the 7-day mortality rate was utilized as the outcome indicator, these figures were 3% and 14.6% respectively4,5,6. In light of the significant role of carbapenem-based drugs in treating patients infected with multi-drug-resistant Gram-negative pathogens, the World Health Organization has classified CRPA as a “priority pathogen”, yet the widespread dissemination and high prevalence of CRPA remain a matter of concern7.

The resistance of P. aeruginosa to carbapenems is associated with factors such as the production of carbapenemases, mutations and deletions of the outer membrane protein OprD, and overexpression of efflux pumps, and can be widely disseminated through clonal strains. It has been reported that the deletion of the OprD porin (typically mediated by deletion, mutation, or insertion inactivation of the OprD gene) is one of the main mechanisms of P. aeruginosa resistance to carbapenems8. In recent years, the resistance rate of P. aeruginosa to carbapenems has increased. According to the data from the China Antimicrobial Resistance Surveillance System (CARSS), the resistance rates of P.aeruginosa to imipenem and meropenem remained high at 21.9% and 17.4% respectively in 20239. Therefore, it is of particular importance for us to explore the risk factors of CRPA infection and prevent CRPA health care-associated infections (HAIs). Early identification of high-risk patients with CRPA infection will help clinical decision makers to initiate timely decolonization interventions, strengthen infection control strategies, and implement targeted anti-infective therapy as soon as possible, so as to effectively block the nosocomial transmission path of CRPA and reduce CRPA infection-related mortality in severely ill patients.

The predicted heterogeneity of patients at high risk of CRPA infection was reflected in multiple dimensions, including differences in underlying disease spectrum, diversity of drug exposure history, and dynamic changes in infection transmission. This heterogeneity may result in the limited generalization ability of a single prediction model10. At present, relevant research teams are committed to using machine learning methods to assist the diagnosis and prediction of HAIs, and to explore potential risk factors. This kind of research not only achieved a breakthrough in the efficiency of traditional Logistic regression model, but also revealed the nonlinear interaction between risk factors through SHAP interpretability algorithm11. To our knowledge, a number of studies have analyzed the risk factors for CRPA infection, but the formation of predictive models for CRPA infection based on machine learning models has not been explored. Therefore, this study explored the use of machine learning algorithms to establish CRPA infection risk prediction models to provide decision-making basis and tools for precise prevention and control of CRPA infection.

Materials and methods

Study design

This study retrospectively selected inpatients with HAIs caused by P. aeruginosa at the Second Hospital of Shanxi Medical University in China from January 1, 2021 to March 1, 2024. The hospital is located in Taiyuan city, Shanxi Province, an underdeveloped region in central China. It is a tertiary class A general hospital with 2700 beds and treats a large proportion of patients with orthopedic trauma and severe diseases in the region. Inclusion criteria were: microbiology laboratory testing of bacterial specimens positive for P. aeruginosa and meeting the Centers for Disease Control and Prevention (CDC) criteria for HAI infection. Exclusion criteria were: missing information of key medical data; Patients with other bacterial infections; Strains that are laboratory determined to be non-pathogenic (colonizing bacteria). In addition, we recorded records of patients with recurrent infection only once. According to the ratio of 1:1, 89 patients with carbapenem sensitive Pseudomonas aeruginosa (CSPA) infection who were hospitalized at the same time as CRPA infection group were selected. This study was approved by the Ethics Committee of the Second Hospital of Shanxi Medical University (2024YX-018). Due to the retrospective nature of this study, the review institution waived the requirement for written informed consent from study subjects. The study followed Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD).

Predictor variables and associated outcome definitions

Data collection was carried out in the “Blue Dragonfly” hospital infection management system with information extraction and cross-validation (HW and FT) by two experienced HAI managers. We selected candidate predictors based on previous literature studies as well as data availability. The clinical data collected included: (1) the source of specimens and the department of specimen submission; (2) General characteristics of patients: gender, age, length of hospital stay; (3) Main diagnosis: tumor (including hematological tumor), cerebrovascular disease, renal failure, trauma, pulmonary infection, diabetes, hypertension; (4) exposure during hospitalization: ICU admission, surgery, duration of central venous catheterization (CVC), duration of mechanical ventilation, retention time of urinary catheter, days of fever, indwelling drainage tube, transfusion, continuous Renal Replacement Therapy, puncture surgery; (5) Classification of inpatient drug use: cephalosporins, carbapenems, tetracyclines, glycopeptides, aminoglycosides, fluoroquinolones, linazolamide, immunosuppressants, antifungalagents, β-lactamase inhibitors.

Healthcare-associated infections: hospital-acquired infections in hospitalized patients, including those acquired during hospitalization and those acquired in the hospital after discharge; However, infections that had started before admission or were in incubation period at admission were not included. CRPA isolates were defined as P. aeruginosa isolates resistant to at least one carbapenem (minimum inhibitory concentration (MIC) of meropenem or imipenem ≥ 8 Âµg/mL)12.

Bacterial colonization: The growth of microorganisms, such as bacteria, on the skin, gastrointestinal tract, respiratory tract, oral cavity, and reproductive tract of a patient without causing clinical manifestations of associated infection13.

Identification of pathogenic bacteria and drug sensitivity analysis

VITEK-2 Compact automatic microbial identification analyzer and drug sensitivity analyzer (Bio Merieux company, France) were used to identify the strains and drug sensitivity test. Meropenem and imipenem susceptibility results were judged according to the Clinical and Laboratory Standards Institute (CLSI) criteria (provided by the Microbiology Laboratory)14. Standard calibration cards and quality control strains (Pseudomonas aeruginosa ATCC27853) were used for daily quality control. The minimum inhibitory concentration (MIC) was determined using the broth microdilution method.

Statistical analysis

SPSS26.0 and R Studio (4.3.1) were used for statistical analysis. Shapiro normality test was used to determine the normality of the sample data. If it was normal distribution, it was expressed as mean ± standard deviation, and the independent sample t test was used for comparison between the two groups. If the distribution did not meet the normal distribution, it was expressed as median (25% quantile, 75% quantile), and the Wilcoxon rank sum test was used for comparison between the two groups. Qualitative data were described by frequency (percentage), and chi-square test or Fisher’s exact test was used for comparison between groups. When the P-value is less than 0.05, the difference is considered statistically significant. R Studio (4.3.1) was used to analyze the collinearity between the variables to form a correlation heat map. Least absolute shrinkage and selection operator (LASSO) regression can effectively select variables and reduce model complexity. In addition, its L1 regularization property can effectively deal with multicollinearity between variables. We used LASSO regression to screen the predictor variables, and the data set was randomly divided into a training set and a test set in a ratio of 7:3. The training set was used for machine learning model development, and the test set was used for model validation. The extreme gradient boosting (XGBoost) model was applied to analyze the risk factors in the training set. Shapley additive explain (SHAP) could explain the model and help us understand the contribution of each feature to the prediction of the machine learning model. The accuracy, ROC curve and clinical decision curve were used to evaluate the predictive efficiency and clinical practicability of the machine learning model.

Results

Patient characteristics, specimen classification, and distribution

From January 2021 to March 2024, a total of 1,949 patients were diagnosed with P. aeruginosa infection. A total of 178 patients were finally included in the study. The age of the patients ranged from 15 to 97 years old, with an average age of 63 years old. The main gender was male (114/178, 64.04%). The inclusion flowchart is shown in Fig. 1. Overall, the most common disease diagnoses were hypoproteinemia (84/178, 47.19%), cancer (including leukemia) (70/178, 39.33%), hypertension (64/178, 35.96%), and cerebrovascular disease (48/178, 26.97%). Table 1 shows all clinical and demographic characteristics of patients with PA infection. There were statistically significant differences in the duration of hospitalization, ICU, respiratory failure, hypoproteinemia, days of fever, indwelling time of invasive pipeline, and clinical drug use between the two groups (P < 0.05).

Table 1 Demographic and clinical characteristics of patients with P. aeruginosa infection. * P<0.05.

In addition, Table 2 summarizes the departments in which CRPA infection specimens were detected. The ICU had the highest proportion (23 cases, 26.1%), the hematology department had a similar proportion (21 cases, 23.8%), the neurosurgery department and the respiratory department also had high risk (12 cases, 13.6% and 11 cases, 12.5%), and the infection situation was almost evenly distributed in other departments.

Table 2 Department distribution of 89 CRPA patients.

The correct diagnosis and treatment of infectious diseases need to be guided by the correct pathogen detection, and the premise of the correct pathogen detection is qualified specimens. Table 3 summarizes the types of specimens detected by CRPA, which are generally representative of microbial surveillance patterns in our medical institutions. Sputum was the most common type of CRPA infection (39/89, 43.8%), followed by blood samples (35/89, 39.3%) and urine samples (5/89, 5.6%).

Table 3 Composition ratio of cultivated specimens. BALF: broncho-alveolar lavage fluid.
Fig. 1
figure 1

Cases were included in a flow chart. PA: Pseudomonas aeruginosa; CRPA: carbapenem-resistant Pseudomonas aeruginosa; CSPA: carbapenem-sensitive Pseudomonas aeruginosa;

Predictive value and correlation heatmap of variables

We created a bar chart of AUC values (Fig. 2A) and an ROC curve (Fig. 2B) to illustrate the predictive value of individual variables for CRPA infection. Among them, the use of carbapenems, the days of fever, the retention time of urinary catheter, and the duration of central venous catheterization were the most predictive of outcome (AUC > 0.7). Additionally, We calculated the Pearson correlation coefficients to assess the degree of multicollinearity among variables and created a correlation heatmap. In Fig. 2C, the x- and y-axes have the same names, and different colors represent the correlation coefficients, which are calculated using the correlation method by computing the correlation between each x-axis value and the corresponding y-axis value. Each.

coordinate generates a value, and different values are assigned different colors, representing the size of the corresponding variable’s correlation. The variables marked with an asterisk indicate significant correlations. Finally, by fitting these values, we generate a fitted curve. An ideal fitted curve in the heatmap is a diagonal line. As can be seen from the heat map, collinearity between variables is always present and may eventually lead to model distortion.

Lasso regression

To address the problem of multicollinearity among variables, this study used LASSO regression to select predictive variables. All variables in the baseline data were included in the LASSO regression model, with quantitative variables recorded as actual values and categorical variables assigned values.

Fig. 2
figure 2

Variable contribution, collinearity analysis, and LASSO regression screening procedure. (A) AUC values of each variable for CRPA infection; (B) ROC curves of each variable for CRPA infection; (C) Visualization of collinearity between variables; (D) LASSO regression cross-validation plot; (E) LASSO regression path plot.

according to the following rules: male = 1, female = 2; disease (tumor, diabetes, hypertension, cardiovascular disease, cerebrovascular disease, renal failure, respiratory failure, trauma, hypoproteinemia) = 1, no disease = 0; procedure (admission to ICU, surgery, drainage tube, transfusion, hemodialysis, puncture, cephalosporins, carbapenems, tetracyclines, glycopeptides, aminoglycosides, fluoroquinolones, linazolamide, immunosuppressant, antifungal agents, β-lactamase inhibitors) = 1, no procedure = 0. The LASSO cross-validation diagram and regression path diagram (Fig. 2D and E) were constructed. The vertical dashed line on the left side of Fig. 2E represents lambda.min, and the vertical dashed line on the right side represents lambda.1se. The model’s deviation fluctuates minimally within the interval [lambda.1se, lambda.min]. When the predictor variables are 4, the error.

is minimized, namely, admission to ICU, duration of CVC, use of carbapenems, and use of fluoroquinolones.

XGBoost machine learning model construction and validation

The statistically significant variables selected by LASSO regression were included in the model to establish the prediction model of CRPA infection based on XGBoost. The accuracy of the model was calculated, and the ROC curve and clinical decision curve were drawn to evaluate the efficacy of the model. The final results showed that the accuracy of the XGBoost machine learning model in the training set and test set was 0.944 and 0.808, respectively. The area under the ROC curve (AUC value) was 0.987 (95%CI: 0.974-1.000) in the training set and 0.862 (0.750–0.974) in the test set.

Fig. 3
figure 3

ROC curve and clinical decision curve in the training and validation sets. (A) ROC curve of training sets. (B) ROC curve of validation sets. (C) Decision Curve Analysis (DCA) curve of training sets. (D) DCA curve of validation sets.

(Fig. 3A and B). The clinical decision curve also indicated that the model may have practical application in a wide range of clinical situations (Fig. 3C and D). Considering the performance of the above indicators, the XGBoost machine learning model can better predict the occurrence of CRPA infection.

XGBoost machine learning model interpretation based on SHAP

  • To explain the selected variables visually and intuitively, we employed SHAP to elucidate how these variables impact CRPA infection. Figure 4A presents the global SHAP explanation plot of features, where the horizontal axis represents the SHAP values of each feature. Positive and negative values.

  • respectively indicate positive and negative correlations with CRPA infection. The vertical axis indicates the feature names sorted by feature importance, and each point corresponding to a feature represents the SHAP contribution value of an individual sample. The results demonstrate that in the XGBoost model, the most prominent feature is the use of carbapenems, and it is positively correlated with CRPA infection. Figure 4B shows the force diagram of single-sample SHAP values, where E[f(x)] represents the baseline value of SHAP, and f(x) is the mean of the predicted values of the model. Purple indicates a negative impact, and yellow indicates a positive impact. The use of carbapenems exerted a positive impact of 1.85 on the outcome. The Duration of CVC had a negative impact of -0.495 on the outcome, and the final predicted value of the sample is -0.0123. Furthermore, in this sample, the use of carbapenems and the use of fluoroquinolones were the main factors contributing to CRPA infection, while not being admitted to the ICU and the Duration of CVC were he main factors reducing CRPA infection..

Fig. 4
figure 4

XGBoost machine learning model interpretation based on SHAP. (A) Feature global SHAP interpretation map. (B) Force diagram of single sample SHAP values. CVC: Central venous catheterization, ICU: Intensive care unit.

Discussion

This study relied on retrospective data from a regional medical institution in Shanxi Province to conduct an analysis of the risk factors for CRPA infection. We considered the interaction and collinearity effects among various predictor variables and employed LASSO regression for variable selection, which can offer more precise and reliable classification rules for clinical prediction. Notably, we utilized the XGBoost machine learning model to construct a risk prediction model for CRPA. XGBoost is an optimized distributed gradient boosting library that possesses advantages such.

as low complexity, rapid running speed, and high accuracy15. However, it is unable to clearly elucidate the underlying logical information. SHAP is a visualization approach based on game theory for explaining the output of machine learning models, providing consistent interpretability for the model16. In this study, the SHAP algorithm was employed to analyze the global and single-sample variables of the XGBoost model and quantitatively visualize the relationships between risk factors and outcomes, enhancing the credibility of the model. The final XGBoost machine learning model encompasses four commonly used clinical indicators, including the use of carbapenems, the duration of CVC, the use of fluoroquinolones, and admission to the ICU, which preliminarily indicated the potential application value of XGBoost-SHAP framework in predicting the risk of CRPA infection. However, its clinical promotion still needs to be verified by multi-center prospective studies.

The specimen distribution and source of CRPA can essentially reflect the microbial surveillance pattern of the institution or region. Among the cases with known infection sources, sputum specimens collected from the respiratory tract are the most common potential source, which is in line with other studies17,18. This might suggest that the respiratory system has a considerable risk of being infected by P. aeruginosa. Intriguingly, this study reveals that the ICU, the Department of Hematology, the Department of Neurosurgery, and the Department of Respiratory Medicine are the main departments where P. aeruginosa infections occur, accounting for ≥ 75% of the total detections. Patients in these high-risk departments are mostly elderly individuals with hematological malignancies, chronic respiratory disorders, severe craniocerebral injuries, trauma, or burn patients. They have low immunity and undergo more invasive treatments, which are prone to causing flora imbalance, leading to the translocation of P. aeruginosa to other sites for colonization and subsequently significantly increasing the risk of HAIs19.

Possible explanations for risk factors

CVC has been widely recognized as a risk factor for the further risk of drug resistance, which is consistent with the results of this study20. Microbial colonization on the catheter joint and the skin surface around the puncture site is the main source of pathogens. Microorganisms colonizing the skin often migrate from the CVC insertion site to the subcutaneous tunnel and colonize the catheter tip. Besides, the fluid management of CVC can also influence the growth of microorganisms. During intravenous infusion, Gram-positive bacteria (G+) (such as Staphylococcus epidermidis and Staphylococcus aureus) grow poorly, while Gram-negative bacteria (G-) (such as Pseudomonas aeruginosa and Klebsiella pneumoniae) grow continuously and form biofilms covering the catheter tip. In relation to the critical care operations, our study also found that admission to the ICU is a relevant risk factor for CRPA infection. The ICU often admits patients with severe diseases and a high risk of mortality. Low immunity leads to easier invasion and adhesion of pathogens to the inner walls of the respiratory and urinary tracts21. Simultaneously, invasive procedures bypass the host’s innate mechanical defense functions, providing a niche for drug-resistant microorganisms and promoting the progression of infection. Hence, we should further enhance the standardization of invasive procedures in the ICU in all aspects, such as conducting a comprehensive assessment of patients before catheter placement. Similarly, assessment should also take place in all aspects of care and disinfection maintenance during CVC insertion. Unnecessary catheters should be removed in a timely manner to minimize the risk of HAIs to the greatest extent.

We discovered that the utilization of carbapenems and fluoroquinolones constitutes a high-risk factor for CRPA infection22,23,24,25. This can likewise be accounted for by the severity of the disease, as carbapenems are frequently employed as the ultimate resort antibiotics for treating multi-drug-resistant infections. Additionally, the use of fluoroquinolones is a subject worthy of discussion. Owing to their advantages such as oral efficacy, broad-spectrum activity, and relatively fewer side effects, fluoroquinolones have witnessed rapid development and are currently widely utilized in empirical treatment of P. aeruginosa infections. Ciprofloxacin was once regarded as one of the most effective drugs for treating HAIs caused by P. aeruginosa26. Nevertheless, with the extensive use of fluoroquinolones, the resistance of P. aeruginosa to them has progressively escalated. The resistance mechanisms of quinolones primarily encompass: (i) genetic mutations of Class IV topoisomerases and gyrases encoding the target sites of quinolones; (ii) hyper-expression of regulatory genes governing the active efflux system, leading to enhanced active efflux; (iii) the functions of lipopolysaccharides and outer membrane proteins as well as the formation of biofilms27. Concurrently, the development of drug resistance is influenced by multiple factors. The pharmacokinetics, pharmacodynamics of the drug itself, and the resistance mechanisms of the pathogen against it undoubtedly play a decisive role in the emergence of bacterial resistance. However, the frequency of use of a certain type of drug and the clinical drug management model also significantly affect drug resistance.

In contrast to other studies, we failed to obtain statistically significant results regarding the use of cephalosporins and aminoglycosides28,29. The reasons for this may be due to multiple factors such as methodological limitations and clinical practice specificity. Due to the limitation of retrospective study design and sample size, the statistical power of the existing models may be insufficient. In addition, the patterns of use of these two drug classes, which are mostly used as de-escalation therapy to carbapenems, may have weakened the cumulative effect of drug selection pressure. It is worth noting that the antimicrobial management plan and intensive disinfection measures implemented in our center (such as hydrogen peroxide aerosol disinfection, which reduced the colonization rate of CRPA in the ICU environment) may have further reduced the strength of the association between drug exposure and the risk of resistance. The heterogeneity of results among different studies essentially reflects regional differences in antibiotic application strategies and the complexity of the resistance ecosystem. Therefore, when interpreting the relevant conclusions, it is necessary to fully consider the key factors such as the epidemiological characteristics of drug-resistant bacteria in the study area, the level of infection control in medical institutions, and the spectrum of antimicrobial use. It is necessary to verify its universality through multi-center studies.

The findings of this study provide targeted, actionable insights to optimize the clinical management of CRPA infections. First, the strong association between carbapenems use and CRPA infection highlights the need to limit the empirical prescribing of carbapenems in severe cases, and targeted antibiotic management must be strengthened. Second, we recommend strengthening the environmental cleaning and disinfection measures in ICU and monitoring patients with long-term CVC. Finally, the model can be integrated into the hospital infection system in the future to shorten the response time of prevention and control.

Limitation of the study

This study has the following limitations: Firstly, in terms of research design, â‘  The generalizability of the findings may be limited by the single-center design and relatively limited sample size. â‘¡ secondary infections and multiple culture results were not tracked; â‘¢ drug combination treatment variables were not integrated; â‘£ due to institutional costs and resource constraints, systematic inpatient screening was not conducted, resulting in the colonization status of P. aeruginosa not being quantified. Secondly, in terms of data quality, although accuracy was ensured through electronic system verification and review by physicians from the infection management center, the high exclusion rate may lead to selection bias, thereby underestimating the true risk of drug resistance; at the same time, the inherent information bias of the retrospective design still exists. Additionally, there are three limitations in the model construction aspect: â‘  competitive risks from other multidrug-resistant bacteria were not evaluated; â‘¡ organ function scores such as SOFA/APACHE II were not integrated, which may affect the prediction sensitivity for patients with multiple organ failure; â‘¢ a lack of multi-algorithm comparison (such as comparison with logistic regression models), the current results cannot be extrapolated as the universal advantage of XGBoost. However, these limitations point the way for future research: model development should be based on a multi-center, prospective research design, develop a more flexible framework for handling missing data, introduce microbial interaction network analysis, and conduct multi-algorithm comparisons, thereby more comprehensively capturing the dynamic characteristics of CRPA infections. It is encouraged to develop real-time prediction models and integrate them into HAI databases to promote continuous assessment of diagnostic results.

Conclusion

This study, based on single-center retrospective data, utilized the XGBoost-SHAP framework to construct a risk prediction model for CRPA infections, initially demonstrating its potential value. The infection control and clinical departments should strengthen the management of targeted antibiotics, standardize the invasive operation process, implement dynamic prevention and control in high-risk departments, and shorten the response time of CRPA infection prevention and control. Future validation of the universality of core risk factors through multi-center prospective studies is needed.