Introduction

Alzheimer’s disease (AD) was first identified in a female patient in 1901 and is pathologically defined by the presence of amyloid-β(Aβ) plaques and fibrillar tau tangles1. As the most prevalent form of dementia, its incidence is increasing at a concerning rate. Notably, women account for approximately two-thirds of AD cases, and their lifetime risk of developing the disease (1 in 5) is significantly higher than that of men (1 in 10)1,2. While some argue that this disparity is primarily due to women’s longer life expectancy, research suggests that sex influences both risk factors and potential disease-causing mechanisms3. In addition to biological elements such as chromosomal, epigenetic, and hormonal differences, psychosocial and cultural aspects, including education access and gender disparities, may also play a role in disease susceptibility. This review primarily focuses on biological factors to explore how sex differences impact key mechanisms underlying neurodegeneration4. The progressive nature of AD leads to severe cognitive decline, ultimately resulting in death. Recent studies have increasingly pointed towards lifestyle and physical activity as modifiable risk factors that can influence the course and outcomes of AD5. However, the quantification of these effects and their incorporation into predictive models for mortality risk remains a challenging endeavor6.

Recent epidemiological studies have consistently identified lifestyle and physical activity as critical modifiable risk factors associated with AD mortality. For instance, Norton et al.7 reported that addressing lifestyle factors, such as physical inactivity, could significantly reduce the global incidence of Alzheimer’s disease. Additionally, the Lancet Commission highlighted physical inactivity as one of the key modifiable risk factors that could substantially affect dementia outcomes, suggesting that increased physical activity could delay disease progression and potentially extend survival6. Meta-analyses have also reinforced the association between higher physical activity levels and reduced risk of dementia and AD-related mortality, emphasizing physical exercise as a practical strategy for risk reduction8,9. Moreover, Kivipelto et al.10 demonstrated the long-term protective effects of midlife physical activity against AD and other dementias, supporting the hypothesis that sustained physical activity could lower the risk of AD mortality. Therefore, incorporating lifestyle and physical activity variables into predictive mortality models is crucial for enhancing the precision of prognostic assessments and informing tailored preventive strategies.

Machine Learning (ML) offers a promising avenue for addressing this challenge by enabling the analysis of large and complex datasets to identify patterns and predict outcomes more accurately than traditional statistical methods11. By leveraging ML algorithms, this study aims to predict mortality risk in Alzheimer’s patients by analyzing variables related to lifestyle choices and physical activity levels12. This approach is novel in its application to AD and has the potential to provide valuable insights that could inform patient care strategies, public health policies, and individual lifestyle modifications to mitigate the risks associated with the disease.

ML techniques, when correctly implemented, are capable of processing vast amounts of data and producing precise outcomes, particularly with sparse models13. These algorithms have shown promise in estimating mortality risks and predicting the timing of death, enhancing our comprehension of the progression of dementia through the analysis of risk factors and their interactions14. ML methods often yield more accurate findings compared to conventional statistical approaches due to their superior ability to manage large and diverse datasets15. From a medical perspective, the main objective isn’t merely achieving high accuracy in predictions. Instead, identifying key risk factors often stands as the central clinical question16.

Recent advances in machine learning and artificial intelligence have opened new frontiers in medical research and practice, particularly in predictive analytics. In the context of AD, machine learning models have demonstrated potential in identifying early-stage biomarkers, classifying disease stages, and predicting cognitive decline with a degree of accuracy previously unattainable through traditional statistical methods17.

This study aims to extend the current understanding of AD’s progression by employing advanced machine learning techniques to analyze the relationship between lifestyle factors, physical activity, and mortality risk in individuals with Alzheimer’s disease. By doing so, we seek to contribute to the development of more effective, personalized intervention strategies that can potentially slow the disease’s progression and improve the overall prognosis for those affected.

Subjects and methods

Participants

Figure 1 illustrates the selection process of participants for a study conducted on data collected from the National Health and Nutrition Examination Survey (NHANES)18 depression screening spanning from 2007 to 2020. NHANES is a program of studies designed to assess the health and nutritional status of adults and children in the United States, which collects data through interviews and physical examinations to provide insights into various health parameters across a diverse population sample. Initially, 102,956 individuals participated in the screening. The selection process involved several exclusion criteria: 22,636 individuals were excluded due to missing Patient Health Questionnaire-9 scores; 11,981 were excluded for not having AD or having incomplete data regarding the diagnosis of AD; and 15,108 were excluded because they lacked follow-up information. Following these exclusions, a final sample of 53,231 research participants was obtained. This sample was then randomly split into two datasets: a training set comprising 42,585 individuals (80%) and a test set consisting of 10,646 individuals (20%). The training set is used to develop the machine learning model, while the test set is used to evaluate its performance.

Fig. 1
figure 1

Flowchart of the study design and participants excluded from the study.

Categorization of participants via PHQ-9

Participants were assessed for depressive symptoms using the Patient Health Questionnaire-9 (PHQ-9), a validated self-report instrument designed to screen for depression severity19. Based on their PHQ-9 scores, participants were categorized into three groups: none (scores 0–4), mild (scores 5–9), and severe (scores 10–14)20. This categorization follows established guidelines to identify clinically relevant levels of depressive symptomatology, facilitating meaningful analysis and interpretation of depression’s impact across different severity levels in the studied population. Categorizing individuals with AD by depression severity is crucial, as depressive symptoms can significantly influence cognitive performance, disease progression trajectories, responsiveness to treatments, and overall quality of life among AD patients21. Clarifying the impact of varying depression severities on these outcomes may enhance targeted interventions and therapeutic strategies tailored specifically for individuals with AD.

Definition of AD mortality

AD mortality refers to death resulting from the complications associated with AD. The progression of AD is characterized by the gradual deterioration of cognitive functions, leading ultimately to death. Mortality in AD patients is often a result of complications such as infections, including pneumonia, or other co-morbid conditions like heart disease or stroke, which are exacerbated by the decline in health and function caused by AD22. Operationalizing the definition of AD mortality requires a robust set of criteria that takes into account not only the presence of AD as a primary or contributing cause of death but also accounts for the role of AD in the presence of other terminal conditions. This can involve analyzing death certificates, medical records, and family reports to establish AD as a cause of death. Furthermore, AD mortality is not uniformly classified across different regions and studies, which may lead to discrepancies in reporting and understanding the scale of AD mortality23,24.

In this study, we obtained data on AD mortality from the National Death Index (NDI) up to December 31, 201925, using the Tenth Revision of the International Classification of Diseases (ICD-10) to determine cause of death. In this analysis, AD mortality was identified by the ICD-10 codes G30.0, G30.1, G30.8, and G30.9. We follow the methodology outlined by the Centers for Disease Control and Prevention (CDC) for defining AD-related deaths, ensuring consistency with large-scale epidemiological studies and national statistics26.

Definition of covariates

The selection and definition of covariates for this study were critical in examining the multifaceted influences on Alzheimer’s disease mortality. Initial candidate covariates were derived from an extensive review of epidemiological studies addressing factors influencing mortality in Alzheimer’s patients, including lifestyle risk factors, metabolic and cardiovascular conditions, and sociodemographic variables7,8,9,10. Sociodemographic data, lifestyle factors, medical comorbidities, and therapeutic measures were exhaustively collected via standardized assessments, including questionnaires, diagnostic evaluations, and physical examinations. Participants’ smoking history was classified according to lifetime cigarette exposure. Individuals who reported smoking less than 100 cigarettes in their lifetime were labeled as ‘never smokers’. Those who had smoked over 100 cigarettes and were currently smoking at the time of the survey were categorized as ‘current smokers’. Similarly, ‘former smokers’ were individuals who had smoked more than 100 cigarettes in the past but had quit by the time of the study. Alcohol use was quantified based on frequency, with ‘drinkers’ defined as those who consumed alcohol on at least 12 days throughout the past year. Physical activity levels were measured using the Global Physical Activity Questionnaire, which takes into account exercise from leisure, work, and transport. Activities were distinguished by their intensity—either ‘vigorous’ or ‘moderate’. The Total Physical Activity (TPA) score was derived by combining the duration of moderate activities and the double duration of vigorous activities, considering individuals with more than 150 min of combined weekly activity as ‘active’. Diabetes status incorporated self-reports, clinical measurements such as fasting glucose and glycohemoglobin levels, and information on anti-diabetic medication or insulin use. The presence of cardiovascular diseases (CVD) was identified through medical diagnosis records, including conditions such as congestive heart failure, coronary artery disease, angina, myocardial infarction, or stroke. Pharmacological interventions were recorded, with a particular focus on medications that manage blood sugar, blood pressure, and cholesterol levels. These were identified through detailed questionnaires that delved into diabetes management and cardiovascular health. Body mass index (BMI) was calculated using height and weight (kg/m2), providing a standard metric for assessing body fat and categorizing weight status. Total cholesterol (TC) readings were taken under strict laboratory conditions, with detailed methodologies outlined in the NHANES Laboratory/Medical Technician Procedures Manual. This comprehensive collection of covariates enabled a nuanced analysis of factors that could influence mortality risk among individuals with Alzheimer’s disease, thereby supporting the development of more targeted interventions. In addition, all variable summaries are in Table S1.

Data preprocessing

In the preprocessing stage, missing data were addressed using the Random Forest imputation method, a commonly employed machine learning-based technique that effectively captures non-linear relationships among variables, thus providing robust estimates for missing values (Stekhoven & Bühlmann, 2012). For variable transformations, categorical survey responses, such as binary "yes/no" questions, were numerically encoded to facilitate quantitative analysis. Specifically, “yes” responses were converted to '1', while “no” responses were converted to '0', a standard approach to binary categorical variable transformation (Kuhn & Johnson, 2013).

Statistical analysis

Model development

We selected the Random Survival Forest model to capture potential nonlinear relationships and complex interactions among predictors, while the Cox proportional hazards model was chosen for its interpretability and widespread acceptance in time-to-event analysis. In the context of this study, meticulous data analysis was conducted using Python, widely regarded for its robust capabilities in statistical computation. The construction of our predictive models was fundamentally supported by the utilization of two prominent Python libraries: ‘sksurv’ and ‘lifelines’. ‘sksurv’ was instrumental for implementing RSF model, celebrated for its precision in analyzing survival data. RSF model is an advanced machine learning technique that extends the traditional random forest algorithm to analyze time-to-event data, providing robust and interpretable predictions for survival analysis by handling censored and uncensored data. We optimized the hyperparameters of the RSF using cross-validation, a widely adopted strategy to enhance model reproducibility27,28. Ultimately, we determined that setting the number of estimators to 100 (n_estimators = 100) and fixing the random seed for reproducibility (random_state = 24) provided the most robust results for our RSF model.

In tandem, ‘lifelines’ provided the infrastructure necessary for crafting the Cox proportional hazards model, a seminal tool in the field of survival estimations. A pivotal stage of our analysis involved deploying the ‘permutation_importance’ module from Python, an astute methodological selection for assessing the importance of variables. This indispensable process allowed for an in-depth investigation into the contributory weight of each variable, shedding light on the most significant predictors of survival rates. This facet of the study is crucial, as it amplifies our understanding of the factors that are most influential in determining the prognostic outcomes for patients with Alzheimer’s Disease.

Model validation

To assess the robustness and predictive performance of our model, we employed a hold-out validation strategy in which 30% of the dataset was randomly reserved for testing. This approach provided an objective and independent measure of model performance. The same datasets, with a consistent 70/30 split, were utilized to develop and validate both the RSF and the Cox proportional hazards models, ensuring comparability across methods. Our validation emphasized two key dimensions: discrimination and calibration. Discrimination, reflecting a model’s capacity to differentiate between outcomes, was evaluated using the iAUC and time-dependent AUC (tAUC). Calibration, indicating the alignment between predicted probabilities and actual outcomes, was assessed via the integrated Brier score (iBS) and prediction error (PE). These metrics offered critical insights into the model’s accuracy in representing patient outcomes. Additionally, patients were stratified into ‘high-risk’ and ‘low-risk’ groups based on median predictive scores, which further supported refined risk assessment and personalized decision-making.

In clinical research, several statistical metrics are pivotal for evaluating the performance of prognostic models, including the integrated area under the curve (iAUC), time-dependent AUC (tAUC), integrated Brier score (iBS), prediction error (PE), and concordance index (C-index). The iAUC and tAUC primarily assess a model’s discriminatory power, reflecting its ability to distinguish between patients who will experience an event and those who will not over a specified time horizon29. Higher iAUC and tAUC values indicate superior discrimination, facilitating the early identification of high-risk individuals. ​Calibration is gauged through the iBS and PE, which compare predicted probabilities against observed outcomes. Lower iBS and PE values signify tighter alignment between predictions and actual events, thereby enhancing the clinical utility of the model30. ​Additionally, the C-index evaluates the proportion of correctly ranked pairs in survival analysis and is widely regarded for handling censored data31. A higher C-index reflects better model discrimination, indicating that the model more accurately predicts the order of events. ​Collectively, these metrics offer a comprehensive view of model accuracy and reliability, aiding clinicians in risk stratification and informed decision-making for patient management.

Model output

The primary outcome of this study was AD-related mortality. The endpoint was operationally defined as death attributed either directly to Alzheimer’s disease or resulting from complications exacerbated by AD, such as infections (particularly pneumonia) or comorbid conditions including heart disease or stroke. Mortality data were obtained from the National Death Index (NDI) through December 31, 2019, and causes of death were identified using the International Classification of Diseases, Tenth Revision (ICD-10) codes: G30.0, G30.1, G30.8, and G30.9. This definition aligns with the Centers for Disease Control and Prevention’s standards for categorizing AD-related deaths, ensuring consistency with epidemiological research. Accurate classification of this endpoint was crucial for assessing the predictive performance of the developed machine learning survival models, which aimed at forecasting mortality risk based on lifestyle and physical activity metrics.

Result

Patient characteristics of cohorts

The summary of the participant characteristics (shown in Table 1) stratified by the severity of AD symptoms, as assessed by the Patient Health Questionnaire-9 (PHQ-9) scores. The cohort is divided into three groups: those with no symptoms (scores 0–4), mild symptoms (scores 5–9), and severe symptoms (scores 10–14). Gender distribution across the symptom severity groups shows a higher percentage of males than females as the severity increases, with 51.0% of males exhibiting no symptoms, 57.3% with mild symptoms, and 61.9% with severe symptoms. Conversely, females account for 49.0%, 42.7%, and 38.1% of individuals in the respective symptom severity categories. Age-wise, the majority of participants across all severity groups are aged 55 years or younger, representing 63.3% of those with no symptoms, 61.9% with mild symptoms, and 62.6% with severe symptoms. Participants aged 56–65 make up a smaller proportion, while those over 65 years old represent 21.5% in the no symptoms and mild symptoms groups and decrease to 17.7% in the severe symptoms group. The average Body Mass Index (BMI) escalates with symptom severity, being lowest in the no symptoms group (28.5 ± 6.51) and highest among those with severe symptoms (30.9 ± 8.38). In terms of comorbid conditions like Congestive Heart Failure (CHF) and Coronary Heart Disease (CHD), the majority of participants did not have documented cases, but there’s a notable trend where treated cases of CHF and CHD increase with the severity of AD symptoms. Specifically, treated CHF is reported in 2.5% of participants without symptoms, 4.5% with mild symptoms, and 6.5% with severe symptoms. Similarly, treated CHD is seen in 3.4% without symptoms, 4.4% with mild symptoms, and an equal percentage of 6.4% in both the mild and severe groups. Physical activity measured in minutes shows a consistent pattern across all severity levels for high and moderate-intensity work-related activities and walking/cycling, with only a slight variation in the time spent on these activities. Interestingly, there is a marginal decrease in the minutes dedicated to vigorous and moderate recreational activities as symptom severity increases.

Table 1 Description of participants based on severity of AD symptoms.

Evaluation of the differentiation ability of the RSF-based model

Table 2 offers a comparison of RSF and Cox proportional hazards models in their ability to predict survival based on the severity of depressive symptoms. Performance metrics include iAUC/tAUC, integrated Brier score/prediction error (iBS/PE), and the concordance index (C-index), each accompanied by their 95% confidence intervals (CI) and P values, where applicable32. For participants with no depressive symptoms, the RSF model shows an iAUC/tAUC of 0.781 with a 95% CI of 0.778–0.839 and an iBS/PE of 0.150 with a 95% CI of 0.083–0.122. The model also achieves a C-index of 0.785 with a 95% CI of 0.776–0.800, indicating a good predictive ability. No P values are reported for the RSF model, suggesting that these results may serve as a baseline for comparison. The Cox model, when applied to the same group of participants, reveals a slightly lower iAUC/tAUC of 0.765 (95% CI 0.760–0.844) and a higher iBS/PE of 0.358 (95% CI 0.357–0.359), implying a less precise prediction of survival. The C-index is 0.793 (95% CI 0.781–0.806), with all P values for the Cox model’s metrics being significant (P < 0.001), indicating that the differences observed between the Cox model and the RSF model are statistically significant. In the case of participants with mild depressive symptoms, the RSF model’s performance shows an iAUC/tAUC of 0.764 and an iBS/PE of 0.150, with respective confidence intervals slightly narrowing compared to the group with no symptoms. The C-index here is 0.755. The Cox model scores slightly lower on the iAUC/tAUC at 0.745 and has a similarly high iBS/PE as seen in participants without symptoms, at 0.356. The C-index for the Cox model is 0.774, with all P values again significant. For moderate depressive symptoms, the RSF model’s iAUC/tAUC increases to 0.808, suggesting improved discrimination compared to the other groups, and maintains a consistent iBS/PE at 0.149. The C-index shows a slight decrease to 0.750. The Cox model’s performance is relatively close to the RSF with an iAUC/tAUC of 0.796 and iBS/PE of 0.351, with its C-index at 0.748. All P values are significant, indicating the Cox model’s consistent performance across different severity levels.

Table 2 Comparative performance of RSF and Cox in survival prediction on severity of depressive symptoms.

The first graph (Fig. 2a) depicts the Receiver Operating Characteristic (ROC) curves for three groups categorized by the severity of depressive symptoms—none, mild, and moderate—when predicting survival. The area under the curve (AUC) values suggest the model’s discriminatory power, with the ‘Moderate’ group achieving the highest AUC of 0.831, followed by the ‘Mild’ group at 0.825, and the ‘None’ group at 0.767. This indicates that the model is most adept at distinguishing survival outcomes in the ‘Moderate’ group, with ‘Mild’ also showing strong predictive accuracy. In the second graph (Fig. 2b), the time-dependent AUC (tAUC) for survival predictions over a period of 90 months is plotted for both the Random Survival Forest (RSF) and Cox proportional hazards models. For both models, the tAUC metrics fluctuate over time for each depressive symptom severity category. The RSF model consistently demonstrates higher tAUC values in the early months across all severity levels, suggesting a stronger initial predictive performance compared to the Cox model. However, as time progresses, there is an apparent decline in tAUC for both models, indicating a reduction in predictive accuracy with increasing time horizons. Overall, the graphs present an analysis of the predictive capabilities of the RSF and Cox models in the context of depressive symptom severity over time. The models exhibit variations in performance, with neither maintaining a constant predictive accuracy throughout the observed period. The initial higher tAUC values for the RSF model indicate a potentially more robust predictive utility in the short term, particularly for patients with moderate symptoms, while the Cox model displays relative stability in predictive performance, albeit at a slightly lower accuracy level. This comparative performance analysis is critical for understanding the temporal dynamics of survival prediction models in clinical settings. Supplementary Figure S1 presents calibration curves for the RSF (reference) and Cox models in both training and test sets. In every prognostic stratum (none, mild, and moderate), the RSF curves lie closer to the 45-degree line of perfect calibration, underscoring its superior agreement between predicted and observed survival probabilities.

Fig. 2
figure 2

The parameter of our model RSF-based for predicting survival. (a) ROC analysis for random survival forest model. (b) AUC trends for Cox and RSF across severity of depressive symptoms.

The horizontal bar chart provided appears to represent the results of a variable importance analysis, likely from a predictive model relating to health outcomes (shown in Fig. 3). The variables are ranked by their importance scores on the x-axis, which seem to measure the impact each variable has on the model’s predictions. The length and direction of the bars suggest the degree and direction of the relationship of each variable with the outcome being predicted, with longer bars indicating a higher importance or stronger relationship. At the top of the chart, ‘Age’ shows the greatest positive importance, indicating that it is likely the most significant predictor in the model. This is followed by variables related to work activity level, blood pressure readings, body mass index (BMI), and various health-related behaviors and conditions such as smoking status, alcohol consumption, and history of diseases like congestive heart failure and stroke. Error bars are included for each variable, though due to the scale of the chart, they are mostly not discernible, suggesting precise estimates of the importance scores. Variables toward the bottom of the chart, including 'Race/Ethnicity,' 'Walking/Biking Minutes,' and 'Fasting Blood Glucose,' exhibit less importance in the model relative to the top-ranked variables.

Fig. 3
figure 3

Ranking of variable importance for the top influential parameters in RSF model.

Patient clinical benefit evaluation of the RSF-based model

The set of graphs represents a Decision Curve Analysis (DCA) for evaluating the clinical benefit of Random Survival Forest (RSF)-based and Cox proportional hazards models in predicting survival outcomes with varying severities—none, mild, and severe. Each pair of graphs corresponds to a different severity of survival prediction, with the left graph of each pair representing the training cohort and the right graph representing the test cohort (shown in Fig. 4). For all severities and in both cohorts, the RSF model and Cox model lines show the net benefit across a range of threshold probabilities. The net benefit is compared against two default strategies: ‘treat all’ and 'treat none.' ‘Treat all’ assumes all patients have the event, and ‘treat none’ assumes no patients have the event. The higher the line, the greater the net benefit of using the model at that threshold probability. In every graph, there’s a range of threshold probabilities where using the RSF and Cox models provides a greater net benefit than either default strategy, suggesting that the models have practical value. The shaded area around the lines for the RSF and Cox models may indicate confidence intervals, suggesting the uncertainty around the net benefit estimates. For the ‘None’ and ‘Mild’ survival predictions, both models show some clinical benefit over the ‘treat all’ and ‘treat none’ strategies in both the training and test cohorts. However, for ‘Severe’ survival prediction, the models’ net benefit closely aligns with the ‘treat all’ strategy, especially in the test cohort, indicating that the models’ predictions align with a more conservative approach to predicting severe survival outcomes. Overall, the decision curves indicate that the RSF and Cox models have varying levels of utility depending on the severity of the survival outcome being predicted and the cohort being considered.

Fig. 4
figure 4

Patient clinical benefit evaluation of the model RSF-based and COX-based. Decision Curve Analysis for None, Mild, Severe Survival Prediction of Rsf model and COX-based in Training and Test Cohorts. Decision curve analysis (DCA) for predicting None survival in the training cohort (a) and test cohort (b); Mild survival in the training cohort (c) and test cohort (d); Severe survival in the training cohort (e) and test cohort (f).

Patient survival analysis based on the RSF model

The series of Kaplan–Meier curves in Fig. 5 illustrate the survival analysis of patients based on risk stratification from the Random Survival Forest (RSF) model, across two cohorts—training and test—and three levels of survival outcomes: None, Mild, and Severe. In each graph, two groups are compared: the Low risk group (blue line) and the High risk group (red line). Across all graphs, the Low risk group consistently shows a higher survival rate over time compared to the High risk group. The separation between the survival curves of the two groups is significant, as indicated by P-values less than 0.001 in all cases, suggesting that the RSF model is effective in differentiating between higher and lower-risk patients in terms of survival. For both the training (graphs a, c, e) and test cohorts (graphs b, d, f), as the survival outcome severity increases from None to Severe, the High risk group’s survival rate decreases more rapidly over time, which is highlighted by the steeper curves. The Chi-square (χ2) statistics provided alongside the P-values confirm the significance of the differences between the risk groups, with higher values indicating a stronger distinction between the groups’ survival rates. The consistency of these findings across different severities of survival outcomes and both cohorts underscores the robustness of the RSF model in survival prediction and risk categorization.

Fig. 5
figure 5

Patient survival analysis based on the RSF model. The None survival (a), Mild survival (c), Severe survival (e) in training cohort and the None survival (b), Mild survival (d), Severe survival (f) in test cohort.

Figure 6 presents the SHAP-based33 feature importance results for the three-fold cross-validated RSF models, with each panel corresponding to a single fold (a: fold 1, C-index = 0.787; b: fold 2, C-index = 0.801; c: fold 3, C-index = 0.810). In all three folds, Age emerged as the strongest predictor of mortality risk, exhibiting the highest mean absolute SHAP value across the background samples. Systolic blood pressure measurements—particularly earlier readings (1st and 2nd timepoints)—consistently ranked among the top three features, indicating that blood pressure dynamics play a key role in survival prediction. BMI and diabetes status (self-reported “the doctor told me I have diabetes”) also appeared among the top five features in each fold, underscoring the importance of metabolic factors. Fold 1 additionally highlighted diastolic blood pressure (at multiple timepoints) as an influential variable, whereas fold 2 identified congestive heart failure and fold 3 emphasized smoking-related variables (“Are you smoking now” and “How long have you quit smoking”) among its top predictors. Despite minor variations in the rank order of less dominant features, the recurrent prominence of age, blood pressure metrics, BMI, and diabetes across all three folds demonstrates the robustness and reproducibility of the RSF model’s feature selection.

Fig. 6
figure 6

SHAP-based feature importance for three-fold cross-validated Random Survival Forest models. (a) Fold 1 (Validation C-index = 0.787); (b) Fold 2 (Validation C-index = 0.801); (c) Fold 3 (Validation C-index = 0.810). Each panel displays the top 10 features ranked by mean absolute SHAP value (horizontal axis), with feature names on the vertical axis.

Discussion

The inclusion of lifestyle and physical activity as variables in our model is particularly pertinent to the clinical management of AD. Research has increasingly highlighted these factors as modifiable risk elements that significantly influence the progression and outcomes of AD34,35. By integrating these factors, our study not only aligns with current research trends but also opens avenues for preventive strategies rooted in lifestyle modifications36. This approach supports personalized medicine by tailoring intervention strategies to individual risk profiles, thereby optimizing patient management and potentially delaying disease progression. From a clinical perspective, the ability to accurately predict mortality risk is invaluable. It allows healthcare providers to identify high-risk patients early, enabling earlier interventions that could significantly alter disease trajectories. Moreover, such predictive capabilities support the allocation of healthcare resources more effectively, ensuring that patients who are most at risk receive appropriate care promptly. The RSF model’s robust performance across different levels of symptom severity also suggests its adaptability to various stages of AD, making it a versatile tool in clinical practice. However, the study’s reliance on a specific machine learning model and dataset highlights typical challenges in medical research, such as model generalizability and data dependency. The variability in the RSF model’s performance across test cohorts points to potential issues in generalizing findings without extensive external validation. Therefore, future research should focus on validating these models across diverse demographic groups and healthcare settings to ascertain the robustness and applicability of the findings. Additionally, the significant influence of age as a predictive variable warrants a deeper investigation into the interaction between age, genetic factors, and lifestyle choices in AD progression.

The results from our study highlight the significance of categorizing individuals with AD based on depression severity, as assessed by the PHQ-9. Participants were stratified into three groups according to PHQ-9 scores: none (0–4), mild (5–9), and severe (10–14), aligning with established clinical thresholds19. Our analysis revealed notable differences among these groups. Specifically, the proportion of males increased with depression severity from 51.0% in the none group to 57.3% in mild and 61.9% in severe groups. In contrast, females showed a decreasing trend from 49.0 to 42.7% and 38.1%, respectively. BMI values also showed an increasing trend, with mean BMI rising from 28.5 ± 6.51 in participants without depressive symptoms to 30.0 ± 7.79 in mild and 30.9 ± 8.38 in severe cases. Furthermore, comorbid conditions such as CHF and CHD demonstrated a higher prevalence of treated cases in groups with greater depressive severity; for instance, treated CHF cases increased from 2.5% in participants without symptoms to 6.5% in severe cases. Similarly, treated CHD rose from 3.4% in the none group to 6.4% in the severe group. Physical activity metrics indicated a marginal decline in vigorous and moderate recreational activities as depression severity increased. Such classification is critical because depressive symptoms can markedly impact cognitive functioning, accelerate disease progression, alter responsiveness to treatment interventions, and significantly reduce quality of life in AD patients21,36. Thus, differentiating AD populations by depression severity provides essential insights for targeted clinical interventions and individualized therapeutic strategies21,36.

The application of the RSF model within this study has elucidated the potential of machine learning techniques in predicting mortality risk in AD patients. In Table 2, the P values (< 0.001) for the Cox model’s performance metrics indicate that the observed differences compared with the RSF model are statistically significant. Notably, the RSF model yields consistently higher iAUC/tAUC values and lower iBS/PE values across varying levels of depressive symptoms, suggesting a superior predictive accuracy that is unlikely to be attributed to chance. A key advantage of the RSF over the Cox proportional hazards model lies in its nonparametric, tree-based architecture, which enables the RSF to capture complex, non-linear relationships and interactions among predictors without relying on the proportional hazards assumption. By aggregating multiple survival trees, the RSF approach inherently reduces overfitting and can better accommodate high-dimensional data or intricate predictor dependencies. Moreover, the ensemble structure of RSF allows it to adaptively weight variables and refine split points, potentially improving the model’s calibration and discrimination. Consequently, the RSF consistently demonstrates superior predictive performance and yields statistically significant improvements over the Cox model. Our findings suggest that the RSF model outperforms the traditional Cox proportional hazards model, particularly in terms of calibration and discrimination ability. The superior performance of the RSF model, evidenced through higher iAUC values and lower prediction errors across various symptom severities, reinforces the role of advanced computational models in enhancing prognostic accuracy in clinical settings. Notably, the study’s emphasis on lifestyle and physical activity as modifiable risk factors aligns with a growing body of literature that recognizes their influence on AD progression. By incorporating these variables into our model, we contribute to a more holistic understanding of AD and open avenues for preventive strategies that are grounded in behavioral modification37. This aspect underscores the vital importance of personalized medicine, which tailors intervention strategies to the individual’s unique risk profile.

We have analyzed the fundamental differences between the RSF model and the traditional Cox proportional hazards model, both in theoretical principles and empirical results. The RSF model, being a non-parametric approach, offers a distinct advantage in handling complex interactions and non-linear relationships without the need for explicit assumptions about the hazard functions, unlike the Cox model which assumes proportional hazards. Empirically, our results demonstrate that the RSF model provides better calibration and discriminatory ability across various AD symptom severities. This is particularly evident in the RSF model’s superior performance in accurately predicting outcomes in patients with severe symptoms, where traditional models like the Cox might underperform due to their linear assumptions.

However, this study is not without limitations. The predictive performance of the RSF model, while promising, displayed variability when applied to the test cohorts. This variation underscores the challenge of generalizing machine learning models to diverse populations and real-world scenarios. Future research could focus on external validation of our findings across different demographic groups and healthcare settings to ensure the model’s robustness and applicability. Moreover, the significant role of age as a predictive variable raises critical questions about the complex interplay between genetic factors and lifestyle choices. The trend observed, where a higher percentage of males exhibited more severe symptoms, may indicate gender-specific pathways in AD progression that merit further exploration.

Future iterations of this research should also consider the integration of additional biomarkers and genetic information, as well as the potential impact of environmental factors. Such data could enhance model complexity and potentially uncover new relationships that were not previously apparent. In conclusion, the current study demonstrates that machine learning models, particularly the RSF model, are valuable tools for predicting mortality in AD. These models provide significant insights that could aid in the early identification of high-risk individuals and the development of targeted interventions. Nonetheless, these tools should be seen as complementary to, rather than replacements for, traditional clinical judgment and the nuanced understanding that healthcare professionals bring to patient care.

Conclusion

In summary, our study demonstrates that machine learning—particularly the RSF model—provides a robust framework for predicting mortality risk in Alzheimer’s disease patients by integrating traditional clinical variables with lifestyle and physical activity measures. The RSF consistently outperformed the Cox proportional hazards model across multiple evaluation metrics (iAUC, iBS/PE, and C‐index), highlighting its ability to capture nonlinear interactions and complex relationships among predictors. Importantly, modifiable factors such as total physical activity and smoking status emerged as significant contributors to mortality risk, underscoring the potential of targeted lifestyle interventions to improve patient outcomes. While our findings suggest that RSF‐based models can support more accurate risk stratification and personalized care plans, further validation in diverse, external cohorts is essential to confirm generalizability and ensure broader clinical utility.

Limitations

Although our findings demonstrate that the RSF model outperformed traditional Cox proportional hazards models, several important limitations must be considered. First, machine-learning approaches like RSF, while effective at modeling complex, non-linear relationships, carry an inherent risk of overfitting, especially when working with relatively small or highly heterogeneous samples. Second, our study utilized internal validation through a hold-out method but did not include validation on an independent external dataset. Testing the RSF model in external cohorts is critical for assessing its robustness and confirming generalizability before clinical implementation. Third, it should be noted that the NHANES dataset used in our analysis primarily represents the U.S. population, which may limit the global applicability of our findings. Populations with different demographic characteristics, genetic backgrounds, or healthcare systems may yield different predictive performances. Lastly, while RSF provides clear advantages in predictive accuracy and flexibility, its complexity may pose practical challenges for clinicians, including interpretability difficulties and increased computational requirements, which could hinder its widespread adoption in routine clinical settings. In addition, while the PHQ-9 is widely used, it has limitations as a proxy for AD severity. As a self-report tool, it can be influenced by recall bias and transient mood fluctuations, and its standard cut‐offs were validated in general populations rather than AD cohorts, risking misclassification. And physical activity was assessed via self‐reported questionnaires, which introduces potential biases.

Future studies should therefore balance predictive performance with clinical utility, emphasizing external validation across diverse populations. This entails not only testing the RSF model in independent cohorts—preferably from different geographic regions and healthcare systems—but also adopting prospective and longitudinal designs that can capture evolving patterns of risk. Such efforts will help ensure that RSF, and similar approaches, provide robust, generalizable, and clinically meaningful predictions across a wide range of patient populations.