Abstract
Alzheimerâs disease (AD), a progressive neurodegenerative disorder, significantly impacts patient survival, prompting the need for accurate prognostic tools. Lifestyle factors and physical activity levels have been identified as critical modifiable risk factors influencing AD outcomes, but their precise impact on mortality prediction remains understudied. This study aimed to employ machine learning (ML) techniques to predict mortality risk in AD patients, leveraging data on lifestyle and physical activity to enhance personalized care strategies and inform public health policies. We analyzed data from 53,231 participants collected from the National Health and Nutrition Examination Survey (NHANES, 2007â2020). Participants were stratified by AD symptom severity using Patient Health Questionnaire-9 scores. Random Survival Forest (RSF) and Cox proportional hazards models were developed and validated using a training set (nâ=â42,585) and test set (nâ=â10,646). Model performance was evaluated using the integrated area under the curve (iAUC), integrated Brier score/prediction error (iBS/PE), and concordance index (C-index). The RSF model outperformed the Cox model, achieving higher discrimination and calibration. Specifically, the RSF demonstrated an iAUC of 0.781 (95% CI 0.778â0.839), iBS/PE of 0.150 (95% CI 0.083â0.122), and a C-index of 0.785 (95% CI 0.776â0.800) in the no-symptom group of the training cohort. These metrics indicate superior predictive accuracy, especially at extreme ends of risk prediction. Lifestyle and physical activity levels were identified as significant predictors influencing mortality risk. ML algorithms, notably RSF, effectively predict mortality risk in AD patients, demonstrating clear advantages over traditional statistical models. Incorporating lifestyle and physical activity into ML-based predictive frameworks can significantly improve risk stratification, informing targeted interventions. Further external validation across diverse populations is necessary to establish broader applicability.
Similar content being viewed by others
Introduction
Alzheimerâs disease (AD) was first identified in a female patient in 1901 and is pathologically defined by the presence of amyloid-β(Aβ) plaques and fibrillar tau tangles1. As the most prevalent form of dementia, its incidence is increasing at a concerning rate. Notably, women account for approximately two-thirds of AD cases, and their lifetime risk of developing the disease (1 in 5) is significantly higher than that of men (1 in 10)1,2. While some argue that this disparity is primarily due to womenâs longer life expectancy, research suggests that sex influences both risk factors and potential disease-causing mechanisms3. In addition to biological elements such as chromosomal, epigenetic, and hormonal differences, psychosocial and cultural aspects, including education access and gender disparities, may also play a role in disease susceptibility. This review primarily focuses on biological factors to explore how sex differences impact key mechanisms underlying neurodegeneration4. The progressive nature of AD leads to severe cognitive decline, ultimately resulting in death. Recent studies have increasingly pointed towards lifestyle and physical activity as modifiable risk factors that can influence the course and outcomes of AD5. However, the quantification of these effects and their incorporation into predictive models for mortality risk remains a challenging endeavor6.
Recent epidemiological studies have consistently identified lifestyle and physical activity as critical modifiable risk factors associated with AD mortality. For instance, Norton et al.7 reported that addressing lifestyle factors, such as physical inactivity, could significantly reduce the global incidence of Alzheimerâs disease. Additionally, the Lancet Commission highlighted physical inactivity as one of the key modifiable risk factors that could substantially affect dementia outcomes, suggesting that increased physical activity could delay disease progression and potentially extend survival6. Meta-analyses have also reinforced the association between higher physical activity levels and reduced risk of dementia and AD-related mortality, emphasizing physical exercise as a practical strategy for risk reduction8,9. Moreover, Kivipelto et al.10 demonstrated the long-term protective effects of midlife physical activity against AD and other dementias, supporting the hypothesis that sustained physical activity could lower the risk of AD mortality. Therefore, incorporating lifestyle and physical activity variables into predictive mortality models is crucial for enhancing the precision of prognostic assessments and informing tailored preventive strategies.
Machine Learning (ML) offers a promising avenue for addressing this challenge by enabling the analysis of large and complex datasets to identify patterns and predict outcomes more accurately than traditional statistical methods11. By leveraging ML algorithms, this study aims to predict mortality risk in Alzheimerâs patients by analyzing variables related to lifestyle choices and physical activity levels12. This approach is novel in its application to AD and has the potential to provide valuable insights that could inform patient care strategies, public health policies, and individual lifestyle modifications to mitigate the risks associated with the disease.
ML techniques, when correctly implemented, are capable of processing vast amounts of data and producing precise outcomes, particularly with sparse models13. These algorithms have shown promise in estimating mortality risks and predicting the timing of death, enhancing our comprehension of the progression of dementia through the analysis of risk factors and their interactions14. ML methods often yield more accurate findings compared to conventional statistical approaches due to their superior ability to manage large and diverse datasets15. From a medical perspective, the main objective isnât merely achieving high accuracy in predictions. Instead, identifying key risk factors often stands as the central clinical question16.
Recent advances in machine learning and artificial intelligence have opened new frontiers in medical research and practice, particularly in predictive analytics. In the context of AD, machine learning models have demonstrated potential in identifying early-stage biomarkers, classifying disease stages, and predicting cognitive decline with a degree of accuracy previously unattainable through traditional statistical methods17.
This study aims to extend the current understanding of ADâs progression by employing advanced machine learning techniques to analyze the relationship between lifestyle factors, physical activity, and mortality risk in individuals with Alzheimerâs disease. By doing so, we seek to contribute to the development of more effective, personalized intervention strategies that can potentially slow the diseaseâs progression and improve the overall prognosis for those affected.
Subjects and methods
Participants
Figure 1 illustrates the selection process of participants for a study conducted on data collected from the National Health and Nutrition Examination Survey (NHANES)18 depression screening spanning from 2007 to 2020. NHANES is a program of studies designed to assess the health and nutritional status of adults and children in the United States, which collects data through interviews and physical examinations to provide insights into various health parameters across a diverse population sample. Initially, 102,956 individuals participated in the screening. The selection process involved several exclusion criteria: 22,636 individuals were excluded due to missing Patient Health Questionnaire-9 scores; 11,981 were excluded for not having AD or having incomplete data regarding the diagnosis of AD; and 15,108 were excluded because they lacked follow-up information. Following these exclusions, a final sample of 53,231 research participants was obtained. This sample was then randomly split into two datasets: a training set comprising 42,585 individuals (80%) and a test set consisting of 10,646 individuals (20%). The training set is used to develop the machine learning model, while the test set is used to evaluate its performance.
Categorization of participants via PHQ-9
Participants were assessed for depressive symptoms using the Patient Health Questionnaire-9 (PHQ-9), a validated self-report instrument designed to screen for depression severity19. Based on their PHQ-9 scores, participants were categorized into three groups: none (scores 0â4), mild (scores 5â9), and severe (scores 10â14)20. This categorization follows established guidelines to identify clinically relevant levels of depressive symptomatology, facilitating meaningful analysis and interpretation of depressionâs impact across different severity levels in the studied population. Categorizing individuals with AD by depression severity is crucial, as depressive symptoms can significantly influence cognitive performance, disease progression trajectories, responsiveness to treatments, and overall quality of life among AD patients21. Clarifying the impact of varying depression severities on these outcomes may enhance targeted interventions and therapeutic strategies tailored specifically for individuals with AD.
Definition of AD mortality
AD mortality refers to death resulting from the complications associated with AD. The progression of AD is characterized by the gradual deterioration of cognitive functions, leading ultimately to death. Mortality in AD patients is often a result of complications such as infections, including pneumonia, or other co-morbid conditions like heart disease or stroke, which are exacerbated by the decline in health and function caused by AD22. Operationalizing the definition of AD mortality requires a robust set of criteria that takes into account not only the presence of AD as a primary or contributing cause of death but also accounts for the role of AD in the presence of other terminal conditions. This can involve analyzing death certificates, medical records, and family reports to establish AD as a cause of death. Furthermore, AD mortality is not uniformly classified across different regions and studies, which may lead to discrepancies in reporting and understanding the scale of AD mortality23,24.
In this study, we obtained data on AD mortality from the National Death Index (NDI) up to December 31, 201925, using the Tenth Revision of the International Classification of Diseases (ICD-10) to determine cause of death. In this analysis, AD mortality was identified by the ICD-10 codes G30.0, G30.1, G30.8, and G30.9. We follow the methodology outlined by the Centers for Disease Control and Prevention (CDC) for defining AD-related deaths, ensuring consistency with large-scale epidemiological studies and national statistics26.
Definition of covariates
The selection and definition of covariates for this study were critical in examining the multifaceted influences on Alzheimerâs disease mortality. Initial candidate covariates were derived from an extensive review of epidemiological studies addressing factors influencing mortality in Alzheimerâs patients, including lifestyle risk factors, metabolic and cardiovascular conditions, and sociodemographic variables7,8,9,10. Sociodemographic data, lifestyle factors, medical comorbidities, and therapeutic measures were exhaustively collected via standardized assessments, including questionnaires, diagnostic evaluations, and physical examinations. Participantsâ smoking history was classified according to lifetime cigarette exposure. Individuals who reported smoking less than 100 cigarettes in their lifetime were labeled as ânever smokersâ. Those who had smoked over 100 cigarettes and were currently smoking at the time of the survey were categorized as âcurrent smokersâ. Similarly, âformer smokersâ were individuals who had smoked more than 100 cigarettes in the past but had quit by the time of the study. Alcohol use was quantified based on frequency, with âdrinkersâ defined as those who consumed alcohol on at least 12 days throughout the past year. Physical activity levels were measured using the Global Physical Activity Questionnaire, which takes into account exercise from leisure, work, and transport. Activities were distinguished by their intensityâeither âvigorousâ or âmoderateâ. The Total Physical Activity (TPA) score was derived by combining the duration of moderate activities and the double duration of vigorous activities, considering individuals with more than 150 min of combined weekly activity as âactiveâ. Diabetes status incorporated self-reports, clinical measurements such as fasting glucose and glycohemoglobin levels, and information on anti-diabetic medication or insulin use. The presence of cardiovascular diseases (CVD) was identified through medical diagnosis records, including conditions such as congestive heart failure, coronary artery disease, angina, myocardial infarction, or stroke. Pharmacological interventions were recorded, with a particular focus on medications that manage blood sugar, blood pressure, and cholesterol levels. These were identified through detailed questionnaires that delved into diabetes management and cardiovascular health. Body mass index (BMI) was calculated using height and weight (kg/m2), providing a standard metric for assessing body fat and categorizing weight status. Total cholesterol (TC) readings were taken under strict laboratory conditions, with detailed methodologies outlined in the NHANES Laboratory/Medical Technician Procedures Manual. This comprehensive collection of covariates enabled a nuanced analysis of factors that could influence mortality risk among individuals with Alzheimerâs disease, thereby supporting the development of more targeted interventions. In addition, all variable summaries are in Table S1.
Data preprocessing
In the preprocessing stage, missing data were addressed using the Random Forest imputation method, a commonly employed machine learning-based technique that effectively captures non-linear relationships among variables, thus providing robust estimates for missing values (Stekhoven & Bühlmann, 2012). For variable transformations, categorical survey responses, such as binary "yes/no" questions, were numerically encoded to facilitate quantitative analysis. Specifically, âyesâ responses were converted to '1', while ânoâ responses were converted to '0', a standard approach to binary categorical variable transformation (Kuhn & Johnson, 2013).
Statistical analysis
Model development
We selected the Random Survival Forest model to capture potential nonlinear relationships and complex interactions among predictors, while the Cox proportional hazards model was chosen for its interpretability and widespread acceptance in time-to-event analysis. In the context of this study, meticulous data analysis was conducted using Python, widely regarded for its robust capabilities in statistical computation. The construction of our predictive models was fundamentally supported by the utilization of two prominent Python libraries: âsksurvâ and âlifelinesâ. âsksurvâ was instrumental for implementing RSF model, celebrated for its precision in analyzing survival data. RSF model is an advanced machine learning technique that extends the traditional random forest algorithm to analyze time-to-event data, providing robust and interpretable predictions for survival analysis by handling censored and uncensored data. We optimized the hyperparameters of the RSF using cross-validation, a widely adopted strategy to enhance model reproducibility27,28. Ultimately, we determined that setting the number of estimators to 100 (n_estimatorsâ=â100) and fixing the random seed for reproducibility (random_stateâ=â24) provided the most robust results for our RSF model.
In tandem, âlifelinesâ provided the infrastructure necessary for crafting the Cox proportional hazards model, a seminal tool in the field of survival estimations. A pivotal stage of our analysis involved deploying the âpermutation_importanceâ module from Python, an astute methodological selection for assessing the importance of variables. This indispensable process allowed for an in-depth investigation into the contributory weight of each variable, shedding light on the most significant predictors of survival rates. This facet of the study is crucial, as it amplifies our understanding of the factors that are most influential in determining the prognostic outcomes for patients with Alzheimerâs Disease.
Model validation
To assess the robustness and predictive performance of our model, we employed a hold-out validation strategy in which 30% of the dataset was randomly reserved for testing. This approach provided an objective and independent measure of model performance. The same datasets, with a consistent 70/30 split, were utilized to develop and validate both the RSF and the Cox proportional hazards models, ensuring comparability across methods. Our validation emphasized two key dimensions: discrimination and calibration. Discrimination, reflecting a modelâs capacity to differentiate between outcomes, was evaluated using the iAUC and time-dependent AUC (tAUC). Calibration, indicating the alignment between predicted probabilities and actual outcomes, was assessed via the integrated Brier score (iBS) and prediction error (PE). These metrics offered critical insights into the modelâs accuracy in representing patient outcomes. Additionally, patients were stratified into âhigh-riskâ and âlow-riskâ groups based on median predictive scores, which further supported refined risk assessment and personalized decision-making.
In clinical research, several statistical metrics are pivotal for evaluating the performance of prognostic models, including the integrated area under the curve (iAUC), time-dependent AUC (tAUC), integrated Brier score (iBS), prediction error (PE), and concordance index (C-index). The iAUC and tAUC primarily assess a modelâs discriminatory power, reflecting its ability to distinguish between patients who will experience an event and those who will not over a specified time horizon29. Higher iAUC and tAUC values indicate superior discrimination, facilitating the early identification of high-risk individuals. âCalibration is gauged through the iBS and PE, which compare predicted probabilities against observed outcomes. Lower iBS and PE values signify tighter alignment between predictions and actual events, thereby enhancing the clinical utility of the model30. âAdditionally, the C-index evaluates the proportion of correctly ranked pairs in survival analysis and is widely regarded for handling censored data31. A higher C-index reflects better model discrimination, indicating that the model more accurately predicts the order of events. âCollectively, these metrics offer a comprehensive view of model accuracy and reliability, aiding clinicians in risk stratification and informed decision-making for patient management.
Model output
The primary outcome of this study was AD-related mortality. The endpoint was operationally defined as death attributed either directly to Alzheimerâs disease or resulting from complications exacerbated by AD, such as infections (particularly pneumonia) or comorbid conditions including heart disease or stroke. Mortality data were obtained from the National Death Index (NDI) through December 31, 2019, and causes of death were identified using the International Classification of Diseases, Tenth Revision (ICD-10) codes: G30.0, G30.1, G30.8, and G30.9. This definition aligns with the Centers for Disease Control and Preventionâs standards for categorizing AD-related deaths, ensuring consistency with epidemiological research. Accurate classification of this endpoint was crucial for assessing the predictive performance of the developed machine learning survival models, which aimed at forecasting mortality risk based on lifestyle and physical activity metrics.
Result
Patient characteristics of cohorts
The summary of the participant characteristics (shown in Table 1) stratified by the severity of AD symptoms, as assessed by the Patient Health Questionnaire-9 (PHQ-9) scores. The cohort is divided into three groups: those with no symptoms (scores 0â4), mild symptoms (scores 5â9), and severe symptoms (scores 10â14). Gender distribution across the symptom severity groups shows a higher percentage of males than females as the severity increases, with 51.0% of males exhibiting no symptoms, 57.3% with mild symptoms, and 61.9% with severe symptoms. Conversely, females account for 49.0%, 42.7%, and 38.1% of individuals in the respective symptom severity categories. Age-wise, the majority of participants across all severity groups are aged 55 years or younger, representing 63.3% of those with no symptoms, 61.9% with mild symptoms, and 62.6% with severe symptoms. Participants aged 56â65 make up a smaller proportion, while those over 65 years old represent 21.5% in the no symptoms and mild symptoms groups and decrease to 17.7% in the severe symptoms group. The average Body Mass Index (BMI) escalates with symptom severity, being lowest in the no symptoms group (28.5â±â6.51) and highest among those with severe symptoms (30.9â±â8.38). In terms of comorbid conditions like Congestive Heart Failure (CHF) and Coronary Heart Disease (CHD), the majority of participants did not have documented cases, but thereâs a notable trend where treated cases of CHF and CHD increase with the severity of AD symptoms. Specifically, treated CHF is reported in 2.5% of participants without symptoms, 4.5% with mild symptoms, and 6.5% with severe symptoms. Similarly, treated CHD is seen in 3.4% without symptoms, 4.4% with mild symptoms, and an equal percentage of 6.4% in both the mild and severe groups. Physical activity measured in minutes shows a consistent pattern across all severity levels for high and moderate-intensity work-related activities and walking/cycling, with only a slight variation in the time spent on these activities. Interestingly, there is a marginal decrease in the minutes dedicated to vigorous and moderate recreational activities as symptom severity increases.
Evaluation of the differentiation ability of the RSF-based model
Table 2 offers a comparison of RSF and Cox proportional hazards models in their ability to predict survival based on the severity of depressive symptoms. Performance metrics include iAUC/tAUC, integrated Brier score/prediction error (iBS/PE), and the concordance index (C-index), each accompanied by their 95% confidence intervals (CI) and P values, where applicable32. For participants with no depressive symptoms, the RSF model shows an iAUC/tAUC of 0.781 with a 95% CI of 0.778â0.839 and an iBS/PE of 0.150 with a 95% CI of 0.083â0.122. The model also achieves a C-index of 0.785 with a 95% CI of 0.776â0.800, indicating a good predictive ability. No P values are reported for the RSF model, suggesting that these results may serve as a baseline for comparison. The Cox model, when applied to the same group of participants, reveals a slightly lower iAUC/tAUC of 0.765 (95% CI 0.760â0.844) and a higher iBS/PE of 0.358 (95% CI 0.357â0.359), implying a less precise prediction of survival. The C-index is 0.793 (95% CI 0.781â0.806), with all P values for the Cox modelâs metrics being significant (Pâ<â0.001), indicating that the differences observed between the Cox model and the RSF model are statistically significant. In the case of participants with mild depressive symptoms, the RSF modelâs performance shows an iAUC/tAUC of 0.764 and an iBS/PE of 0.150, with respective confidence intervals slightly narrowing compared to the group with no symptoms. The C-index here is 0.755. The Cox model scores slightly lower on the iAUC/tAUC at 0.745 and has a similarly high iBS/PE as seen in participants without symptoms, at 0.356. The C-index for the Cox model is 0.774, with all P values again significant. For moderate depressive symptoms, the RSF modelâs iAUC/tAUC increases to 0.808, suggesting improved discrimination compared to the other groups, and maintains a consistent iBS/PE at 0.149. The C-index shows a slight decrease to 0.750. The Cox modelâs performance is relatively close to the RSF with an iAUC/tAUC of 0.796 and iBS/PE of 0.351, with its C-index at 0.748. All P values are significant, indicating the Cox modelâs consistent performance across different severity levels.
The first graph (Fig. 2a) depicts the Receiver Operating Characteristic (ROC) curves for three groups categorized by the severity of depressive symptomsânone, mild, and moderateâwhen predicting survival. The area under the curve (AUC) values suggest the modelâs discriminatory power, with the âModerateâ group achieving the highest AUC of 0.831, followed by the âMildâ group at 0.825, and the âNoneâ group at 0.767. This indicates that the model is most adept at distinguishing survival outcomes in the âModerateâ group, with âMildâ also showing strong predictive accuracy. In the second graph (Fig. 2b), the time-dependent AUC (tAUC) for survival predictions over a period of 90 months is plotted for both the Random Survival Forest (RSF) and Cox proportional hazards models. For both models, the tAUC metrics fluctuate over time for each depressive symptom severity category. The RSF model consistently demonstrates higher tAUC values in the early months across all severity levels, suggesting a stronger initial predictive performance compared to the Cox model. However, as time progresses, there is an apparent decline in tAUC for both models, indicating a reduction in predictive accuracy with increasing time horizons. Overall, the graphs present an analysis of the predictive capabilities of the RSF and Cox models in the context of depressive symptom severity over time. The models exhibit variations in performance, with neither maintaining a constant predictive accuracy throughout the observed period. The initial higher tAUC values for the RSF model indicate a potentially more robust predictive utility in the short term, particularly for patients with moderate symptoms, while the Cox model displays relative stability in predictive performance, albeit at a slightly lower accuracy level. This comparative performance analysis is critical for understanding the temporal dynamics of survival prediction models in clinical settings. Supplementary Figure S1 presents calibration curves for the RSF (reference) and Cox models in both training and test sets. In every prognostic stratum (none, mild, and moderate), the RSF curves lie closer to the 45-degree line of perfect calibration, underscoring its superior agreement between predicted and observed survival probabilities.
The horizontal bar chart provided appears to represent the results of a variable importance analysis, likely from a predictive model relating to health outcomes (shown in Fig. 3). The variables are ranked by their importance scores on the x-axis, which seem to measure the impact each variable has on the modelâs predictions. The length and direction of the bars suggest the degree and direction of the relationship of each variable with the outcome being predicted, with longer bars indicating a higher importance or stronger relationship. At the top of the chart, âAgeâ shows the greatest positive importance, indicating that it is likely the most significant predictor in the model. This is followed by variables related to work activity level, blood pressure readings, body mass index (BMI), and various health-related behaviors and conditions such as smoking status, alcohol consumption, and history of diseases like congestive heart failure and stroke. Error bars are included for each variable, though due to the scale of the chart, they are mostly not discernible, suggesting precise estimates of the importance scores. Variables toward the bottom of the chart, including 'Race/Ethnicity,' 'Walking/Biking Minutes,' and 'Fasting Blood Glucose,' exhibit less importance in the model relative to the top-ranked variables.
Patient clinical benefit evaluation of the RSF-based model
The set of graphs represents a Decision Curve Analysis (DCA) for evaluating the clinical benefit of Random Survival Forest (RSF)-based and Cox proportional hazards models in predicting survival outcomes with varying severitiesânone, mild, and severe. Each pair of graphs corresponds to a different severity of survival prediction, with the left graph of each pair representing the training cohort and the right graph representing the test cohort (shown in Fig. 4). For all severities and in both cohorts, the RSF model and Cox model lines show the net benefit across a range of threshold probabilities. The net benefit is compared against two default strategies: âtreat allâ and 'treat none.' âTreat allâ assumes all patients have the event, and âtreat noneâ assumes no patients have the event. The higher the line, the greater the net benefit of using the model at that threshold probability. In every graph, thereâs a range of threshold probabilities where using the RSF and Cox models provides a greater net benefit than either default strategy, suggesting that the models have practical value. The shaded area around the lines for the RSF and Cox models may indicate confidence intervals, suggesting the uncertainty around the net benefit estimates. For the âNoneâ and âMildâ survival predictions, both models show some clinical benefit over the âtreat allâ and âtreat noneâ strategies in both the training and test cohorts. However, for âSevereâ survival prediction, the modelsâ net benefit closely aligns with the âtreat allâ strategy, especially in the test cohort, indicating that the modelsâ predictions align with a more conservative approach to predicting severe survival outcomes. Overall, the decision curves indicate that the RSF and Cox models have varying levels of utility depending on the severity of the survival outcome being predicted and the cohort being considered.
Patient clinical benefit evaluation of the model RSF-based and COX-based. Decision Curve Analysis for None, Mild, Severe Survival Prediction of Rsf model and COX-based in Training and Test Cohorts. Decision curve analysis (DCA) for predicting None survival in the training cohort (a) and test cohort (b); Mild survival in the training cohort (c) and test cohort (d); Severe survival in the training cohort (e) and test cohort (f).
Patient survival analysis based on the RSF model
The series of KaplanâMeier curves in Fig. 5 illustrate the survival analysis of patients based on risk stratification from the Random Survival Forest (RSF) model, across two cohortsâtraining and testâand three levels of survival outcomes: None, Mild, and Severe. In each graph, two groups are compared: the Low risk group (blue line) and the High risk group (red line). Across all graphs, the Low risk group consistently shows a higher survival rate over time compared to the High risk group. The separation between the survival curves of the two groups is significant, as indicated by P-values less than 0.001 in all cases, suggesting that the RSF model is effective in differentiating between higher and lower-risk patients in terms of survival. For both the training (graphs a, c, e) and test cohorts (graphs b, d, f), as the survival outcome severity increases from None to Severe, the High risk groupâs survival rate decreases more rapidly over time, which is highlighted by the steeper curves. The Chi-square (Ï2) statistics provided alongside the P-values confirm the significance of the differences between the risk groups, with higher values indicating a stronger distinction between the groupsâ survival rates. The consistency of these findings across different severities of survival outcomes and both cohorts underscores the robustness of the RSF model in survival prediction and risk categorization.
Figure 6 presents the SHAP-based33 feature importance results for the three-fold cross-validated RSF models, with each panel corresponding to a single fold (a: fold 1, C-indexâ=â0.787; b: fold 2, C-indexâ=â0.801; c: fold 3, C-indexâ=â0.810). In all three folds, Age emerged as the strongest predictor of mortality risk, exhibiting the highest mean absolute SHAP value across the background samples. Systolic blood pressure measurementsâparticularly earlier readings (1st and 2nd timepoints)âconsistently ranked among the top three features, indicating that blood pressure dynamics play a key role in survival prediction. BMI and diabetes status (self-reported âthe doctor told me I have diabetesâ) also appeared among the top five features in each fold, underscoring the importance of metabolic factors. Fold 1 additionally highlighted diastolic blood pressure (at multiple timepoints) as an influential variable, whereas fold 2 identified congestive heart failure and fold 3 emphasized smoking-related variables (âAre you smoking nowâ and âHow long have you quit smokingâ) among its top predictors. Despite minor variations in the rank order of less dominant features, the recurrent prominence of age, blood pressure metrics, BMI, and diabetes across all three folds demonstrates the robustness and reproducibility of the RSF modelâs feature selection.
SHAP-based feature importance for three-fold cross-validated Random Survival Forest models. (a) Fold 1 (Validation C-indexâ=â0.787); (b) Fold 2 (Validation C-indexâ=â0.801); (c) Fold 3 (Validation C-indexâ=â0.810). Each panel displays the top 10 features ranked by mean absolute SHAP value (horizontal axis), with feature names on the vertical axis.
Discussion
The inclusion of lifestyle and physical activity as variables in our model is particularly pertinent to the clinical management of AD. Research has increasingly highlighted these factors as modifiable risk elements that significantly influence the progression and outcomes of AD34,35. By integrating these factors, our study not only aligns with current research trends but also opens avenues for preventive strategies rooted in lifestyle modifications36. This approach supports personalized medicine by tailoring intervention strategies to individual risk profiles, thereby optimizing patient management and potentially delaying disease progression. From a clinical perspective, the ability to accurately predict mortality risk is invaluable. It allows healthcare providers to identify high-risk patients early, enabling earlier interventions that could significantly alter disease trajectories. Moreover, such predictive capabilities support the allocation of healthcare resources more effectively, ensuring that patients who are most at risk receive appropriate care promptly. The RSF modelâs robust performance across different levels of symptom severity also suggests its adaptability to various stages of AD, making it a versatile tool in clinical practice. However, the studyâs reliance on a specific machine learning model and dataset highlights typical challenges in medical research, such as model generalizability and data dependency. The variability in the RSF modelâs performance across test cohorts points to potential issues in generalizing findings without extensive external validation. Therefore, future research should focus on validating these models across diverse demographic groups and healthcare settings to ascertain the robustness and applicability of the findings. Additionally, the significant influence of age as a predictive variable warrants a deeper investigation into the interaction between age, genetic factors, and lifestyle choices in AD progression.
The results from our study highlight the significance of categorizing individuals with AD based on depression severity, as assessed by the PHQ-9. Participants were stratified into three groups according to PHQ-9 scores: none (0â4), mild (5â9), and severe (10â14), aligning with established clinical thresholds19. Our analysis revealed notable differences among these groups. Specifically, the proportion of males increased with depression severity from 51.0% in the none group to 57.3% in mild and 61.9% in severe groups. In contrast, females showed a decreasing trend from 49.0 to 42.7% and 38.1%, respectively. BMI values also showed an increasing trend, with mean BMI rising from 28.5â±â6.51 in participants without depressive symptoms to 30.0â±â7.79 in mild and 30.9â±â8.38 in severe cases. Furthermore, comorbid conditions such as CHF and CHD demonstrated a higher prevalence of treated cases in groups with greater depressive severity; for instance, treated CHF cases increased from 2.5% in participants without symptoms to 6.5% in severe cases. Similarly, treated CHD rose from 3.4% in the none group to 6.4% in the severe group. Physical activity metrics indicated a marginal decline in vigorous and moderate recreational activities as depression severity increased. Such classification is critical because depressive symptoms can markedly impact cognitive functioning, accelerate disease progression, alter responsiveness to treatment interventions, and significantly reduce quality of life in AD patients21,36. Thus, differentiating AD populations by depression severity provides essential insights for targeted clinical interventions and individualized therapeutic strategies21,36.
The application of the RSF model within this study has elucidated the potential of machine learning techniques in predicting mortality risk in AD patients. In Table 2, the P values (<â0.001) for the Cox modelâs performance metrics indicate that the observed differences compared with the RSF model are statistically significant. Notably, the RSF model yields consistently higher iAUC/tAUC values and lower iBS/PE values across varying levels of depressive symptoms, suggesting a superior predictive accuracy that is unlikely to be attributed to chance. A key advantage of the RSF over the Cox proportional hazards model lies in its nonparametric, tree-based architecture, which enables the RSF to capture complex, non-linear relationships and interactions among predictors without relying on the proportional hazards assumption. By aggregating multiple survival trees, the RSF approach inherently reduces overfitting and can better accommodate high-dimensional data or intricate predictor dependencies. Moreover, the ensemble structure of RSF allows it to adaptively weight variables and refine split points, potentially improving the modelâs calibration and discrimination. Consequently, the RSF consistently demonstrates superior predictive performance and yields statistically significant improvements over the Cox model. Our findings suggest that the RSF model outperforms the traditional Cox proportional hazards model, particularly in terms of calibration and discrimination ability. The superior performance of the RSF model, evidenced through higher iAUC values and lower prediction errors across various symptom severities, reinforces the role of advanced computational models in enhancing prognostic accuracy in clinical settings. Notably, the studyâs emphasis on lifestyle and physical activity as modifiable risk factors aligns with a growing body of literature that recognizes their influence on AD progression. By incorporating these variables into our model, we contribute to a more holistic understanding of AD and open avenues for preventive strategies that are grounded in behavioral modification37. This aspect underscores the vital importance of personalized medicine, which tailors intervention strategies to the individualâs unique risk profile.
We have analyzed the fundamental differences between the RSF model and the traditional Cox proportional hazards model, both in theoretical principles and empirical results. The RSF model, being a non-parametric approach, offers a distinct advantage in handling complex interactions and non-linear relationships without the need for explicit assumptions about the hazard functions, unlike the Cox model which assumes proportional hazards. Empirically, our results demonstrate that the RSF model provides better calibration and discriminatory ability across various AD symptom severities. This is particularly evident in the RSF modelâs superior performance in accurately predicting outcomes in patients with severe symptoms, where traditional models like the Cox might underperform due to their linear assumptions.
However, this study is not without limitations. The predictive performance of the RSF model, while promising, displayed variability when applied to the test cohorts. This variation underscores the challenge of generalizing machine learning models to diverse populations and real-world scenarios. Future research could focus on external validation of our findings across different demographic groups and healthcare settings to ensure the modelâs robustness and applicability. Moreover, the significant role of age as a predictive variable raises critical questions about the complex interplay between genetic factors and lifestyle choices. The trend observed, where a higher percentage of males exhibited more severe symptoms, may indicate gender-specific pathways in AD progression that merit further exploration.
Future iterations of this research should also consider the integration of additional biomarkers and genetic information, as well as the potential impact of environmental factors. Such data could enhance model complexity and potentially uncover new relationships that were not previously apparent. In conclusion, the current study demonstrates that machine learning models, particularly the RSF model, are valuable tools for predicting mortality in AD. These models provide significant insights that could aid in the early identification of high-risk individuals and the development of targeted interventions. Nonetheless, these tools should be seen as complementary to, rather than replacements for, traditional clinical judgment and the nuanced understanding that healthcare professionals bring to patient care.
Conclusion
In summary, our study demonstrates that machine learningâparticularly the RSF modelâprovides a robust framework for predicting mortality risk in Alzheimerâs disease patients by integrating traditional clinical variables with lifestyle and physical activity measures. The RSF consistently outperformed the Cox proportional hazards model across multiple evaluation metrics (iAUC, iBS/PE, and Câindex), highlighting its ability to capture nonlinear interactions and complex relationships among predictors. Importantly, modifiable factors such as total physical activity and smoking status emerged as significant contributors to mortality risk, underscoring the potential of targeted lifestyle interventions to improve patient outcomes. While our findings suggest that RSFâbased models can support more accurate risk stratification and personalized care plans, further validation in diverse, external cohorts is essential to confirm generalizability and ensure broader clinical utility.
Limitations
Although our findings demonstrate that the RSF model outperformed traditional Cox proportional hazards models, several important limitations must be considered. First, machine-learning approaches like RSF, while effective at modeling complex, non-linear relationships, carry an inherent risk of overfitting, especially when working with relatively small or highly heterogeneous samples. Second, our study utilized internal validation through a hold-out method but did not include validation on an independent external dataset. Testing the RSF model in external cohorts is critical for assessing its robustness and confirming generalizability before clinical implementation. Third, it should be noted that the NHANES dataset used in our analysis primarily represents the U.S. population, which may limit the global applicability of our findings. Populations with different demographic characteristics, genetic backgrounds, or healthcare systems may yield different predictive performances. Lastly, while RSF provides clear advantages in predictive accuracy and flexibility, its complexity may pose practical challenges for clinicians, including interpretability difficulties and increased computational requirements, which could hinder its widespread adoption in routine clinical settings. In addition, while the PHQ-9 is widely used, it has limitations as a proxy for AD severity. As a self-report tool, it can be influenced by recall bias and transient mood fluctuations, and its standard cutâoffs were validated in general populations rather than AD cohorts, risking misclassification. And physical activity was assessed via selfâreported questionnaires, which introduces potential biases.
Future studies should therefore balance predictive performance with clinical utility, emphasizing external validation across diverse populations. This entails not only testing the RSF model in independent cohortsâpreferably from different geographic regions and healthcare systemsâbut also adopting prospective and longitudinal designs that can capture evolving patterns of risk. Such efforts will help ensure that RSF, and similar approaches, provide robust, generalizable, and clinically meaningful predictions across a wide range of patient populations.
Data availability
The datasets and code that support the findings of this study are openly available in the GitHub repository at the following URL: https://github.com/Drcreater/AD_. This repository contains the complete set of processed data and the Python code utilized for the statistical analyses and model development described within our manuscript. The datasets generated and analyzed during the current study are available on reasonable request. To request these data, please contact Ruihong Tang at [email protected].
References
2019 Alzheimerâs disease facts and figures. Alzheimerâs Dementia. 15, 321â387. https://doi.org/10.1016/j.jalz.2019.01.010 (2019).
Rajan, K. B. et al. Population estimate of people with clinical Alzheimerâs disease and mild cognitive impairment in the United States (2020â2060). Alzheimerâs Dement. 17, 1966â1975. https://doi.org/10.1002/alz.12362 (2021).
Snyder, H. M. et al. Sex biology contributions to vulnerability to Alzheimerâs disease: A think tank convened by the Womenâs Alzheimerâs Research Initiative. Alzheimerâs Dement. 12, 1186â1196. https://doi.org/10.1016/j.jalz.2016.08.004 (2016).
Lopez-Lee, C., Torres, E. R. S., Carling, G. & Gan, L. Mechanisms of sex differences in Alzheimerâs disease. Neuron 112, 1208â1221. https://doi.org/10.1016/j.neuron.2024.01.024 (2024).
Harper, L. C. (n.d.). 2020 Alzheimerâs Association Facts and Figures.
Livingston, G. et al. Dementia prevention, intervention, and care: 2020 report of the Lancet Commission. Lancet 396, 413â446. https://doi.org/10.1016/S0140-6736(20)30367-6 (2020).
Norton, S., Matthews, F. E., Barnes, D. E., Yaffe, K. & Brayne, C. Potential for primary prevention of Alzheimerâs disease: an analysis of population-based data. Lancet Neurol. 13, 788â794. https://doi.org/10.1016/S1474-4422(14)70136-X (2014).
Deckers, K. et al. Target risk factors for dementia prevention: a systematic review and Delphi consensus study on the evidence from observational studies. Int. J. Geriatr. Psychiatry 30, 234â246. https://doi.org/10.1002/gps.4245 (2015).
Xu, W. et al. Meta-analysis of modifiable risk factors for Alzheimerâs disease. J. Neurol Neurosurg. Psychiatry https://doi.org/10.1136/jnnp-2015-310548 (2015).
Kivipelto, M. et al. Obesity and vascular risk factors at midlife and the risk of dementia and Alzheimer disease. Arch. Neurol. https://doi.org/10.1001/archneur.62.10.1556 (2005).
Chaudhury, S. et al. Polygenic risk score in postmortem diagnosed sporadic early-onset Alzheimerâs disease. Neurobiol. Aging 62, 244.e1-244.e8. https://doi.org/10.1016/j.neurobiolaging.2017.09.035 (2018).
Rueda, A., Arevalo, J., Cruz, A., Romero, E., & González, F. A. Bag of features for automatic classification of alzheimerâs disease in magnetic resonance images. 559â566. https://doi.org/10.1007/978-3-642-33275-3_69 (2012).
Yuan, G.-X., Chang, K.-W., Hsieh, C.-J. & Lin, C.-J. A comparison of optimization methods and software for large-scale l1-regularized linear classification. J. Mach. Learn. Res. 11, 3183â3234 (2010).
Wang, L. et al. Development and validation of a deep learning algorithm for mortality prediction in selecting patients with dementia for earlier palliative care interventions. JAMA Netw. Open. 2, e196972. https://doi.org/10.1001/jamanetworkopen.2019.6972 (2019).
Makridakis, S., Spiliotis, E. & Assimakopoulos, V. Statistical and machine learning forecasting methods: Concerns and ways forward. PLoS ONE 13, e0194889. https://doi.org/10.1371/journal.pone.0194889 (2018).
Spooner, A. et al. A comparison of machine learning methods for survival analysis of high-dimensional clinical data for dementia prediction. Sci. Rep. 10, 20410. https://doi.org/10.1038/s41598-020-77220-w (2020).
Chang, C.-H., Lin, C.-H. & Lane, H.-Y. Machine learning and novel biomarkers for the diagnosis of Alzheimerâs disease. Int. J. Mol. Sci. 22, 2761. https://doi.org/10.3390/ijms22052761 (2021).
Centers for Disease Control, and (CDC), P. National Health and Nutrition Examination Survey: Laboratory Procedures Manual. https://wwwn.cdc.gov/nchs/data/nhanes/public/2021/manuals/2021-Laboratory-Procedures-508.pdf (2021).
Kroenke, K., Spitzer, R. L. & Williams, J. B. W. The PHQ-9. J. Gen. Intern. Med. 16, 606â613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x (2001).
Shen, W., Su, Y., Guo, T., Ding, N. & Chai, X. The relationship between depression based on patient health questionaire-9 and cardiovascular mortality in patients with hypertension. J. Affect Disord. 345, 78â84. https://doi.org/10.1016/j.jad.2023.10.059 (2024).
Baumgartner, A. et al. Regional neuronal activity in patients with relapsing remitting multiple sclerosis. Acta Neurol. Scand. 138, 466â474. https://doi.org/10.1111/ane.13012 (2018).
James, B. D. et al. Contribution of Alzheimer disease to mortality in the United States. Neurology 82, 1045â1050. https://doi.org/10.1212/WNL.0000000000000240 (2014).
Huang, Y. & Mucke, L. Alzheimer mechanisms and therapeutic strategies. Cell 148, 1204â1222. https://doi.org/10.1016/j.cell.2012.02.040 (2012).
Lynch, C. World Alzheimer Report 2019: Attitudes to dementia, a global survey: Public health: Engaging people in ADRD research. Alzheimerâs Dement. 16, e038255 (2020).
Lee, H. & Singh, G. K. Disparities in all-cancer and lung cancer survival by social, behavioral, and health status characteristics in the United States: A longitudinal follow-up of the 1997â2015 national health interview survey-national death index record linkage study. J. Cancer Prev. 27, 89â100. https://doi.org/10.15430/JCP.2022.27.2.89 (2022).
Kaufman-Shriqui, V., Navarro, D. A., Salem, H. & Boaz, M. Mediterranean diet and healthâa narrative review. Funct. Foods Health Dis. 12, 479. https://doi.org/10.31989/ffhd.v12i9.989 (2022).
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, (Montreal, Canada), 1137â1145 (1995).
Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B Stat. Methodol. 36, 111â133. https://doi.org/10.1111/j.2517-6161.1974.tb00994.x (1974).
Deardorff, W. J. et al. Development and external validation of a mortality prediction model for community-dwelling older adults with dementia. JAMA Intern. Med. 182, 1161. https://doi.org/10.1001/jamainternmed.2022.4326 (2022).
Steyerberg, E. W. & Vergouwe, Y. Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur. Heart J. 35, 1925â1931. https://doi.org/10.1093/eurheartj/ehu207 (2014).
Uno, H., Cai, T., Pencina, M. J., DâAgostino, R. B. & Wei, L. J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 30, 1105â1117. https://doi.org/10.1002/sim.4154 (2011).
Blanche, P., Dartigues, J. & Jacqmin-Gadda, H. Estimating and comparing time-dependent areas under receiver operating characteristic curves for censored event times with competing risks. Stat. Med. 32, 5381â5397. https://doi.org/10.1002/sim.5958 (2013).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30 (2017).
Livingston, G. et al. Dementia prevention, intervention, and care. Lancet 390, 2673â2734. https://doi.org/10.1016/S0140-6736(17)31363-6 (2017).
Ngandu, T. et al. A 2 year multidomain intervention of diet, exercise, cognitive training, and vascular risk monitoring versus control to prevent cognitive decline in at-risk elderly people (FINGER): a randomised controlled trial. Lancet 385, 2255â2263. https://doi.org/10.1016/S0140-6736(15)60461-5 (2015).
Ownby, R. L., Crocco, E., Acevedo, A., John, V. & Loewenstein, D. Depression and risk for Alzheimer disease. Arch. Gen. Psychiatry 63, 530. https://doi.org/10.1001/archpsyc.63.5.530 (2006).
Bains, J., Birks, J. & Dening, T. Antidepressants for treating depression in dementia. Cochrane Database Syst. Rev. https://doi.org/10.1002/14651858.CD003944 (2002).
Acknowledgements
The authors acknowledge the support of the Institute for Healthcare Artificial Intelligence, Guangdong Second Provincial General Hospital. We would also like to thank all students who participated in this study.
Author information
Authors and Affiliations
Contributions
R.T. and L.T. conceived the experiment(s), R.T. and Z.S. conducted the experiment(s), and R.T. and Z.Z. analyzed the results. R.T. and P.C. wrote the paper. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
The National Center for Health Statistics (NCHS) and the Centers for Disease Control and Prevention (CDC) conducted the NHANES survey. The National Center for Health Statisticsâ Research Ethics Assessment Board evaluated and approved the NHANES study protocol. This study used a de-identified publicly available dataset that did not involve direct human experimentation or further interventions, and no additional ethical approvals were required for this type of research. For further confirmation, please refer to the link to the NCHS ethics approval document for the NHANES data: https://www.cdc.gov/nchs/nhanes/irba98.htm.
Informed consent
The fully informed participants consented to all the procedures that were performed as well as with the publication of the details written in this paper. They were informed about the possibility to withdraw from participation at any time without consequences.
Additional information
Publisherâs note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Tang, R., Tan, L., Chen, X. et al. Predicting mortality risk in Alzheimerâs disease using machine learning based on lifestyle and physical activity. Sci Rep 15, 26928 (2025). https://doi.org/10.1038/s41598-025-11819-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-11819-9