Development and validation of a risk prediction model for kinesiophobia in postoperative lung cancer patients: an interpretable machine learning algorithm study

Li, Chuang; Lin, Youbei; Xiao, Xuyang; Guo, Xinru; Fei, Jinrui; Lu, Yanyan; Zhao, Junling; Zhang, Lan

doi:10.1038/s41598-025-03575-7

Download PDF

Article
Open access
Published: 03 June 2025

Development and validation of a risk prediction model for kinesiophobia in postoperative lung cancer patients: an interpretable machine learning algorithm study

Scientific Reports volumeÂ 15, ArticleÂ number:Â 19412 (2025) Cite this article

1696 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Kinesiophobia is particularly common in postoperative lung cancer patients, which causes patients may be reluctant to cough and move due to misperception, internal fear or fear of pain, and avoid rehabilitation training affecting postoperative recovery. Therefore, it is clinically important to discover the factors associated with the occurrence of kinesiophobia and to develop a prediction model. This study aims to investigate the occurrence of kinesiophobia in postoperative lung cancer patients and to develop a prediction model to assess its performance, thereby providing a reference for clinical decision-making. A cross-sectional study involving 519 postoperative lung cancer patients from a tertiary hospital in Liaoning Province was conducted. The least absolute shrinkage and selection operator (LASSO) and multifactor logistic regression were used to screen predictors. Subsequently, six machine learning (ML) models were developed and compared to identify the optimal model. The importance of feature variables was ranked and interpreted to facilitate risk assessment. The incidence of kinesiophobia among postoperative lung cancer patients was 43.74%. Positive coping style, social support, pain severity, personal income, surgical history, and gender were identified as significant predictors of kinesiophobia. Among the evaluated models, the RF model demonstrated the best performance, with an AUROC of 0.893, accuracy of 0.803, precision of 0.732, recall of 0.870, and F1 score of 0.795. The calibration curve of the RF model closely aligned with the ideal 45-degree diagonal, indicating strong agreement between predicted and observed outcomes. Furthermore, DCA revealed that the RF model provided the highest net benefit in predicting postoperative agoraphobia in lung cancer patients. This study demonstrates that machine learning modelsâ€”particularly the RF algorithmâ€”hold substantial promise for predicting kinesiophobia in postoperative lung cancer patients. By integrating individual background characteristics along with physical, psychological, and social factors, the RF model effectively identifies high-risk patients and provides a valuable foundation for early clinical screening and intervention. These findings underscore the critical influence of multidimensional factors in the development of postoperative kinesiophobia and highlight the advantages of machine learning in enhancing predictive accuracy and supporting personalized medical decision-making. To improve the modelâ€™s generalizability and clinical utility, future research should incorporate heterogeneous datasets from multiple regions and healthcare institutions to ensure broader applicability and greater robustness.

A machine learning based prediction model for short term efficacy of nasopharyngeal carcinoma

Article Open access 21 May 2025

Using machine learning algorithms to predict risk factors of heart failure after complete mesocolic excision in colorectal cancer patients

Article Open access 15 July 2025

Developing a risk prediction tool for lung cancer in Kent and Medway, England: cohort study using linked data

Article Open access 17 October 2023

Introduction

Malignant tumors are a significant threat to public health and represent a major global health concern¹. According to GLOBOCAN statistics, by 2022, the estimated global number of new cancer cases and deaths will reach 19.976Â million and 9.744Â million, respectively. Lung cancer will account for the highest burden, with 2.5Â million new cases and 1.8Â million deaths². Lung cancer (LC) is a primary malignant tumor of the lung that originates in the bronchial mucosa or glands of the lungs³, of which non-small cell lung cancer (NSCLC) accounts for about 85% of the total number of lung cancers, and it is one of the major causes of the highest incidence and cancer-related deaths⁴.

Lung cancer incidence has been increasing annually, and although various comprehensive treatments are available, surgery remains one of the most common and crucial interventions⁵. With advances in medical technology, thoracic surgical procedures have shifted from traditional open-heart surgery to minimally invasive thoracoscopic surgery⁶. However, studies continue to show that postoperative lung cancer patients experience moderate to severe pain or worse⁷. This leads patients to become reluctant to cough or move due to misperceptions, internal fear, or fear of pain. They may even rely on pain medication, avoid rehabilitation, and ultimately develop kinesiophobia. The concept of â€œkinesiophobia,â€ defining it as an excessive, irrational fear of physical activity resulting from heightened pain sensitivity following an injury or damage⁸. The fear-avoidance model, first proposed by Lethem et al. in 1983⁹, explains how negative perceptions of pain influence patientsâ€™ physical activity. Central to this model is the individualâ€™s cognitive and emotional response to painful stimuli. When pain is perceived as a threat to health, patients may develop maladaptive beliefs and engage in avoidance behaviors to prevent perceived re-injury. In 1990, Kori et al. introduced the term kinesiophobia to describe this excessive, irrational, and debilitating fear of movement stemming from the anticipation of pain or injury. Once kinesiophobia develops in a postoperative lung cancer patient, it not only exacerbates anxiety, depression, and other negative emotions, but also impairs postoperative lung function and can lead to disuse syndrome and loss of functional ability, ultimately hindering recovery¹⁰. In recent years, the application of machine learning (ML) algorithms in healthcare has introduced new perspectives and methods for risk assessment¹¹. Developing machine learning models based on patientsâ€™ clinical data and multi-dimensional psychological, physiological, and socio-environmental risk factors can help clinicians identify high-risk patients earlier, enabling targeted interventions to improve postoperative recovery outcomes^12,13. ML models present distinct advantages over traditional statistical approaches. While conventional methods often rely on predefined assumptions and may struggle with capturing complex, non-linear associations, ML algorithms can autonomously uncover hidden patterns in high-dimensional data without requiring explicit assumptions. This capability makes ML particularly effective in addressing the complexity and heterogeneity inherent in healthcare data. Machine learning algorithms are widely used for early identification of diseases, prognosis prediction, and individualized treatment plan development. Postoperative rehabilitation following lung cancer surgery is often accompanied by physical limitations, pain, and psychological distress. Among these issues, kinesiophobia may significantly impede recovery. While kinesiophobia has been extensively studied in musculoskeletal conditions, its presence and implications in postoperative lung cancer patients remain underexplored. Given the importance of early mobilization for optimal recovery, kinesiophobia may delay rehabilitation, reduce physical functioning, and negatively impact long-term outcomes. To address this challenge, the present study aims to develop a machine learning-based model to predict the risk of kinesiophobia in patients after lung cancer surgery. By integrating demographic, psychological, and social support variables, the model seeks to assist clinicians in early identification of high-risk individuals and to facilitate tailored interventions. This work not only offers novel insights for clinical nursing practice but also provides scientific support for enhancing the long-term recovery and quality of life among lung cancer survivors.

Materials and methods

Design and participants

This cross-sectional study included 519 postoperative lung cancer patients hospitalized at the First Affiliated Hospital of Jinzhou Medical University, Liaoning Province, from February to December, 2024. Inclusion criteria: (1) Lung cancer was diagnosed based on clinical symptoms, chest computed tomography (CT), other imaging modalities, histopathological examination, and molecular testing, in accordance with the diagnostic criteria outlined in the 2023 Edition of the Primary Lung Cancer Diagnosis and Treatment Guidelines¹⁴; (2) Patients who underwent surgical resection for lung cancer; (3) ageâ€‰â‰¥â€‰18 years; (4) conscious individuals with normal reading, comprehension, and communication abilities, capable of completing the assessment; (5) provided informed consent and voluntarily participated in the study; Exclusion criteria: (1) Patients with malignant tumors at other sites; (2) individuals with cognitive dysfunction; and (3) those currently using antidepressant or anxiolytic medications(To avoid the confounding effects of pre-existing states of depression and anxiety). The predictive model was developed following the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) guidelines¹⁵. The design and process of this study are illustrated in Fig.Â 1.

Ethical approval statement

This study was approved by the Ethics Review Committee of Jinzhou Medical University (JZMULL2025007). All procedures conducted in this study adhered to the ethical standards set by the universityâ€™s committee and complied with the 1964 Declaration of Helsinki and its subsequent amendments. Written informed consent was obtained from all participants prior to the commencement of the study.

Tools

General information questionnaire (GIQ)

It includes gender, age, educational attainment, occupation, and other variables, as detailed in Fig.Â 2. GIQ was developed based on a literature quality assessment and the Delphi method.

Tampa scale for kinesiophobia (TSK)

It is a patient self-administered scale developed by Miller¹⁶ in 1991, and was first translated and culturally adapted by Chinese scholar Wen Hu¹⁷ in 2012 to form the Chinese version of the TSK, with a Cronbachâ€™s alpha coefficient of 0.778. The TSK consists of 17 items and is scored on a Likert 4 scale, with items 4, 8, 12, and 16 being reversely scored. The range of the scale is 17â€“68 points, and a score of more than 37 points can be recognized as Kinesiophobia.

General self-efficacy scale (GSES)

It was developed by German psychologist Schwarzer¹⁸ in 1981, and Chinese version of GSES was Chineseized by our scholar Caikang Wang¹⁹ in 2001. The scale comprises 10 unidimensional items, each rated on a 4-point Likert scale, yielding a total score ranging from 10 to 40. Higher scores indicate greater general self-efficacy. The scale demonstrates good internal consistency, with a Cronbachâ€™s Î± coefficient of 0.87.

Hospital anxiety and depression scale (HADS)

It was developed by Zigmond²⁰ in 1983 and consists of two subscales, anxiety (items 1, 3, 5, 7, 9, 11, 13) and depression (items 2, 4, 6, 8, 10, 12, 14), with a total of 14 items. The scale was based on 4-point Likert scale, with a total score of 0â€“21, and the higher the score, the higher the anxiety and depression level of the patients, with a Cronbachâ€™s alpha of 0.890.

Social support rate scale (SSRS)

It was developed by Shuiyuan Xiao²¹ in 1986 to measure the level of social support of patients. The scale contains 10 entries divided into three dimensions: objective support (item 2, 6, and 7), subjective support (item 1, 3, 4, and 5), and utilization of social support (item 8, 9, and 10). The scoring criteria were as follows: questions 1, 4, and 8â€“10 were single-choice questions worth 1â€“4 points, with the options corresponding to the score; question 5 consisted of five sub-items, ranging from no support to full support scoring 1â€“4 points, with the sum of the sub-item scores being the total score for question 5; questions 6 and 7 scored 0 points for the choice of â€œno sourceâ€, and 0 points for choosing any other source, with the number of options scored. Questions 6 and 7 were scored 0 if â€œno sourceâ€ was chosen and 0 if other sources were chosen. Higher total and dimension scores indicate higher levels of social support. The Cronbachâ€™s alpha for this scale is 0.723.

Simplified coping style questionnaire (SCSQ)

It was revised by Chinese scholar Yaning Xie²² in 1988 to assess the coping level of individuals facing stress or adversity. The scale consists of 20 questions. The scale is based on a 4-point Likert scale with a total score ranging from 0 to 60, and consists of two dimensions of positive coping (12 items) and negative coping (8 items), with higher scores on each dimension indicating that the patient is more inclined to use that particular coping style. The Cronbachâ€™s alpha for this scale is 0.90.

Principle and selection of models

In this study, we selected six widely used and representative machine learning algorithms for the binary classification task: Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Artificial Neural Network (ANN), and K-Nearest Neighbors (KNN). These models are commonly applied in medical binary classification problems and possess high practical relevance and reference value. DT and RF are tree-based models that are easy to interpret and capable of capturing complex feature interactions. XGBoost, an advanced ensemble learning method, often achieves superior performance in binary classification tasks, particularly in handling imbalanced data. SVM is robust in high-dimensional spaces and performs well with small sample sizes, although it is less effective with datasets containing a large number of features. ANN is suitable for modeling complex, nonlinear relationships between features and outputs. KNN, a distance-based classification method, is simple and intuitive, making it a useful benchmark for comparison with more complex models. Although other machine learning techniques are available for binary classification, the selected models represent diverse learning paradigmsâ€”rule-based, distance-based, kernel-based, ensemble-based, and deep learningâ€”providing a comprehensive comparative analysis.

Model performance assessment metrics

The following metrics were employed to evaluate the performance of the predictive models:

Accuracy: The most straightforward metric, representing the overall proportion of correct predictions made by the model.
Precision: The proportion of predicted positive cases that are truly positive, reflecting the modelâ€™s reliability in identifying positive cases.
Recall: The proportion of actual positive cases that are correctly identified by the model, indicating its ability to detect positive instances.
F1 Score: The harmonic mean of precision and recall, providing a balanced measure when there is an uneven class distribution.
Area under the receiver operating characteristic curve (AUROC): Evaluates the modelâ€™s ability to distinguish between positive and negative cases across all possible classification thresholds.
Calibration curve: Compares predicted probabilities with observed outcomes to assess the accuracy of the modelâ€™ s probability estimates.
Decision curve analysis (DCA): Evaluates the clinical utility of the model by quantifying the net benefit across a range of threshold probabilities.

Predictors

Given the highly complex nature of kinesiophobia, its underlying mechanism may involve dynamic interactions, feedback regulation, and multifactorial control among stressors, stress responses, and various influencing elements. Daily life problems often serve as stressors, acting as primary triggers of psychological stress and subsequently impacting physical health. The stress response comprises physiological, psychological, and behavioral changes, shaped by a range of factors and their interrelationships. Among these, cognitive-emotional factors and coping styles function as key mediating variables in the psychological stress process, while social support and individual background serve as critical influencing factors (see Fig.Â 2). In this study, lung cancer surgery is identified as the primary stressor experienced by patients, which may induce kinesiophobia and, in turn, negatively affect both physical and psychological health. Therefore, based on the multifactorial stress interaction framework, this study systematically examined the determinants of kinesiophobia and their internal relationships in postoperative lung cancer patients. Focusing on the mediating roles of individual background, cognitive-emotional factors, coping styles, and social support, the study was guided by a theoretical framework and employed both systematic literature review and the Delphi method. As a result, 24 potential predictors were identified, encompassing five dimensions: demographic characteristics, disease-related factors, cognitive-emotional variables, coping styles, and social support. These predictors were used to comprehensively analyze and identify risk factors associated with kinesiophobia in postoperative lung cancer patients.

Sample size

Following the 10-fold EPV (events per variable) principle for constructing risk prediction models, the final number of predictor variables in this study was capped at 10²³. Based on a previous study reporting an kinesiophobia incidence rate of 38.68%²⁴ among postoperative trans-thoracoscopic lung cancer patients and considering a 10% sample attrition rate, the required sample size was calculated to be 410 cases. Ultimately, a total of 519 cases were collected in this study, which met the required sample size.

Data pre-processing

First, variables with more than 20% missing data were excluded from the analysis. For variables with less than 20% missing data, multivariate imputation by chained equations (MICE) was applied. After missing value analysis, the presence of missing values is 0. Thus, no further deletion or data imputation was required (see Supplementary Material 1).

Second, continuous variables were standardized using the min-max normalization method to address scale inconsistencies among variables.

Finally, data imbalance, which can significantly impact classifier performance, is often addressed using techniques such as undersampling or oversampling. Generally, when the ratio of positive to negative events reaches 1:4, it is considered a threshold indicating data imbalance²⁵. In this study, the ratio of positive to negative events was approximately 1:1.5, indicating near balance; therefore, no additional balancing techniques were applied.

Feature selection

Given that the variables in this study may not be independent and could exhibit linear dependencies, traditional variable selection methods that do not account for the relationships between variables have limitations. Therefore, this study applied LASSO, a robust technique for screening high-dimensional variables, for initial variable selection. Subsequently, factors related to kinesiophobia in postoperative lung cancer patients were included in a multifactorial logistic regression analysis.

Model construction and evaluation

In this study, we employed six machine learning classification algorithms to construct a binary prediction model: DT, RF, ANN, SVM, XGBoost, and KNN. First, we used the initial_split function from the rsample package in R to randomly divide the dataset into a training set (70%) and a testing set (30%). A random seed of 4321 was set prior to the split to ensure reproducibility. Stratified sampling based on the target variable was applied using the strata parameter to maintain class distribution consistency between the training and testing sets and to minimize the impact of class imbalance on model evaluation.

Second, the training set was used to train the models, with hyperparameter tuning conducted through 5-fold cross-validation and grid search²⁶. During cross-validation, the training set was divided into five subsets; in each iteration, four subsets were used for model training and the remaining one for internal validation. This process was repeated five times, and the optimal set of hyperparameters was selected.

After cross-validation, the final models were retrained using the optimized hyperparameters. Finally, the models were evaluated on the independent testing set to assess their generalization ability and predictive performance on unseen data.

Model explanation

SHAP (SHapley Additive Explanation) is based on the Shapley value, a game-theoretic concept developed by economist Lloyd Shapley to address model interpretability²⁷. It quantifies the contribution of each feature to the modelâ€™s predictions, enhancing the userâ€™s understanding of the model. In this study, SHAP was applied to identify the most influential features in predicting kinesiophobia in postoperative lung cancer patients, providing valuable insights for model development.

Statistical analysis

IBM SPSS Statistics 27.0 and R 4.4.1 software were used for data analysis and visualization. Continuous variables that followed a normal distribution were presented as meanâ€‰Â±â€‰standard deviation (Mâ€‰Â±â€‰SD), while skewed variables were presented as the median with interquartile range (M [P25, P75]). Group comparisons were conducted using t-tests or Wilcoxon rank-sum tests. Categorical variables were expressed as frequencies (n, %), and comparisons between groups were made using the Pearson chi-square test or Fisherâ€™s exact test (pâ€‰<â€‰0.05 was considered statistically significant).

Result

Baseline characteristics

A total of 519 patients were included in this study, with 227 patients having a TSK score greater than 37, resulting in an agoraphobia incidence of 43.74%. The incidence of agoraphobia was 43.64% in the training set and 43.94% in the test set. No statistically significant differences were found in patient characteristics between the training and test sets (pâ€‰>â€‰0.05) (TableÂ 1).

Table 1 Comparison of demographic characteristics between training and test datasets (nâ€‰=â€‰519).

Full size table

Feature selection

We used the glmnet package in R to perform LASSO regression analysis for initial feature screening, and used ten-fold cross-validation to determine the optimal value of the regularization parameter Î». The final value of Î»â€‰=â€‰0.01379032 (Î»â€‰=â€‰min) is chosen, which minimizes the cross-validation error of the model and controls the model complexity while maintaining the predictive power. A total of 10 variables with non-zero regression coefficients were initially screened, indicating their importance in predicting agoraphobia. These variables and their corresponding regression coefficients were: gender, personal income, medical payment methods, range of surgery, surgical history, type of pathology, pain severity, social support, positive coping style, and self-efficacy (Fig.Â 3). Subsequently, we performed a multifactorial Logistic regression analysis with these 10 variables as the independent variables and kinesiophobia as the dependent variable (0â€‰=â€‰without kinesiophobia, 1â€‰=â€‰with kinesiophobia). The results showed that gender, personal income, surgical history, pain severity, social support, and positive coping style were independent influences of kinesiophobia (pâ€‰<â€‰0.05) (TableÂ 2).

Table 2 Results of multifactorial logistic regression analysis of the incidence of kinesiophobia in lung cancer patients (nâ€‰=â€‰519).

Full size table

Models performance

The performance of the six models is summarized as follows:

As shown in TableÂ 3, the accuracy of the models on the training set ranged from 0.823 to 0.939, precision from 0.715 to 0.894, recall from 0.828 to 0.987, F1 score from 0.830 to 0.934, and AUROC from 0.862 to 0.989 (Fig.Â 3a). The average results of five-fold cross-validation yielded accuracy between 0.807 and 0.823, precision between 0.854 and 0.910, recall between 0.750 and 0.818, F1 score between 0.813 and 0.833, and AUROC between 0.876 and 0.916 (Fig.Â 3c). According to TableÂ 4, the performance of the models on the test set showed accuracy ranging from 0.783 to 0.809, precision from 0.684 to 0.760, recall from 0.826 to 1.000, F1 score from 0.792 to 0.817, and AUROC from 0.787 to 0.893 (Fig.Â 3b). The AUROC values in the test set were slightly lower than those from internal validation, with differences of less than 10%, suggesting good model generalization and successful fitting. Among all models, the RF model achieved the highest AUROC (0.893) on the test set, indicating the strongest ability to distinguish between positive and negative classes. Additionally, it demonstrated stable performance during five-fold cross-validation (AUROCâ€‰=â€‰0.912), with balanced recall (0.870) and accuracy (0.803), suggesting a low risk of missed detection and robust generalization capability.

Table 3 The performance of the six models on the training set and the average five-fold cross-validation performance.

Full size table

Table 4 The final performance of the six models on the test set.

Full size table

Calibration of the models

Figure 4d presents the calibration curves of each model, used to evaluate the agreement between the predicted probabilities and the actual event occurrence rates. The curves were generated by grouping samples into bins based on their predicted probabilities and computing the average predicted probability and corresponding observed event rate within each bin. The red dots in the graph represent the observed event rates within each predicted probability interval, providing insight into the modelâ€™s calibration across different probability levels. Ideally, a well-calibrated model will produce predictions that lie along the 45-degree diagonal line, indicating strong alignment between predicted and actual probabilities. As shown in Fig. 4d, the calibration curve of the Random Forest (RF) model aligns most closely with the diagonal, suggesting minimal bias and superior calibration performance compared to the other models.

Clinical utility of the model

Decision Curve Analysis (DCA) offers a net benefit-centered evaluation that complements traditional performance metrics, supporting the identification of clinically useful predictive models. As shown in Fig.Â 4e, when the threshold probability ranges from approximately 24â€“70%, most models outperform the â€œtreat noneâ€ and â€œtreat allâ€ strategies in terms of net benefit, indicating potential clinical value within this range. Among them, the RF model exhibits the highest and most stable net benefit across a relatively wide threshold range (approximately 25â€“75%), outperforming other models in most intervals. Given its overall performance across discrimination, calibration, and clinical utility, the RF model appears to be the most suitable predictive model for clinical application in this study.

Confusion matrix

To maintain overall predictive stability and balance, as illustrated in Fig.Â 5, the RF model demonstrates a favorable trade-off between false negatives and false positives. Specifically, it yielded 9 false negatives and 22 false positives, indicating relatively strong performance in limiting false positive classifications. At the same time, the number of false negatives remained within a reasonable range, suggesting that the RF model maintains a well-balanced capacity for distinguishing between positive and negative cases.

Explanation of the model

To better understand the data and identify the characteristics most significantly influencing the target variables, we assessed variable importance using the best-performing RF model. As shown in Fig. 6a,â€‰b, the active coping style emerged as the most important predictor of agoraphobia in postoperative lung cancer patients, followed by pain level, social support, monthly personal income, surgical history, and gender. FigureÂ 6c was used to illustrate the modelâ€™s predictive results for a specific sample. For instance, a sample with a history of surgery, a personal income between 2000 and 4000 RMB, male gender, pain severity score is 5(Standardized score is 0.44), social support score is 61(Standardized score is 0.94), and positive coping style score is 27(Standardized score is 0.75) had an average predicted probability of 45% for developing kinesiophobia, as compared to the full sample.

Discussion

Evaluation of predictive models

The results of this study indicated that kinesiophobia is prevalent in post-surgical lung cancer patients, with a prevalence rate of 43.74%, closely aligning with the findings of Xinyuan Zhangâ€™s study on kinesiophobia²⁴. A high prevalence of kinesiophobia suggests that post-surgical lung cancer patients may face significant psychological and behavioral barriers during recovery. This not only reduces patientsâ€™ participation in rehabilitation exercises but may also delay lung function recovery and potentially affect long-term health outcomes. The occurrence of agoraphobia is influenced by multiple factors, and traditional individualized nursing assessments may struggle to accurately identify patients at high risk.

In this study, we compared the performance of six mainstream machine learning models in predicting postoperative agoraphobia. Model performance was comprehensively assessed using several metrics: Accuracy, Precision, Recall, F1 score, and AUROC. The DT model demonstrated outstanding performance in recall, indicating a strong ability to identify positive cases. However, its low precision suggests a tendency for false positives and a lack of specificity. Additionally, its AUROC was the lowest among the six models, reflecting limited overall discriminative ability. The RF model performed well across multiple evaluation metrics, particularly in AUROC, where it achieved the highest score. This indicates a strong ability to discriminate between positive and negative cases. Furthermore, RF maintained high precision and recall, and its F1 score reflected a well-balanced prediction. Overall, the RF model emerged as the best-performing model in this study, exhibiting stable performance across all indicators and strong generalization ability. The XGBoost model excelled in recall, making it suitable for scenarios where missed detections are highly undesirable. However, its slightly lower precision suggests a higher rate of false positives, which could result in unnecessary interventions. The SVM model achieved perfect recall, maximizing the identification of positive cases. However, its low precision, characterized by false positives, could lead to overly cautious predictions. The KNN model excelled in precision, indicating more accurate predictions with fewer false positives. However, it had the lowest recall, suggesting a risk of missed detections.The ANN model had the highest accuracy and precision of all models, demonstrating strong overall predictive ability. However, its recall was slightly lower than that of SVM and XGBoost, meaning it may miss some positive cases. Its AUROC of 0.890, which is close to that of RF, indicates good discriminative capability.

When comparing multiple performance indicators, the RF model demonstrated excellent overall ability, ranking first in AUROC and achieving a well-balanced combination of precision and recall. It also showed high net clinical benefit in the decision curve analysis. Therefore, RF is considered the optimal model for predicting the risk of postoperative agoraphobia in this study.

Factors affecting kinesiophobia

At present, the factors influencing the incidence and occurrence of kinesiophobia in postoperative lung cancer patients are complex and diverse. For this reason, we conducted a systematic evaluation of the literature and two rounds of expert correspondence using the Delphi method on its potential predictors in our previous study, which provided an important theoretical basis and research foundation for this study. Ultimately, nine features were used to construct the model, and SHAP analyses showed that positive coping styles were the most important predictor of kinesiophobia in postoperative lung cancer patients. This is consistent with the findings of Min Xie²⁹. Positive coping styles refer to a series of proactive, positive, and constructive coping strategies that an individual adopts in the face of stress or challenges, such as seeking to solve problems, emotionally regulating, seeking support, and remaining optimistic³⁰. The more patients adopt positive coping styles, the lower their risk of developing kinesiophobia. The reason for this may be that a positive coping style helps to reduce the stimulation of the limbic system of the brain by negative emotions³¹, such as fear and anxiety, and promotes the patientâ€™s ability to face the difficulties of postoperative rehabilitation with a more positive attitude. Patients who adopt a positive coping style tend to view discomfort during rehabilitation as temporary and actively seek solutions. This positive mindset reduces the fear of potential discomfort during postoperative rehabilitation exercises, helps to improve adherence to rehabilitation, and avoids the false association of â€˜exercise equals dangerâ€™, thus alleviating the symptoms of kinesiophobia³². In addition, pain severity was strongly associated with the occurrence of kinesiophobia in postoperative lung cancer patients. Firstly, pain, as a physiological stimulus, can significantly activate the bodyâ€™s sympathetic nervous system, leading to a range of physiological responses such as increased heart rate, increased blood pressure and muscle tension³³. These stress responses instinctively prompt individuals to engage in avoidance behaviours to reduce potential harm or discomfort. During the recovery phase of a lung cancer patient going through surgery, intense pain often creates a strong fear of physical activity as they worry that exercise will exacerbate the pain or lead to injury. This physiological â€˜avoidance reactionâ€™ may create a vicious circle through the neuroendocrine pathway, making the pain more intense and increasing the patientâ€™s fear of exercise³⁴. Furthermore, the presence of pain has a profound effect on the patientâ€™s cognition and mood³⁵. Severe pain is often accompanied by anxiety, depression, and feelings of helplessness, and these negative emotions can have a direct impact on a patientâ€™s confidence in his or her recovery. Negative perceptions such as â€˜Exercise causes more painâ€™ or â€˜I canâ€™t handle the discomfort of exerciseâ€™ may occur, leading to further avoidance of exercise. This negative perception not only affects the patientâ€™s daily life, but may also exacerbate the risk of postoperative agoraphobia by affecting their emotional stability. Therefore, pain management is not only about relieving physical discomfort, but is also an important factor affecting the patientâ€™s psychological state and behaviour, and interventions at both the physiological and psychological levels must be considered in an integrated manner in order to effectively reduce the incidence of postoperative kinesiophobia.

In fact, all the other predictors are also closely related to kinesiophobia. First, the socioeconomic factor of personal income may influence patientsâ€™ psychological state and health behaviors³⁶. Patients with lower incomes may experience higher levels of life stress and lack sufficient financial resources for effective rehabilitation, which can lead to increased fear and exercise avoidance behaviors. Social support plays an important role in postoperative rehabilitation³⁷. Support from family, friends, or healthcare professionals can reduce anxiety and fear, enhance self-efficacy, and encourage active participation in exercise and rehabilitation activities. Conversely, patients who lack social support may feel isolated, making them more prone to fear and avoidance behaviors, thus increasing the risk of postoperative kinesiophobia^38,39. Gender, as both a biological and social factor, may also impact postoperative kinesiophobia. Research suggests that female patients are more likely to experience postoperative fear and anxiety than males, which is related to gender differences in physiology and psychology⁴⁰. Surgical history may correlate with a patientâ€™s postoperative recovery experience, and patients who have experienced complex surgery may develop a greater sense of unease about subsequent motor recovery⁴¹. Although these factors are relatively minor, they may still influence the development of postoperative agoraphobia to some extent and therefore need to be considered in a comprehensive assessment.

Therefore, clinical staff should pay more attention to female postoperative lung cancer patients with lower personal income, surgery history, high level of pain severity, and take certain interventions to improve patientsâ€™ self-efficacy, social support level, and promote patients to adopt a positive coping style to face the postoperative rehabilitation exercise, so as to reduce the risk of the occurrence of the patientsâ€™ kinesiophobia.

Conclusion

In conclusion, this study developed a predictive model based on machine learning algorithms, with the Random Forest model demonstrating superior performance. Additionally, we utilized SHAP to provide personalized risk assessments for the progression of kinesiophobia in post-surgical lung cancer patients. This efficient computer-aided approach has the potential to assist frontline clinical healthcare providers and patients in identifying and intervening in the occurrence of kinesiophobia.

Limitations

First, the sample size of this study is relatively small and limited to postoperative lung cancer patients in Jinzhou, China. The lack of a multicenter design may limit the generalizability of the model to other regions. Second, although the model demonstrated high consistency in both the training and testing datasets, some unavoidable errors may arise due to the inherent uncertainty in data splitting. Finally, due to the lack of external validation across different time periods and locations, the machine learning model has not been deployed and applied in real-world settings. Furthermore, the deployment process requires full integration with existing systems, while also considering data privacy, user interface design, and continuous monitoring, all of which were beyond the scope of this study. We hope to further improve the model in future research.

Data availability

The data used in this study are related to the personal privacy of patients, so they are not made public. But relevant data of this study can be obtained from the first author (Chuang Li) according to reasonable requirements.

References

Sabarwal, A., Kumar, K. & Singh, R. P. Hazardous effects of chemical pesticides on human healthâ€”cancer and other associated disorders. Environ. Toxicol. Pharmacol. 63, 103â€“114 (2018).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Sung, H. et al. Global Cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J. Clin. 71, 209â€“249 (2021).
ArticleÂ Google ScholarÂ
Frissell, L. F. & Knox, L. C. Primary carcinoma of the lung. Am. J. Cancer. 30, 219â€“288 (1937).
ArticleÂ Google ScholarÂ
Zhang, Y. et al. Global variations in lung cancer incidence by histological subtype in 2020: a population-based study. Lancet Oncol. 24, 1206â€“1218 (2023).
ArticleÂ PubMedÂ Google ScholarÂ
Montagne, F., Guisier, F., Venissac, N. & Baste, J. M. The role of surgery in lung cancer treatment: present indications and future perspectivesâ€”state of the art. Cancers 13, 3711 (2021).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Cai, H., Wang, Y., Qin, D., Cui, Y. & Zhang, H. Advanced surgical technologies for lung cancer treatment: current status and perspectives. Eng. Regeneration. 4, 55â€“67 (2023).
ArticleÂ Google ScholarÂ
Bendixen, M., JÃ¸rgensen, O. D., Kronborg, C., Andersen, C. & Licht, P. B. Postoperative pain and quality of life after lobectomy via video-assisted thoracoscopic surgery or anterolateral thoracotomy for early stage lung cancer: A randomised controlled trial. Lancet Oncol. 17, 836â€“844 (2016).
ArticleÂ PubMedÂ Google ScholarÂ
Lee, M. et al. Comparative effectiveness of long-term maintenance beta-blocker therapy after acute myocardial infarction in stable, optimally treated patients undergoing percutaneous coronary intervention. J. Am. Heart Assoc. 12, e028976 (2023).
ArticleÂ CASÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Lethem, J., Slade, P. D., Troup, J. D. & Bentley, G. Outline of a fear-avoidance model of exaggerated pain perceptionâ€”I. Behav. Res. Ther. 21, 401â€“408 (1983).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Bal, D. & Ã‡ilingir, D. A new concept in nursing care after surgery: Kinesiophobia. J. Educ. Res. Nursing/HemÅŸirel. EÄŸitim. AraÅŸt. Derg. 19, 108â€“112 (2022).
Google ScholarÂ
Ngiam, K. Y. & Khor, I. W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, e262â€“e273 (2019).
ArticleÂ PubMedÂ Google ScholarÂ
Deng, L. et al. Evaluation of large Language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2. Int. J. Surg. 110, 1941â€“1950 (2024).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Wang, Y. et al. Development and validation of machine learning models for predicting cancer-related fatigue in lymphoma survivors. Int. J. Med. Informatics. 192, 105630 (2024).
ArticleÂ Google ScholarÂ
Wolf, A. M. D. et al. Screening for lung cancer: 2023 guideline update from the American Cancer society. Cancer J. Clin. 74, 50â€“81 (2024).
ArticleÂ Google ScholarÂ
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD). Circulation 131, 211â€“219 (2015).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Miller, M. B., Roumanis, M. J., Kakinami, L. & Dover, G. C. Chronic pain patientsâ€™ kinesiophobia and catastrophizing are associated with activity intensity at different times of the day. J. Pain Res. 13, 273â€“284 (2020).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Hu, W. Cultural adaptation of the simplified Chinese version of the TSK and FABQ scales and their application in degenerative low back pain: a study. China Natl. Knowl. Infrastructure (2012).
Luszczynska, A., Scholz, U. & Schwarzer, R. The general self-efficacy scale: multicultural validation studies. J. Psychol. 139, 439â€“457 (2005).
ArticleÂ PubMedÂ Google ScholarÂ
Wang, C., Hu, Z. & Liu, Y. Study on the reliability and validity of general self-efficacy scale. Appl. Psychol. 2001(01), 37â€“40 (2001).
Zigmond, A. S. & Snaith, R. P. The hospital anxiety and depression scale. Acta Psychiatry Scand. 67, 361â€“370 (1983).
ArticleÂ CASÂ Google ScholarÂ
Xiao, S. Theoretical basis and research application of social support rating scale. J. Clin. Psychiatry 1994(02), 98â€“100 (1994).
Xie, Y. A preliminary study on reliability and validity of simplified coping style scale. Chin. J. Clin. Psychol. 1998(02), 3â€“5 (1998).
Austin, P. C. & Steyerberg, E. W. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat. Methods Med. Res. 26, 796â€“808 (2017).
ArticleÂ MathSciNetÂ PubMedÂ Google ScholarÂ
Zhang, X., Zhang, X. & Chen, J. Relationship between pain-related patientsâ€™ reported outcome and panic level after thoracoscopic lung cancer resection. J. Nurs. Sci. 37, 28â€“31 (2022).
CASÂ Google ScholarÂ
Hasanin, T. & Khoshgoftaar, T. The effects of random undersampling with simulated class imbalance for big data. in IEEE International Conference on Information Reuse and Integration (IRI) 70â€“79 (2018). https://doi.org/10.1109/IRI.2018.00018
Jiang, X. & Xu, C. Deep learning and machine learning with grid search to predict later occurrence of breast cancer metastasis using clinical data. J. Clin. Med. 11, 5772 (2022).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. in Advances in Neural Information Processing Systems vol. 30 (Curran Associates, Inc., 2017).
Lei, T. et al. Establishment and validation of predictive model of Tophus in gout patients. JCM 12, 1755 (2023).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Xie, M., Yin, L., Guo, Y., Zhang, X. & Zhao, R. Current status and influencing factors of kinesiophobia in patients with peritoneal dialysis: A multicenter cross-sectional study. BMC Nephrol. 25, 404 (2024).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Wu, Y. et al. Psychological resilience and positive coping styles among Chinese undergraduate students: A cross-sectional study. BMC Psychol. 8, 79 (2020).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Rauch, A. V. et al. Cognitive coping style modulates neural responses to emotional faces in healthy humans: A 3-T FMRI study. Cereb. Cortex. 17, 2526â€“2535 (2007).
ArticleÂ PubMedÂ Google ScholarÂ
Aglio, L. S. et al. Surgical prehabilitation: strategies and psychological intervention to reduce postoperative pain and opioid use. Anesth. Analg. 134, 1106â€“1111 (2022).
ArticleÂ PubMedÂ Google ScholarÂ
SebastiÃ£o, R., Bento, A. & BrÃ¡s, S. Analysis of physiological responses during pain induction. Sens. (Basel). 22, 9276 (2022).
ArticleÂ ADSÂ Google ScholarÂ
Disney, L. 101 Solution-focused questions for help with depression. J. Evid. Inf. Soc. Work. 13, 576â€“577 (2016).
ArticleÂ PubMedÂ Google ScholarÂ
Simons, L. E., Elman, I. & Borsook, D. Psychological processing in chronic pain: A neural systems approach. Neurosci. Biobehavioral Reviews. 39, 61â€“78 (2014).
ArticleÂ Google ScholarÂ
Yihunie, M. et al. Fear-avoidance beliefs for physical activity among chronic low back pain: A multicenter cross-sectional study. J. Pain Res. 16, 233â€“243 (2023).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Xinghua, W. The research on strategies and methods for postoperative recovery and functional reconstruction in surgery. MEDS Clin. Med. 4, 35â€“40 (2023).
Google ScholarÂ
Kapikiran, G. & Bulbuloglu, S. The effect of perceived social support on psychological resilience and surgical fear in surgical oncology patients. Psychol. Health Med. 29, 473â€“483 (2024).
ArticleÂ CASÂ PubMedÂ Google ScholarÂ
Luo, Q., Liu, F., Jiang, Z. & Zhang, L. The chain mediating effect of spiritual well-being and anticipatory grief between benefit finding and meaning in life of patients with advanced lung cancer: Empirical research quantitative. Nurs. Open. 11, e2179 (2024).
ArticleÂ PubMedÂ PubMed CentralÂ Google ScholarÂ
Watanabe, Y. et al. Gender differences on preoperative psychologic factors affecting acute postoperative pain in patients with lumbar spinal disorders. J. Orthop. Sci. 29, 1174â€“1178 (2024).
ArticleÂ PubMedÂ Google ScholarÂ
Yu, Z., Xie, G., Qin, C., He, H. & Wei, Q. Effect of postoperative exercise training on physical function and quality of life of lung cancer patients with chronic obstructive pulmonary disease: A randomized controlled trial. Med. (Baltim). 103, e37285 (2024).
ArticleÂ CASÂ Google ScholarÂ
Soares, F. G. et al. Identification of demographic, clinical and psychological predictors in relation to kinesiophobia of patients in the post-operative musculoskeletal trauma. Braz. J. Phys. Ther. 28, 100728 (2024).
ArticleÂ Google ScholarÂ

Download references

Acknowledgements

This study is very grateful to all the cooperative medical staff and research objects in the First Affiliated Hospital of Jinzhou Medical University.

Funding

This research was supported by the 2024 Liaoning Provincial Science and Technology Joint Program (2024-MSLH-168). We sincerely appreciate the funding provided.

Author information

Authors and Affiliations

Department of Nursing, First Affiliated Hospital of Jinzhou Medical University, Jinzhou, 121001, China
Chuang LiÂ &Â Lan Zhang
School of Nursing, Jinzhou Medical University, Jinzhou, 121001, China
Chuang Li,Â Youbei Lin,Â Xinru GuoÂ &Â Jinrui Fei
Thoracic Surgery Unit, First Affiliated Hospital of Jinzhou Medical University, Jinzhou, 121001, China
Xuyang Xiao,Â Yanyan LuÂ &Â Junling Zhao

Authors

Chuang Li
View author publications
Search author on:PubMedÂ Google Scholar
Youbei Lin
View author publications
Search author on:PubMedÂ Google Scholar
Xuyang Xiao
View author publications
Search author on:PubMedÂ Google Scholar
Xinru Guo
View author publications
Search author on:PubMedÂ Google Scholar
Jinrui Fei
View author publications
Search author on:PubMedÂ Google Scholar
Yanyan Lu
View author publications
Search author on:PubMedÂ Google Scholar
Junling Zhao
View author publications
Search author on:PubMedÂ Google Scholar
Lan Zhang
View author publications
Search author on:PubMedÂ Google Scholar

Contributions

Chuang Li: Writingâ€”original draft, Visualization, Methodology, Investigation, Formal analysis, Data curation, Conceptualization: Youbei Lin, Xuyang Xiao, Xinru Guo, Jinrui Fei, Yanyan Lu, Junling Zhao: Investigation, Formal analysis Lan Zhang: Writingâ€”review and editing, Methodology, Resources, Project administration, Data curation, Conceptualization.

Corresponding author

Correspondence to Lan Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisherâ€™s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâ€™s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâ€™s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, C., Lin, Y., Xiao, X. et al. Development and validation of a risk prediction model for kinesiophobia in postoperative lung cancer patients: an interpretable machine learning algorithm study. Sci Rep 15, 19412 (2025). https://doi.org/10.1038/s41598-025-03575-7

Download citation

Received: 06 January 2025
Accepted: 21 May 2025
Published: 03 June 2025
DOI: https://doi.org/10.1038/s41598-025-03575-7

Subjects

Abstract

Similar content being viewed by others

A machine learning based prediction model for short term efficacy of nasopharyngeal carcinoma

Using machine learning algorithms to predict risk factors of heart failure after complete mesocolic excision in colorectal cancer patients

Developing a risk prediction tool for lung cancer in Kent and Medway, England: cohort study using linked data

Introduction

Materials and methods

Design and participants

Ethical approval statement

Tools

General information questionnaire (GIQ)

Tampa scale for kinesiophobia (TSK)

General self-efficacy scale (GSES)

Hospital anxiety and depression scale (HADS)

Social support rate scale (SSRS)

Simplified coping style questionnaire (SCSQ)

Principle and selection of models

Model performance assessment metrics

Predictors

Sample size

Data pre-processing

Feature selection

Model construction and evaluation

Model explanation

Statistical analysis

Result

Baseline characteristics

Feature selection

Models performance

Calibration of the models

Clinical utility of the model

Confusion matrix

Explanation of the model

Discussion

Evaluation of predictive models

Factors affecting kinesiophobia

Conclusion

Limitations

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisherâ€™s note

Electronic supplementary material

Supplementary Material 1

Supplementary Material 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links