Abstract
Kinesiophobia is particularly common in postoperative lung cancer patients, which causes patients may be reluctant to cough and move due to misperception, internal fear or fear of pain, and avoid rehabilitation training affecting postoperative recovery. Therefore, it is clinically important to discover the factors associated with the occurrence of kinesiophobia and to develop a prediction model. This study aims to investigate the occurrence of kinesiophobia in postoperative lung cancer patients and to develop a prediction model to assess its performance, thereby providing a reference for clinical decision-making. A cross-sectional study involving 519 postoperative lung cancer patients from a tertiary hospital in Liaoning Province was conducted. The least absolute shrinkage and selection operator (LASSO) and multifactor logistic regression were used to screen predictors. Subsequently, six machine learning (ML) models were developed and compared to identify the optimal model. The importance of feature variables was ranked and interpreted to facilitate risk assessment. The incidence of kinesiophobia among postoperative lung cancer patients was 43.74%. Positive coping style, social support, pain severity, personal income, surgical history, and gender were identified as significant predictors of kinesiophobia. Among the evaluated models, the RF model demonstrated the best performance, with an AUROC of 0.893, accuracy of 0.803, precision of 0.732, recall of 0.870, and F1 score of 0.795. The calibration curve of the RF model closely aligned with the ideal 45-degree diagonal, indicating strong agreement between predicted and observed outcomes. Furthermore, DCA revealed that the RF model provided the highest net benefit in predicting postoperative agoraphobia in lung cancer patients. This study demonstrates that machine learning modelsâparticularly the RF algorithmâhold substantial promise for predicting kinesiophobia in postoperative lung cancer patients. By integrating individual background characteristics along with physical, psychological, and social factors, the RF model effectively identifies high-risk patients and provides a valuable foundation for early clinical screening and intervention. These findings underscore the critical influence of multidimensional factors in the development of postoperative kinesiophobia and highlight the advantages of machine learning in enhancing predictive accuracy and supporting personalized medical decision-making. To improve the modelâs generalizability and clinical utility, future research should incorporate heterogeneous datasets from multiple regions and healthcare institutions to ensure broader applicability and greater robustness.
Similar content being viewed by others
Introduction
Malignant tumors are a significant threat to public health and represent a major global health concern1. According to GLOBOCAN statistics, by 2022, the estimated global number of new cancer cases and deaths will reach 19.976Â million and 9.744Â million, respectively. Lung cancer will account for the highest burden, with 2.5Â million new cases and 1.8Â million deaths2. Lung cancer (LC) is a primary malignant tumor of the lung that originates in the bronchial mucosa or glands of the lungs3, of which non-small cell lung cancer (NSCLC) accounts for about 85% of the total number of lung cancers, and it is one of the major causes of the highest incidence and cancer-related deaths4.
Lung cancer incidence has been increasing annually, and although various comprehensive treatments are available, surgery remains one of the most common and crucial interventions5. With advances in medical technology, thoracic surgical procedures have shifted from traditional open-heart surgery to minimally invasive thoracoscopic surgery6. However, studies continue to show that postoperative lung cancer patients experience moderate to severe pain or worse7. This leads patients to become reluctant to cough or move due to misperceptions, internal fear, or fear of pain. They may even rely on pain medication, avoid rehabilitation, and ultimately develop kinesiophobia. The concept of âkinesiophobia,â defining it as an excessive, irrational fear of physical activity resulting from heightened pain sensitivity following an injury or damage8. The fear-avoidance model, first proposed by Lethem et al. in 19839, explains how negative perceptions of pain influence patientsâ physical activity. Central to this model is the individualâs cognitive and emotional response to painful stimuli. When pain is perceived as a threat to health, patients may develop maladaptive beliefs and engage in avoidance behaviors to prevent perceived re-injury. In 1990, Kori et al. introduced the term kinesiophobia to describe this excessive, irrational, and debilitating fear of movement stemming from the anticipation of pain or injury. Once kinesiophobia develops in a postoperative lung cancer patient, it not only exacerbates anxiety, depression, and other negative emotions, but also impairs postoperative lung function and can lead to disuse syndrome and loss of functional ability, ultimately hindering recovery10. In recent years, the application of machine learning (ML) algorithms in healthcare has introduced new perspectives and methods for risk assessment11. Developing machine learning models based on patientsâ clinical data and multi-dimensional psychological, physiological, and socio-environmental risk factors can help clinicians identify high-risk patients earlier, enabling targeted interventions to improve postoperative recovery outcomes12,13. ML models present distinct advantages over traditional statistical approaches. While conventional methods often rely on predefined assumptions and may struggle with capturing complex, non-linear associations, ML algorithms can autonomously uncover hidden patterns in high-dimensional data without requiring explicit assumptions. This capability makes ML particularly effective in addressing the complexity and heterogeneity inherent in healthcare data. Machine learning algorithms are widely used for early identification of diseases, prognosis prediction, and individualized treatment plan development. Postoperative rehabilitation following lung cancer surgery is often accompanied by physical limitations, pain, and psychological distress. Among these issues, kinesiophobia may significantly impede recovery. While kinesiophobia has been extensively studied in musculoskeletal conditions, its presence and implications in postoperative lung cancer patients remain underexplored. Given the importance of early mobilization for optimal recovery, kinesiophobia may delay rehabilitation, reduce physical functioning, and negatively impact long-term outcomes. To address this challenge, the present study aims to develop a machine learning-based model to predict the risk of kinesiophobia in patients after lung cancer surgery. By integrating demographic, psychological, and social support variables, the model seeks to assist clinicians in early identification of high-risk individuals and to facilitate tailored interventions. This work not only offers novel insights for clinical nursing practice but also provides scientific support for enhancing the long-term recovery and quality of life among lung cancer survivors.
Materials and methods
Design and participants
This cross-sectional study included 519 postoperative lung cancer patients hospitalized at the First Affiliated Hospital of Jinzhou Medical University, Liaoning Province, from February to December, 2024. Inclusion criteria: (1) Lung cancer was diagnosed based on clinical symptoms, chest computed tomography (CT), other imaging modalities, histopathological examination, and molecular testing, in accordance with the diagnostic criteria outlined in the 2023 Edition of the Primary Lung Cancer Diagnosis and Treatment Guidelines14; (2) Patients who underwent surgical resection for lung cancer; (3) ageââ¥â18 years; (4) conscious individuals with normal reading, comprehension, and communication abilities, capable of completing the assessment; (5) provided informed consent and voluntarily participated in the study; Exclusion criteria: (1) Patients with malignant tumors at other sites; (2) individuals with cognitive dysfunction; and (3) those currently using antidepressant or anxiolytic medications(To avoid the confounding effects of pre-existing states of depression and anxiety). The predictive model was developed following the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) guidelines15. The design and process of this study are illustrated in Fig. 1.
Ethical approval statement
This study was approved by the Ethics Review Committee of Jinzhou Medical University (JZMULL2025007). All procedures conducted in this study adhered to the ethical standards set by the universityâs committee and complied with the 1964 Declaration of Helsinki and its subsequent amendments. Written informed consent was obtained from all participants prior to the commencement of the study.
Tools
General information questionnaire (GIQ)
It includes gender, age, educational attainment, occupation, and other variables, as detailed in Fig. 2. GIQ was developed based on a literature quality assessment and the Delphi method.
Tampa scale for kinesiophobia (TSK)
It is a patient self-administered scale developed by Miller16 in 1991, and was first translated and culturally adapted by Chinese scholar Wen Hu17 in 2012 to form the Chinese version of the TSK, with a Cronbachâs alpha coefficient of 0.778. The TSK consists of 17 items and is scored on a Likert 4 scale, with items 4, 8, 12, and 16 being reversely scored. The range of the scale is 17â68 points, and a score of more than 37 points can be recognized as Kinesiophobia.
General self-efficacy scale (GSES)
It was developed by German psychologist Schwarzer18 in 1981, and Chinese version of GSES was Chineseized by our scholar Caikang Wang19 in 2001. The scale comprises 10 unidimensional items, each rated on a 4-point Likert scale, yielding a total score ranging from 10 to 40. Higher scores indicate greater general self-efficacy. The scale demonstrates good internal consistency, with a Cronbachâs α coefficient of 0.87.
Hospital anxiety and depression scale (HADS)
It was developed by Zigmond20 in 1983 and consists of two subscales, anxiety (items 1, 3, 5, 7, 9, 11, 13) and depression (items 2, 4, 6, 8, 10, 12, 14), with a total of 14 items. The scale was based on 4-point Likert scale, with a total score of 0â21, and the higher the score, the higher the anxiety and depression level of the patients, with a Cronbachâs alpha of 0.890.
Social support rate scale (SSRS)
It was developed by Shuiyuan Xiao21 in 1986 to measure the level of social support of patients. The scale contains 10 entries divided into three dimensions: objective support (item 2, 6, and 7), subjective support (item 1, 3, 4, and 5), and utilization of social support (item 8, 9, and 10). The scoring criteria were as follows: questions 1, 4, and 8â10 were single-choice questions worth 1â4 points, with the options corresponding to the score; question 5 consisted of five sub-items, ranging from no support to full support scoring 1â4 points, with the sum of the sub-item scores being the total score for question 5; questions 6 and 7 scored 0 points for the choice of âno sourceâ, and 0 points for choosing any other source, with the number of options scored. Questions 6 and 7 were scored 0 if âno sourceâ was chosen and 0 if other sources were chosen. Higher total and dimension scores indicate higher levels of social support. The Cronbachâs alpha for this scale is 0.723.
Simplified coping style questionnaire (SCSQ)
It was revised by Chinese scholar Yaning Xie22 in 1988 to assess the coping level of individuals facing stress or adversity. The scale consists of 20 questions. The scale is based on a 4-point Likert scale with a total score ranging from 0 to 60, and consists of two dimensions of positive coping (12 items) and negative coping (8 items), with higher scores on each dimension indicating that the patient is more inclined to use that particular coping style. The Cronbachâs alpha for this scale is 0.90.
Principle and selection of models
In this study, we selected six widely used and representative machine learning algorithms for the binary classification task: Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Support Vector Machine (SVM), Artificial Neural Network (ANN), and K-Nearest Neighbors (KNN). These models are commonly applied in medical binary classification problems and possess high practical relevance and reference value. DT and RF are tree-based models that are easy to interpret and capable of capturing complex feature interactions. XGBoost, an advanced ensemble learning method, often achieves superior performance in binary classification tasks, particularly in handling imbalanced data. SVM is robust in high-dimensional spaces and performs well with small sample sizes, although it is less effective with datasets containing a large number of features. ANN is suitable for modeling complex, nonlinear relationships between features and outputs. KNN, a distance-based classification method, is simple and intuitive, making it a useful benchmark for comparison with more complex models. Although other machine learning techniques are available for binary classification, the selected models represent diverse learning paradigmsârule-based, distance-based, kernel-based, ensemble-based, and deep learningâproviding a comprehensive comparative analysis.
Model performance assessment metrics
The following metrics were employed to evaluate the performance of the predictive models:
-
Accuracy: The most straightforward metric, representing the overall proportion of correct predictions made by the model.
-
Precision: The proportion of predicted positive cases that are truly positive, reflecting the modelâs reliability in identifying positive cases.
-
Recall: The proportion of actual positive cases that are correctly identified by the model, indicating its ability to detect positive instances.
-
F1 Score: The harmonic mean of precision and recall, providing a balanced measure when there is an uneven class distribution.
-
Area under the receiver operating characteristic curve (AUROC): Evaluates the modelâs ability to distinguish between positive and negative cases across all possible classification thresholds.
-
Calibration curve: Compares predicted probabilities with observed outcomes to assess the accuracy of the modelâ s probability estimates.
-
Decision curve analysis (DCA): Evaluates the clinical utility of the model by quantifying the net benefit across a range of threshold probabilities.
Predictors
Given the highly complex nature of kinesiophobia, its underlying mechanism may involve dynamic interactions, feedback regulation, and multifactorial control among stressors, stress responses, and various influencing elements. Daily life problems often serve as stressors, acting as primary triggers of psychological stress and subsequently impacting physical health. The stress response comprises physiological, psychological, and behavioral changes, shaped by a range of factors and their interrelationships. Among these, cognitive-emotional factors and coping styles function as key mediating variables in the psychological stress process, while social support and individual background serve as critical influencing factors (see Fig. 2). In this study, lung cancer surgery is identified as the primary stressor experienced by patients, which may induce kinesiophobia and, in turn, negatively affect both physical and psychological health. Therefore, based on the multifactorial stress interaction framework, this study systematically examined the determinants of kinesiophobia and their internal relationships in postoperative lung cancer patients. Focusing on the mediating roles of individual background, cognitive-emotional factors, coping styles, and social support, the study was guided by a theoretical framework and employed both systematic literature review and the Delphi method. As a result, 24 potential predictors were identified, encompassing five dimensions: demographic characteristics, disease-related factors, cognitive-emotional variables, coping styles, and social support. These predictors were used to comprehensively analyze and identify risk factors associated with kinesiophobia in postoperative lung cancer patients.
Sample size
Following the 10-fold EPV (events per variable) principle for constructing risk prediction models, the final number of predictor variables in this study was capped at 1023. Based on a previous study reporting an kinesiophobia incidence rate of 38.68%24 among postoperative trans-thoracoscopic lung cancer patients and considering a 10% sample attrition rate, the required sample size was calculated to be 410 cases. Ultimately, a total of 519 cases were collected in this study, which met the required sample size.
Data pre-processing
First, variables with more than 20% missing data were excluded from the analysis. For variables with less than 20% missing data, multivariate imputation by chained equations (MICE) was applied. After missing value analysis, the presence of missing values is 0. Thus, no further deletion or data imputation was required (see Supplementary Material 1).
Second, continuous variables were standardized using the min-max normalization method to address scale inconsistencies among variables.
Finally, data imbalance, which can significantly impact classifier performance, is often addressed using techniques such as undersampling or oversampling. Generally, when the ratio of positive to negative events reaches 1:4, it is considered a threshold indicating data imbalance25. In this study, the ratio of positive to negative events was approximately 1:1.5, indicating near balance; therefore, no additional balancing techniques were applied.
Feature selection
Given that the variables in this study may not be independent and could exhibit linear dependencies, traditional variable selection methods that do not account for the relationships between variables have limitations. Therefore, this study applied LASSO, a robust technique for screening high-dimensional variables, for initial variable selection. Subsequently, factors related to kinesiophobia in postoperative lung cancer patients were included in a multifactorial logistic regression analysis.
Model construction and evaluation
In this study, we employed six machine learning classification algorithms to construct a binary prediction model: DT, RF, ANN, SVM, XGBoost, and KNN. First, we used the initial_split function from the rsample package in R to randomly divide the dataset into a training set (70%) and a testing set (30%). A random seed of 4321 was set prior to the split to ensure reproducibility. Stratified sampling based on the target variable was applied using the strata parameter to maintain class distribution consistency between the training and testing sets and to minimize the impact of class imbalance on model evaluation.
Second, the training set was used to train the models, with hyperparameter tuning conducted through 5-fold cross-validation and grid search26. During cross-validation, the training set was divided into five subsets; in each iteration, four subsets were used for model training and the remaining one for internal validation. This process was repeated five times, and the optimal set of hyperparameters was selected.
After cross-validation, the final models were retrained using the optimized hyperparameters. Finally, the models were evaluated on the independent testing set to assess their generalization ability and predictive performance on unseen data.
Model explanation
SHAP (SHapley Additive Explanation) is based on the Shapley value, a game-theoretic concept developed by economist Lloyd Shapley to address model interpretability27. It quantifies the contribution of each feature to the modelâs predictions, enhancing the userâs understanding of the model. In this study, SHAP was applied to identify the most influential features in predicting kinesiophobia in postoperative lung cancer patients, providing valuable insights for model development.
Statistical analysis
IBM SPSS Statistics 27.0 and R 4.4.1 software were used for data analysis and visualization. Continuous variables that followed a normal distribution were presented as meanâ±âstandard deviation (Mâ±âSD), while skewed variables were presented as the median with interquartile range (M [P25, P75]). Group comparisons were conducted using t-tests or Wilcoxon rank-sum tests. Categorical variables were expressed as frequencies (n, %), and comparisons between groups were made using the Pearson chi-square test or Fisherâs exact test (pâ<â0.05 was considered statistically significant).
Result
Baseline characteristics
A total of 519 patients were included in this study, with 227 patients having a TSK score greater than 37, resulting in an agoraphobia incidence of 43.74%. The incidence of agoraphobia was 43.64% in the training set and 43.94% in the test set. No statistically significant differences were found in patient characteristics between the training and test sets (pâ>â0.05) (Table 1).
Feature selection
We used the glmnet package in R to perform LASSO regression analysis for initial feature screening, and used ten-fold cross-validation to determine the optimal value of the regularization parameter λ. The final value of λâ=â0.01379032 (λâ=âmin) is chosen, which minimizes the cross-validation error of the model and controls the model complexity while maintaining the predictive power. A total of 10 variables with non-zero regression coefficients were initially screened, indicating their importance in predicting agoraphobia. These variables and their corresponding regression coefficients were: gender, personal income, medical payment methods, range of surgery, surgical history, type of pathology, pain severity, social support, positive coping style, and self-efficacy (Fig. 3). Subsequently, we performed a multifactorial Logistic regression analysis with these 10 variables as the independent variables and kinesiophobia as the dependent variable (0â=âwithout kinesiophobia, 1â=âwith kinesiophobia). The results showed that gender, personal income, surgical history, pain severity, social support, and positive coping style were independent influences of kinesiophobia (pâ<â0.05) (Table 2).
Lasso regression analysis was performed to identify predictor variables associated with the occurrence of agoraphobia in postoperative lung cancer patients. ((a) A plot showing changes in the LASSO coefficients for all variables, with different colored curves representing different variables. (b) A plot displaying the selection of optimal parameters in the LASSO regression cross-validation model.)
Models performance
The performance of the six models is summarized as follows:
As shown in Table 3, the accuracy of the models on the training set ranged from 0.823 to 0.939, precision from 0.715 to 0.894, recall from 0.828 to 0.987, F1 score from 0.830 to 0.934, and AUROC from 0.862 to 0.989 (Fig. 3a). The average results of five-fold cross-validation yielded accuracy between 0.807 and 0.823, precision between 0.854 and 0.910, recall between 0.750 and 0.818, F1 score between 0.813 and 0.833, and AUROC between 0.876 and 0.916 (Fig. 3c). According to Table 4, the performance of the models on the test set showed accuracy ranging from 0.783 to 0.809, precision from 0.684 to 0.760, recall from 0.826 to 1.000, F1 score from 0.792 to 0.817, and AUROC from 0.787 to 0.893 (Fig. 3b). The AUROC values in the test set were slightly lower than those from internal validation, with differences of less than 10%, suggesting good model generalization and successful fitting. Among all models, the RF model achieved the highest AUROC (0.893) on the test set, indicating the strongest ability to distinguish between positive and negative classes. Additionally, it demonstrated stable performance during five-fold cross-validation (AUROCâ=â0.912), with balanced recall (0.870) and accuracy (0.803), suggesting a low risk of missed detection and robust generalization capability.
Calibration of the models
Figure 4d presents the calibration curves of each model, used to evaluate the agreement between the predicted probabilities and the actual event occurrence rates. The curves were generated by grouping samples into bins based on their predicted probabilities and computing the average predicted probability and corresponding observed event rate within each bin. The red dots in the graph represent the observed event rates within each predicted probability interval, providing insight into the modelâs calibration across different probability levels. Ideally, a well-calibrated model will produce predictions that lie along the 45-degree diagonal line, indicating strong alignment between predicted and actual probabilities. As shown in Fig. 4d, the calibration curve of the Random Forest (RF) model aligns most closely with the diagonal, suggesting minimal bias and superior calibration performance compared to the other models.
(a) is Receiver Operating Characteristic (ROC) curves of the six models on the training set; (b) is ROC curves of the six models on the test set; (c) is Bar chart showing the performance metrics (Accuracy, Precision, Recall, F1 Score, and AUROC) of the six models in the five-fold cross-validation set. (d) is the calibration curve for six models in the test set, where the x-axis represents the midpoint of predicted probabilities, and the y-axis represents the percentage of observed events; (e) is DCA for the test set, with the x-axis representing threshold probability and the y-axis representing net benefit. The orange line (Treat All) represents the assumption that all patients have kinesiophobia, and the yellow line (Treat None) represents the assumption that no patients have kinesiophobia.
Clinical utility of the model
Decision Curve Analysis (DCA) offers a net benefit-centered evaluation that complements traditional performance metrics, supporting the identification of clinically useful predictive models. As shown in Fig. 4e, when the threshold probability ranges from approximately 24â70%, most models outperform the âtreat noneâ and âtreat allâ strategies in terms of net benefit, indicating potential clinical value within this range. Among them, the RF model exhibits the highest and most stable net benefit across a relatively wide threshold range (approximately 25â75%), outperforming other models in most intervals. Given its overall performance across discrimination, calibration, and clinical utility, the RF model appears to be the most suitable predictive model for clinical application in this study.
Confusion matrix
To maintain overall predictive stability and balance, as illustrated in Fig. 5, the RF model demonstrates a favorable trade-off between false negatives and false positives. Specifically, it yielded 9 false negatives and 22 false positives, indicating relatively strong performance in limiting false positive classifications. At the same time, the number of false negatives remained within a reasonable range, suggesting that the RF model maintains a well-balanced capacity for distinguishing between positive and negative cases.
Explanation of the model
To better understand the data and identify the characteristics most significantly influencing the target variables, we assessed variable importance using the best-performing RF model. As shown in Fig. 6a,âb, the active coping style emerged as the most important predictor of agoraphobia in postoperative lung cancer patients, followed by pain level, social support, monthly personal income, surgical history, and gender. Figure 6c was used to illustrate the modelâs predictive results for a specific sample. For instance, a sample with a history of surgery, a personal income between 2000 and 4000 RMB, male gender, pain severity score is 5(Standardized score is 0.44), social support score is 61(Standardized score is 0.94), and positive coping style score is 27(Standardized score is 0.75) had an average predicted probability of 45% for developing kinesiophobia, as compared to the full sample.
SHAP plot of variable interpretation of the random forest model ((a) Standardized importance of variables; (b) meaning of the plot: the horizontal axis indicates the contribution of the characteristic to the predicted outcome (kinesiophobia) (SHAP value, positive SHAP value: increases the probability of predicting âkinesiophobiaâ, negative SHAP value: decreases the probability of predicting âkinesiophobiaâ), each point indicates a sample, the colour of the point indicates the magnitude of the characteristic value, red: higher value of the characteristic, green: lower value of the characteristic. Each dot represents a sample, and the colour of the dot indicates the magnitude of the eigenvalue, red: higher eigenvalue, green: lower eigenvalue; (c) The x-axis represents feature contribution values, while the y-axis indicates feature names.).
Discussion
Evaluation of predictive models
The results of this study indicated that kinesiophobia is prevalent in post-surgical lung cancer patients, with a prevalence rate of 43.74%, closely aligning with the findings of Xinyuan Zhangâs study on kinesiophobia24. A high prevalence of kinesiophobia suggests that post-surgical lung cancer patients may face significant psychological and behavioral barriers during recovery. This not only reduces patientsâ participation in rehabilitation exercises but may also delay lung function recovery and potentially affect long-term health outcomes. The occurrence of agoraphobia is influenced by multiple factors, and traditional individualized nursing assessments may struggle to accurately identify patients at high risk.
In this study, we compared the performance of six mainstream machine learning models in predicting postoperative agoraphobia. Model performance was comprehensively assessed using several metrics: Accuracy, Precision, Recall, F1 score, and AUROC. The DT model demonstrated outstanding performance in recall, indicating a strong ability to identify positive cases. However, its low precision suggests a tendency for false positives and a lack of specificity. Additionally, its AUROC was the lowest among the six models, reflecting limited overall discriminative ability. The RF model performed well across multiple evaluation metrics, particularly in AUROC, where it achieved the highest score. This indicates a strong ability to discriminate between positive and negative cases. Furthermore, RF maintained high precision and recall, and its F1 score reflected a well-balanced prediction. Overall, the RF model emerged as the best-performing model in this study, exhibiting stable performance across all indicators and strong generalization ability. The XGBoost model excelled in recall, making it suitable for scenarios where missed detections are highly undesirable. However, its slightly lower precision suggests a higher rate of false positives, which could result in unnecessary interventions. The SVM model achieved perfect recall, maximizing the identification of positive cases. However, its low precision, characterized by false positives, could lead to overly cautious predictions. The KNN model excelled in precision, indicating more accurate predictions with fewer false positives. However, it had the lowest recall, suggesting a risk of missed detections.The ANN model had the highest accuracy and precision of all models, demonstrating strong overall predictive ability. However, its recall was slightly lower than that of SVM and XGBoost, meaning it may miss some positive cases. Its AUROC of 0.890, which is close to that of RF, indicates good discriminative capability.
When comparing multiple performance indicators, the RF model demonstrated excellent overall ability, ranking first in AUROC and achieving a well-balanced combination of precision and recall. It also showed high net clinical benefit in the decision curve analysis. Therefore, RF is considered the optimal model for predicting the risk of postoperative agoraphobia in this study.
Factors affecting kinesiophobia
At present, the factors influencing the incidence and occurrence of kinesiophobia in postoperative lung cancer patients are complex and diverse. For this reason, we conducted a systematic evaluation of the literature and two rounds of expert correspondence using the Delphi method on its potential predictors in our previous study, which provided an important theoretical basis and research foundation for this study. Ultimately, nine features were used to construct the model, and SHAP analyses showed that positive coping styles were the most important predictor of kinesiophobia in postoperative lung cancer patients. This is consistent with the findings of Min Xie29. Positive coping styles refer to a series of proactive, positive, and constructive coping strategies that an individual adopts in the face of stress or challenges, such as seeking to solve problems, emotionally regulating, seeking support, and remaining optimistic30. The more patients adopt positive coping styles, the lower their risk of developing kinesiophobia. The reason for this may be that a positive coping style helps to reduce the stimulation of the limbic system of the brain by negative emotions31, such as fear and anxiety, and promotes the patientâs ability to face the difficulties of postoperative rehabilitation with a more positive attitude. Patients who adopt a positive coping style tend to view discomfort during rehabilitation as temporary and actively seek solutions. This positive mindset reduces the fear of potential discomfort during postoperative rehabilitation exercises, helps to improve adherence to rehabilitation, and avoids the false association of âexercise equals dangerâ, thus alleviating the symptoms of kinesiophobia32. In addition, pain severity was strongly associated with the occurrence of kinesiophobia in postoperative lung cancer patients. Firstly, pain, as a physiological stimulus, can significantly activate the bodyâs sympathetic nervous system, leading to a range of physiological responses such as increased heart rate, increased blood pressure and muscle tension33. These stress responses instinctively prompt individuals to engage in avoidance behaviours to reduce potential harm or discomfort. During the recovery phase of a lung cancer patient going through surgery, intense pain often creates a strong fear of physical activity as they worry that exercise will exacerbate the pain or lead to injury. This physiological âavoidance reactionâ may create a vicious circle through the neuroendocrine pathway, making the pain more intense and increasing the patientâs fear of exercise34. Furthermore, the presence of pain has a profound effect on the patientâs cognition and mood35. Severe pain is often accompanied by anxiety, depression, and feelings of helplessness, and these negative emotions can have a direct impact on a patientâs confidence in his or her recovery. Negative perceptions such as âExercise causes more painâ or âI canât handle the discomfort of exerciseâ may occur, leading to further avoidance of exercise. This negative perception not only affects the patientâs daily life, but may also exacerbate the risk of postoperative agoraphobia by affecting their emotional stability. Therefore, pain management is not only about relieving physical discomfort, but is also an important factor affecting the patientâs psychological state and behaviour, and interventions at both the physiological and psychological levels must be considered in an integrated manner in order to effectively reduce the incidence of postoperative kinesiophobia.
In fact, all the other predictors are also closely related to kinesiophobia. First, the socioeconomic factor of personal income may influence patientsâ psychological state and health behaviors36. Patients with lower incomes may experience higher levels of life stress and lack sufficient financial resources for effective rehabilitation, which can lead to increased fear and exercise avoidance behaviors. Social support plays an important role in postoperative rehabilitation37. Support from family, friends, or healthcare professionals can reduce anxiety and fear, enhance self-efficacy, and encourage active participation in exercise and rehabilitation activities. Conversely, patients who lack social support may feel isolated, making them more prone to fear and avoidance behaviors, thus increasing the risk of postoperative kinesiophobia38,39. Gender, as both a biological and social factor, may also impact postoperative kinesiophobia. Research suggests that female patients are more likely to experience postoperative fear and anxiety than males, which is related to gender differences in physiology and psychology40. Surgical history may correlate with a patientâs postoperative recovery experience, and patients who have experienced complex surgery may develop a greater sense of unease about subsequent motor recovery41. Although these factors are relatively minor, they may still influence the development of postoperative agoraphobia to some extent and therefore need to be considered in a comprehensive assessment.
Therefore, clinical staff should pay more attention to female postoperative lung cancer patients with lower personal income, surgery history, high level of pain severity, and take certain interventions to improve patientsâ self-efficacy, social support level, and promote patients to adopt a positive coping style to face the postoperative rehabilitation exercise, so as to reduce the risk of the occurrence of the patientsâ kinesiophobia.
Conclusion
In conclusion, this study developed a predictive model based on machine learning algorithms, with the Random Forest model demonstrating superior performance. Additionally, we utilized SHAP to provide personalized risk assessments for the progression of kinesiophobia in post-surgical lung cancer patients. This efficient computer-aided approach has the potential to assist frontline clinical healthcare providers and patients in identifying and intervening in the occurrence of kinesiophobia.
Limitations
First, the sample size of this study is relatively small and limited to postoperative lung cancer patients in Jinzhou, China. The lack of a multicenter design may limit the generalizability of the model to other regions. Second, although the model demonstrated high consistency in both the training and testing datasets, some unavoidable errors may arise due to the inherent uncertainty in data splitting. Finally, due to the lack of external validation across different time periods and locations, the machine learning model has not been deployed and applied in real-world settings. Furthermore, the deployment process requires full integration with existing systems, while also considering data privacy, user interface design, and continuous monitoring, all of which were beyond the scope of this study. We hope to further improve the model in future research.
Data availability
The data used in this study are related to the personal privacy of patients, so they are not made public. But relevant data of this study can be obtained from the first author (Chuang Li) according to reasonable requirements.
References
Sabarwal, A., Kumar, K. & Singh, R. P. Hazardous effects of chemical pesticides on human healthâcancer and other associated disorders. Environ. Toxicol. Pharmacol. 63, 103â114 (2018).
Sung, H. et al. Global Cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Cancer J. Clin. 71, 209â249 (2021).
Frissell, L. F. & Knox, L. C. Primary carcinoma of the lung. Am. J. Cancer. 30, 219â288 (1937).
Zhang, Y. et al. Global variations in lung cancer incidence by histological subtype in 2020: a population-based study. Lancet Oncol. 24, 1206â1218 (2023).
Montagne, F., Guisier, F., Venissac, N. & Baste, J. M. The role of surgery in lung cancer treatment: present indications and future perspectivesâstate of the art. Cancers 13, 3711 (2021).
Cai, H., Wang, Y., Qin, D., Cui, Y. & Zhang, H. Advanced surgical technologies for lung cancer treatment: current status and perspectives. Eng. Regeneration. 4, 55â67 (2023).
Bendixen, M., Jørgensen, O. D., Kronborg, C., Andersen, C. & Licht, P. B. Postoperative pain and quality of life after lobectomy via video-assisted thoracoscopic surgery or anterolateral thoracotomy for early stage lung cancer: A randomised controlled trial. Lancet Oncol. 17, 836â844 (2016).
Lee, M. et al. Comparative effectiveness of long-term maintenance beta-blocker therapy after acute myocardial infarction in stable, optimally treated patients undergoing percutaneous coronary intervention. J. Am. Heart Assoc. 12, e028976 (2023).
Lethem, J., Slade, P. D., Troup, J. D. & Bentley, G. Outline of a fear-avoidance model of exaggerated pain perceptionâI. Behav. Res. Ther. 21, 401â408 (1983).
Bal, D. & Ãilingir, D. A new concept in nursing care after surgery: Kinesiophobia. J. Educ. Res. Nursing/HemÅirel. EÄitim. AraÅt. Derg. 19, 108â112 (2022).
Ngiam, K. Y. & Khor, I. W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 20, e262âe273 (2019).
Deng, L. et al. Evaluation of large Language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2. Int. J. Surg. 110, 1941â1950 (2024).
Wang, Y. et al. Development and validation of machine learning models for predicting cancer-related fatigue in lymphoma survivors. Int. J. Med. Informatics. 192, 105630 (2024).
Wolf, A. M. D. et al. Screening for lung cancer: 2023 guideline update from the American Cancer society. Cancer J. Clin. 74, 50â81 (2024).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD). Circulation 131, 211â219 (2015).
Miller, M. B., Roumanis, M. J., Kakinami, L. & Dover, G. C. Chronic pain patientsâ kinesiophobia and catastrophizing are associated with activity intensity at different times of the day. J. Pain Res. 13, 273â284 (2020).
Hu, W. Cultural adaptation of the simplified Chinese version of the TSK and FABQ scales and their application in degenerative low back pain: a study. China Natl. Knowl. Infrastructure (2012).
Luszczynska, A., Scholz, U. & Schwarzer, R. The general self-efficacy scale: multicultural validation studies. J. Psychol. 139, 439â457 (2005).
Wang, C., Hu, Z. & Liu, Y. Study on the reliability and validity of general self-efficacy scale. Appl. Psychol. 2001(01), 37â40 (2001).
Zigmond, A. S. & Snaith, R. P. The hospital anxiety and depression scale. Acta Psychiatry Scand. 67, 361â370 (1983).
Xiao, S. Theoretical basis and research application of social support rating scale. J. Clin. Psychiatry 1994(02), 98â100 (1994).
Xie, Y. A preliminary study on reliability and validity of simplified coping style scale. Chin. J. Clin. Psychol. 1998(02), 3â5 (1998).
Austin, P. C. & Steyerberg, E. W. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat. Methods Med. Res. 26, 796â808 (2017).
Zhang, X., Zhang, X. & Chen, J. Relationship between pain-related patientsâ reported outcome and panic level after thoracoscopic lung cancer resection. J. Nurs. Sci. 37, 28â31 (2022).
Hasanin, T. & Khoshgoftaar, T. The effects of random undersampling with simulated class imbalance for big data. in IEEE International Conference on Information Reuse and Integration (IRI) 70â79 (2018). https://doi.org/10.1109/IRI.2018.00018
Jiang, X. & Xu, C. Deep learning and machine learning with grid search to predict later occurrence of breast cancer metastasis using clinical data. J. Clin. Med. 11, 5772 (2022).
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. in Advances in Neural Information Processing Systems vol. 30 (Curran Associates, Inc., 2017).
Lei, T. et al. Establishment and validation of predictive model of Tophus in gout patients. JCM 12, 1755 (2023).
Xie, M., Yin, L., Guo, Y., Zhang, X. & Zhao, R. Current status and influencing factors of kinesiophobia in patients with peritoneal dialysis: A multicenter cross-sectional study. BMC Nephrol. 25, 404 (2024).
Wu, Y. et al. Psychological resilience and positive coping styles among Chinese undergraduate students: A cross-sectional study. BMC Psychol. 8, 79 (2020).
Rauch, A. V. et al. Cognitive coping style modulates neural responses to emotional faces in healthy humans: A 3-T FMRI study. Cereb. Cortex. 17, 2526â2535 (2007).
Aglio, L. S. et al. Surgical prehabilitation: strategies and psychological intervention to reduce postoperative pain and opioid use. Anesth. Analg. 134, 1106â1111 (2022).
Sebastião, R., Bento, A. & Brás, S. Analysis of physiological responses during pain induction. Sens. (Basel). 22, 9276 (2022).
Disney, L. 101 Solution-focused questions for help with depression. J. Evid. Inf. Soc. Work. 13, 576â577 (2016).
Simons, L. E., Elman, I. & Borsook, D. Psychological processing in chronic pain: A neural systems approach. Neurosci. Biobehavioral Reviews. 39, 61â78 (2014).
Yihunie, M. et al. Fear-avoidance beliefs for physical activity among chronic low back pain: A multicenter cross-sectional study. J. Pain Res. 16, 233â243 (2023).
Xinghua, W. The research on strategies and methods for postoperative recovery and functional reconstruction in surgery. MEDS Clin. Med. 4, 35â40 (2023).
Kapikiran, G. & Bulbuloglu, S. The effect of perceived social support on psychological resilience and surgical fear in surgical oncology patients. Psychol. Health Med. 29, 473â483 (2024).
Luo, Q., Liu, F., Jiang, Z. & Zhang, L. The chain mediating effect of spiritual well-being and anticipatory grief between benefit finding and meaning in life of patients with advanced lung cancer: Empirical research quantitative. Nurs. Open. 11, e2179 (2024).
Watanabe, Y. et al. Gender differences on preoperative psychologic factors affecting acute postoperative pain in patients with lumbar spinal disorders. J. Orthop. Sci. 29, 1174â1178 (2024).
Yu, Z., Xie, G., Qin, C., He, H. & Wei, Q. Effect of postoperative exercise training on physical function and quality of life of lung cancer patients with chronic obstructive pulmonary disease: A randomized controlled trial. Med. (Baltim). 103, e37285 (2024).
Soares, F. G. et al. Identification of demographic, clinical and psychological predictors in relation to kinesiophobia of patients in the post-operative musculoskeletal trauma. Braz. J. Phys. Ther. 28, 100728 (2024).
Acknowledgements
This study is very grateful to all the cooperative medical staff and research objects in the First Affiliated Hospital of Jinzhou Medical University.
Funding
This research was supported by the 2024 Liaoning Provincial Science and Technology Joint Program (2024-MSLH-168). We sincerely appreciate the funding provided.
Author information
Authors and Affiliations
Contributions
Chuang Li: Writingâoriginal draft, Visualization, Methodology, Investigation, Formal analysis, Data curation, Conceptualization: Youbei Lin, Xuyang Xiao, Xinru Guo, Jinrui Fei, Yanyan Lu, Junling Zhao: Investigation, Formal analysis Lan Zhang: Writingâreview and editing, Methodology, Resources, Project administration, Data curation, Conceptualization.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisherâs note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, C., Lin, Y., Xiao, X. et al. Development and validation of a risk prediction model for kinesiophobia in postoperative lung cancer patients: an interpretable machine learning algorithm study. Sci Rep 15, 19412 (2025). https://doi.org/10.1038/s41598-025-03575-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-03575-7