Classification of musculoskeletal pain using machine learning

Fouad, Dalia Mohamed; Mahfouz, Marwa Mahmoud; Mohamed, Mohammed Mostafa; Elzanaty, Mahmoud Yassin; Abd El-Hafeez, Tarek

doi:10.1038/s41598-025-12049-9

Download PDF

Article
Open access
Published: 25 July 2025

Classification of musculoskeletal pain using machine learning

Scientific Reports volumeÂ 15, ArticleÂ number:Â 27158 (2025) Cite this article

Subjects

Abstract

Musculoskeletal pain is a significant health concern affecting individuals across various demographics and professions, often leading to reduced productivity and impaired quality of life. This study proposes a framework leveraging Particle Swarm Optimization (PSO) to evaluate and assess musculoskeletal pain risk based on a comprehensive dataset encompassing demographic, professional, physical, and lifestyle characteristics. The dataset includes detailed information on individualsâ€™ pain experiences across multiple body regions, providing a robust foundation for identifying correlations and risk factors. By integrating PSO with neural networks, this framework aims to enhance the detection of pain risk patterns, offering insights into the interplay between various factors and musculoskeletal health. The proposed framework involves data preprocessing, definition of neural network architecture, implementation of PSO, and performance evaluation. The dataset, containing 350 entries, was preprocessed to handle missing values, balance class distributions using SMOTE, and normalize features. A fully connected feedforward neural network with a single hidden layer was employed, with PSO optimizing the networkâ€™s weights and biases. Performance was evaluated using metrics including accuracy, precision, recall, F1-score, and AUC-ROC. The results demonstrate that the PSO-optimized neural network effectively identifies musculoskeletal pain risk, achieving strong performance across all evaluation metrics (accuracy 95.8â€“100%). Key determinants such as age, BMI, exercise frequency, and occupational factors were identified, providing valuable insights for targeted interventions. The frameworkâ€™s performance compares favorably with conventional approaches, highlighting the potential of optimization techniques in musculoskeletal pain assessment and the development of preventive strategies.

Deep convolutional neural network-based algorithm for muscle biopsy diagnosis

Article 02 October 2021

Utilization of telemedicine in conjunction with wearable devices for patients with chronic musculoskeletal pain: a randomized controlled clinical trial

Article Open access 09 January 2025

A calibrated deep learning ensemble for abnormality detection in musculoskeletal radiographs

Article Open access 27 April 2021

Introduction

Musculoskeletal disorders (MSDs) represent one of the most significant global health challenges, with low back pain (LBP) alone affecting approximately 619 million people worldwide in 2020, projected to rise to 843 million cases by 2050¹. According to the 2020 Global Burden of Disease Study, LBP accounted for 69.0 million years lived with disability (YLDs), ranking as the leading cause of global disability. The overall burden of musculoskeletal conditions is even more staggering, affecting over 1.63 billion people and representing the second leading cause of non-fatal disabilities worldwide. These disorders encompass a wide spectrum of conditions, with prevalence rates varying significantly across populations. Among elderly individuals, musculoskeletal pain affects 65â€“85% of the population, with back pain specifically impacting 36â€“70%¹. In working-age adults, 60â€“80% will experience LBP during their lifetime, with prevalence rates in the United States ranging from 10 to 30% at any given time and lifetime prevalence reaching 65â€“80%^2,3. The economic consequences are equally profound. Musculoskeletal conditions create substantial healthcare burdens, with LBP alone ranking sixth in overall disease burden globally. These disorders particularly impact occupational populations, showing elevated prevalence among healthcare workers (75%), office employees (45%), and manual laborers (62%) according to recent epidemiological studies. Understanding these factors and their interactions is critical for developing effective prevention, management, and intervention strategies^4,5,6.

The growing burden of musculoskeletal pain has prompted extensive research into its causes, risk factors, and potential solutions. Traditional approaches to studying musculoskeletal pain often rely on linear statistical models or conventional machine learning techniques, which may not fully capture the intricate, non-linear relationships between variables. For instance, while age, body mass index (BMI), and occupational factors are commonly associated with musculoskeletal pain, their interactions with lifestyle habits such as exercise frequency, work hours, and ergonomic practices are less understood. Moreover, existing studies frequently focus on isolated pain regions or specific populations, neglecting the holistic view necessary for comprehensive pain management. This limitation underscores the need for advanced analytical frameworks that can effectively model the complex interplay of factors contributing to musculoskeletal pain^7,8,9.

Musculoskeletal disorders (MSDs) refer to conditions that affect the bodyâ€™s support system, including muscles, bones, joints, tendons, ligaments, nerves, and surrounding connective tissues. These disorders can cause pain whether temporary or lifelong alongside reduced mobility and diminished dexterity, which ultimately restricts functional abilities and participation in daily life^10,11. Figure 1 (obtained from public website and no permission needed^21,22,23) .highlights the nine body areas assessed for work-related pain in the study. The evaluation included the neck, shoulders, upper back, elbows, wrists/hands, lower back, hips, knees, and ankles/feet - regions most commonly affected by musculoskeletal disorders in academic professionals. These areas were systematically examined to identify pain patterns and their potential links to teaching activities and work habits.

In recent years, machine learning and optimization algorithms have emerged as powerful tools for analyzing complex datasets and making accurate evaluation^24,25,26. Among these, Particle Swarm Optimization (PSO)²⁷ has gained prominence for its ability to efficiently explore large solution spaces and optimize complex functions. Inspired by the social behavior of bird flocking or fish schooling, PSO is a population-based stochastic optimization technique that balances exploration and exploitation to find optimal solutions. Its application spans various domains, including engineering, finance, and healthcare, where it has been used to optimize neural networks, feature selection, and diagnostic modeling. However, the use of PSO in musculoskeletal pain research remains underexplored, particularly in the context of integrating it with neural networks for pain evaluation. This study proposes a framework that leverages PSO to optimize neural network training for musculoskeletal pain classification. The framework is designed to address the limitations of traditional approaches by capturing the complex, non-linear relationships between demographic, professional, physical, and lifestyle factors. The dataset used in this study is comprehensive, encompassing information on individualsâ€™ age, sex, professional rank, work hours, physical attributes (e.g., weight, height, BMI), lifestyle habits (e.g., exercise frequency, extra work), and pain experiences across multiple body regions. By integrating PSO with neural networks, the framework aims to enhance accuracy and provide deeper insights into the determinants of musculoskeletal pain.

The significance of this study lies in its potential to advance the understanding of musculoskeletal pain and improve diagnostic modeling in healthcare. By identifying key determinants of pain and optimizing classification models, the framework can support the development of targeted interventions and preventive strategies. For instance, healthcare professionals can use the insights gained from this study to design personalized exercise programs, ergonomic interventions, and workplace policies that reduce the risk of musculoskeletal pain. Policymakers can leverage the findings to promote public health initiatives aimed at improving musculoskeletal health across different populations. Additionally, individuals at risk of musculoskeletal pain can benefit from early detection and tailored recommendations based on their unique characteristics and lifestyle habits.

Problem statement

Musculoskeletal pain is a multifaceted condition influenced by a combination of demographic, professional, physical, and lifestyle factors. Traditional approaches to analyzing and predicting pain often rely on linear models or conventional machine learning techniques, which may not fully capture the complex, non-linear relationships inherent in the data. Additionally, existing studies frequently focus on isolated factors or specific pain regions, neglecting the holistic view necessary for comprehensive pain management. There is a need for an advanced, optimized framework that can effectively analyze the intricate relationships between diverse variables and predict pain occurrences with high accuracy. This study addresses this gap by proposing a PSO-based framework for musculoskeletal pain classification, aiming to enhance the understanding of pain determinants and improve classification performance.

Research question

How can Particle Swarm Optimization (PSO) be effectively integrated with neural networks to analyze and predict musculoskeletal pain based on a comprehensive dataset of demographic, professional, physical, and lifestyle characteristics?

Research gap

While machine learning approaches for musculoskeletal pain assessment have been extensively studied, the application of advanced optimization techniques in this domain remains relatively underexplored. Gradient-based optimization methods are indeed the standard approach for neural network training, offering well-established advantages in convergence and computational efficiency. However, alternative optimization strategies like Particle Swarm Optimization (PSO) may offer complementary benefits worth investigating, particularly for specific problem configurations or when dealing with certain types of local optima. The current literature on musculoskeletal pain prediction has primarily focused on conventional machine learning architectures with standard optimization approaches. Few studies have systematically examined how hybrid approaches combining neural networks with bio-inspired optimization techniques might perform in this specific application domain. Our work explores this less-traveled path not as a replacement for gradient-based methods, but as a potential alternative worth evaluating in the context of pain prediction, where the nature of medical data and the importance of robust feature selection may present unique opportunities. Our contribution lies not in claiming a methodological gap in optimization techniques generally, but rather in investigating whether PSO-enhanced approaches might offer specific advantages for musculoskeletal pain prediction tasks. This is particularly relevant given the complex, multidimensional nature of pain-related data, where traditional approaches sometimes struggle to capture nonlinear relationships between diverse risk factors. The empirical results we present should be viewed as an exploration of this specific application rather than as a general challenge to established optimization practices.

Contributions

1.
Proposed framework: This study introduces a framework that integrates Particle Swarm Optimization (PSO) with neural networks for musculoskeletal pain classification. The framework is designed to optimize the training process, enhancing the modelâ€™s ability to capture complex relationships within the data.
2.
Comprehensive dataset analysis: The study utilizes a detailed dataset that includes demographic, professional, physical, and lifestyle characteristics, providing a holistic view of factors influencing musculoskeletal pain. This comprehensive approach allows for a more accurate and nuanced analysis of pain determinants.
3.
Optimized classification model: By employing PSO, the framework optimizes the weights and biases of the neural network, improving accuracy and robustness. This optimization process ensures that the model effectively balances exploration and exploitation, leading to strong performance.
4.
Identification of key pain determinants: The framework identifies significant correlations and risk factors for musculoskeletal pain across various body regions, offering valuable insights for targeted interventions and preventive measures.
5.
Performance evaluation: The study conducts a thorough evaluation of the proposed framework using multiple performance metrics, including accuracy, precision, recall, F1-score, and AUC-ROC. This comprehensive assessment demonstrates the frameworkâ€™s effectiveness in predicting musculoskeletal pain.
6.
Practical implications: The findings of this study have practical implications for healthcare professionals, policymakers, and individuals at risk of musculoskeletal pain. By identifying key determinants and optimizing classification models, the framework supports the development of tailored interventions and preventive strategies, ultimately improving musculoskeletal health outcomes.

Related work

The application of artificial intelligence (AI) and machine learning (ML) techniques to predict, detect, and classify musculoskeletal disorders (MSDs) and low back pain (LBP) has become an active area of research. Various studies have explored different models, sensor types, and datasets to achieve progress in posture classification, pain assessment, and risk identification.

Several studies have concentrated on analyzing posture and movement using wearable sensor technology. For instance, Zemp et al.²⁸ employed force and acceleration sensors to collect sitting posture data, reporting that a Random Forest algorithm achieved a mean accuracy of 90.9%. Conforti et al.²⁹ utilized wearable sensors for biomechanical data collection during lifting tasks, with a Support Vector Machine (SVM) reportedly achieving 99.4% accuracy in distinguishing correct from incorrect lifting postures. Donisi et al.³⁰ also used wearable inertial sensors for lifting task analysis, where tree-based algorithms reached accuracies exceeding 90% in binary risk classification. More recently, Rao³⁰ developed an active orthosis for individuals with impaired trunk control using EMG and IMU data, achieving classification accuracies between 87.0% and 95.44%. While these studies highlight the potential of sensor-based AI, it is crucial to note, as the reviewer wisely pointed out, that exceptionally high performance figures are often reported from studies using private, non-benchmark datasets or specific evaluation conditions. Such results, while indicative of model capability within a constrained environment, require cautious interpretation regarding their broader real-world applicability and generalizability.

Other research avenues have involved leveraging survey data or clinical information. Sasikumar and Binoosh³¹ developed a predictive model using survey data from computer professionals to assess MSD risk, with Random Forest and Naive Bayes algorithms demonstrating the highest accuracy at 81.25%. Hanumegowda and Gnanasekaran³² analyzed survey data from airline baggage handlers, reporting that Decision Tree and Random Forest algorithms achieved 100% accuracy in predicting pain frequency. Such perfect scores, particularly with subjective survey data, warrant careful consideration of dataset characteristics, sample size, and the potential for overfitting, necessitating validation on independent datasets. In the domain of clinical text analysis, Vaid et al.³³ fine-tuned a LLaMA-7B model to parse and classify clinical notes related to musculoskeletal pain, achieving high accuracies (e.g., 0.94 for lower back pain, 0.98 for pain location), showcasing the promise of large language models in this area.

Specific applications targeting LBP and related conditions have also been prominent. Phan et al.³⁴ used a Bayesian Neural Network to analyze lifting techniques and pain self-efficacy in people with chronic LBP (CLBP), reporting 97.9% accuracy in predicting pain outcomes. Thiry et al.³⁵ employed IMU and sample entropy (SampEn) data to identify CLBP during bending and reaching tests, where Gaussian Naive Bayes achieved 79% accuracy. Abdel Hady and Abd El-Hafeez³⁶ analyzed trunk movement in 100 postpartum women to predict and classify LBP, reporting perfect classification accuracy (1.0) with CNN and Random Forest models. While these outcomes are promising for the specific cohorts studied, the achievement of perfect or near-perfect scores, particularly with smaller or homogeneous datasets, again underscores the importance of external validation to ascertain generalizability.

Broader reviews provide essential context and highlight methodological trends. Jha et al.³⁷ conducted a systematic review and meta-analysis of AI models for diagnosing temporomandibular disorders (TMDs), finding a pooled sensitivity of 0.91. More comprehensively, Gkikas and Tsiknakis³⁸ performed a systematic review on deep learning methods for automatic pain assessment. Their review discusses various models, methods, and data types (unimodal vs. multimodal, temporal exploitation) used in establishing deep learning-based pain assessment systems. They emphasize the importance of multimodal approaches, especially in clinical settings, and the benefits of incorporating temporal information. Crucially, they also highlight limitations of available pain databases for robust deep learning model development and validation, and advocate for robust evaluation protocols and interpretation methods to ensure objective and comprehensible results from AI systems in real-life scenarios.

Furthermore, the influence of demographic variables on pain is a critical consideration for developing equitable and accurate AI models. Gkikas et al.³⁹ specifically investigated automatic pain intensity estimation by combining features from electrocardiography (ECG) signals with demographic factors such as gender and age. Their work explored the correlation of these factors with pain manifestation and aimed to improve estimation accuracy by incorporating this information. Building upon this, Gkikas et al.⁴⁰ introduced a multi-task neural network for automatic pain estimation that utilizes ECG data along with age and gender information. They demonstrated that such an approach could reveal variations in pain perception among different demographic groups and showed advantages compared to other methods that do not consider these factors. These studies underscore the necessity of integrating demographic data, not merely as potential confounders but as informative features, to enhance the personalization and fairness of AI-driven pain assessment tools.

This collective body of research demonstrates the diverse strategies and data sources being employed in AI for musculoskeletal health. It highlights significant strides in predictive capabilities but also underscores ongoing challenges, particularly concerning the generalizability of models often trained on limited or private datasets, the critical interpretation of reported high-performance metrics, and the imperative to incorporate contextual factors like demographics for developing truly valuable real-world applications.

Materials

Study design and ethical considerations

The research employed a cross-sectional design to examine work-related musculoskeletal disorders among faculty members at universities in Al-Minia Governorate, Egypt. Conducted between June and December 2024, the study protocol received ethical approval from Deraya Universityâ€™s Institutional Review Board (Approval No. DCSR-010-024-19). This investigation pursued two primary objectives: assessing current prevalence rates of musculoskeletal disorders among academic staff and developing classification models for pain assessment. The methodology incorporated both population-level epidemiological analysis and individualized risk classification through standardized data collection procedures.

Sample size determination and statistical power

The required sample size was calculated using the single proportion formula:

$${\text{n }} = {\text{ }}\left( {{\text{Z}}^{2} {\text{ }} \times {\text{ P }} \times {\text{ }}\left( {{\text{1 }} - {\text{ P}}} \right)} \right)/{\text{d}}^{2}$$

(1)

where Z represents the Z-score corresponding to a 95% confidence level, which is 1.96; P denotes the expected prevalence based on prior studies, set at 0.65; and d indicates the desired precision, chosen as 0.05. This calculation yielded a minimum sample size of 350 participants. The prevalence estimate of 65% was derived from comparable studies examining musculoskeletal disorders among academic professionals (Meaza et al., 2020). The selected precision of Â±â€‰5% ensures sufficient statistical power to detect significant associations while maintaining practical feasibility for data collection. This sample size accounts for potential non-response or incomplete data while providing adequate representation across the five targeted academic disciplines.

Participant selection criteria and recruitment

The research targeted faculty members across five academic disciplines: physiotherapy, pharmacy, dentistry, nursing, and medicine. Inclusion criteria mandated at least six months of teaching experience, with no restrictions on academic rank, gender, or upper age limit. Exclusion criteria were implemented to control confounding variables, including recent trauma or surgery (past six months), current pregnancy, pre-existing musculoskeletal/neurological conditions, physical disabilities, and faculty with less than six months of experience. These parameters ensured the study population represented typical cases of work-related musculoskeletal disorders.

Data collection and analytical methods

A comprehensive three-tiered data collection approach was implemented. The methodology included an online demographic survey capturing essential characteristics, administration of the validated Nordic Musculoskeletal Questionnaire (NMQ), and development of computational models. The NMQ assessed pain distribution across nine anatomical regions, symptom characteristics, and work impact. Machine learning algorithms analyzed the multidimensional dataset to identify risk patterns and develop classification models. Collected data was systematically stratified by age, professional experience, and working hours to enable detailed subgroup analysis while maintaining methodological rigor throughout the research process.

Figure 2 (obtained from public website and no permission needed⁴¹) illustrates the nine key anatomical regions evaluated in the study for work-related musculoskeletal disorders (WRMDs) among faculty members. These areasâ€”neck, shoulders, upper back, elbows, wrists/hands, low back, hips/thighs, knees, and ankles/feetâ€”were systematically examined using the Nordic Musculoskeletal Questionnaire (NMQ) to identify pain prevalence, distribution patterns, and functional limitations. The selected regions represent common sites of musculoskeletal complaints in academic professionals, particularly those associated with prolonged sedentary work, repetitive movements, and ergonomic stressors. This comprehensive assessment framework enabled a detailed analysis of pain localization and its potential correlation with specific occupational activities and demographic factors.

Methodology

Dataset characteristics

The dataset contains information on individuals, focusing on their demographic characteristics, professional details, physical attributes, lifestyle habits, and musculoskeletal pain experiences. The data is structured to capture a wide range of variables that may influence or correlate with pain occurrences in different body regions. Below is a detailed description of the datasetâ€™s variables:

1.
Demographic information

The dataset includes several demographic and professional variables. Age refers to the individualâ€™s age in years, while Sex denotes gender, with 0 indicating female and 1 indicating male. Scientific Rank reflects the individualâ€™s academic or professional level, categorized from 1 (Junior/Entry-level) to 5 (Leadership/Executive). Experience Duration in Years indicates the total number of years the person has worked in their field. Working Hours/Day represents the average number of hours worked daily, and Work Days/Week specifies the number of days worked per week. Finally, College identifies the individualâ€™s institutional affiliation, coded as 1 for Physical Therapy, 2 for Dentistry, 3 for Medicine, 4 for Pharmacy, and 5 for Nursing.
2.
Physical attributes

The dataset also includes physical health metrics. Weight in KG represents the individualâ€™s body weight measured in kilograms, while Height in CM denotes their height in centimeters. BMI (Body Mass Index) is calculated using the standard formula: weight in kilograms divided by the square of height in meters (kg/mÂ²), providing an indicator of body fatness.
3.
Lifestyle habits

The dataset includes several key variables related to work and exercise habits. Extra Work is a binary variable indicating whether the individual engages in additional work beyond their primary job (1 for yes, 0 for no). Exercise is another binary variable showing whether the person exercises regularly (1 for yes, 0 for no). Additionally, Exercising Days/Week records the number of days per week the individual exercises, while Exercising Hours/Day measures the average hours spent exercising daily. These variables help analyze the relationship between work habits and physical activity.
4.
Pain-related variables

The dataset contains structured measures of occupation-related musculoskeletal pain, assessed across multiple anatomical regions. For each body region, three distinct pain-related outcomes were evaluated:
1. 1.
  Pain presence: A binary indicator (1/0) of current work-associated pain in the specified anatomical region.
2. 2.
  Functional impairment: A binary variable (1/0) assessing whether the reported pain interfered with activities of daily living.
3. 3.
  Temporal recency: A binary measure (1/0) of pain occurrence within the previous 7-day period.
For instance, each pain was operationalized through three variables:
- Pain_current (dichotomous presence/absence).
- Pain_impairment (functional limitation).
- Pain_last7days (recent occurrence in last 7 days).
This standardized assessment framework was systematically applied across all evaluated anatomical regions to ensure consistent measurement of occupation-related musculoskeletal outcomes. The approach facilitates comparative analysis of pain prevalence, functional consequences, and temporal patterns across different body areas.
5.
Additional notes
- The dataset contains 350 entries, each representing an individual.
- Missing or incomplete data points are represented as 0 or left blank, depending on the context.
- The dataset is suitable for analyzing correlations between demographic, lifestyle, and pain-related variables, as well as identifying potential risk factors for musculoskeletal pain.
- Identifying risk factors for musculoskeletal pain in specific body regions.
- Analyzing the impact of lifestyle habits (e.g., exercise, work hours) on pain occurrences.
- Exploring demographic trends in pain experiences across different age groups, genders, and professional ranks.

This dataset provides a comprehensive foundation for research into musculoskeletal health, particularly about occupational and lifestyle factors.

Figure 3 shows the correlation between the dataset features.

Statistical analysis

Tables 1, 2 and 3 present a detailed statistical analysis of the dataset used in this study. Table 1 includes descriptive statistics such as mean, standard deviation (SD), minimum, and maximum values for numerical variables, along with frequency distributions for categorical variables. Table 2 provides an analysis of pain occurrence frequencies and their impact on activities, as well as their occurrence in the last 7 days. Table 3 summarizes the distribution of participants across different colleges, showing frequency percentages along with the mean age, BMI, and experience duration for each group. These statistical analyses help to better understand the demographic, occupational, and health-related characteristics of the study population.

Table 1 Statistical analysis of the dataset.

Subjects

Abstract

Similar content being viewed by others

Deep convolutional neural network-based algorithm for muscle biopsy diagnosis

Utilization of telemedicine in conjunction with wearable devices for patients with chronic musculoskeletal pain: a randomized controlled clinical trial

A calibrated deep learning ensemble for abnormality detection in musculoskeletal radiographs

Introduction

Problem statement

Research question

Research gap

Contributions

Related work

Materials

Study design and ethical considerations

Sample size determination and statistical power

Participant selection criteria and recruitment

Data collection and analytical methods

Methodology

Dataset characteristics

Statistical analysis

Key observations

The proposed framework

PSO algorithm

Fitness criterion

The pseudo-code of the PSO

The proposed framework steps

Binary PSO

PSO drawbacks

Results and analysis

Evaluation metrics for classification models

The results of the traditional classification machine learning technique

The results of the proposed optimized PSO classification technique

Feature correlations

Comprehensive analysis of correlations across college disciplines and pain experiences

Exercise and physical activity correlations

Pain-related correlations

Discussion and limitations

Key findings and interpretations

Clinical and practical implications

Strengths and limitations

Conclusions and future work

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethical statement

Consent statement

Trial registration

Additional information

Publisherâ€™s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links