Using Random Forest Models to Identify Correlates of a Diabetic Peripheral Neuropathy Diagnosis from Electronic Health Record Data
- PMID: 27252307
- DOI: 10.1093/pm/pnw096
Using Random Forest Models to Identify Correlates of a Diabetic Peripheral Neuropathy Diagnosis from Electronic Health Record Data
Abstract
Objective: To identify variables correlated with a diagnosis of diabetic peripheral neuropathy (DPN) using random forest modeling applied to electronic health records.
Design: Retrospective analysis.
Setting: Humedica de-identified electronic health records database.
Subjects: Subjects ≥ 18 years old with type 2 diabetes from January 1, 2008-September 30, 2013 having continuous data for 1 year pre- and postindex with DPN (n = 35,050) and without DPN (n = 288,328) were identified.
Methods: Demographic, clinical, and health care resource utilization variables (e.g., inpatient and outpatient encounters, medications, and procedures) were input into a random forest model to identify the most important correlates of a DPN diagnosis. Random forest modeling is a computationally extensive, robust data mining technique that accommodates large sets of variables to identify associated factors using an ensemble of classifications trees. Accuracy of the model was evaluated using receiver operating characteristic curves (ROC).
Results: The final random forest model consisted of the following variables (importance) associated with a DPN diagnosis: Charlson Comorbidity Index score (100%), age (37.1%), number of pre-index procedures and services (29.7%), number of pre-index outpatient prescriptions (24.2%), number of pre-index outpatient visits (18.3%), number of pre-index laboratory visits (16.9%), number of pre-index outpatient office visits (12.1%), number of inpatient prescriptions (5.9%), and number of pain-related medication prescriptions (4.4%). ROC analysis confirmed model performance, with an area under the curve of 0.824 and accuracy of 89.6% (95% confidence interval 89.4%, 89.8%).
Conclusions: Random forest modeling can determine likelihood of a DPN diagnosis. Further validation of the random forest model may help facilitate earlier diagnosis and enhance management strategies.
Keywords: Diabetes; Diabetic Peripheral Neuropathy; Electronic Health Records; Health Care Resource Utilization; Random Forest Model.
© 2016 American Academy of Pain Medicine. All rights reserved. For permissions, please e-mail: [email protected]
Comment in
-
Using Random Forest Methods to Identify Factors Associated with Diabetic Neuropathy: A Novel Approach.Pain Med. 2017 Jan 1;18(1):1-2. doi: 10.1093/pm/pnw311. Pain Med. 2017. PMID: 28199716 No abstract available.
Similar articles
-
Healthcare utilization and costs in diabetes relative to the clinical spectrum of painful diabetic peripheral neuropathy.J Diabetes Complications. 2015 Mar;29(2):212-7. doi: 10.1016/j.jdiacomp.2014.10.013. Epub 2014 Nov 8. J Diabetes Complications. 2015. PMID: 25498300
-
Identification of a potential fibromyalgia diagnosis using random forest modeling applied to electronic medical records.J Pain Res. 2015 Jun 10;8:277-88. doi: 10.2147/jpr.s8256. eCollection 2015. J Pain Res. 2015. PMID: 26089700 Free PMC article.
-
Establishment and external validation of an early warning model of diabetic peripheral neuropathy based on random forest and logistic regression.BMC Endocr Disord. 2024 Sep 20;24(1):196. doi: 10.1186/s12902-024-01728-9. BMC Endocr Disord. 2024. PMID: 39304867 Free PMC article.
-
Diabetic peripheral neuropathy: resource utilization and burden of illness.J Med Econ. 2014 Sep;17(9):637-45. doi: 10.3111/13696998.2014.928639. Epub 2014 Jun 17. J Med Econ. 2014. PMID: 24888404
-
Prevalence of peripheral neuropathy in patients with diabetes: A systematic review and meta-analysis.Prim Care Diabetes. 2020 Oct;14(5):435-444. doi: 10.1016/j.pcd.2019.12.005. Epub 2020 Jan 6. Prim Care Diabetes. 2020. PMID: 31917119
Cited by
-
Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review.JMIR Med Inform. 2021 Feb 1;9(2):e23934. doi: 10.2196/23934. JMIR Med Inform. 2021. PMID: 33522976 Free PMC article.
-
Research Progress of Machine Learning in Extending and Regulating the Shelf Life of Fruits and Vegetables.Foods. 2024 Sep 24;13(19):3025. doi: 10.3390/foods13193025. Foods. 2024. PMID: 39410060 Free PMC article. Review.
-
Machine Learning and Data Mining Methods in Diabetes Research.Comput Struct Biotechnol J. 2017 Jan 8;15:104-116. doi: 10.1016/j.csbj.2016.12.005. eCollection 2017. Comput Struct Biotechnol J. 2017. PMID: 28138367 Free PMC article. Review.
-
The Comparative Effectiveness of Monotherapy and Combination Therapies: Impact of Angiotensin Receptor Blockers on the Onset of Alzheimer's Disease.JAR Life. 2023 Jun 20;12:35-45. doi: 10.14283/jarlife.2023.8. eCollection 2023. JAR Life. 2023. PMID: 37441415 Free PMC article.
-
Classification of painful or painless diabetic peripheral neuropathy and identification of the most powerful predictors using machine learning models in large cross-sectional cohorts.BMC Med Inform Decis Mak. 2022 May 29;22(1):144. doi: 10.1186/s12911-022-01890-x. BMC Med Inform Decis Mak. 2022. PMID: 35644620 Free PMC article.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical