Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 1;18(1):107-115.
doi: 10.1093/pm/pnw096.

Using Random Forest Models to Identify Correlates of a Diabetic Peripheral Neuropathy Diagnosis from Electronic Health Record Data

Affiliations

Using Random Forest Models to Identify Correlates of a Diabetic Peripheral Neuropathy Diagnosis from Electronic Health Record Data

Sarah DuBrava et al. Pain Med. .

Abstract

Objective: To identify variables correlated with a diagnosis of diabetic peripheral neuropathy (DPN) using random forest modeling applied to electronic health records.

Design: Retrospective analysis.

Setting: Humedica de-identified electronic health records database.

Subjects: Subjects ≥ 18 years old with type 2 diabetes from January 1, 2008-September 30, 2013 having continuous data for 1 year pre- and postindex with DPN (n = 35,050) and without DPN (n = 288,328) were identified.

Methods: Demographic, clinical, and health care resource utilization variables (e.g., inpatient and outpatient encounters, medications, and procedures) were input into a random forest model to identify the most important correlates of a DPN diagnosis. Random forest modeling is a computationally extensive, robust data mining technique that accommodates large sets of variables to identify associated factors using an ensemble of classifications trees. Accuracy of the model was evaluated using receiver operating characteristic curves (ROC).

Results: The final random forest model consisted of the following variables (importance) associated with a DPN diagnosis: Charlson Comorbidity Index score (100%), age (37.1%), number of pre-index procedures and services (29.7%), number of pre-index outpatient prescriptions (24.2%), number of pre-index outpatient visits (18.3%), number of pre-index laboratory visits (16.9%), number of pre-index outpatient office visits (12.1%), number of inpatient prescriptions (5.9%), and number of pain-related medication prescriptions (4.4%). ROC analysis confirmed model performance, with an area under the curve of 0.824 and accuracy of 89.6% (95% confidence interval 89.4%, 89.8%).

Conclusions: Random forest modeling can determine likelihood of a DPN diagnosis. Further validation of the random forest model may help facilitate earlier diagnosis and enhance management strategies.

Keywords: Diabetes; Diabetic Peripheral Neuropathy; Electronic Health Records; Health Care Resource Utilization; Random Forest Model.

PubMed Disclaimer

Comment in

Similar articles

Cited by