Development of a machine learning model to interpret the results of laboratory diagnostics in order to identify suspected diseases

Gusev AV, Gavrilenko GG, Gavrilov DV.

INTRODUCTION

The interpretation of the results of quantitative laboratory studies has a number of features and limitations. To eliminate these limitations, the use of medical decision support systems is discussed. And the use of artificial intelligence technologies, in particular NLP technologies for automatically extracting symptoms and other important information from electronic medical records, followed by their interpretation by machine learning models designed to assess the likelihood of a patient having a particular disease.

PURPOSE OF THE STUDY

Studying approaches to the formation of data sets using laboratory parameters and related diseases by the example of developing a machine learning model based on laboratory examination data, age and gender.

MATERIAL AND METHODS

The database of electronic health records (EHR) of the Webiomed platform was used. A data set was formed containing input information about patients who underwent laboratory diagnostics, including demographic data (gender, age), laboratory data, date of analysis. The output was presented with information about the final clinical diagnosis, type of treatment (outpatient or inpatient) and treatment outcome. To create a model for identifying suspicions of diseases, the following classification algorithms were used: LogisticRegression, GaussianNB, DecisionTree, RandomForest, xgboost, AdaBoost, LGBM, MLP. Accuracy was chosen as the performance metric of the model. The original training set data was processed in various ways for the purpose of normalization. The total number of records in the training dataset was 201.613.

RESULTS

Ensemble algorithms, decision trees and artificial neural networks showed the highest classification results: LGBM — 58%, xgboost — 59%, DecisionTree — 59%, MLP (multilayer perceptron, the number of hidden layers — 3.147 neurons in each) — 61%, Random Forest — 69%. In order to avoid retraining of models, cross-validation and regularization methods were used.

CONCLUSIONS

The study showed that the use of data sets based on features extracted from EHR and machine learning allows you to create models to identify suspected diseases, and step-by-step work on the analysis and preparation of data sets, as well as the use of various machine learning algorithms and their tuning, can consistently increase the accuracy of the models.

Gusev AV, Gavrilenko GG, Gavrilov DV. Development of a machine learning model to interpret the results of laboratory diagnostics in order to identify suspected diseases. Laboratory Service. 2022;11(2):9‑17. (In Russ.). https://doi.org/10.17116/labs2022110219

Development of a machine learning model to interpret the results of laboratory diagnostics in order to identify suspected diseases

Share

Subscribe to our newsletter

Join us

We are in social networks