27 июня 2022

Development of a machine learning model to interpret the results of laboratory diagnostics in order to identify suspected diseases


Gusev AV, Gavrilenko GG, Gavrilov DV.


The interpretation of the results of quantitative laboratory studies has a number of features and limitations. To eliminate these limitations, the use of medical decision support systems is discussed. And the use of artificial intelligence technologies, in particular NLP technologies for automatically extracting symptoms and other important information from electronic medical records, followed by their interpretation by machine learning models designed to assess the likelihood of a patient having a particular disease.


Studying approaches to the formation of data sets using laboratory parameters and related diseases by the example of developing a machine learning model based on laboratory examination data, age and gender.


The database of electronic health records (EHR) of the Webiomed platform was used. A data set was formed containing input information about patients who underwent laboratory diagnostics, including demographic data (gender, age), laboratory data, date of analysis. The output was presented with information about the final clinical diagnosis, type of treatment (outpatient or inpatient) and treatment outcome. To create a model for identifying suspicions of diseases, the following classification algorithms were used: LogisticRegression, GaussianNB, DecisionTree, RandomForest, xgboost, AdaBoost, LGBM, MLP. Accuracy was chosen as the performance metric of the model. The original training set data was processed in various ways for the purpose of normalization. The total number of records in the training dataset was 201.613.


Ensemble algorithms, decision trees and artificial neural networks showed the highest classification results: LGBM — 58%, xgboost — 59%, DecisionTree — 59%, MLP (multilayer perceptron, the number of hidden layers — 3.147 neurons in each) — 61%, Random Forest — 69%. In order to avoid retraining of models, cross-validation and regularization methods were used.


The study showed that the use of data sets based on features extracted from EHR and machine learning allows you to create models to identify suspected diseases, and step-by-step work on the analysis and preparation of data sets, as well as the use of various machine learning algorithms and their tuning, can consistently increase the accuracy of the models.

Gusev AV, Gavrilenko GG, Gavrilov DV. Development of a machine learning model to interpret the results of laboratory diagnostics in order to identify suspected diseases. Laboratory Service. 2022;11(2):9‑17. (In Russ.). https://doi.org/10.17116/labs2022110219


Subscribe to our newsletter

Are you interested in digital healthcare and artificial intelligence for medicine? Join our mailing list!

Join us

We are in social networks