Use of artificial intelligence technologies in laboratory medicine, their effectiveness and application scenarios: a systematic review

Cover Image


Cite item

Abstract

BACKGROUND: With the increasing volume of data, laboratory medicine requires automation and standardization of routine processes to reduce workload on healthcare professionals and clear their time for more specialized tasks. Machine learning models and artificial neural networks support image recognition and analysis of large data sets, which allows their integration into laboratory workflows to solve routine tasks.

AIM: This study aimed to analyze global scientific publications on the application of artificial intelligence technologies in laboratory medicine and their potential to address current challenges and identify barriers in their integration into laboratory workflows.

METHODS: A search for publications was conducted using PubMed, manufacturer websites offering ready-to-use laboratory solutions, and reference lists from other reviews. The Mendeley software was utilized for bibliographic data management. The search covered the time interval 2019–2024. Obtained data included bibliometric indicators, research areas, key methodological characteristics, diagnostic effectiveness values for artificial intelligence systems and healthcare professionals, the number and experience of involved healthcare professionals, and validated outcomes of artificial intelligence implementation. The study quality was assessed using a modified QUADAS-CAD checklist.

RESULTS: Twenty-three publications presenting studies at the pre-analytical (n = 1), analytical (n = 19), and post-analytical (n = 3) stages of laboratory analysis were included. Most studies focused on cytology and microbiology, accounting for 48% and 35% of the studies, respectively. Artificial intelligence demonstrated high effectiveness in solving tasks across all stages of the laboratory process. Moreover, its diagnostic accuracy was comparable to that of healthcare professionals; however, decision-making speed was higher. All studies demonstrated a risk of systematic bias, which was associated with unbalanced samples, lacking external data validation, and incomplete description of datasets and analytical methods.

CONCLUSION: Artificial intelligence demonstrates high potential in diagnostic accuracy and processing speed, making it a promising tool to be integrated into laboratory practice and automation of routine processes. However, to achieve this, research methodologies for artificial intelligence should be standardized to reduce the risk of systematic bias, establish reference values for laboratories to ensure the reproducibility and generalizability of results, raise awareness among healthcare professionals and patients on how artificial intelligence works to overcome prejudices, and develop reliable mechanisms for protecting personal data when using artificial intelligence.

Full Text

BACKGROUND

Laboratory medicine is a high-volume field with continuous test and data flow. Traditional laboratory diagnostic protocols are time-consuming and require the constant attention of medical professionals [1, 2]. Process automation is especially important in this area because it relieves medical professionals of routine procedures, enabling them to focus on more complex, specialized tasks [3].

Artificial intelligence (AI) technologies ranging from relatively simple machine learning to artificial neural networks have advanced rapidly over the past decade, demonstrating great potential for automating routine processes in laboratory medicine.

AIM

To analyze the global scientific publications on applications of AI technologies in laboratory medicine, evaluate their potential to address current challenges, and identify barriers in their integration into laboratory workflows.

METHODS

Search Strategy

A search for publications was conducted using PubMed [4], websites of AI providers for laboratory medicine, and the snowballing technique (which involves searching for papers in the reference lists from publications).

PubMed search engine. The publication search spanned 2019 to June 1, 2024. The search query was as follows: ((((((((((((artificial intelligence) OR (deep learning)) OR (machine learning) OR (computer vision)) AND (clinical laboratory)) OR (laboratory medicine)) OR (clinical deployment)) AND (pathomorphology)) OR (digital pathology))) AND (computer-aided diagnosis)) OR (diagnostics)) AND (pathological image analysis)) NOT (radiomics).

The following filter options were selected:

  • for Text Availability, to select articles with available full text: Abstract, Full Text;
  • for Article Attribute, to select articles that contain references to associated clinical studies or datasets that confirm the reliability of the obtained results: Associated Data;
  • for Article Type, select the most robust evidence: Clinical Trial, Randomized Controlled Trial.

Mendeley Reference Manager. The publication search spanned 2019–2024. The search used the keyword: artificial intelligence laboratory medicine.

We identified research papers demonstrating the laboratory use of AI technologies by exploring the websites of the manufacturers mentioned in the selected publications, with special focus on the sections containing publications about the use of their equipment:

  • Visiopharm1
  • CyPath Lung2
  • EasyCell3
  • Copan4.

The publication search spanned 2023–2024. We searched for publications that confirm the laboratory implementation of AI technologies.

Reviews on the use of AI technologies in laboratory medicine from 2023 to 2024: 2 publications selected from 12 found [2, 5].

Inclusion Criteria

  • Publications with at least an English abstract
  • Articles published in peer-reviewed journals
  • Preprints
  • Articles published in conference proceedings.

Exclusion Criteria

  • Publications not related to laboratory medicine and computer vision
  • Publications that do not cover issues related to human medicine
  • Reviews
  • Conference abstracts.

The search strategy involved two steps.

  • First, we analyzed the titles and abstracts of all the papers found by search queries and selected studies that met our objectives.
  • Second, we analyzed the full texts of the selected papers and compiled a sample for the primary analysis.

One expert selected the publications, and two experts assessed the final list of included papers. The experts were researchers with more than 10 years of experience in medical informatics.

Our review includes publications demonstrating the application of AI technologies in three key phases of laboratory testing:

  • pre-analytical
  • analytical
  • post-analytical.

We discuss each step in the systematic review separately because the objectives and methods of these steps differ.

Information Extraction and Paper Quality Assessment

The following characteristics were extracted from the full texts of the selected articles:

  • Bibliometric data such as the first author, title of publication, year of publication, Digital Object Identifier (DOI), journal and its impact factor, and country of study origin;
  • Field of research and key characteristics of the study (sample size, study design, external validation, laboratory techniques, and AI models used);
  • Diagnostic performance metrics of AI models (sensitivity, specificity, area under the curve [AUC], accuracy, and some other performance criteria that are widely used in laboratory medicine)
  • Comparison of diagnostic performance of AI models and medical professionals;
  • Number of medical professionals and their qualifications;
  • Evaluation of turnaround time for machine learning models and AI systems (including comparisons with medical professionals);
  • Evaluation of potential cost-effectiveness of AI implementation;
  • Validated results of AI implementation.

We calculated the mean diagnostic performance metrics using the median and 95% confidence interval (CI) from all applicable studies. The quality of the selected publications was assessed using a Quality Assessment of Diagnostic Accuracy Studies Computer-Aided Detection (QUADAS-CAD) tool [6], which was developed for studies using AI.

One expert extracted the information and assessed the quality of the papers. Two experts with more than 10 years of experience in medical informatics evaluated the results.

RESULTS

Publication Search and Selection

The first step yielded 2036 publications:

  • PubMed: 551
  • Mendeley Reference Manager: 1335
  • Websites: 17
  • Reviews: 133.

The second step yielded 58 publications and excluded 1978. The primary analysis included 23 publications (Supplement 1). The systematic review excluded 35 publications (Supplement 2). The main reasons for exclusion were as follows:

  • Lack of access to full text
  • Lack of AI use
  • Technical development without medical data analysis.

Of the included papers, one focused on the pre-analytical phase, 19 focused on the analytical phase, and 3 focused on the post-analytical phase.

Supplement 3 describes key characteristics of the studies presented in the included publications.

Supplement 4 characterizes samples, machine learning models, and off-the-shelf AI solutions employed in the studies.

Pre-analytical Phase

A study focusing on the pre-analytical phase of laboratory testing explored the potential for using AI technologies to identify mislabeled test tubes [7]. Their performance in quality control was compared with the results achieved by laboratory medical personnel. Such validation typically relies on delta-check methods, which involve comparing a patient’s consecutive laboratory results over time to identify potential errors based on significant differences. This retrospective study lacked external validation and testing in real laboratory settings. The labeling errors were simulated at the 50% level. The authors developed, trained, and tested eight machine learning models in the R environment (Supplement 5). Comparisons were performed by 50 medical professionals with varying levels of experience (Supplement 6). Note that no significant correlations were found between the accuracy of quality control by medical professionals and their level of experience (p > 0.1). All eight models (0.865–0.921) were more accurate than medical personnel (0.778) at identifying mislabeling (Supplement 6). The neural network model was the most effective (0.921), whereas the simple decision tree model was the least effective (0.865).

Analytical Phase

Of the 19 found publications, most presented studies on the analytical phase of laboratory testing. These studies focused on cytology, microbiology, histopathology, parasitology, or their combinations.

The cytology tasks were distributed as follows:

  • Diagnosis of tumor: 2 studies (blood test; sputum culture) [8, 9]; diagnosis of hematological disorders: 6 studies (blood test 5; bone marrow smear 1) [10–15].

The microbiology tasks were distributed as follows:

  • Assessment of Escherichia coli resistance to 19 types of antibiotics: 1 study (complete blood count and urinalysis) [16];
  • Detection of group A streptococci (Streptococcus): 1 study (oropharyngeal mucus culture using agar and blood agar) [17];
  • Classification of positive and negative urine cultures via colony counting without cell morphology: 1 study (blood and MacConkey agar) [18];
  • Identification of mycobacteria in human tissues using Ziehl–Neelsen staining: 2 studies [19, 20];
  • Diagnosis of vaginitis using vaginal smears: 2 studies [21, 22].

Two publications demonstrated a combined cytological and microbiological testing with urine microscopy to identify pathogens and diagnose urinary tract infections by detecting and counting sediment elements [23, 24].

Mathison et al. [25] detected intestinal protozoa in trichrome-stained stool specimens. The following microorganisms were used for detection:

  • Giardia duodenalis, its cysts and trophozoites;
  • Intestinal amoebae (Entamoeba hartmanni, Entamoeba non-hartmanni, or large Entamoeba spp.) and their trophozoites;
  • Dientamoeba fragilis;
  • Blastocystis ;
  • Chilomastix mesnili and its trophozoites;
  • Endolimax nana and their trophozoites;
  • Iodamoeba buetschlii and its trophozoites;
  • Erythrocytes;
  •  

Additionally, the models were trained to identify yeast as a negative class to prevent misclassification. The model was trained to recognize only the active stage of trophozoites for Entamoeba spp., C. mesnili, E. nana , and I. buetschlii. However, the model failed to identify their cysts because of an insufficient number of training samples and poor morphological expression with the staining technique used.

Of the 19 studies reviewed, the following were conducted during the analytical phase of laboratory testing:

  • Multicenter (used data from several laboratories): 8 (42%) [9, 12, 13, 15, 18, 21];
  • Single-center (used data from only one laboratory): 11 (55%) [7, 8, 10, 11, 14, 16, 17, 19, 22, 23, 25];
  • Retrospective: 17 (90%) [8–11, 13–21, 23–26];
  • Prospective: 1 (5%) [22];
  • Retrospective with prospective validation: 1 (5%) [12].

Two studies (10%) used external validation [12, 13].

Different parameters were used to describe the sample size in studies (Supplement 3):

  • Number of patients;
  • Number of specimens (smears, tests);
  • Number of images and image areas.

Sample size can vary greatly across studies:

  • Number of patients: 103–8021;
  • Number of specimens: 167–212,554;
  • Number of images: 510–695,030;
  • Number of image areas: 260,000–7,000,000.

In studies with multiple sample sizes, the number of specimens always exceeded the number of patients, and the number of analyzed images always exceeded the number of specimens.

Of 19 papers, 8 provided information on patient age, with wide age ranges both within and across papers. Of the 19 papers, 10 included information about the sex ratio of the samples. Two studies [8, 12] had relatively balanced sex ratios. However, anemia was diagnosed more often in women [12]. Studies focusing on diagnosing urinary tract infections predominantly included women [16, 23] because these infections are more common in women. Some papers had imbalanced sex ratios, but no reasons were specified [9, 15, 20, 26]. One study specified the race and ethnicity of the patients [9]:

  • White patients with no malignant neoplasms: 110 (90.2%);
  • White patients with confirmed malignant neoplasm: 25 (89.3%);
  • Non-white patients with no malignant neoplasms: 12 (9.8%);
  • Non-white patients with confirmed malignant neoplasm: 3 (10.7%);
  • Hispanic or Latino with no malignant neoplasms: 15 (12.3%);
  • Hispanic or Latino with confirmed malignant neoplasm: 8 (28.6%)
  • Non-Hispanic or non-Latino with no malignant neoplasm: 104 (85.2%);
  • Non-Hispanic or non-Latino with confirmed malignant neoplasm: 18 (64.3%);
  • Patients with no malignant neoplasms and no data on race and ethnicity: 3 (2.5%);
  • Patients with confirmed malignant neoplasm and with no data on race and ethnicity: 2 (7.1%).

In 12 studies, the authors used their own models based on various machine learning algorithms. In 7 studies, the authors used off-the-shelf commercial models, with the following information on conflicts of interest provided:

  • No conflict of interest: 2 [11, 14];
  • No information about the absence of conflicts of interest: 2 [8, 26];
  • Declaration of conflict of interest (when the study was sponsored by the equipment manufacturer, e.g., when the manufacturer provided equipment and materials, or when the authors were present or former employees of the equipment distributor): 3 [9, 17, 18].

Six studies compared models obtained using various algorithms [9, 12, 15, 16, 23, 24], and nine studies evaluated their diagnostic performance compared with that of medical professionals [8, 11, 12, 17, 18, 20–22, 25].

Diagnostic performance of artificial intelligence in the analytical phase

Supplement 5 presents data on the diagnostic performance of AI models.

The mean generalized performance metrics of the machine learning models were quite high:

  • Sensitivity: 0.923 (95% CI 0.921–0.924), n = 34;
  • Specificity: 0.940 (95% CI 0.939–0.942), n = 34;
  • AUC: 0.915 (95% CI 0.914–0.916), n = 14;
  • Accuracy: 0.929 (95% CI 0.928–0.930), n = 37.

Performance metrics can vary greatly depending on the application and area of laboratory medicine.

For example, machine learning models based on blood test results for diagnosing anemia [12] and tumors and anemia [10] demonstrated the following performance range (min and max):

  • sensitivity: 0.930–0.980
  • specificity: 0.920–1.000
  • AUC: 0.900–0.990.

Machine learning models using sputum smear data for cancer diagnosis [9] also demonstrated high-performance metrics (min and max):

  • sensitivity: 0.820–0.920
  • specificity: 0.770–0.880
  • AUC: 0.850–0.940).

The performance metrics (min and max) of machine learning models for tumor diagnosis based on bone marrow smears [13] were as follows:

  • sensitivity: 0.857–0.992
  • specificity: 0.917–0.933
  • AUC: 0.970–0.990
  • accuracy: 0.914–0.929.

The quality of blood cell identification and counting varied greatly depending on the type of cells being tested. Yoon et al. [11] provided metrics for the diagnostic accuracy of cell classification using a digital morphological analyzer, involving a hematologist at the final step. However, the article did not present a detailed algorithm for expert verification, nor did we find one in the description of the Vision Pro® digital morphological analyzer (West Medica, Austria) on the manufacturer’s website.5 The analyzer demonstrated high sensitivity for normal leukocytes and nucleated erythroid blood cells (0.801–0.980) and low sensitivity for blasts, myelocytes, and metamyelocytes (0.765, 0.480, and 0.505, respectively). However, the specificity was high for all cell types (0.981–1.000).

Elagina et al. [15] compared various machine learning models for blood cell identification. Note that classification models based on convolutional neural networks and support vector machines had the highest performance in terms of diagnostic accuracy. However, the support vector machine was overfitted and computationally expensive. The k-nearest neighbor classification model was less accurate than the convolutional neural network and support vector machine models.

Ayyıldız et al. [16] used machine learning methods to assess antibiotic resistance of E. coli. The model accuracy depended on the machine learning technique and the type of antibiotic (0.680–0.980). Additionally, AI technologies can accurately diagnose mycobacteria in human tissues (sensitivity: 0.957–0.987; specificity: 0.987–1.000; AUC: 0.980; accuracy: 0.983–0.988) [19, 20]. The performance metrics of machine learning models used to diagnose bacterial vaginitis depended on the application: sensitivity of 0.841–0.957 and specificity of 0.659–0.994 [21, 22].

The machine learning model may effectively identify streptococci in agar cultures, achieving a sensitivity of 0.906 and a specificity of 0.940. The model for detecting bacterial colonies in urine cultures had a high sensitivity of 0.998 and a moderate specificity of 0.720 [17].

Burton et al. [23] demonstrated that AI can reduce the laboratory workload by decreasing the number of cultures required. Various machine learning models were used for urine microscopy (cell and bacterial count models) to determine the need for further culturing. The authors found that the extreme gradient boosting model was the most effective. For example, using this model instead of a standard automated microscopy heuristic model transforms one out of every 4 false positives into true negatives and one of every 11 false negatives into true positives. The authors suggest evaluating the results of testing in pregnant women and pediatric patients separately. Avci et al. [24] developed a convolutional neural network-based model for detecting different elements in urine sediment, achieving high accuracy rates of 0.962–0.986.

Wallace et al. [26] evaluated the potential for AI technologies to reduce false negative rates in detecting intestinal neoplasia. For this task, patients underwent two consecutive colonoscopies on the same day. Group 1 underwent AI-assisted colonoscopy first and then standard colonoscopy. By contrast, group 2 underwent a standard colonoscopy first, followed by an AI-assisted colonoscopy. The authors calculated the adenoma miss rate (AMR), defined as the number of histologically confirmed lesions detected during the second colonoscopy divided by the total lesions detected in both examinations performed on the same day. Additionally, the authors calculated the mean number of lesions detected at the second colonoscopy and the proportion of false-negative cases, which were defined as having no lesions at the first colonoscopy and having at least one at the second. The AMR rates in groups 1 and 2 were 0.155 (38 of 246) and 0.324 (80 of 247), respectively. Note that the AMR rates were lower in group 1 for lesions no larger than 5 mm (0.159 vs. 0.358) and for non-polypoid lesions (0.168 vs. 0.458). Additionally, AMR rates were lower in the proximal (0.183 vs. 0.325) and distal (0.108 vs. 0.321) colon. The mean number of adenomas based on the second colonoscopy was lower in group 1 than in group 2 (0.330 ± 0.630 vs. 0.700 ± 0.970, p < 0.001). The false negative rates were 0.068 (3 of 44 patients) in group 1 and 0.296 (13 of 44) in group 2.

Comparison of the diagnostic performance of artificial intelligence and medical professionals in the analytical phase of laboratory testing

Supplement 6 provides a comparative analysis of the diagnostic performance of AI and medical professionals during the analytical phase of laboratory testing.

Studies comparing the diagnostic performance of AI with that of medical professionals showed that AI either outperforms them [12, 17, 20] or performs similarly [8, 11, 17, 18, 21, 22]. Two studies showed that AI spent less time on each patient than human experts did, suggesting a higher data processing rate [12, 20]. In some cases, such as when counting different types of blood cells, the results can vary greatly depending on the cell type. Machine learning models and medical professionals demonstrated high levels of agreement for some cell types and low levels for others [11]. Some cases are discussed in detail below.

In a clinical laboratory setting, a machine learning model that predicted low blood ferritin levels based on complete blood counts and C-reactive protein levels was more accurate than clinical pathologists. The model achieved a sensitivity of 0.930–0.980 and a specificity of 0.920, whereas clinical pathologists had a sensitivity of 0.830–0.880 and a specificity of 0.910–0.920. Additionally, AI can substantially reduce decision-making time: <1 second per patient compared with 19–20 seconds for pathologists and 13–16 seconds when using AI as an auxiliary tool.

The authors suggest that a machine learning model based on routine laboratory testing can accurately predict low ferritin levels in patients with anemia [12].

The agreement between AI-based histological analysis and the standard protocol was evaluated for the diagnosis and staging of malignant neoplasms using the tumor cell proliferative activity index (Ki-67). The intraclass correlation coefficient for Ki-67 was 0.960 (95% CI 0.940–0.980); for disease staging, the quadratic weighted κ was 0.860 (95% CI 0.810–0.910). The obtained results demonstrate a high level of agreement. The authors believe that using AI to determine Ki-67 is comparable to manual assessment in terms of diagnostic accuracy and is an effective, time-saving auxiliary tool for disease diagnosis [8].

Van et al. [17] evaluated the ability of PhenoMATRIX® (Copan Diagnostics, Inc., USA) with an AI-based chromogenic detection module to automatically identify Group A Streptococcus colonies grown on Colorex Strep A Agar® (CHROMagar, France). Data obtained using the software was compared to manual readings performed by experienced medical laboratory technicians who were trained to interpret chromogenic media. The AI-based software demonstrated higher sensitivity than the technicians, with comparable specificity. The authors suggest that implementing this tool to detect Group A streptococcus colonies on chromogenic media could optimize workflow by reducing turnaround time. Note that in Russia, the interpretation of laboratory results is restricted to licensed physicians, which limits the practical impact of these findings.

Yoon et al. [11] calculated the difference between leukocyte counts determined by three different techniques:

  • Manual counting by two hematologists using a light microscope at 200x magnification;
  • Using a Vision Pro® digital morphological analyzer (West Medica, Austria);
  • Using this digital morphological analyzer with final expert reclassification (in accordance with the undisclosed methodology of the software developer).

Three counting techniques showed a high degree of variability in agreement for different cell types.

Faron et al. [18] used the AI-based WASPLab® software (Copan, Italy) to automate the analysis of urine cultures using blood agar and MacConey agar. Manual counting served as the reference test. The performance metrics for software-assisted urine culture colony counting were 0.998 for sensitivity and 0.720 for specificity. The authors recommend this tool for image interpretation because of its high sensitivity, making it suitable for use in laboratories for batch analysis of negative cultures to enhance workflow. The discrepancy between manual and automated counting was attributed to the presence of microcolonies. Standardizing colony detection thresholds is a key step in automating culture counting. The laboratories that submitted results for the study used different criteria to define positive and negative results, which complicated the application of AI technologies and reduced the performance of the tool.

Zurac et al. [20] proposed an automated technique that used deep neural networks to identify Mycobacterium tuberculosis in Ziehl–Neelsen-stained samples and human tissues. This technique demonstrated superior diagnostic performance (sensitivity: 0.957; specificity: 1.000; accuracy: 0.983) compared with that of pathologists (sensitivity: 0.391–0.957; specificity: 0.756–0.946; accuracy: 0.833). The mean time spent by pathologists examining each specimen ranged from 5.48 to 17.06 minutes. Analysis of positive slides was faster than analysis of negative slides (both true and false negatives). The longest analysis time was reported for negative cases (true negative for seven medical professionals and false negative for one medical professional), and the shortest duration was reported for true positive cases. The time pathologists took to examine the specimen using AI ranged from 9 seconds to 2 minutes for positive slides (mean: 0.61 minutes). Therefore, the AI-based automatic identification technologies used in this study saved pathologists at least a third of their time. Additionally, it decreases the likelihood of errors caused by fatigue or inattention.

Two studies compared the performance of AI technologies and medical professionals in diagnosing vaginitis. These studies showed that the diagnostic accuracy of AI was comparable to that of medical professionals [21, 22]. Wang et al. [21] found that the sensitivity was 0.914 for the convolutional neural network model and 0.943 for medical professionals (three laboratory technicians and two obstetrician-gynecologists).

This model demonstrated higher specificity (0.913 vs. 0.731) and accuracy (0.893 vs. 0.837). The authors highlighted that changes in image quality, such as color and brightness, affect the accuracy of convolutional neural network models. These studies suggested that automated microscopy could improve the quality of primary diagnoses of infectious and non-infectious vaginitis.

Mathison et al. [25] found a high agreement between a convolutional neural network model and the detection and classification of intestinal protozoa in trichrome-stained stool samples. The levels of positive and negative agreement were 0.989 (95% CI 0.938–1.000) and 0.981 (95% CI 0.934–0.998), respectively.

The model demonstrated high reproducibility across slides containing multiple parasite classes, a single class, or no parasites. The authors believed that digital slide scanning with a validated convolutional neural network model effectively supplements the traditional technique of detecting intestinal protozoa.

Confirmed results of the implementation of artificial intelligence technologies in the analytical phase of laboratory testing

Kurstjens et al. [12] implemented a developed machine learning model in the laboratory for one month in test mode. The Python script was integrated into the in-house laboratory system. All test results from adult primary care patients with anemia were prospectively analyzed in October 2021. Ferritin levels were measured in all adults. After analyzing data from 391 patients over 21 days, the machine learning model identified 18 new cases of iron deficiency that had previously been unnoticed. The authors suggested that AI technologies could be useful tools for medical professionals because they can quickly and accurately detect low blood iron levels. However, these technologies have some limitations. For example, this model was validated for use with a specific group of patients: adult patients diagnosed with anemia. Additionally, different laboratories use different reference values for blood ferritin levels.

None of the reviewed studies evaluated the cost-effectiveness of implementing AI technologies into laboratory practices.

Post-analytical Phase

Studies in the post-analytical phase of laboratory testing focused on quality control, which involved identifying errors and abnormal values in test results. Note that the use of AI technologies in the post-analytical phase was evaluated primarily in the following applications:

  • Big Data analysis;
  • Data simulation (e.g., artificial error injection);
  • Delta check, which refers to the comparison of sequential laboratory results for the same patient over time.

In all three studies, AI outperformed classical statistical methods of patient-based real-time quality control (PBRTQC).

Liang et al. [27] evaluated a new data stability protocol that combines delta data and machine learning techniques to improve QC event detection. The authors compared delta-type and same-type data, both of which were processed using PBRTQC cutoffs based on a statistical method. Seven blood parameters were compared (Supplement 3). The number of patients affected by the bias from the time of injection until detection was also used as a clinical parameter, with a minimal value considered optimal. The study showed that the random forest model substantially outperformed standard PBRTQC statistical methods (Supplement 5).

Zhou et al. [28] compared various algorithms for the postanalytical quality control of total prostate-specific antigen testing. The study evaluated four traditional models of quality control (PBRTQC): Moving Average; Moving Median; Moving Standard Deviation; Moving Sum of Number of Patient Results. Additionally, the authors compared the performance rates of three machine learning models: Random Forest, Support Vector Machines, and Neural Networks. They also tested an information entropy fusion algorithm that combined all three machine learning models. Eight different error levels (0.01–0.20 μg/L) and six different block sizes were used in the simulation. All machine learning models, as well as their combinations, outperformed the standard PBRTQC algorithm in terms of diagnostic accuracy. The fusion model outperformed all three machine learning models. The random forest model showed a tendency to overfit. The support vector model had issues with multiple classifications, whereas the neural network model performed poorly in decision-making. The fusion model was more accurate than the following models:

  • support vectors by 8.7%
  • random forest by 9.6%
  • neural network by 6.9%
  • standard PBRTQC by 20%.

Additionally, the performance of PBRTQC models depended on the level of injected error, whereas machine learning models demonstrated stable performance regardless of the error level.

Wang et al. [29] tested the performance of several machine learning models in verifying biochemical test data that included 52 parameters. The final generalized model showed pass and false negative rates of 89.60% and 0.095%, respectively. The resulting model reduced the number of invalid reports by approximately 80% compared with reports assessed using a standard algorithm. This increased efficiency and reduced the workload of biochemistry laboratory personnel. Note that two false negative reports were attributed to patients with extremely high or low ages for the sample: 4 months and 92 years. Additionally, the pass rates of the standard laboratory algorithm varied during the workflow (50.20%–65.10%), whereas those of the machine learning model remained relatively stable (87.00%–94.00%).

Assessment of Study Methodology Quality

Supplement 7 presents the results of the methodological quality assessment for the reviewed studies using a modified QUADAS-CAD tool.

All the reviewed studies (n = 23, 100%) were susceptible to bias because of the methods employed (Fig. 1). Only a few studies had balanced samples in terms of pathology level (n = 3, 13.0%) and demographic characteristics (n = 2, 8.3%). All the analyzed cases had a high risk of bias because the relevant questions were signals for the D1 domain (Patient Selection). In some studies (n = 9; 39.1%), the methodology description did not provide sufficient information on whether the training and test sets overlapped, and this was a key issue within the D2 (Index Test) domain. In some cases, the following D2 domain question (e.g., “If an abnormality cutoff threshold was used, was it pre-specified?”) and D3 domain questions (e.g., “Can the reference standard correctly classify the target condition?” and “Were the results of the reference standards prepared or verified with the required level of expertise?”) were not applicable. Some studies [11, 18, 19, 25] evaluated the performance of models to count various cell types without diagnosing diseases, whereas other studies employed data simulation [7, 27–29].

 

Fig. 1. Assessment of the risk of bias using the modified QUADAS-CAD questionnaire. QUADAS-CAD, Quality Assessment of Diagnostic Accuracy Studies-Computer-Aided Detection, a special, modified questionnaire for assessing the risk of bias and the applicability of studies in artificial intelligence technologies.

 

In most cases (n = 19; 82.6%), the reference standard classified the target conditions correctly. However, nine studies (39.1%) inadequately defined the level of expert training and the criteria used to assess reference standards. The D4 domain, which assesses the transparency of the obtained results, was found to have the minimal risk of bias.

DISCUSSION

Applications of Artificial Intelligence Technologies

A systematic review revealed the wide range of laboratory medicine applications of AI technologies. The analytical, pre-analytical, and post-analytical phases require special attention and represent three broad categories, each with specific tasks and solutions. During the analytical phase of laboratory testing, AI technologies are primarily used to identify objects of different morphologies in images and perform quantitative analyses. This task should be addressed across a wide range of applications:

  • Blood testing to diagnose various hematological diseases, such as leukemia and anemia;
  • Urine sediment testing to detect urinary tract infections;
  • Detection of various microorganisms, from bacteria to protozoa, in tissue samples, smears, and cultures;
  • Flow cytometric analysis of sputum samples used to detect malignant lung neoplasms;
  • Analysis of bone marrow biopsy results to detect hematological cancers.

In most of the reviewed studies, the authors used solutions they created in the Python or R environments. Convolutional neural network models are the most effective in terms of diagnostic accuracy, turnaround time, and ability to prevent overfitting. However, some studies analyzed blood samples, urine cultures, oropharyngeal mucus, and sputum samples via flow cytometry using off-the-shelf commercial solutions.

In the pre-analytical and post-analytical phases, the primary objective is to ensure the quality of the data [2, 30], which involves identifying mislabeled test tubes and searching for outliers or erroneous values in test results for each patient and the general tested sample. In this context, big data tools are required [31].

The uneven distribution of work across the laboratory testing phases, with a significant predominance at the analytical phase, reflects the structure of our search query. Further research is needed on each of these applications.

Diagnostic Performance of Artificial Intelligence and Its Implementation in the Laboratory Process

All the reviewed studies demonstrated that machine learning models have sufficient diagnostic performance for implementation in laboratory practice. The AI performance is comparable with that of highly qualified medical professionals and exceeds the performance of novice specialists. For example, for the analytical phase, the pooled estimates of AI diagnostic performance were as follows:

  • sensitivity: 0.923
  • specificity: 0.940
  • accuracy: 0.929.

In the pre-analytical phase, the accuracy of machine learning models in detecting mislabeled tubes was 0.865–0.921. In the post-analytical phase, the performance metrics of the AI use in terms of data quality control (i.e., sensitivity, specificity, and accuracy) reached 0.990. When analyzing images and data, trained machine learning models substantially outperformed medical professionals in diagnostic turnaround time.

Introduction of Artificial Intelligence Technologies into Laboratory Processes and Existing Barriers

Although AI has proven to perform well in experiments, cases of its practical implementation are rare, and it is still in the experimental stage. We only found one publication on the experience of implementing AI technologies experimentally in laboratory practice for one month. Note that there are some challenges hindering this process [32, 33].

  • The reasons for this are related to the structure of the experimental studies in which the estimates of high AI performance were obtained. A significant and prevalent issue in this context was the imbalance of samples when testing machine learning models [34]. An imbalance was observed in relation to the predominance of patients with target conditions within the sample size. This can be explained by the fact that these studies were conducted in specialized healthcare organizations and researchers only had access to these samples. However, this increases the risk of bias within the study and reduces the real-world performance of the trained model. The same applies to imbalanced samples based on demographic characteristics such as sex and age. The experimental sample typically included all available patients. However, studies have found that AI performance can vary greatly across different age groups and in pregnant women.
  • Most studies lacked external validation of machine learning model testing results, which prevented the generalization of experiment results. For example, image quality (e.g., color and brightness) was found to affect the AI performance.

Some studies divided one image into several smaller ones to increase the sample size. They are then used as independent cases within the general sample for AI applications. A preliminary check is required before this procedure to ensure the analyzed areas of a single image are independent of each other. Absence of such a check can result in pseudo-replication and an overestimation of the model’s performance.

Additionally, studies with conflicts of interest were identified because they were sponsored by the equipment manufacturer, either directly or indirectly. This increased the risk of bias when assessing AI performance. However, we did not find any studies indicating poor AI performance. This indirectly suggests publication bias, which occurs when negative results are unavailable to the scientific community.

In some cases, the overall complexity of medical tasks was also difficult to overcome. For example, AI systems may identify certain cell types in an image exceptionally well, yet fail to recognize others at an unacceptable level.

The structure of processes and the laboratory’s needs are often more complex than the experimental conditions. Additionally, even laboratories in the same specialty may have different working conditions (e.g., use of their own reference values). The need for AI technologies is driven by the desire for time and resource savings. However, there are factors that can offset the potential benefits of using them in real-world conditions:

  • The need for many annotated images;
  • The variability in sensitivity and specificity (e.g., high performance in detecting true positive cases but lower performance in detecting true negative cases);
  • The need to train healthcare professionals;
  • Mandatory human verification of classification results;
  • Difficulty interpreting the conclusions and variability of the results because the operation of the neural network is a black box for the user.

The time and financial costs of implementing AI technologies in laboratory practices remain uncertain.

Therefore, machine learning models have good potential as an auxiliary tool for healthcare professionals in the field of laboratory medicine. AI technologies can automate and standardize routine laboratory processes [35, 36] and solve a wide range of pathological tasks. More experimental research on AI applications is needed to address existing methodological issues [37] and evaluate the cost-benefit ratio of incorporating AI into laboratory practices.

Limitations of Systematic Review

Laboratory medicine encompasses a wide variety of tasks spanning many medical and technological areas. We used PubMed and Mendeley Reference Manager to search for and select publications on the most common topics. However, the wide range of tasks, the limited structure of the query, and the limited access to some studies prevent us from addressing all the existing issues, and some topics are not covered. These areas include the robotization of sample collection, such as venous blood collection [38], test referral, and prognosing. Because of the wide variety of tasks in the reviewed studies and the resulting inability to group models developed using different machine learning techniques, we calculated mean estimates of diagnostic performance metrics for individual models without conducting a meta-analysis, as recommended by the Cochrane Handbook [39].

In addition to medical, technical, and economic considerations, the implementation of AI technologies raises ethical concerns [40]. For example, both healthcare professionals and patients face various fears and psychological challenges caused by a lack of awareness. Another important issue when using AI technologies is to ensure the protection of patients’ personal data. Each of these questions requires its own research and solution.

Note that the practical application of machine learning models and AI-based systems requires obtaining AI-enabled medical device status. It is centrally assigned in the Russian Federation (Federal Service for Surveillance in Healthcare [Roszdravnadzor}, marketing authorization) and the United States (Food and Drug Administration, FDA), or decentralized by accredited, authorized private bodies in European Union countries (CE marking) [41]. The reviewed publications, which included the experimental implementation of AI technologies in laboratory processes [12], did not provide information on the registration status of medical devices. This confirms that these technologies are in the early stages of being introduced to the relevant medical field.

CONCLUSION

The use of AI technologies is relevant at every stage of the laboratory process.

An analysis of the identified studies revealed their distribution across all three phases of laboratory testing: pre-analytical, analytical, and post-analytical, with most studies (83.6%) conducted for the analytical phase. The primary focus was on diagnosing hematological and oncological diseases. The review also included studies that aimed to identify pathogenic bacteria in tissue samples, urine, and smears. Additionally, there were studies focused on parasitology and histopathology (one study each). The papers on the pre-analytical and post-analytical phases aimed to develop effective quality control methods for laboratory reports using AI technology. The implementation of AI in laboratory medicine is still in its early stages, as evidenced by the prevalence of proprietary developments. Of the reviewed works, only 30.4% used off-the-shelf commercial solutions.

Machine learning models and AI-based systems have performance rates that are comparable with or higher than those of highly trained healthcare professionals. However, our evaluation of the methodological quality of the reviewed studies revealed a high risk of bias in all assessed domains, except transparency of results. The high risk of bias is caused by the imbalance of samples in terms of the presented conditions and demographic characteristics, potential pseudo-replication of data, and the lack of external validation, which together complicate generalization.

The overestimation of AI performance in the reviewed studies indirectly suggests that few attempts have been made to implement the developed models in routine practice. We identified one experiment that produced positive results when detecting new cases of iron deficiency.

AI technologies have the potential to greatly improve the quality and turnaround time of routine laboratory testing by facilitating automation and standardization, thereby allowing medical personnel more time to focus on more complex tasks. However, a comprehensive solution to issues related to the assessment of AI reliability, reproducibility, and practical application is necessary for the full implementation of AI into laboratory practice.

ADDITIONAL INFORMATION

Supplement 1: List of publications included in the systematic review and their characteristics. doi: 10.17816/DD635349-4334766

Supplement 2: List of publications excluded from the systematic review. doi: 10.17816/DD635349-4334769

Supplement 3: Key characteristics of the studies presented in the included publications. doi: 10.17816/DD635349-4334770

Supplement 4: Sample characteristics, machine learning models, or commercial off-the-shelf solutions presented in the studies. doi: 10.17816/DD635349-4334771

Supplement 5: Effectiveness of artificial intelligence in the studies. doi: 10.17816/DD635349-4334772

Supplement 6: Comparative analysis of diagnostic effectiveness of artificial intelligence and healthcare professionals. doi: 10.17816/DD635349-4334773

Supplement 7: Quality assessment of study methodologies using the modified QUADAS-CAD checklist. doi:

Author contributions: Yu.A. Vasilev, A.V. Vladzymyrskyy, A.S. Goldberg: conceptualization; O.G. Nanova, I.A. Blokhin, R.V. Reshetnikov: published data search and analysis, writing—original draft, writing—review & editing. All the authors approved the version of the manuscript to be published and agreed to be accountable for all aspects of the work, ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Ethics approval: Not applicable.

Funding sources: This article was prepared as part of the research project Scientific Justification of Radiology Modalities for Tumor Diseases Using Radiomics Analysis (Unified State Information Accounting System No. 123031500005-2), in accordance with Order No. 1258 dated December 22, 2023, On Approval of State Assignments Funded by the Budget of the City of Moscow for State Budgetary (Autonomous) Institutions Under the Jurisdiction of the Moscow City Health Department for 2024 and the Planned Period of 2025–2026, issued by the Moscow City Health Department.

Disclosure of interests: The authors have no relationships, activities, or interests for the last three years related to for-profit or not-for-profit third parties whose interests may be affected by the content of the article.

Statement of originality: No previously published material (text, images, or data) was used in this work.

Data availability statement: The editorial policy regarding data sharing does not apply to this work. All data generated during this study are available in the article and its supplementary material (Supplements 1–7).

Generative AI: No generative artificial intelligence technologies were used to prepare this article.

Provenance and peer review: This paper was submitted unsolicited and reviewed following the standard procedure. The peer review process involved two members of the editorial board and the in-house science editor.

 

1 Visiopharm [Internet]. Denmark: Visiopharm®. 2001–2024. Available at: https://visiopharm.com/ Accessed on October 12, 2024.

2 CyPath Lung [Internet]. San Antonio: CyPath® Lung. 2021–2024. Available at: https://www.cypathlung.com/ Accessed on October 12, 2024.

3 EasyCell [Internet]. Anyang-si: EasyCell Co., Ltd. 2020–2024. Available at: https://www.easycell.co/ Accessed on October 12, 2024.

4 Copan [Internet]. Murrieta: Copan Diagnostics Inc. 1999–2024. Available at: https://www.copanusa.com/ Accessed on October 12, 2024.

5 Digital microscopy and AI: clinical and research applications [Internet]. Perchtoldsdorf: West Medica. 2021–2024. Available at: https://wm-vision.com/en/product/hema Accessed on October 12, 2024.

×

About the authors

Yuriy A. Vasilev

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: npcmr@zdrav.mos.ru
ORCID iD: 0000-0002-5283-5961
SPIN-code: 4458-5608

MD, Cand. Sci. (Medicine)

Russian Federation, 24 Petrovka st, bldg 1, Moscow, 127051

Olga G. Nanova

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Author for correspondence.
Email: nanova@mail.ru
ORCID iD: 0000-0001-8886-3684
SPIN-code: 6135-4872

Cand. Sci. (Biology)

Russian Federation, 24 Petrovka st, bldg 1, Moscow, 127051

Anton V. Vladzymyrskyy

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: VladzimirskijAV@zdrav.mos.ru
ORCID iD: 0000-0002-2990-7736
SPIN-code: 3602-7120

MD, Dr. Sci. (Medicine)

Russian Federation, 24 Petrovka st, bldg 1, Moscow, 127051

Arcadiy S. Goldberg

The Russian Medical Academy of Continuous Professional Education

Email: goldarcadiy@gmail.com
ORCID iD: 0000-0002-2787-4731
SPIN-code: 8854-0469

MD, Cand. Sci. (Medicine)

Russian Federation, Moscow

Ivan A. Blokhin

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: BlokhinIA@zdrav.mos.ru
ORCID iD: 0000-0002-2681-9378
SPIN-code: 3306-1387

MD, Cand. Sci. (Medicine)

Russian Federation, 24 Petrovka st, bldg 1, Moscow, 127051

Roman V. Reshetnikov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: ReshetnikovRV1@zdrav.mos.ru
ORCID iD: 0000-0002-9661-0254
SPIN-code: 8592-0558

Cand. Sci. (Physics and Mathematics)

Russian Federation, 24 Petrovka st, bldg 1, Moscow, 127051

References

  1. Bonert M, Zafar U, Maung R, et al. Pathologist workload, work distribution and significant absences or departures at a regional hospital laboratory. PLOS ONE. 2022;17(3):e0265905. doi: 10.1371/journal.pone.0265905 EDN: UFNVFE
  2. Hou H, Zhang R, Li J. Artificial intelligence in the clinical laboratory. Clinica Chimica Acta. 2024;559:119724. doi: 10.1016/j.cca.2024.119724 EDN: PBDERB
  3. Munari E, Scarpa A, Cima L, et al. Cutting-edge technology and automation in the pathology laboratory. Virchows Archiv. 2023;484(4):555–566. doi: 10.1007/s00428-023-03637-z EDN: OSGENI
  4. Vasilev YuA, Vladzymyrskyy AV, Omelyanskaya OV, et al. Guidelines for preparing a systematic review. Moscow: State Budget-Funded Health Care Institution of the City of Moscow “Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of Moscow Health Care Department”; 2023. 34 p. (In Russ.) EDN: XKXHDA
  5. Anjankar AP, Jha RK, Lambe S. Implementation of artificial intelligence in laboratory medicine. Journal of Datta Meghe Institute of Medical Sciences University. 2023;18(4):598–601. doi: 10.4103/jdmimsu.jdmimsu_486_22 EDN: VBNWUF
  6. Kodenko MR, Vasilev YuA, Vladzymyrskyy AV, et al. Diagnostic accuracy of ai for opportunistic screening of abdominal aortic aneurysm in ct: a systematic review and narrative synthesis. Diagnostics. 2022;12(12):3197. doi: 10.3390/diagnostics12123197 EDN: ERWYPX
  7. Farrell CJ. Identifying mislabelled samples: machine learning models exceed human performance. Annals of Clinical Biochemistry: International Journal of Laboratory Medicine. 2021;58(6):650–652. doi: 10.1177/00045632211032991 EDN: MQQLCW
  8. Lea D, Gudlaugsson EG, Skaland I, et al. Digital image analysis of the proliferation markers Ki67 and phosphohistone H3 in gastroenteropancreatic neuroendocrine neoplasms: accuracy of grading compared with routine manual hot spot evaluation of the Ki67 index. Applied Immunohistochemistry & Molecular Morphology. 2021;29(7):499–505. doi: 10.1097/pai.0000000000000934 EDN: XIKRGL
  9. Lemieux ME, Reveles XT, Rebeles J, et al. Detection of early-stage lung cancer in sputum using automated flow cytometry and machine learning. Respiratory Research. 2023;24(1):23. doi: 10.1186/s12931-023-02327-3 EDN: HSQBUA
  10. Kimura K, Tabe Y, Ai T, et al. A novel automated image analysis system using deep convolutional neural networks can assist to differentiate MDS and AA. Scientific Reports. 2019;9(1):1–9. doi: 10.1038/s41598-019-49942-z EDN: PXXHII
  11. Yoon S, Hur M, Park M, et al. Performance of digital morphology analyzer Vision Pro on white blood cell differentials. Clinical Chemistry and Laboratory Medicine (CCLM). 2021;59(6):1099–1106. doi: 10.1515/cclm-2020-1701 EDN: GVMONA
  12. Kurstjens S, de Bel T, van der Horst A, et al. Automated prediction of low ferritin concentrations using a machine learning algorithm. Clinical Chemistry and Laboratory Medicine (CCLM). 2022;60(12):1921–1928. doi: 10.1515/cclm-2021-1194 EDN: HDJWKG
  13. Wang M, Dong C, Gao Y, et al. A deep learning model for the automatic recognition of aplastic anemia, myelodysplastic syndromes, and acute myeloid leukemia based on bone marrow smear. Frontiers in Oncology. 2022;12: 844978. doi: 10.3389/fonc.2022.844978 EDN: BQFWSO
  14. Kim H, Lee GH, Yoon S, et al. Performance of digital morphology analyzer Medica EasyCell assistant. Clinical Chemistry and Laboratory Medicine (CCLM). 2023;61(10):1858–1866. doi: 10.1515/cclm-2023-0100 EDN: ZDXONI
  15. Elagina EA, Margun AA. Research of machine learning methods in the problem of identification of blood cells. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2021;21(6):903–911. doi: 10.17586/2226-1494-2021-21-6-903-911 EDN: ZVQLEV
  16. Ayyıldız H, Arslan Tuncer S. Is it possible to determine antibiotic resistance of E. coli by analyzing laboratory data with machine learning? Turkish Journal of Biochemistry. 2021;46(6):623–630. doi: 10.1515/tjb-2021-0040 EDN: JTZHYJ
  17. Van TT, Mata K, Bard JD. Automated detection of Streptococcus pyogenes pharyngitis by use of Colorex Strep A CHROMagar and WASPLab artificial intelligence chromogenic detection module software. Journal of Clinical Microbiology. 2019;57(11):e00811-19. doi: 10.1128/JCM.00811-19
  18. Faron ML, Buchan BW, Relich RF, et al. Evaluation of the WASPLab segregation software to automatically analyze urine cultures using routine blood and MacConkey agars. Journal of Clinical Microbiology. 2020;58(4):e01683-19. doi: 10.1128/jcm.01683-19 EDN: UDENAP
  19. Yang M, Nurzynska K, Walts AE, Gertych A. A CNN-based active learning framework to identify mycobacteria in digitized Ziehl–Neelsen stained human tissues. Computerized Medical Imaging and Graphics. 2020;84:101752. doi: 10.1016/j.compmedimag.2020.101752 EDN: AYLPVY
  20. Zurac S, Mogodici C, Poncu T, et al. A new artificial intelligence-based method for identifying mycobacterium tuberculosis in Ziehl–Neelsen stain on tissue. Diagnostics. 2022;12(6):1484. doi: 10.3390/diagnostics12061484 EDN: IJUCYT
  21. Wang Z, Zhang L, Zhao M, et al. Deep neural networks offer morphologic classification and diagnosis of bacterial vaginosis. Journal of Clinical Microbiology. 2021;59(2):e02236-20. doi: 10.1128/JCM.02236-20 EDN: GBZITD
  22. Lev-Sagie A, Strauss D, Ben Chetrit A. Diagnostic performance of an automated microscopy and pH test for diagnosis of vaginitis. NPJ Digital Medicine. 2023;6(1):66. doi: 10.1038/s41746-023-00815-w EDN: SVUVPJ
  23. Burton RJ, Albur M, Eberl M, Cuff SM. Using artificial intelligence to reduce diagnostic workload without compromising detection of urinary tract infections. BMC Medical Informatics and Decision Making. 2019;19:171. doi: 10.1186/s12911-019-0878-9
  24. Avci D, Sert E, Dogantekin E, et al. A new super resolution Faster R-CNN model based detection and classification of urine sediments. Biocybernetics and Biomedical Engineering. 2023;43(1):58–68. doi: 10.1016/j.bbe.2022.12.001 EDN: HQRRRR
  25. Mathison BA, Kohan JL, Walker JF, et al. Detection of intestinal protozoa in trichrome-stained stool specimens by use of a deep convolutional neural network. Journal of Clinical Microbiology. 2020;58(6):e02053-19. doi: 10.1128/jcm.02053-19 EDN: GWHHRT
  26. Wallace MB, Sharma P, Bhandari P, et al. Impact of artificial intelligence on miss rate of colorectal neoplasia. Gastroenterology. 2022;163(1):295–304.e5. doi: 10.1053/j.gastro.2022.03.007 EDN: CVAOAF
  27. Liang Y, Wang Z, Huang D, et al. A study on quality control using delta data with machine learning technique. Heliyon. 2022;8(8):e09935. doi: 10.1016/j.heliyon.2022.e09935 EDN: XNSZKR
  28. Zhou R, Liang Y, Cheng H, et al. A multi-model fusion algorithm as a real-time quality control tool for small shift detection. Computers in Biology and Medicine. 2022;148:105866. doi: 10.1016/j.compbiomed.2022.105866 EDN: OBKKZC
  29. Wang H, Wang H, Zhang J, et al. Using machine learning to develop an autoverification system in a clinical biochemistry laboratory. Clinical Chemistry and Laboratory Medicine (CCLM). 2020;59(5):883–891. doi: 10.1515/cclm-2020-0716 EDN: SVNLZY
  30. Lippi G, Mattiuzzi C, Favaloro E. Artificial intelligence in the pre-analytical phase: state-of-the art and future perspectives. Journal of Medical Biochemistry. 2024;43(1):1–10. doi: 10.5937/jomb0-45936 EDN: PVAVYI
  31. Blatter TU, Witte H, Nakas CT, Leichtle AB. Big data in laboratory medicine-FAIR quality for AI? Diagnostics. 2022;12(8):1923. doi: 10.3390/diagnostics12081923 EDN: MCJCST
  32. Ghassemi M, Oakden-Rayner L, Beam AL. The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health. 2021;3(11):e745–e750. doi: 10.1016/s2589-7500(21)00208-9 EDN: EHUNYG
  33. Paranjape K, Schinkel M, Hammer RD, et al. The value of artificial intelligence in laboratory medicine. American Journal of Clinical Pathology. 2020;155(6):823–831. doi: 10.1093/ajcp/aqaa170 EDN: KUADLL
  34. Ghosh K, Bellinger C, Corizzo R, et al. The class imbalance problem in deep learning. Machine Learning. 2022;113(7):4845–4901. doi: 10.1007/s10994-022-06268-8 EDN: AQXQUP
  35. Certuficate of state registration of a computer program No. 2023665713/ 19.07.2023. Byul. No. 7. Vasilev YuA, Vladzymyrskyy AV, Omelyanskaya OV, et al. Web platform for technological and clinical monitoring of the results of digital medical image analysis algorithms. Available from: https://elibrary.ru/download/elibrary_54200632_17081735.PDF (In Russ.) EDN: JIEPJK
  36. Zinchenko VV, Arzamasov KM, Kremneva EI, et al. Technological defects in software based on artificial intelligence. Digital Diagnostics. 2023;4(4):593–604. doi: 10.17816/DD501759 EDN: ORUFMM
  37. Sharova DE, Garbuk SV, Vasilyev YuA. Artificial intelligence systems in clinical medicine: the world’s first series of national standards. Standards and Quality. 2023;(1):46–51. doi: 10.35400/0038-9692-2023-1-304-22 EDN: SNMGQA
  38. Laddi A, Goyal S, Savlania A. Vein segmentation and visualization of upper and lower extremities using convolution neural network. Biomedical Engineering. Biomedizinische Technik. 2024;69(5):455–464. doi: 10.1515/bmt-2023-0331 EDN: PRAAZI
  39. Macaskill P, Takwoingi Y, Deeks JJ, Gatsonis C. Chapter 9: Understanding meta-analysis. In: Deeks JJ, Bossuyt PM, Leeflang MM, Takwoingi Y, editors. Cochrane handbook for systematic reviews of diagnostic test accuracy. version 2.0 (updated July 2023). Cochrane; 2023 [cited 2024 Aug 17]. Available from: https://training.cochrane.org/handbook-diagnostic-test-accuracy/current
  40. Pennestrì F, Banfi G. Artificial intelligence in laboratory medicine: fundamental ethical issues and normative key-points. Clinical Chemistry and Laboratory Medicine (CCLM). 2022;60(12):1867–1874. doi: 10.1515/cclm-2022-0096 EDN: ZOALXU
  41. Muehlematter UJ, Daniore P, Vokinger KN. Approval of artificial intelligence and machine learning-based medical devices in the USA and Europe (2015–20): a comparative analysis. The Lancet Digital Health. 2021;3(3):e195–e203. doi: 10.1016/s2589-7500(20)30292-2 EDN: UWEZGN

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Supplement 1. List of publications included in the systematic review and their characteristics
Download (16KB)
3. Supplement 2. List of publications excluded from the systematic review
Download (18KB)
4. Supplement 3. Main characteristics of the studies presented in the publications included in the systematic review 10.17816/DD635349-4334770
Download (18KB)
5. Supplement 4. Characteristics of samples and machine learning models used, or commercially available solutions presented in the studies
Download (23KB)
6. Supplement 5. Effectiveness of artificial intelligence in studies
Download (20KB)
7. Supplement 6. Comparative analysis of the diagnostic efficiency of artificial intelligence and medical workers
Download (15KB)
8. Supplement 7. Assessment of the quality of research methodology using the modified QUADAS-CAD questionnaire
Download (33KB)
9. Fig. 1. Assessment of the risk of bias using the modified QUADAS-CAD questionnaire. QUADAS-CAD, Quality Assessment of Diagnostic Accuracy Studies-Computer-Aided Detection, a special, modified questionnaire for assessing the risk of bias and the applicability of studies in artificial intelligence technologies.

Download (459KB)

Copyright (c) 2025 Eco-Vector

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № ФС 77 - 79539 от 09 ноября 2020 г.