Autonomous artificial intelligence for sorting results of preventive radiological examinations of chest organs: medical and economic efficiency

Cover Image


Cite item

Abstract

BACKGROUND: This article proposes a model for organizing preventive radiological examinations of chest organs through autonomous sorting of examination results using medical devices based on artificial intelligence technologies, optimized for maximum sensitivity — 1.0 (95% CI: 1.0; 1.0). Sorting involves classifying the results of mass preventive screenings (fluoroscopy and chest X-rays) into two: “not normal” and “normal.” The “not normal” category includes all cases of abnormalities (e.g., pathological conditions, post-disease or post-surgery consequences, and age-related and congenital features), which are sent for interpretation by a radiologist. The “normal” category consists of cases without signs of pathological deviations, which potentially do not require a radiologist’s description.

AIM: To evaluate the feasibility, effectiveness, and efficiency of autonomous sorting of results from preventive radiological examinations of chest organs.

MATERIALS AND METHODS: A prospective multicenter diagnostic study was conducted on the safety and quality of autonomous sorting of results from preventive radiological examinations of chest organs. Analytical and statistical methods of scientific inquiry were used.

RESULTS: The study included results from 575,549 preventive radiological examinations obtained through fluoroscopy and chest X-rays and processed using three medical devices based on artificial intelligence technologies. In autonomous sorting, 54.8% of the preventive radiological examinations of chest organs were classified as “normal,” resulting in a proportional reduction in the radiologist’s workload for interpreting and describing the examination results. Fully correct autonomous sorting was achieved in 99.95% of cases. Clinically significant discrepancies were determined in 0.05% of cases (95% CI: 0.04; 0.06%).

CONCLUSIONS: This study confirmed the medical and economic effectiveness of the model for autonomous sorting of results from preventive radiological examinations of chest organs using medical devices based on artificial intelligence technologies. The next phase should involve updating the regulatory framework and ensuring the legitimacy of the autonomous application of certain medical devices based on artificial intelligence technologies in established conditions and preventive tasks.

Full Text

BACKGROUND

Prevention is a strategic aspect of healthcare that preserves and improves population health, reduces mortality, enhances life span and quality of life, promotes economic growth, and ensures national security. Prevention is a comprehensive set of measures and procedures that includes population screening to identify risk factors and early signs of significant social and other diseases [1–6]. At the first step of medical examinations and mass screenings, diagnostic imaging modalities are crucial, including chest fluoroscopy, X-ray, and mammography. They account for up to 30% of all imaging tests in Russia [7–11].

Additionally, the following paradoxical combination of factors was reported:

  • High relevance and need for mass screening;
  • Heavy burden on the healthcare system because of the need to perform and constantly increase the number of screenings;
  • High conventionalism of operations performed by healthcare personnel due to the maximum imaging standardization, with most screened patients being deemed healthy;
  • Limited availability and quality of preventive screenings due to personnel shortage.

Healthcare personnel shortage is a result of several factors, including the rapid growth of all types of medical care, such as preventive care; constant strengthening of requirements for the timing and quality of care; and increased social demand for healthcare services [7, 13]. Currently, the demand for certain types of specialists permanently outpaces the ability to increase their number. Therefore, new healthcare organization models that are primarily based on digital technologies and automated, evidence-based sorting will inevitably be adopted.

This is best illustrated by the frequent use of preventive radiological examinations. Radiologists routinely interpret millions of images with no abnormal changes [8–12]. Moreover, the situation is desperate: there are not enough radiologists to meet the growing demand for preventive and diagnostic imaging tests. This raises the risk of failing to implement national prevention programs. Thus, new, innovative, empirically supported approaches are needed.

Russia is conducting the world’s largest prospective multicenter study on the applicability, safety, and quality of artificial intelligence (AI) in a real-world setting: experiment on the use of innovative computer vision technologies for analysis of medical images in the Moscow healthcare system, known as the Moscow Experiment [14–18].

The Moscow Experiment1 and related studies have found that the sensitivity of AI services in real-world settings can be as high as 1.0 (95% confidence interval [CI]: 1.0–1.0) [19]. This indicates their ability to detect all cases of abnormal changes with high probability. If the AI service does not classify the images as abnormal, then they should be sorted as “normal.” In mass screening, normal images are considered to prevail. This diagnostic approach is expected to have a certain practical effect.

The Moscow Experiment and global experience in general have achieved a new level of technological readiness and scientific knowledge about AI. This progress enables the development of a new model for organizing preventive medical care, involving the autonomous sorting of results from preventive radiological examinations [20]. In 2023, the proposed model was tested under experimental laboratory conditions. The obtained results show the potential for autonomous sorting of mass preventive radiological examinations. This increases the availability and effectiveness of preventive care, eliminate the shortage of diagnostic imaging personnel, and optimize the use of materials and technical resources [21]. There were grounds to conduct a large-scale, real-world study. Two hypotheses were proposed (Table 1).

 

Table 1. Hypotheses

Null hypothesis (H0)

Alternative hypothesis (H1)

At least 50% of the results from preventive radiological examinations would be classified as normal through autonomous sorting, which would proportionally reduce the labor costs of radiologists

Less than 50% of the results from preventive radiological examinations would be classified as normal through autonomous sorting, which would not proportionally reduce the labor costs of radiologists

For certain age and sex groups, the number of clinically significant discrepancies with autonomous sorting would be 0.

For certain age and sex groups, the number of clinically significant discrepancies with autonomous sorting would be >0.

Note. Normal images are images without abnormal changes.

 

AIM

The study aimed to evaluate the effectiveness and efficiency of the autonomous sorting of the results from preventive radiological examinations of the chest in real-world settings.

METHODS

Study Design

A prospective, multicenter, diagnostic study was conducted.

Study Duration

The study was conducted from May 1 to September 30, 2024.

Study Setting

Preventive radiological examinations of the chest were conducted in outpatient setting and during mass screening at healthcare institutions of the Moscow City Health Department.

The index test was an AI service integrated into the Unified Radiology Information Service (URIS), which is part of the Unified Medical Information and Analytical System of Moscow (EMIAS).

Reference test 1 used a protocol prepared by a radiologist from the Russian Medical Academy of Continuous Professional Education of the Ministry of Health of Russia.

Reference test 2 was a peer review by a thoracic radiologist at the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department.

General procedure of the study

  1. A general practitioner at a city outpatient clinic referred a patient for preventive radiological examination of the chest.
  2. The patient made an appointment using EMIAS and other platforms.
  3. A radiographer at the city outpatient clinic performed the prescribed imaging test and saved the obtained images in the EMIAS URIS.
  4. Chest radiographs or fluorograms in the anteroposterior view were automatically sent to one of three included AI services.
  5. Images with abnormal changes (in the non-normal category) were automatically sent to a radiologist at the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies for interpretation of the Moscow City Health Department. The radiologist signed the protocol and report, which were saved in the patient’s electronic medical record in the EMIAS.
  6. Images are assigned to the “normal” category if there are no abnormal changes.
  • AI-generated conclusions were saved as an entry in the patient’s electronic medical records in the EMIAS.
  • These conclusions were automatically sent to a radiologist at the Russian Medical Academy of Continuous Professional Education of the Ministry of Health of Russia for interpretation. The radiologist signed the protocol and report, which were saved in the patient’s electronic medical records in the EMIAS.
  1. If the radiologist at the Russian Medical Academy of Continuous Professional Education of the Ministry of Health of Russia identified AI mistakes:
  • Data from the relevant imaging tests were entered into a logbook of abnormal changes and defects and reported to the experts at the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies.
  • A thoracic radiologist at the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department reviewed the relevant images.
  • If a mistake was confirmed, the case was considered a discrepancy. If the mistake was not confirmed, the images were considered normal.

Figure 1 shows the general procedure of the study and data flow.

 

Fig. 1. General study procedure AI, artificial intelligence; RMACPE MoH of Russia, Russian Medical Academy of Continuous Professional Education of the Ministry of Health of Russia; RCRPCC D&TT MCHD, Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department.

 

Eligibility Criteria

Inclusion criteria:

  • Age ≥18 years;
  • Referral for a preventive radiological examination of the chest by a local general practitioner at a city outpatient clinic;
  • Preventive radiological examination of the chest in the outpatient settings (chest X-ray or fluoroscopy);
  • Signed voluntary informed consent form for medical care and imaging tests at this city clinic.

Exclusion criteria:

  • Absence of preventive radiological examination and/or autonomous sorting results in the EMIAS URIS;
  • Technical defects during preventive radiological examination.

Classification of Discrepancies

Accuracy of the AI-generated conclusions was evaluated using an X-ray quality assessment procedure approved by the Moscow City Health Department. The following criteria have been established for discrepancies between the expert and AI-generated conclusions [22, 23]:

Clinically significant discrepancies:

  • Atelectasis
  • Dissemination
  • Infiltrate or consolidation
  • Disrupted bone cortex
  • Shadow area
  • Pleural effusion
  • Pneumothorax
  • Cavity
  • Widened mediastinum

Clinically insignificant discrepancies:

  • Lung calcification or calcified shadow
  • Cardiomegaly
  • Consolidated fracture.

A standardized interpretation template was used to document radiological conclusions.

Artificial Intelligence Services

The experiment evaluated chest X-ray images and fluorograms using AI-assisted medical devices:

  • Automated Evaluation Program for Digital Chest X-Ray Images/Fluorograms according to TU 62.01.29-001-96876180-2019 (FtizisBioMed LLC; RZN 2022/17406)
  • Third Opinion. Fluorograms/X-Ray Images (Third Opinion Platform LLC; RZN 2021/14506)
  • Celsus (Medical Screening Systems LLC; RZN 2021/14449).

AI services were selected by their ratings in the maturity matrix of the Moscow Experiment, which were calculated using the area under a receiver operating characteristic curve and clinical assessment. The sensitivity of the included AI services was set to 1.0 (95% confidence interval [CI]: 1.0; 1.0), and the specificity was not significant.

The safety and quality of AI-assisted medical devices were monitored according to the approved procedure [24].

The results were evaluated, considering the patient stratification by sex and age.

METHODS

Analytical methods were used for data analysis and synthesis.

The statistical methods included descriptive statistics. The chi-squared (χ2) test was used for statistical testing of the hypothesis and logistic regression to predict the probability of a clinically significant discrepancy. The dependent variable was the presence or absence of a discrepancy, and the model factors were sex and age and the square of age in cases of nonlinear dependence on age. The odds ratio (OR) for each risk factor was evaluated. The Clopper–Pearson interval was used to calculate the 95% CI limits. MedCalc 18® (MedCalc Software Ltd, Belgium) was employed to process and evaluate the data.

Ethics Approval

This study was based on the experiment on the use of innovative computer vision technologies for medical image analysis and further use in the Moscow healthcare system approved by the Independent Ethics Committee of the Moscow Regional Branch of the Russian Society of Radiographers and Radiologists (protocol no. 2, dated February 20, 2020), also registered at ClinicalTrials (NCT04489992).

RESULTS

In total, 642,681 images of preventive radiological examinations of the chest were sent to AI services for autonomous sorting. Based on the established criteria, the final analysis included 575,549 images, of which 60% (345,408) were fluorograms and 40% (230,141) were X-ray images (Figs. 2 and 3).

 

Fig. 2. A presentation example for artificial intelligence conclusions in the electronic medical record in the Unified Medical Information and Analytical System as an automatically generated conclusion about the absence of abnormal changes in an electronic medical record format.

 

Fig. 3. Presentation example for the artificial intelligence-generated results in the Unified Radiological Information Service of the Unified Medical Information and Analytical System of Moscow (image, DICOM SR).

 

Assessment of Medical Efficiency

Table 2 shows the results of the autonomous sorting of data from preventive screening by modality.

 

Table 2. Results of autonomous sorting of preventive radiological examinations of the chest

Modality

All included images, n (%)

Non-normal images, n (%)

Normal images, n (%)

Fluorography

345,408 (60)

149,373 (57.4)

196,035 (62.1)

Radiography

230,141 (40)

110,685 (42.6)

119,456 (37.9)

Total

575,549 (100)

260,058 (100)

315,491 (100)

 

Therefore, the study included:

  • 45.2% (260,058) of non-normal images
  • 54.8% (315,491) of normal images.

Most of the images obtained using these modalities were classified as normal.

Therefore, 54.8% of preventive radiological examinations of the chest potentially did not require a radiologist’s involvement for interpretation and reporting. The first study hypothesis was accepted (Table 1).

Images sorted as non-normal were automatically sent to the radiologists at the Moscow Reference Center of the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department for interpretation. The current study did not focus on these images; thus, they are not discussed further (Fig. 1).

According to the study design, automated AI-generated conclusions were saved for images sorted as normal in the patients’ electronic medical records. The images sorted as normal were sent to federal-level radiologists at the Russian Medical Academy of Continuous Professional Education of the Ministry of Health of Russia, an independent expert organization, for interpretation to objectively evaluate the AI quality and ensure experiment safety (Fig. 1). The study sample included 315,491 images:

  • 37.0% (116,877) from male patients
  • 63.0% (198,614) from female patients.

Table 3 presents the demographic characteristics of patients with images sorted as normal. Notably, most of the screened patients were aged 18–44 years, i.e., young men and women (54.5% and 46.4%, respectively).

 

Table 3. Demographic characteristics of patients with images that were automatically sorted as normal

Age group, years

Male, n (%)

Female, n (%)

All patients, n (%)

18–44

63,679 (54.5)

92,219 (46.4)

155,898 (49.4)

45–59

28,013 (24.0)

53,066 (26.7)

81,079 (25.7)

60–74

20,979 (17.9)

44,073 (22.2)

65,052 (20.6)

75–89

4,116 (3.5)

9,117 (4.6)

13,233 (4.2)

90 и более

90 (0.1)

139 (0.1)

229 (0.1)

Total

116,877 (100)

198,614 (100)

315,491 (100)

Note. Normal images are images without abnormal changes.

 

The medical effectiveness of the model for autonomous sorting of results from preventive radiological examinations of the chest using AI-assisted medical devices was evaluated.

In healthcare, medical effectiveness refers to the extent to which a desired medical outcome is achieved. At healthcare institution level, it can be measured by a set of arbitrary parameters, such as the institution’s activities, morbidity, mortality, outcomes, etc. In our study, medical effectiveness was measured as the percentage of preventive screening images correctly sorted as normal [25, 26].

Of the images sorted as normal (n = 315,491), the radiologists at the Russian Medical Academy of Continuous Professional Education of the Ministry of Health of Russia identified 942 cases of abnormal changes. All of these images were sent to a thoracic radiologist for review. Subsequently, it was found that 411 images revealed abnormal changes, whereas the others were attributed to the independent radiologist’s false-positive conclusions (Table 4).

 

Table 4. Discrepancies in screening image sorting as normal

Abnormal changes

Number of cases, n (%)

Clinically significant discrepancies

Infiltration or consolidation

188 (64.8)

Shadow area

78 (26.9)

Widened mediastinum

9 (3.1)

Pleural effusion

5 (1.7)

Disrupted bone cortex

3 (1.0)

Atelectasis

3 (1.0)

Dissemination

2 (0.7)

Cavity

2 (0.6)

Total

290(100)

Clinically insignificant discrepancies

Consolidated fracture

76 (62.8)

Lung calcification or calcified shadow

44 (36.4)

Cardiomegaly

1 (0.8)

Total

121 (100)

Note. Discrepancies are false-negative decisions made by the artificial intelligence service. The normal category includes images that do not show any abnormal changes.

 

The clinically significant discrepancies primarily consisted of omissions of infiltrates and/or consolidation and shadow areas (64.8% and 26.9%, respectively). Other abnormal changes were less frequently recorded, sometimes only in isolated cases (Figs. 4 and 5).

 

Fig. 4. A set of images for a male patient aged 53 years. A clinically significant discrepancy: signs of small, focal, polysegmental dissemination in the lungs.

 

Fig. 5. Images for a female patient aged 47 years. A clinically significant discrepancy: lesion in the lower lobe of the left lung (red arrow).

 

Evaluation of discrepancies in the context of modalities revealed that they were more frequently recorded for X-ray images: 62.3% (256) vs. 37.7% (155), respectively. However, clinically significant discrepancies were found for X-ray images and fluorograms: 79.3% (203) and 56.1% (87), respectively. The percentage distribution of abnormal changes was generally consistent: omissions of infiltrates and/or consolidation and shadow areas significantly exceeded the rest.

The hypothesis that, for certain age and sex groups, there would be zero cases of clinically significant discrepancies during autonomous sorting was tested (Table 1).

When data were evaluated by the sex of the screened patients, it was found that men were more likely to have a clinically significant discrepancy (OR = 1.317 [95% CI: 1.044; 1.661], p = 0.020). However, for certain AI services, there were no significant differences in the rates of defects between men and women based on modality. Clinically significant discrepancies were most frequently observed in young people (18–44 years). The percentage of AI mistakes decreased by 0.12%–0.07% in approximately 50% in older age groups and remained stable in two age groups. In older patients, the parameter increased again (up to 0.09%), followed by virtually no clinically significant discrepancies in long-lived individuals. There were very few observations in the group aged >90 years; thus, it was not possible to identify the rate of defects.

Automated sorting classified 315,491 of the 575,549 screening images as normal. Therefore, clinically significant defects in autonomous sorting were reported in only 0.05% of the total number of sorted images. The specific weight of clinically significant defects in autonomous sorting was <0.1%. Autonomous sorting was fully correct in 99.95% of cases.

The Clopper–Pearson interval was used to determine the 95% CI for the percentage of clinically significant discrepancies and evaluate the fluctuation range when scaling autonomous sorting to the total number of imaging scans performed each year. The baseline value was defined as the number of preventive radiological examinations of the chest performed in 2023. The minimum and maximum possible percentages of false-negative AI conclusions during autonomous sorting were 0.04% and 0.06%, respectively. The probability that the true value will not exceed the specified level is 97.5%.

Therefore, the medical effectiveness of the proposed autonomous AI model has been proven.

Assessment of Economic Efficiency

In healthcare, economic efficiency is defined as the ratio of outcomes to costs. This calculation requires determining the most effective use of available resources. Additionally, this parameter is crucial to justify public health protection measures [25, 26].

Funding for the experiment was provided through a tariff agreement for medical care under the Moscow Territorial Program of Compulsory Medical Insurance for 2024. This agreement was amended to include experimental tariffs for automated, AI-assisted radiographic data interpretation services.

Four medical services were established:

  • AI-assisted interpretation of fluorograms (service no. 1680/801680) or X-ray images (service no. 1682/801682)
  • Automated AI-assisted interpretation of fluorograms (service no. 1681/801681) or X-ray images (service no. 1683/801683).

The diagnostic imaging services were paid for using per capita funding for outpatient healthcare facilities (city outpatient clinics) of the Moscow state healthcare system.

Depending on the modality, AI-assisted medical devices for autonomous sorting were purchased and used at the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department using funding for services nos. 1681/801681 and 1683/801683. Depending on the modality, screening images sorted as non-normal were interpreted at the Reference Center of the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department using funding for services nos. 1680/801680 and 1682/801682. Screening images sorted as normal were interpreted at the Russian Medical Academy of Continuous Professional Education of the Ministry of Health of Russia using funding from the special grant of the Moscow City Health Department. Expert events and research at the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department were supported using funds from the state assignment that covered the relevant types of work.

The costs for autonomous sorting and interpretation of preventive radiological examinations of the chest were calculated (Table 5).

 

Table 5. Expenses from compulsory medical insurance funds for interpretation of preventive radiological examinations of the chest during autonomous sorting

 

Normal image

Non-normal image

Fluorography, n = 345,408

Number of scans (n)

149,373

196,035

Tariff, RUB

155.62

6.9

Amount, RUB

23,245,426.26

1,352,641.5

X-ray, n = 230,141

Number of scans (n)

110,685

119,456

Tariff, RUB

157.56

8.8

Amount, RUB

17,439,528.6

1,051,212.8

Total amount, RUB

40,684,954.86

2,403,854.3

43,088,809.16

Note. Costs are calculated for the experimental period. Non-normal images are images with abnormal changes. Normal images are images without abnormal changes.

 

In total, 43,088,809.16 RUB was spent from the compulsory medical insurance program on medical services related to the interpretation of preventive radiological examinations of the chest.

As an alternative, interpretation of a similar number of images for the same period without autonomous sorting would cost 76,536,506.02 RUB, not including tariff indexation (Table 6).

 

Table 6. A model of expenses from compulsory medical insurance funds for interpretation of preventive radiological examinations of the chest

Modality

Number of scans (n)

Tariff, RUB

Amount, RUB

Fluorography

345,408

132.98

45,932,355.84

Radiography

230,141

30,604,150.18

Total amount, RUB

  

76,536,506.02

Note. A cost model for a period similar to the duration of the experiment. The current tariff for interpretation of the data from preventive radiological examinations of the chest without artificial intelligence technologies is indicated.

 

Therefore, over 5 months of the experiment, expenses reduced by 43.7%, resulting in savings of 33,447,696.86 RUB.

The economic efficiency of the proposed autonomous AI model has been proven.

DISCUSSION

Our data show that 54.8% of the preventive radiological examinations of the chest sorted as normal may not require interpretation by a radiologist, reducing the workload and costs in the healthcare system. Autonomous sorting of preventive radiological examinations ensures high-quality and safe medical care because images are correctly sorted in 99.95% of cases. Clinically significant discrepancies were determined in 0.05% of cases (95% CI: 0.04; 0.06%).

The quality of diagnostic decisions made by radiologists is well-studied. Various classifications of mistakes and discrepancies are proposed, mechanisms behind incorrect conclusions and omissions are carefully studied, and preventive actions are recommended. Table 7 summarizes the percentage of clinically significant omissions made by radiologists who interpreted radiological images, both overall and for chest X-ray specifically [19, 27–37].

The range of the specific weight of defects varies depending on sample, modality, and patient selection.

  • Minimum values (≥4.0%): for a typical representative sample with a significant number of normal image sets;
  • Maximum (≤30.0%): when all image sets in the sample are abnormal [31].

One of the papers presents questionable data because the authors report an average discrepancy level of 0.05%, the lowest possible value, for significant and insignificant discrepancies. In addition, the sample of over 300,000 chest X-ray images is highly representative. However, only six radiologists participated in this study, three of whom were thoracic radiologists [32]. Interpretation of chest X-ray images is often associated with the high rates of detection errors for individual conditions (e.g., non-small cell lung cancer) and for normal cases (false-positive and false-negative cases) that is especially relevant for our study. This indicates that radiologists can overdiagnose up to 18% of cases [35].

Data obtained from the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department were considered to be the most representative (Table 7) [28, 36]. Thus, we used a reference range of 3%–4% to identify clinically significant discrepancies in chest X-ray.

 

Table 7. Percentage of discrepancies in the interpretation of images by radiologists

Authors

Specific weight of discrepancies, %

Notes

All modalities

Morozov et al. [33]

6.0

• Data from the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health

• Results of internal quality control

Berlin [29]

4.0

• Mean

Brady [30]

3.0–5.0

Bruno et al. [31]

4.0–30.0

Chest X-ray

Arzamasov et al. [19];

Arzamasov et al. [28]

3.0–4.0

• Data from the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health

• Experimental imaging: X-ray, fluoroscopy (8815 scans, 403 specialists)

Satia et al. [35]

18.0

• Specific weight of errors in classification of images as normal

• Omissions of pneumothorax cases

Cascade et al. [32]

0.05

• Images were interpreted by 6 specialists

Quekel et al. [34]

19.0

• Omissions of non-small cell lung cancer cases

 

As shown above, the rate of clinically significant defects in autonomous sorting is 0.05% of the total number of sorted image sets. This value is lower than the lower reference limit.

Notably, the quality of AI-assisted sorting of the results of preventive radiological examinations of the chest is significantly higher than that of the typical radiologist. Moreover, the quality of AI services can greatly vary, as shown by the different specific weight of defects.

In medicine, the occurrence of an event cannot be predicted with 100% accuracy. However, there are two stereotypes:

  • Healthcare professionals are expected to demonstrate 100% accuracy and quality in their actions and decisions, setting a gold standard.
  • AI-assisted medical devices are expected to demonstrate 100% accuracy and reliability in their actions and decisions.

The second stereotype emerged from the gradual rejection of the first and the realization that healthcare professionals should be permitted to make mistakes under certain circumstances.

The above evidence confirming the very high medical effectiveness of autonomous sorting supports the second stereotype. Therefore, in practice, even 0.05% of cases with clinically significant discrepancies should be converted to zero.

The proposed solution is similar to the double check used in the arrangement of mass screenings for other conditions.

This prospective clinical study resulted in proposals to optimize the original organizational model and eliminate clinically significant discrepancies.

The updated autonomous sorting model for the results of preventive radiological examinations of the chest is implemented using an automated double check. In this model, two independent AI-assisted medical devices perform a parallel analysis of the diagnostic image, and the final decision favors the screened patient. The economic reasons behind this approach are explained below.

The data included the number of relevant imaging scans performed in 2023. Calculations were made for autonomous sorting and the specific percentage distribution of normal and non-normal screening images. In addition, the simulation was performed for autonomous sorting according to the experimental design (Table 8) and double-check scheme (Table 9).

 

Table 8. A model of expenses from compulsory medical insurance funds for interpretation and description of preventive radiological examinations of the chest during autonomous sorting (experiment design)

 

Normal image

Non-normal image

Fluorography, n = 1,149,810

Number of scans (n)

496,718

653,092

Tariff, RUB

155.62

6.9

Amount, RUB

77,299,255.16

4,506,334.8

Radiography, n = 840,032

Number of scans (n)

404,055

435,977

Tariff, RUB

157.56

8.8

Amount, RUB

63,662,905.8

3,836,597.6

Total amount, RUB

140,962,160.96

8,342,932.4

149,305,093.36

Note. Non-normal images are images with abnormal changes. Normal images are images without abnormal changes.

 

Table 9. A model of expenses from compulsory medical insurance funds for interpretation and description of preventive radiological examinations of the chest during autonomous sorting

 

Normal image

Non-normal image

Fluorography, n = 1,149,810

Number of scans (n)

496,718

653,092

Tariff, RUB

155.62

13.8

Amount, RUB

77,299,255.16

9,012,669.6

Radiography, n = 840,032

Number of scans (n)

404,055

435,977

Tariff, RUB

157.56

17.6

Amount, RUB

63,662,905.8

7,673,195.2

Total amount, RUB

140,962,160.16

16,685,864.8

157,648,024.96

Note. A model of expenses with double check. Non-normal images are images with abnormal changes. Normal images are images without abnormal changes. The tariff for autonomous sorting is doubled.

 

Based on the annual number of preventive radiological examinations of the chest (1,149,810 X-ray images and 840,032 fluorograms) and the base rate of 132.98 RUB, 264,609,189.16 RUB is required for the images to be interpreted by radiologists.

Autonomous sorting according to the experimental design reduces costs by 43.6% (115,304,095.8 RUB), amounting to 149,305,093.36 RUB.

Using a double check (with doubled tariff for automated interpretation) reduces costs by 40.4% (106,961,164.2 RUB), amounting to 157,648,024.96 RUB.

Finally, the costs of quality control should be considered. The following data are used: 2% of the total number of relevant scans for 2023 and a tariff of 132.98 RUB. The additional cost of a double check is 5,292,183.8 RUB. The total savings could be up to 101,668,936.2 RUB, reducing annual costs for interpreting the results of preventive radiological examinations by 38.4%. The importance of the achieved effect is clear.

The proposed improved model for autonomous sorting of preventive radiological examination results using AI-assisted medical devices is characterized by its high cost-effectiveness.

According to hypothesis 1, at least half of the results from preventive radiological examinations would be classified as normal through autonomous sorting, which would proportionally reduce the labor costs of radiologists. Hypothesis 2 was also not rejected. For certain age and sex groups, the number of cases of clinically significant discrepancies during autonomous sorting may be zero. However, hypothesis 1 scientifically proposes a new model for organization of mass screenings. The results of hypothesis 2 testing are not applicable in real-world settings and are only of theoretical interest.

Based on obtained results, the following updated real-world organizational model was proposed:

  • Autonomous sorting of the results of preventive radiological examinations of the chest should use two AI-assisted medical devices for double checking.
  • Each medical device should be set to a sensitivity of 1.0 (95% CI: 1.0; 1.0).
  • A parallel analysis should be conducted.
  • Images should be sent to a radiologist for interpretation if at least one AI-assisted medical device has classified images as non-normal.
  • When designing process flowcharts and payment tariffs for autonomous sorting, the cost of purchasing and commissioning for two independent medical devices should be simultaneously considered with the necessary functionality.
  • It is recommended to provide an internal quality control of the results from preventive radiological examinations of the chest that were classified as normal through autonomous sorting. This can be achieved by randomly selecting 0.5%–2.0% of the scans performed during the year. The control is carried out by randomly selecting the scans performed no later than the previous quarter.

The implementation of the proposed model could solve key healthcare issues:

  • Increase several times the availability and coverage of prevention care for the population without an additional burden on healthcare resources;
  • Reduce costs;
  • Free up additional human resources by reassigning radiologists to more complex and in-demand types of imaging, such as computed tomography and magnetic resonance imaging;
  • Improve the timely detection of socially significant diseases.

The following research areas were identified to implement the proposed model:

  • Develop and substantiate proposals to update regulatory and legal acts ensuring the legitimacy of autonomous use of certain types of AI-assisted medical devices under established conditions for prevention care;
  • Evaluate the applicability and effectiveness of an updated model for the autonomous sorting of chest screening images with automated double-checking.

Study Limitations

This study was limited by the lack of valid data on the accuracy and quality of radiologists’ interpretations of the results of preventive radiological examinations of the chest obtained in large-sample, multicenter, prospective studies.

CONCLUSION

Autonomous sorting of the results from preventive radiological examinations of the chest was completely correct in 99.95% of cases. In total, 54.8% of fluorograms and chest X-ray images were classified as normal, reducing costs by 43.7% (saving 33,447,696.86 RUB over 5 months) in the Moscow healthcare system due to the difference in tariffs for image interpretation. Clinically significant discrepancies were identified in 0.05% of cases (95% CI: 0.04; 0.06). The original organizational model should be optimized to eliminate clinically significant discrepancies. Autonomous sorting should be double-checked by two independent, AI-assisted medical devices. Maximum sensitivity should be set to 1.0 (95% CI: 1.0; 1.0). This approach will be tested in the next prospective study.

ADDITIONAL INFORMATION

Funding source. The study was conducted in accordance with Decree of the Government of Moscow No. 869-PP On Conducting an Experiment on Automated Interpretation of X-ray Studies Using Artificial Intelligence Technologies, dated April 24, 2024 and Order of the Moscow Health Care Department No. 360 On the Rules for Conducting an Experiment on Automated Interpretation of X-ray Studies Using Artificial Intelligence Technologies, dated 26.04.2024

Disclosure of interests. The authors declare that they have no relationships, activities or interests (personal, professional or financial) with third parties (commercial, non-commercial, private) whose interests may be affected by the content of the article, as well as no other relationships, activities or interests over the past three years that must be reported.

Authors’ contribution. Yu.A. Vasilev: study concept, approval of the final version of the manuscript text; D.A. Sychev editing and approval of the final version of the manuscript text; A.V. Bazhin: study concept and design, collection and analysis of literature data, writing the manuscript text; I.M. Shulkin, A.Yu. Golikova, A.V. Mishchenko, G.A. Bekdzhanyan, L.G. Rodionova editing the manuscript text; A.V. Vladzymyrskyy writing the manuscript text; K.M. Arzamasov: collection and analysis of literature data, writing the manuscript text; A.S. Goldberg: final proofreading of the manuscript text. Thereby, all authors provided approval of the version to be published and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Acknowledgments. The authors are grateful to the employees of the Moscow Center for Diagnostics and Telemedicine: V.G. Klyashtorny for consultations on statistical analysis, D.M. Anikina, R.N. Akhmetov, M.K. Balashov, D.I. Doronin, N.D. Kudryavtsev for consultations on the methodology of the experiment, I.A. Blokhin, A.A. Borisov, Y.S. Kirpichev for consultations on the evaluation of AI services, V.A. Emir-Useinova for consultations on the evaluation of economic efficiency, L.N. Arzamasova, V.P. Gamarina for assistance in manuscript preparation, M.M. Ivakaeva, N.K. Makhmudova, A.A. Skobel, L.D. Stetsyuk, I.V. Truten for expert review of the studies; employees of the Russian Medical Academy of Postgraduate Education of the Ministry of Health of the Russian Federation G.E. Ginetullina, S.V. Guskova, M.Y. Drankova, D.S. Kolyuzhny, S.A. Svoevolin, M.I. Tibiev, A.A. Trefilova for contributing to the study.

1 Artificial Intelligence Technologies in Healthcare. In: Center for Diagnostics and Telemedicine Technologies [Internet]. Moscow: Center for Diagnostics and Telemedicine Technologies, 2020–2024. Available at: https://mosmed.ai/ Accessed on October 8, 2024.

×

About the authors

Yuriy A. Vasilev

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: npcmr@zdrav.mos.ru
ORCID iD: 0000-0002-5283-5961
SPIN-code: 4458-5608

MD, Dr. Sci. (Medicine)

Russian Federation, Moscow

Dmitry A. Sychev

Medical Academy of Continuous Professional Education

Email: dimasychev@mail.ru
ORCID iD: 0000-0002-4496-3680
SPIN-code: 4525-7556

MD, Dr. Sci. (Medicine), Professor, academician of the Russian Academy of Sciences

Russian Federation, Moscow

Alexander V. Bazhin

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: BazhinAV@zdrav.mos.ru
ORCID iD: 0000-0003-3198-1334
SPIN-code: 6122-5786

MD, Cand. Sci. (Medicine)

Russian Federation, Moscow

Igor M. Shulkin

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: ShulkinIM@zdrav.mos.ru
ORCID iD: 0000-0002-7613-5273
SPIN-code: 5266-0618

MD, Cand. Sci. (Medicine)

Russian Federation, Moscow

Anton V. Vladzymyrskyy

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Author for correspondence.
Email: vladzimirskijAV@zdrav.mos.ru
ORCID iD: 0000-0002-2990-7736
SPIN-code: 3602-7120

MD, Dr. Sci. (Medicine)

Russian Federation, Moscow

Alexandra Yu. Golikova

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: GolikovaAY1@zdrav.mos.ru
ORCID iD: 0009-0001-5020-2765
Russian Federation, Moscow

Kirill M. Arzamasov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: ArzamasovKM@zdrav.mos.ru
ORCID iD: 0000-0001-7786-0349
SPIN-code: 3160-8062

MD, Cand. Sci. (Medicine)

Russian Federation, Moscow

Andrei V. Mishchenko

Medical Academy of Continuous Professional Education

Email: dr.mishchenko@mail.ru
ORCID iD: 0000-0001-7921-3487
SPIN-code: 8825-4704

MD, Dr. Sci. (Medicine)

Russian Federation, Moscow

Gevorg A. Bekdzhanyan

Medical Academy of Continuous Professional Education

Email: rmapo@rmapo.ru
ORCID iD: 0009-0007-7150-7166
SPIN-code: 4579-9457
Russian Federation, Moscow

Arcadiy S. Goldberg

Medical Academy of Continuous Professional Education

Email: goldarcadiy@gmail.com
ORCID iD: 0000-0002-2787-4731
SPIN-code: 8854-0469

MD, Cand. Sci. (Medicine)

Russian Federation, Moscow

Larisa G. Rodionova

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: RodionovaLG@zdrav.mos.ru
ORCID iD: 0009-0008-9862-8205
Russian Federation, Moscow

References

  1. Boenk EA, Roginko NI, Dzeranova NG, et al. All-Russian medical examination of adult population within the framework of the national project "Healthcare". Vestnik Roszdravnadzora. 2021;(1):21–29. EDN: FIPEZH
  2. Garifullin TYu, Avdeeva MV, Filatov VN, et al. Improvement of medical check-up process on the basis of lean technologies in outpatient settings. Russian Journal of Preventive Medicine and Public Health. 2023;26(3):30–38. doi: 10.17116/profmed20232603130 EDN: AGSZJF
  3. Zakharchenko OO, Shikina IB, Terentyeva DS. Results of the medical examination of the adult population over 60 years in the Russian Federation (2016–2021). Preventive And Clinical Medicine. 2023;3(88):103–114. doi: 10.47843/2074-9120_2023_3_103 EDN: YNHXOE
  4. Ignatyeva VI, Kontsevaya AV, Kalinina AM, et al. Socio-economic effectiveness of early cancer detection during medical checkup. Russian Journal of Preventive Medicine and Public Health. 2024;27(1):36–44. doi: 10.17116/profmed20242701136 EDN: CNVQRC
  5. Levshin VF, Slepchenko NI, Ryzhova NI, et al. Study of the attitude and participation of the population in the preventive and screening examinations and implementation of these examinations in the health care system. Lechaschi Vrach. 2022;25(10):81–87. doi: 10.51793/OS.2022.25.10.013 EDN: UFZZEB
  6. Stupina MI, Selezneva PA, Khaptanova VA. Medical screening of patients with coronary heart disease in outpatient settings. Nauchnyy Aspekt. 2024;34(4):4436–4455. (In Russ.) EDN: XOTAYU
  7. Golubev NA, Ogryzko EV, Tyurina EM, et al. Features of the development of the radiation diagnostics service in the Russian Federation for 2014–2019. Current Problems of Health Care and Medical Statistics. 2021;(2):356–376. doi: 10.24412/2312-2935-2021-2-356-376 EDN: EHSADW
  8. Ivashikin YM. (2024). Lung imaging screening during preventive medical examinations and medical screening. In: Higher education: scientific research. Proceedings of the Interuniversity International Congress. Moscow: Infinity Publishing House, 2024. P. 139–141. (In Russ.) doi: 10.34660/INF.2024.94.91.106 EDN: BXEOLF
  9. Trofimova TN, Kozlova OV. Radiology in Saint-Petersburg 2019. Diagnostic radiology and radiotherapy. 2021;4(11):96–99. doi: 10.22328/2079-5343-2020-11-4-96-99 EDN: HTVSUZ
  10. Tyurin IE. Radiology in the Russian Federation. Journal of Oncology: Diagnostic Radiology and Radiotherapy. 2018;1(4):43–51. EDN: QZSWYK
  11. Zubova NA. Effectiveness of mass preventive examinations in subjects of the Russian Federation with low morbidity rates of tuberculosis. Social Aspects of Population Health. 2016;4(50):8. doi: 10.21045/2071-5021-2016-50-4-8 EDN: WGIKUN
  12. Rubis LV. Efficiency of mass preventive examinations of the urban population for the purpose of early diagnosis of tuberculosis in primary health care institutions. Current Problems of Health Care and Medical Statistics. 2021;(3):1–13. doi: 10.24412/2312-2935-2021-3-1-13 EDN: VPLCTZ
  13. Shelekhov PV. Personnel situation in radiative diagnostics. Current Problems of Health Care and Medical Statistics. 2019;(1):265–275. doi: 10.24411/2312-2935-2019-10018 EDN: ZGZFPV
  14. Bobrovskaya TM, Vasilev YuA, Nikitin NYu, Arzamasov KM. Approaches to building radiology datasets. Medical Doctor and IT. 2023;(4):14–23. doi: 10.25881/18110193_2023_4_14 EDN: EQHEKE
  15. Vasiliev YuA, Vlazimirsky AV, Omelyanskaya OV, et al. Methodology for testing and monitoring artificial intelligence-based software for medical diagnostics. Digital Diagnostics. 2023;4(3):252–267. doi: 10.17816/DD321971 EDN: UEDORU
  16. Vasilev YuA, Arzamasov KM, Kolsanov AV, et al. Experience of application artificial intelligence software on 800 thousand fluorographic studies. Medical Doctor and IT. 2023;(4):54–65. doi: 10.25881/18110193_2023_4_54 EDN: MHCTUB
  17. Vasiliev YuA, Vladzimirsky AV, Arzamasov KM, et al. The first 10,000 mammography exams performed as part of the “Description and interpretation of mammography data using artificial intelligence” service. Manager Zdravookhranenia. 2023;(8):54–67. doi: 10.21045/1811-0185-2023-8-54-67 EDN: KZHPVW
  18. Vasilev YuA, Vladzimirsky AV, Arzamasov KM, et al. Computer vision in radiology: stage one of the Moscow experiment. 2nd ed. Moscow: Publishing solutions; 2023. (In Russ.)
  19. Arzamasov KM, Semenov SS, Kokina DY, et al. Criteria for the applicability of computer vision for preventive studies on the example of chest X-ray and fluorography. Meditsinskaya Fizika. 2022;4(96):56–63. doi: 10.52775/1810-200X-2022-96-4-56-63 EDN: MXKUVL
  20. Vasilev YuA, Tyrov IA, Vladzymyrskyy AV, et al. A new model of organizing mass screening based on stand-alone artificial intelligence used for fluorography image triage. Public Health and Life Environment. 2023;31(11):23–32. doi: 10.35627/2219-5238/2023-31-11-23-32 EDN: SYIQBX
  21. Vasilev YuA, Tyrov IA, Vladzymyrskyy AV, et al. Autonomous artificial intelligence for sorting the preventive imaging studies’ results. Russian Journal of Preventive Medicine. 2024;27(7):23–29. doi: 10.17116/profmed20242707123 EDN: ODGHNM
  22. Morozov SP, Vetsheva NN, Ledikhova NV, et al. Assessing the quality of radiologic studies. Moscow: Moscow Center for Diagnostics and Telemedicine; 2019. (In Russ.)
  23. Alekseeva TR, Amosov VI, Anikeeva OYu, et al. Chest radiology: national guidelines. Moscow: GEOTAR-Media; 2014. (In Russ.) EDN: VRXFKX
  24. Vasilev YuA, Vladzymyrskyy AV, Omelyanskaya OV, et al. Assessing the maturity of artificial intelligence technologies for healthcare. Moscow: Moscow Center for Diagnostics and Telemedicine; 2023. (In Russ.)
  25. Orlov EM, Sokolova ON. Efficiency category in public health services system. Fundamental’nye issledovaniya. 2010;(4):70–75. EDN: MSPQTJ
  26. Kucherenko VZ, Fleck VO, Putin ME, et al. Evaluation of the effectiveness of medical organizations. Vyalkov AI, editor. Moscow: GEOTAR-Med; 2004.
  27. Arzamasov KM, Vasilev YuA, Vladzymyrskyy AV, et al. The use of computer vision for the mammography preventive research. Russian Journal of Preventive Medicine and Public Health. 2023;26(6):117–123. doi: 10.17116/profmed202326061117 EDN: YBKHPS
  28. Arzamasov KM, Vasilev YuA, Vladzymyrskyy AV, et al. An international non-inferiority study for the benchmarking of AI for routine radiology cases: chest X-ray, fluorography and mammography. Healthcare. 2023;11(10):1684. doi: 10.3390/healthcare11121684 EDN: FWVMPQ
  29. Berlin L. Radiologic errors and malpractice: a burry distinction. American Journal of Roentgenology. 2007;189(3):517–522. doi: 10.2214/AJR.07.2209
  30. Brady AP. Error and discrepancy in radiology: inevitable or avoidable? Insights into Imaging. 2017;8(1):171–182. doi: 10.1007/s13244-016-0534-1 EDN: FSSDNE
  31. Bruno MA, Walker EA, Abujudeh HH. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. RadioGraphics. 2015;35(6):1668–1676. doi: 10.1148/rg.2015150023
  32. Cascade PN, Kazerooni EA, Gross BH, et al. Evaluation of competence in the interpretation of chest radiographs. Academic Radiology. 2001;8(4):315–321. doi: 10.1016/S1076-6332(03)80500-7
  33. Morozov S, Guseva E, Ledikhova N, et al. Telemedicine-based system for quality management and peer review in radiology. Insights into Imaging. 2018;9(3):337–341. doi: 10.1007/s13244-018-0629-y EDN: YCIRMT
  34. Quekel LGBA, Kessels AGH, Goei R, van Engelshoven JMA. Miss rate of lung cancer on the chest radiograph in clinical practice. Chest. 1999;115(3):720–724. doi: 10.1378/chest.115.3.720
  35. Satia I, Bashagha S, Bibi A, et al. Assessing the accuracy and certainty in interpreting chest X-rays in the medical division. Clinical Medicine. 2013;13(4):349–352. doi: 10.7861/clinmedicine.13-4-349
  36. Topff L, Steltenpool S, Ranschaert ER, et al. Artificial intelligence-assisted double reading of chest radiographs to detect clinically relevant missed findings: a two-centre evaluation. European Radiology. 2024;34(9):5876–5885. doi: 10.1007/s00330-024-10676-w EDN: RUJICB
  37. Vasilev YuA, Vladzymyrskyy AV, Omelyanskaya OV, et al. AI-based CXR first reading: current limitations to ensure practical value. Diagnostics. 2023;13(8):1430. doi: 10.3390/diagnostics13081430 EDN: MPQYUP

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. General study procedure AI, artificial intelligence; RMACPE MoH of Russia, Russian Medical Academy of Continuous Professional Education of the Ministry of Health of Russia; RCRPCC D&TT MCHD, Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow City Health Department.

Download (268KB)
3. Fig. 2. A presentation example for artificial intelligence conclusions in the electronic medical record in the Unified Medical Information and Analytical System as an automatically generated conclusion about the absence of abnormal changes in an electronic medical record format.

Download (99KB)
4. Fig. 3. Presentation example for the artificial intelligence-generated results in the Unified Radiological Information Service of the Unified Medical Information and Analytical System of Moscow (image, DICOM SR).

Download (444KB)
5. Fig. 4. A set of images for a male patient aged 53 years. A clinically significant discrepancy: signs of small, focal, polysegmental dissemination in the lungs.

Download (168KB)
6. Fig. 5. Images for a female patient aged 47 years. A clinically significant discrepancy: lesion in the lower lobe of the left lung (red arrow).

Download (184KB)

Copyright (c) 2025 Eco-Vector

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № ФС 77 - 79539 от 09 ноября 2020 г.