Consensus-based labeling algorithms for texture analysis of prostate lesions

Cover Image


Cite item

Abstract

BACKGROUND: Texture analysis improves the diagnostic accuracy of magnetic resonance imaging and differential diagnosis of prostate lesions, which are primarily segmented through manual labeling, resulting in significant inter-expert variability of masks. A consensus-based technique can help reduce inconsistencies in prostate lesion segmentation. However, global scientific studies have not described any standardized, consensus-based labeling protocols.

AIM: This study aimed to develop a consensus algorithm for manual labeling of prostate lesions by several independent experts and evaluate inter-expert consistency in lesion segmentation.

METHODS: This retrospective study included 60 biparametric magnetic resonance imaging scans of the prostate gland performed according to PI-RADS 2.1 technical specification. The scans showed PI-RADS 3, 4, and 5 lesions. Two independent radiologists manually segmented the prostate lesions using 3D Slicer. Then, the resulting masks were compared using the Dice–Sørensen coefficient (DSC). For lesions with DSC ≥ 0.75, the final mask was based on the overlap between the two original masks. Conversely, for lesions with DSC < 0.75, the final mask was determined using the proposed consensus algorithm.

RESULTS: The proposed consensus algorithm significantly increased the DSC values, from 0.61 [0.48; 0.73] for primary labeling to 0.74 [0.62; 0.79] for labeling using the proposed algorithm (p = 0.01).

CONCLUSION: The proposed consensus-based algorithm for labeling prostate lesions using magnetic resonance imaging data is crucial in addressing inadequate approaches to objective segmentation in research and clinical settings.

Full Text

BACKGROUND

Radiomic analysis is a promising method for the differential diagnosis of prostate neoplasms, given that the diagnostic accuracy of magnetic resonance imaging in clinical practice may vary depending on the imaging protocol and the experience of the interpreting radiologist [1, 2]. Radiomic analysis may reduce the number of unnecessary biopsies and associated complications [3], which is particularly important in indeterminate prostate lesions [4]. However, the introduction of radiomics in routine clinical practice is associated with several challenges, with the lack of a standardized and optimal labeling tool remaining one of the most critical issues [5, 6].

At present, manual segmentation remains the most accessible and widely used method for delineating prostate lesions for subsequent texture analysis [7].

One of its principal limitations is the high degree of inter-observer variability of segmentation masks, which leads to instability of the extracted texture features [8]. To quantify the similarity between segmentation masks, the Dice–Sørensen similarity coefficient (DSC) is commonly used, with values ranging from 0 to 1, where 1 indicates complete overlap. According to Chen et al. [9], during segmentation of prostate lesions on two magnetic resonance imaging (MRI) scans by four radiologists, the median DSC was 0.81 for the peripheral zone and 0.58 for the transitional zone. This demonstrates high variability in segmentation results even among experienced specialists. Jeganathan et al. [10] reported even greater variability: when three radiologists segmented prostate lesions across 64 examinations, the mean DSC was 0.55. The low values may be explained by low contrast or the small size of the lesions included in the studies. However, published data indicate no substantial improvement in DSC even for lesions with a very high probability of clinically relevant cancer (Prostate Imaging Reporting and Data System category 5, PI-RADS 5), with clearer margins and larger size [10, 11].

To standardize segmentation of lesions, the European Society of Radiology (ESR) and the European Organisation for Research and Treatment of Cancer (EORTC) have developed guidelines [12] stating that manual segmentation of biomedical images should involve iterative independent evaluation by multiple annotators until a presumed consensus and a final mask are obtained. However, the guidelines do not specify the number of iterations or the required number of operators.

The limited methodological detail regarding the consensus procedure is also reflected in original studies [13–15]. According to a review, publications available at the time of writing lack sufficient detail about the methodology for generating the final segmentation mask, which substantially limits the reproducibility of the reported results.

Thus, despite the rapidly growing body of research in prostate radiomics [7], international publications still lack clearly defined algorithms for achieving consensus when segmenting prostate lesions by several annotators. This highlights the need to develop and evaluate the effectiveness of an approach for generating a final segmentation mask based on agreement among several experts. To address this issue, two hypotheses were formulated (see Table 1).

 

Table 1. Null and alternative hypotheses of the study

Null hypothesis (H0)

Alternative hypothesis (Ha)

The DSC for prostate lesion labeling by two experts is < 0.751

The DSC for prostate lesion labeling by two experts is ≥0.751

The correlation coefficient between DSC and the PI-RADS lesion category does not differ significantly from zero

The correlation coefficient between DSC and the PI-RADS lesion category differs significantly from zero

Note. 1 the rationale for the selected cutoff value is provided in the Methods section; DSC, Dice–Sørensen similarity coefficient; PI-RADS, Prostate Imaging Reporting and Data System.

 

AIM

To develop a consensus-forming algorithm for independent manual labeling of prostate lesions by several experts and to assess inter-expert agreement in the segmentation of focal prostate changes.

METHODS

Study Design

An observational, single-center, retrospective study was conducted (see Fig. 1).

 

Fig. 1. Study design. DSC, DiceSørensen similarity coefficient; mpReview (Multiparametric Review), software extension for multiparametric study analysis and segmentation; MRI, magnetic resonance imaging; PI-RADS, Prostate Imaging Reporting and Data System.

 

Eligibility Criteria

Inclusion criteria: biparametric prostate MRI (bpMRI) images acquired in accordance with PI-RADS 2.1, containing lesions corresponding to the following PI-RADS categories: 3, intermediate likelihood of clinically significant prostate cancer; 4, high likelihood of clinically significant prostate cancer; 5, very high likelihood of clinically significant prostate cancer.

Non-inclusion criteria: focal prostate lesions corresponding to PI-RADS category 1 (very low likelihood of clinically significant prostate cancer) or 2 (low likelihood of clinically significant prostate cancer), according to PI-RADS 2.1, as well as examinations performed with deviations from the specified standard.

Exclusion criteria: low-quality examinations (with pronounced artifacts hindering interpretation).

Data Collection

The study used a registered dataset with histological verification1 comprising 103 anonymized bpMRI images acquired on a MAGNETOM® Aera 1.5 T 4G scanner (Siemens Healthcare, Germany) in accordance with the PI-RADS 2.1 standard. Only bpMRI examinations were selected for the present study because, according to the METRICS criteria (a scoring system for radiomics study quality) [16], radiomic analysis should preferably use fewer pulse sequences to minimize the risk of model overfitting. After reviewing all images in the dataset, 43 bpMRI images containing focal lesions classified as PI-RADS 1 and 2 (very low and low likelihood of clinically significant prostate cancer, respectively) were excluded. As a result, the final dataset included 60 images with 69 lesions corresponding to PI-RADS 3 and higher.

Experts

Two radiologists (experts) with 9 and 12 years of experience in diagnostic radiology and prior experience in medical image segmentation participated in dataset labeling.

Segmentation

When creating the original dataset,1 one expert prepared reference images in which the target pathological lesions were schematically marked on a single slice, based on the PI-RADS category and multifocal fusion biopsy findings (see Fig. 2). Thus, more than 6 months had passed between the expert’s previous work with the data and the present study, which, according to the Ebbinghaus forgetting curve, is sufficient to minimize the influence of prior experience on current labeling [17].

 

Fig. 2. Reference images with a schematically marked lesion in the peripheral zone of the left prostate lobe: a, T2-weighted image; b, apparent diffusion coefficient map.

 

In the present study, segmentation was performed following these reference images across all slices that, in the annotator's opinion, contained a lesion. We considered this step justified both for annotator convenience and for reducing inter-observer discrepancies during result processing.

bpMRI images were segmented using the open-source software 3D Slicer2 (version 5.6.2) with the Multiparametric Review (mpReview) extension for multiparametric study analysis and segmentation. The radiologists independently created a separate mask for each prostate lesion manually, using the contour brush (Draw) tool. T2-weighted images were selected as the reference pulse sequence because images used for mask comparison must have identical spatial resolution. Segmentation was performed slice by slice with evaluation of the entire lesion volume. The resulting masks were saved in the NIfTI format (Neuroimaging Informatics Technology Initiative, .nii ).

Ethics Approval

The study was approved by the Independent Ethics Committee of the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies (Minutes No. 06/2025 of June 19, 2025).

Statistical Analysis

Sample size justification. Under the worst-case scenario, assuming maximal disagreement between segmentations (DSC = 1), a sample including 8 lesions provides a statistical power of 80%, with a type I error probability of 0.05 [18].

Statistical methods: The masks obtained from the experts were sequentially compared, and the DSC was calculated. The results are presented as Me [Q1; Q3], where Me is the median and Q1 and Q3 are the first and third quartiles, respectively. The Shapiro–Wilk test was used for normality testing. The relationship between the PI-RADS lesion category and DSC value was evaluated using the Spearman’s rank correlation coefficient (ρ). The level of significance for hypothesis testing was set at 0.05. All calculations were performed using RStudio®, version 4.1.2 (Posit, PBC, USA) [19].

Developing the Consensus Labeling Algorithm

Given that no generally accepted DSC threshold values for consensus labeling are available [12], we established a cutoff value of 0.75. In DSC ≥0.75, the overlapping area of the two masks was retained as the final mask. In DSC < 0.75, the corresponding prostate lesions were jointly re-evaluated by both experts to achieve consensus.

We developed an action algorithm that explicitly regulates the consensus procedure for lesions with DSC values below the threshold (see Fig. 3). The masks corresponding to these lesions were anonymized and arranged in random order. After one month, the experts re-evaluated the masks (both their own and the other annotator’s) in 3D Slicer,2 without discussing them with each other. Subsequently, each expert independently evaluated all lesion masks by assigning either 1 point (mask appropriate) or 0 points (mask inappropriate). In this context, mask appropriate indicated that, in the expert’s opinion, the mask contours sufficiently matched the lesion boundaries. If both experts unanimously selected a mask (total score: 2 points), the mask was used for further analysis. If the total score was 0 or 1, the mask was excluded.

 

Fig. 3. Consensus achievement algorithm.

 

After both masks of a specific lesion passed through the expert voting stage, the following consensus scenarios were possible:

  • If both masks were selected, their intersection was considered the final mask (DSC calculation shown in Figure 4a);
  • If only one mask was selected, it was used as the final mask;
  • If both masks were rejected, the final mask was generated jointly by both experts.

 

Fig. 4. Calculation of the Dice–Sørensen similarity coefficient after completion of the algorithm: а, if both masks are selected by the experts (the original Dice–Sørensen similarity coefficient is retained); b, if only one of the two masks is selected; c, the newly created mask intersects with both previous masks; d, the newly created mask intersects with only one of the previous masks.

 

In the second scenario, after completion of the algorithm, the DSC value was recalculated as the ratio of the intersection volume of the two masks to the volume of the selected (final) mask (see Fig. 4b). In the third scenario, the DSC value was calculated as the ratio of the volume of the newly created mask to the combined volume of the two previous masks (see Figs. 4c, 4d).

RESULTS

A total of 60 bpMRI images obtained according to the PI-RADS 2.1 standard were analyzed, comprising 69 prostate lesions classified as PI-RADS 3, 4, or 5. The distribution of lesions by PI-RADS category was as follows:

  • PI-RADS 3: 27 lesions (39%);
  • PI-RADS 4: 22 lesions (32%);
  • PI-RADS 5: 20 lesions (29%).

After the initial dataset labeling, the baseline DSC was calculated for each pair of masks. The distribution of baseline DSC values is shown in Fig. 5. The DSC for the entire sample was 0.61 [0.48; 0.73].

 

Fig. 5. Distribution of baseline Dice–Sørensen similarity coefficient values.

 

The number of lesions with DSC ≥0.75 was 14 (20.2%). The masks of remaining lesions (n = 55, 79.8%) were re-evaluated according to the algorithm. Lesion categories in the re-evaluation group were as follows:

  • PI-RADS 3: 38% (21 cases);
  • PI-RADS 4: 36% (20 cases);
  • PI-RADS 5: 25% (14 cases).

For the re-evaluation group, consensus (see Fig. 3) was achieved as follows:

  • For 43 lesions, one of the two original masks was unanimously selected as the final mask;
  • For 11 lesions, a new final mask was created because neither original mask received unanimous approval;
  • For one lesion (DSC = 0.56), the final mask was obtained by intersecting the two original masks, both of which were unanimously approved by the experts.

Recalculation of DSC values for the entire sample after re-evaluation of the 55 cases demonstrated changes in agreement. In 27% of cases (n = 15), DSC values decreased after applying the algorithm. For the entire sample, the DSC after consensus reached 0.74 [0.62; 0.79].

The distribution of DSC values across PI-RADS categories deviated significantly from normality (Shapiro–Wilk test, p <  0.001); therefore, nonparametric methods were used for comparison. The distribution of baseline DSC values by PI-RADS category is presented in Table 2.

 

Table 2. Distribution of baseline Dice–Sørensen similarity coefficient values by PI-RADS category of prostate lesions

PI-RADS category

Dice–Sørensen similarity coefficient

PI-RADS 3 (intermediate likelihood of clinically significant prostate cancer)

0.54 [0.37; 0.67]

PI-RADS 4 (high likelihood of clinically significant prostate cancer)

0.61 [0.52; 0.71]

PI-RADS 5 (very high likelihood of clinically significant prostate cancer)

0.68 [0.59; 0.76]

Note. Data are presented as Me [Q1; Q3], where Me is the median and Q1 and Q3 are the 1st and 3rd quartiles, respectively; PI-RADS, Prostate Imaging Reporting and Data System.

 

A correlation analysis of the relationship between segmentation agreement (DSC) and the severity of lesions (PI-RADS category) demonstrated a weak, significant positive association (ρ = 0.3, p = 0.01): the higher the PI-RADS category, the greater the agreement of the segmentation masks.

After re-evaluation, statistical analysis demonstrated no significant association between lesion category and DSC (ρ = 0.09, p = 0.42). Thus, the proposed consensus algorithm reduces variability in segmentation agreement and increases DSC values, including for PI-RADS 3 lesions with indistinct boundaries.

A comparison of baseline and post-consensus DSC values (see Fig. 6, a) showed that the median DSC increased significantly after applying the consensus algorithm (one-sided paired Wilcoxon test, p = 0.01). A correlation analysis between the absolute difference of baseline and new DSC values and PI-RADS category (see Fig. 6b) revealed a weak, significant negative correlation (ρ = 0.2, p = 0.04). Thus, as the PI-RADS category increases, a decrease in the absolute DSC gain is observed.

 

Fig. 6. Comparison of Dice–Sørensen similarity coefficient values before and after applying the consensus labeling algorithm: a, median comparison; b, association with PI-RADS category. PI-RADS, Prostate Imaging Reporting and Data System.

 

DISCUSSION

Summary of Primary Results

A consensus-based algorithm for labeling prostate lesions was developed and validated, enabling standardization and improved transparency of the consensus procedure.

Inter-expert agreement in manual segmentation of prostate lesions was evaluated. The proposed consensus labeling algorithm resulted in a significant increase in the DSC (p = 0.01).

Moreover, the consensus algorithm allowed for invariance of segmentation agreement: the weak positive association between PI-RADS lesion category and DSC observed in the initial dataset was no longer present after re-evaluation of cases using the proposed algorithm. The majority of cases requiring re-evaluation belonged to PI-RADS categories 3 and 4 (74%).

Discussion of Primary Results

This work describes an algorithm for consensus-based labeling of focal prostate lesions. The necessity for such an algorithm stems from the lack of standards for achieving consensus in international publications, as well as the need to prepare datasets for various purposes, including the development and validation of radiomic models.

The proposed algorithm is primarily aimed at segmentation of objects with low contrast relative to surrounding tissues [20], which in this case refers to focal prostate lesions. The use of automated segmentation algorithms in routine clinical practice remains limited, especially for indeterminate lesions classified as PI-RADS 3.

Manual segmentation of prostate lesions is characterized by high operator dependence, which is confirmed by our findings (DSC 0.61 [0.48; 0.73]) and is consistent with previously published data [9, 10]. Subgroup analysis of the baseline dataset demonstrated a weak positive association between DSC values and PI-RADS lesion category (ρ = 0.3, p = 0.01), indicating that with increasing lesion grade and contrast [4], inter-expert agreement increases slightly. Nevertheless, even for lesions with a high likelihood of clinically significant cancer, DSC values remain moderate (see Table 2). Radiomic analysis is most clinically valuable for prostate lesions with intermediate likelihood of clinically significant cancer (PI-RADS 3), which are also the most challenging objects for segmentation. Therefore, the findings underscore the necessity of developing new labeling approaches for such lesions aimed at reducing operator dependence.

The high variability of masks generated by several experts hampers the reproducibility of texture features [8]. Consequently, to minimize this variability and obtain a reliable ground truth, involvement of several annotators followed by a consensus procedure is required.

At present, no standardized protocols for consensus labeling are available in the international scientific community. In many studies, the description of the agreement process is limited to the use of the term “consensus” without further clarification of how final mask boundaries were determined [13–15]. For example, in the study by Cuocolo et al. [13], an additional expert was invited in complex cases and could, if necessary, modify the proposed masks or create new ones. Such an approach is entirely dependent on the expertise of the invited specialist and allows substantial subjectivity that remains unverified and uncontested.

The above-mentioned ESR and EORTC guidelines [12] propose the sequential correction of a single mask by two or more experts until a presumed consensus is reached. However, this method does not specify the number of iterations required and has no clear criteria for determining which mask should be considered final.

In the study by Kocak et al. [21], iterative correction was applied to generate consensus masks when analyzing lesions of the pituitary gland, breast, and kidneys to assess the reproducibility of radiomic features. Twelve radiologists with varying levels of experience participated in the study, and those with greater professional experience corrected the masks produced by their less experienced colleagues. Consequently, the final decision remained with the most experienced physician (the head of the radiology department). However, according to Chen et al. [9], DSC values exhibited substantial variability even among experienced radiologists, supporting the hypothesis that experience in biomedical image annotation may be more important than subspecialty clinical experience [12].

A distinguishing feature of the proposed algorithm is the detailed documentation of all stages of consensus labeling, which renders the segmentation process transparent and comprehensible for radiologists. Moreover, the absence of a requirement to involve additional experts reduces the risk of bias and improves labeling reproducibility. The algorithm significantly increases inter-annotator agreement: the median DSC increased from 0.61 to 0.74 when consensus labeling was applied. As expected, the majority of re-evaluated cases (78%) consisted of lesions classified as PI-RADS 3 and 4. The significant increase in DSC observed both for the entire dataset and for these categories is confirmed by group comparisons and correlation analysis (see Fig. 5). The proposed algorithm may therefore serve as a practical tool for medical image segmentation for annotated datasets.

Further research is needed to determine the optimal duration of the “forgetting” period and to explore strategies to shorten it. To our knowledge, neither Russian nor international publications currently address the forgetting period concept in biomedical image segmentation.

STUDY LIMITATIONS

Our study has several limitations:

  • First, the DSC threshold was set at 0.75 based on the authors' consensus in the absence of reported reference values;
  • Second, the one-month forgetting period does not fully eliminate potential expert bias;
  • Third, we were unable to completely avoid the open discussion stage in one of the algorithm scenarios, specifically when both initial masks were rejected by the experts. A possible solution to this issue, given sufficient time resources, would be to repeat the algorithm for such lesions;
  • Fourth, only two radiologists participated in the study; therefore, further validation of the proposed algorithm is required with a larger number of experts, including application to lesions from other anatomical regions.

Furthermore, an analysis of published works on radiomic analysis and medical image segmentation reveals substantial variability in how the term “consensus” is interpreted by different authors. A publication by Jones et al. [22], which discusses consensus methods in healthcare, defines their primary objective as determining the degree of agreement among a group of experts on a specific issue or problem. One of the most important qualities of consensus methods is anonymity, which helps prevent dominance of a single participant’s opinion. Other key aspects are the iterative nature of the process, access to comments from other experts, and feedback. Based on this definition, the consensus methods currently applied in medical image segmentation do not fully comply with these principles, particularly regarding anonymity. Thus, it can be concluded that only various forms of consensus-like methods are used in medical image annotation, including in the present study.

CONCLUSION

The developed consensus achievement algorithm for manual segmentation of prostate lesions provides a detailed description of each step and helps reduce the subjective influence of annotators on the final result by eliminating the stage in which the final decision is made by a single expert. This algorithm may become a useful tool in texture analysis, facilitating the implementation of radiomics in routine practice. Furthermore, it highlights the need for new approaches to biomedical image segmentation to reduce the influence of the human factor.

The findings confirm the substantial degree of operator dependence in manual segmentation of prostate lesions. At the same time, a slight increase in inter-expert agreement is observed with increasing PI-RADS category.

ADDITIONAL INFORMATION

Author contributions: M.O. Romanenko: writing—original draft, formal analysis; M.R. Kodenko: formal analysis; P.B. Gelezhe: investigation, formal analysis; I.A. Blokhin: conceptualization, methodology, formal analysis, writing—review & editing; R.V. Reshetnikov: writing—review & editing. All the authors approved the version of the manuscript to be published and agreed to be accountable for all aspects of the work, ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Ethics approval: The study was approved by the Independent Ethics Committee at the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies (Minutes No. 06/2025, dated June 19, 2025).

Funding sources: This article was prepared as part of the Scientific Justification of Radiology Modalities for Tumor Diseases Using Radiomics Analysis research project (Unified State Information Accounting System No. 123031500005-2), in accordance with Order of the Moscow City Health Department No. 1196 On Approval of State Assignments Funded by the Budget of the City of Moscow for State Budgetary (Autonomous) Institutions Under the Jurisdiction of the Moscow City Health Department for 2023 and the planned period of 2024–2025, dated December 21, 2022.

Disclosure of interests: The authors have no relationships, activities, or interests for the last three years related to for-profit or not-for-profit third parties whose interests may be affected by the content of the article.

Statement of originality: No previously obtained or published material (text, images, or data) was used in this study or article.

Data availability statement: The editorial policy regarding data sharing does not apply to this work.

Generative AI: No generative artificial intelligence technologies were used to prepare this article.

Provenance and peer-review: This article was submitted unsolicited and reviewed following the standard procedure. The peer review process involved two external reviewers, a member of the Editorial Board, and the in-house science editor.

 

1 State registration certificate of database No. 2024620575 of February 6, 2024. Bull. No. 2. Vasiliev Y.A., Blokhin I.A., Gelezhe P.B. et al. Biparametric prostate MRI dataset with histological verification. Available at: https://www.elibrary.ru/download/elibrary_60779494_94785287.PDF Accessed on: October 21, 2024.

2 3D Slicer image computing platform. In: 3D Slicer [Internet]. 2005–2024. Available at: https://slicer.org/. Accessed on: September 21, 2024.

×

About the authors

Maria O. Romanenko

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Author for correspondence.
Email: RomanenkoMO@zdrav.mos.ru
ORCID iD: 0009-0006-1557-0374
SPIN-code: 8204-5924
Russian Federation, Moscow

Maria R. Kodenko

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies; Bauman Moscow State Technical University

Email: KodenkoM@zdrav.mos.ru
ORCID iD: 0000-0002-0166-3768
SPIN-code: 5789-0319

Cand. Sci. (Engineering)

Russian Federation, Moscow; Moscow

Pavel B. Gelezhe

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies; European Medical Center

Email: gelezhe.pavel@gmail.com
ORCID iD: 0000-0003-1072-2202
SPIN-code: 4841-3234

MD, Cand. Sci. (Medicine);

Russian Federation, Moscow

Ivan A. Blokhin

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: BlokhinIA@zdrav.mos.ru
ORCID iD: 0000-0002-2681-9378
SPIN-code: 3306-1387

MD, Cand. Sci. (Medicine)

Russian Federation, Moscow

Roman V. Reshetnikov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies

Email: ReshetnikovRV1@zdrav.mos.ru
ORCID iD: 0000-0002-9661-0254
SPIN-code: 8592-0558

Cand. Sci. (Physics and Mathematics)

Russian Federation, Moscow

References

  1. Smith CP, Harmon SA, Barrett T, et al. Intra- and interreader reproducibility of PI-RADSv2: a multireader study. Journal of Magnetic Resonance Imaging. 2018;49(6):1694–1703. doi: 10.1002/jmri.26555
  2. Vasilev YuA, Omelyanskaya OV, Vladzymyrskyy AV, et al. Comparison of multiparametric and biparametric magnetic resonance imaging protocols for prostate cancer diagnosis by radiologists with different experience. Digital Diagnostics. 2023;4(4):455–466. doi: 10.17816/dd322816 EDN: PVEPWX
  3. Borghesi M, Ahmed H, Nam R, et al. Complications after systematic, random, and image-guided prostate biopsy. European Urology. 2017;71(3):353–365. doi: 10.1016/j.eururo.2016.08.004 EDN: YXGSZX
  4. Nikolaev AE, Blohin IA, Shapiev AN, et al. Application of the PI-RADS system in MR diagnostics of the prostate gland: methodological recommendations. Moscow: Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies; 2019. (In Russ.) EDN: TTGQTA
  5. Zhong J, Lu J, Zhang G, et al. An overview of meta-analyses on radiomics: more evidence is needed to support clinical translation. Insights into Imaging. 2023;14(1):111. doi: 10.1186/s13244-023-01437-2 EDN: SMPQCJ
  6. Chiacchio G, Castellani D, Nedbal C, et al. Radiomics vs radiologist in prostate cancer. Results from a systematic review. World Journal of Urology. 2023;41(3):709–724. doi: 10.1007/s00345-023-04305-2 EDN: HPNNUD
  7. Telecan T, Andras I, Crisan N, et al. More than meets the eye: using textural analysis and artificial intelligence as decision support tools in prostate cancer diagnosis—a systematic review. Journal of Personalized Medicine. 2022;12(6):983. doi: 10.3390/jpm12060983 EDN: TIZZEK
  8. Whybra P, Spezi E. Sensitivity of standardised radiomics algorithms to mask generation across different software platforms. Scientific Reports. 2023;13(1):14419. doi: 10.1038/s41598-023-41475-w EDN: LVNVNQ
  9. Chen MY, Woodruff MA, Dasgupta P, Rukin NJ. Variability in accuracy of prostate cancer segmentation among radiologists, urologists, and scientists. Cancer Medicine. 2020;9(19):7172–7182. doi: 10.1002/cam4.3386 EDN: LCYCRN
  10. Jeganathan T, Salgues E, Schick U, et al. Inter-rater variability of prostate lesion segmentation on multiparametric prostate MRI. Biomedicines. 2023;11(12):3309. doi: 10.3390/biomedicines11123309 EDN: ZCDYWR
  11. Ghafoor S, Steinebrunner F, Stocker D, et al. Index lesion contouring on prostate MRI for targeted MRI/US fusion biopsy – Evaluation of mismatch between radiologists and urologists. European Journal of Radiology. 2023;162:110763. doi: 10.1016/j.ejrad.2023.110763 EDN: QLSWGX
  12. deSouza NM, van der Lugt A, Deroose CM, et al; European Society of Radiology. Standardised lesion segmentation for imaging biomarker quantitation: a consensus recommendation from ESR and EORTC. Insights into Imaging. 2022;13(1):159. doi: 10.1186/s13244-022-01287-4 EDN: ONUHSE
  13. Cuocolo R, Stanzione A, Ponsiglione A, et al. Clinically significant prostate cancer detection on MRI: a radiomic shape features study. European Journal of Radiology. 2019;116:144–149. doi: 10.1016/j.ejrad.2019.05.006 EDN: XBKHNN
  14. Cuocolo R, Comelli A, Stefano A, et al. Deep learning whole-gland and zonal prostate segmentation on a public MRI dataset. Journal of Magnetic Resonance Imaging. 2021;54(2):452–459. doi: 10.1002/jmri.27585 EDN: GNOJHL
  15. Schelb P, Kohl S, Radtke JP, et al. Classification of cancer at prostate MRI: deep learning versus clinical PI-RADS assessment. Radiology. 2019;293(3):607–617. doi: 10.1148/radiol.2019190938
  16. Kocak B, Akinci D’Antonoli T, Mercaldo N, et al. METhodological RadiomICs Score (METRICS): a quality scoring tool for radiomics research endorsed by EuSoMII. Insights into Imaging. 2024;15(1):8. doi: 10.1186/s13244-023-01572-w EDN: CINMDC
  17. Murre JMJ, Chessa AG. Why Ebbinghaus' savings method from 1885 is a very ‘pure' measure of memory performance. Psychon Bull Rev. 2023;30(1):303–307. doi: 10.3758/s13423-022-02172-3
  18. Chow S.-C., Wang H., Shao J. Sample Size Calculations in Clinical Research (2nd ed.). Chapman and Hall/CRC. – 2007.
  19. Blokhin IA, Kodenko MR, Shumskaya YF, et al. Hypothesis testing using R. Digital Diagnostics. 2023;4(2):238–247. doi: 10.17816/DD121368 EDN: OEKDAG
  20. Kallie CS, Legge GE, Yu D. Identification and detection of simple 3D objects with severely blurred vision. Investigative Opthalmology & Visual Science. 2012;53(13):7997. doi: 10.1167/iovs.12-10013
  21. Kocak B, Yardimci AH, Nazli MA, et al. REliability of consensus-based segMentatIoN in raDiomic feature reproducibility (REMIND): A word of caution. European Journal of Radiology. 2023;165:110893. doi: 10.1016/j.ejrad.2023.110893 EDN: VBDFCG
  22. Jones J, Hunter D. Qualitative Research: Consensus methods for medical and health services research. BMJ. 1995;311(7001):376–380. doi: 10.1136/bmj.311.7001.376 EDN: CBNBSJ

Supplementary files

Supplementary Files
Action
1. JATS XML
2. Fig. 1. Study design. DSC, Dice–Sørensen similarity coefficient; mpReview (Multiparametric Review), software extension for multiparametric study analysis and segmentation; MRI, magnetic resonance imaging; PI-RADS, Prostate Imaging Reporting and Data System.

Download (167KB)
3. Fig. 2. Reference images with a schematically marked lesion in the peripheral zone of the left prostate lobe: a, T2-weighted image; b, apparent diffusion coefficient map.

Download (247KB)
4. Fig. 3. Consensus achievement algorithm.

Download (257KB)
5. Fig. 4. Calculation of the Dice–Sørensen similarity coefficient after completion of the algorithm: а, if both masks are selected by the experts (the original Dice–Sørensen similarity coefficient is retained); b, if only one of the two masks is selected; c, the newly created mask intersects with both previous masks; d, the newly created mask intersects with only one of the previous masks.

Download (92KB)
6. Fig. 5. Distribution of baseline Dice–Sørensen similarity coefficient values.

Download (66KB)
7. Fig. 6. Comparison of Dice–Sørensen similarity coefficient values before and after applying the consensus labeling algorithm: a, median comparison; b, association with PI-RADS category. PI-RADS, Prostate Imaging Reporting and Data System.

Download (182KB)

Copyright (c) 2025 Eco-Vector

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

СМИ зарегистрировано Федеральной службой по надзору в сфере связи, информационных технологий и массовых коммуникаций (Роскомнадзор).
Регистрационный номер и дата принятия решения о регистрации СМИ: серия ПИ № ФС 77 - 79539 от 09 ноября 2020 г.