Purpose or Learning Objective:
To evaluate and compare the performance of two AI tools for intracranial hemorrhage (ICH) detection on a large retrospective dataset, with emphasis on ensemble and sequential strategies. This was performed on a commercially available triage tool (prioritizing specificity and workflow efficiency) and a quantitative research algorithm (prioritizing high sensitivity and volumetric analysis).
Crucially, this study aims to determine what the recommended workflow strategy would be which includes combining these models via logical operators can surmount the inherent "sensitivity-specificity trade-off" of single-model systems. By investigating Parallel (Boolean 'OR') and Sequential (Multi-stage Pipeline) architectures, the study seeks to establish a clinical integration framework that maximizes the detection of subtle, life-threatening hemorrhages while simultaneously minimizing false positives to reduce radiologist alert fatigue.
Methods or Background:
A total of 1,152 non-contrast computed tomography (NCCT) head studies were retrospectively collected and analyzed to form the dataset. To ensure the assessment focused on acute pathology, 160 cases of chronic hemorrhage were excluded from the study. The remaining studies were processed using two distinct models: AI 1, the CINA-ICH v1.0.10 commercial triage tool, and AI 2, the CINA-Quantix quantitative research tool. Ground truth labels were established by board-certified radiologists, with any discordant cases resolved through a consensus process to ensure high diagnostic accuracy.
The AI outputs were stratified by hemorrhage volume, categorizing results into no hemorrhage, hemorrhage ≥5 ml, and small-volume hemorrhage <5 ml. Furthermore, a secondary review was conducted to analyze the specific causes of False Positives (FP) and False Negatives (FN). The core of the methodology involved evaluating two ensemble approaches: an "OR" strategy, where a case was flagged if either tool detected a hemorrhage, and a "Sequential" pipeline, which applied the high-specificity AI 1 first and, only if positive, utilized the high-sensitivity AI 2 for confirmation and quantification.
Results or Findings:
The analysis of error modes revealed distinct patterns for each tool. False Negative cases (n=32) were predominantly associated with subtle hemorrhages (38%), infarcts with hemorrhagic transformation (28%), and post-operative cases (16%). Conversely, False Positive cases (n=72) were largely driven by mimics such as hyperdense brain masses (31%) and post-operative changes (17%) which the algorithms struggled to differentiate from acute blood. regarding performance metrics, the standalone tools displayed characteristic trade-offs: AI 1 achieved an accuracy of 0.774 with a specificity of 0.845, but a lower sensitivity of 0.684. In contrast, AI 2 demonstrated superior sensitivity at 0.928 but suffered from a lower specificity of 0.518 and a Positive Predictive Value (PPV) of 0.609. The application of ensemble strategies yielded significant insights. The Parallel (OR) strategy successfully maximized sensitivity to 0.93, ensuring
fewer missed cases, but failed to improve specificity (0.51). However, the Sequential pipeline approach proved to be the superior framework, significantly enhancing overall performance with an accuracy of 0.85, a PPV of 0.82, a sensitivity of 0.99, and a specificity of 0.63. The primary source of error for AI 1 remained subtle hemorrhages, whereas AI 2 was most frequently confounded by hyperdense lesions mimicking hemorrhage.
Conclusion:
Both AI tools demonstrated robust performance within their respective design parameters, yet neither functioned as a perfect standalone solution. The quantitative tool (AI 2) exhibited superior sensitivity essential for detecting volume, while the triage tool (AI 1) provided the high specificity necessary for efficient workflow management.
The study conclusively demonstrates that a Sequential application strategy using the tools in a pipeline rather than in isolation or parallel substantially improves diagnostic accuracy and drastically reduces false positives. This suggests that the clinical integration of such multi-stage pipelines can optimize ICH detection by balancing patient safety with radiologist efficiency. However, challenges persist, particularly in the reliable detection of very subtle or small-volume hemorrhages and the differentiation of hemorrhage from hyperdense mimics, indicating areas for future algorithmic refinement.
Limitations: The framework of usage of sequential AI will require stringent approval for regulatory clearances. There may be bias amplification introduced while combining separate frameworks.
References:
1. Takala J, Peura H, Pirinen R, et al. High sensitivity in spontaneous intracranial hemorrhage detection from emergency head CT scans using ensemble-learning approach. Sci Rep. 2025;15(1):29919. Published 2025 Aug 15. doi:10.1038/s41598-025-15835-7
2. Khoruzhaya AN, Sakharova PA, Arzamasov KM, Kremneva EI, Burenchev DV, Erizhokov RA, Omelyanskaya OV, Vladzymyrskyy AV, Vasilev YA. Standalone AI Versus AI-Assisted Radiologists in Emergency ICH Detection: A Prospective, Multicenter Diagnostic Accuracy Study. Journal of Clinical Medicine. 2025; 14(16):5700. https://doi.org/10.3390/jcm14165700
3. Weissflog, J.S., Keller, E.J., Neymeyer, M.L. et al. Systematic review of commercial artificial intelligence tools for the detection and volume quantification in intracerebral hemorrhage. Eur Radiol (2025). https://doi.org/10.1007/s00330-025-11834-4
4. Alhasan MS, Azzam AY, Alhasan AS, et al. Diagnostic performance and clinical applications of artificial intelligence for intracranial bleeding detection: a meta-analysis. Brain Spine. 2025;5:105866. doi:10.1016/j.bas.2025.105866
5. Agarwal, S., Wood, D., Grzeda, M. et al. Systematic Review of Artificial Intelligence for Abnormality Detection in High- volume Neuroimaging and Subgroup Meta-analysis for Intracranial Hemorrhage Detection. Clin Neuroradiol 33, 943–956 (2023). https://doi.org/10.1007/s00062-023-01291-1
6. Matsoukas, S., Scaggiante, J., Schuldt, B.R. et al. Accuracy of artificial intelligence for the detection of intracranial hemorrhage and chronic cerebral microbleeds: a systematic review and pooled analysis. Radiol med 127, 1106–1123 (2022). https://doi.org/10.1007/s11547-022-01530-4
7. Supriyadi, M., Samah, A., Muliadi, J. et al. A systematic literature review: exploring the challenges of ensemble model for medical imaging. BMC Med Imaging 25, 128 (2025). https://doi.org/10.1186/s12880-025-01667-4
8. Aryendu and Y. Wang, "RAIDER: Rapid AI Diagnosis at Edge Using Ensemble Models for Radiology," in IEEE Access, vol. 12, pp. 115546-115560, 2024, doi: 10.1109/ACCESS.2024.3444601. keywords: {Diseases;COVID-19;X-ray imaging;Medical diagnostic imaging;Radiology;Lungs;Data models;Biomedical imaging;Edge computing;Ensemble learning;Metalearning;Pulmonology;Chest x-ray;medical image diagnosis;edge computing;ensemble learning;meta- learning},