Purpose or Learning Objective:
The purpose of this study was to evaluate the diagnostic performance of a commercially available artificial intelligence (AI) system in detecting brain pathologies on non-contrast head CT scans, using radiologist consensus as the reference standard. While early iterations of such software were primarily designed for simple triage, often limited to flagging a single condition like intracranial hemorrhage, modern algorithms have evolved into comprehensive diagnostic support tools capable of identifying a broad spectrum of "critical" brain pathologies. AI systems capable of detecting multiple brain pathologies on non-contrast CT can improve diagnostic speed and accuracy, particularly in emergency settings. By automating critical acute finding detection, these tools reduce reporting delays, support radiologists, and enhance care delivery in high-demand or resource-constrained hospitals. Although sensitivity for chronic abnormalities is lower, post-deployment optimisation and threshold adjustments can increase reliability, broadening the system’s clinical utility.
Methods or Background:
This retrospective validation study utilized a dataset of 67 anonymized non-contrast head CT (NCCT) studies. The AI model (BrainScan) was evaluated on its ability to detect 15 distinct brain pathologies, categorized into four primary etiological groups:
Hemorrhagic: (e.g., Intraparenchymal hemorrhage, Subarachnoid hemorrhage, Subdural hematoma, Epidural hematoma, etc.).
Vascular: (e.g., Ischemic stroke/infarcts etc.).
Traumatic:
Structural/Mass Effect: (e.g., Midline shift, Mass effect, Hydrocephalus).
Ground truth was established through the blinded consensus of two radiologists.
Diagnostic Accuracy: Calculated via Sensitivity and Specificity.
Predictive Values: Positive Predictive Value (PPV) and Negative Predictive Value (NPV) were computed to estimate probability in a clinical setting.
Global Performance:
F1-Score: Used to evaluate the balance between precision and recall.
Matthews Correlation Coefficient (MCC): Selected as a robust metric for binary classification, particularly reliable even if classes were imbalanced (e.g., rare pathologies).
Area Under the Receiver Operating Characteristic Curve (AUC-ROC): Calculated to assess the model's aggregate performance across valid decision thresholds.
Statistical Analysis
Statistical significance of the differences in binary classification performance was assessed using McNemar’s test.
Results or Findings:
The AI system showed a mean sensitivity of 0.61, specificity of 0.90, and an area under the curve of approximately 0.88. Top-performing categories included intraventricular haemorrhage, cerebral contusion, oedema, subarachnoid haemorrhage and tumors with hemorrhagic component , with an area under the curve ≥ 0.95. McNemar’s test showed no significant difference (p > 0.05) between AI predictions and radiologist consensus, indicating comparable overall diagnostic performance. Positive predictive value varied (mean: 0.57), highlighting the importance of radiologist
verification for positive AI outputs. Subgroup analysis revealed stronger AI performance on acute pathologies (accuracy approximately 0.90, sensitivity approximately 0.85, specificity approximately 0.94) compared to chronic conditions (accuracy approximately 0.82, sensitivity approximately 0.61, specificity approximately 0.85). False negative analysis
showed that most missed cases occurred in chronic subdural haemorrhages. Lower sensitivity in chronic conditions was attributed to subtle imaging features, small lesion sizes, and the AI system’s high decision threshold, which optimised specificity but reduced sensitivity.
Conclusion:
The AI system demonstrated strong diagnostic performance across acute and chronic brain pathologies, with high specificity and strong agreement with radiologist consensus. It performed well in detecting critical acute conditions, such as cerebral contusion and subarachnoid haemorrhage, making it a valuable tool for triage or second-read in emergency settings. The system’s lower sensitivity for chronic pathologies highlights areas for improvement, particularly in detecting subtle findings. Optimising the decision threshold post-deployment could improve detection of chronic abnormalities. Compared to previous studies, this AI system represents a significant advancement in neuroimaging, with the potential to enhance diagnostic speed, accuracy, and patient care in high-volume, resource-constrained environments. Furthermore, the study sought to investigate the system's potential as a dual-purpose adjunctive aid that enhances both clinical workflow and resident education. In the high-pressure environment of emergency radiology, the AI acts as an "always-on" second reader, prioritizing critical cases on the worklist to ensure faster detection and reduced turnaround times for life-threatening conditions. Beyond efficiency, this integration serves a vital educational function for radiology residents and trainees.
Limitations: The study utilized a dataset of only 67 non-contrast head CT studies, which is a very small sample for validating an AI model intended to detect 15 distinct pathologies. This limits the statistical power and the generalizability of the results to a broader population. As a retrospective study, it evaluates the AI on past data rather than in a live clinical workflow. This may not fully reflect real-world challenges such as technical artifacts, varying scan qualities, or real-time integration issues. We will require assessment in an active clinical workflow for optimal results.
References:
1. Mansour, R.F., Escorcia-Gutierrez, J., Gamarra, M. et al. Artificial intelligence with big data analytics-based brain intracranial hemorrhage e-diagnosis using CT images. Neural Comput & Applic 35, 16037–16049 (2023). https://doi.org/10.1007/s00521-021-06240-y
2. Gilotra K, Sujith S, Racheed M, Jade B, Dashti R. Role of artificial intelligence and machine learning in the diagnosis of cerebrovascular disease. Front Hum Neurosci. 2023;17. doi:10.3389/fnhum.2023.1254417
3. Yadav, J., More, A., Ghosh, B., Sinha, D., Chavane, N., Kumari, A., Datta, A., Borah, A. and Bhattacharya, P. (2025), Implications of Artificial Intelligence in Stroke Intervention and Care. iRADIOLOGY, 3: 115-131.https://doi.org/10.1002/ird3.70005
4. AbuAlrob MA, Mesraoua B. Harnessing artificial intelligence for the diagnosis and treatment of neurological emergencies: a comprehensive review of recent advances and future directions. Front Neurol. 2024;15:1485799.doi:10.3389/fneur.2024.1485799
5. Philip AK, Samuel BA, Bhatia S, Khalifa SAM, El-Seedi HR. Artificial Intelligence and Precision Medicine: A New Frontier for the Treatment of Brain Tumors. Life. 2023; 13(1):24. https://doi.org/10.3390/life13010024
6. Vimalesvaran K, Robert D, Kumar S, et alAssessing the effectiveness of artificial intelligence (AI) in prioritising CT head interpretation: study protocol for a stepped-wedge cluster randomised trial (ACCEPT-AI)BMJ Open 2024;14:e078227. doi: 10.1136/bmjopen-2023-078227
7. Jiang, Bin, et al. "Assessing the performance of artificial intelligence models: insights from the American Society of Functional Neuroradiology Artificial Intelligence Competition." American Journal of Neuroradiology 45.9 (2024): 1276-1283.