Automatic pre-population of normal chest x-ray reports using a high-sensitivity deep learning algorithm: a prospective study of clinical AI deployment (RPS1005b)

Purpose:

To evaluate a high-sensitivity deep learning algorithm for normal/abnormal chest x-ray (CXR) classification by deploying it in a real clinical setting.

Methods and materials:

A commercially available deep learning algorithm (QXR, Qure.ai, India) was integrated into the clinical workflow for a period of 3 months at an outpatient imaging facility. The algorithm, deployed on-premise, was integrated with PACS and RIS such that it automatically analysed all adult CXRs and reports for those which were determined to be “normal” were automatically populated in the RIS using HL7 messaging. Radiologists reviewed the CXRs as part of their regular workflow and ‘accepted’ or changed the pre-populated reports. Changes in reports were divided into ‘clinically insignificant’ and ‘clinically significant’ following which those CXRs with clinically significant changes were reviewed by a specialist chest radiologist with 8 years’ experience.

Results:

A total of 1,970 adult CXRs were analysed by AI, out of which 388 (19.69%) were identified to be normal. 361/388 (93.04%) of these were accepted by radiologists and in 14/388 (3.60%) clinically less significant changes (e.g. increased broncho-vascular markings) were made in reports. Upon review of the balance 13/388 (3.35%) CXRs, it was found that 12 had truly clinically significant missed findings by AI, including 3 with opacities, 3 with lymphadenopathy, 3 with blunted CP angle, 2 with nodules, and 1 with consolidation.

Conclusion:

This study shows that there is a great potential to automate the identification of normal CXRs to a great degree, with very high sensitivity.

Validation of a high precision semantic search tool using a curated dataset containing related and unrelated reports of clinically relevant search terms (RPS 1005b)

Purpose:

To validate a sematic search tool by testing the search results for complex terms.

Methods and materials:

The tool consists of two pipelines: an offline indexing pipeline and a querying pipeline. The raw text from both reports and queries were first passed through a set of pre-processing steps; sentence tokenisation, spelling correction, negation detection, and word sense disambiguation. It was transformed into a concept plane followed by indexing or querying. During querying, additional concepts were added using a query expansion technique to include nearby related concepts. The validation was done on a set of 30 search queries, carefully curated by two radiologists. The reports that are related to the search queries were randomly selected with the help of keyword search and the text was re-read to determine its suitability to the queries. These reports formed the \”related\” group. Similarly, the reports that were not exactly satisfying the context of the search queries were categorised as the \”not related\” group. A set of 5 search queries and 250 reports were used for tuning the model initially. A total of 500 reports of the 10 search queries formed the corpus of the test set. The search results for each test query were evaluated and appropriate statistical analysis was performed.

Results:

The average precision and recall rates on 10 unseen queries on a small corpus for respective queries containing related and unrelated reports were 0.54 and 0.42. On a larger corpus containing 60 K reports, the average precision for these 15 queries was 0.6.

Conclusion:

We describe a method to clinically validate a sematic search tool with high precision.