Validation of expert system enhanced deep learning algorithm for automated screening for COVID-Pneumonia on chest X-rays

Validation Of Expert System Enhanced Deep Learning Algorithm For Automated Screening For COVID-Pneumonia On Chest X-Rays

Abstract

SARS-CoV2 pandemic exposed the limitations of artificial intelligence based medical imaging systems. Earlier in the pandemic, the absence of sufficient training data prevented effective deep learning (DL) solutions for the diagnosis of COVID-19 based on X-Ray data. Here, addressing the lacunae in existing literature and algorithms with the paucity of initial training data; we describe CovBaseAI, an explainable tool using an ensemble of three DL models and an expert decision system (EDS) for COVID-Pneumonia diagnosis, trained entirely on pre-COVID-19 datasets. The performance and explainability of CovBaseAI was primarily validated on two independent datasets. Firstly, 1401 randomly selected CxR from an Indian quarantine center to assess effectiveness in excluding radiological COVID-Pneumonia requiring higher care. Second, curated dataset; 434 RT-PCR positive cases and 471 non-COVID/Normal historical scans, to assess performance in advanced medical settings. CovBaseAI had an accuracy of 87% with a negative predictive value of 98% in the quarantine-center data. However, sensitivity was 0.66–0.90 taking RT-PCR/radiologist opinion as ground truth. This work provides new insights on the usage of EDS with DL methods and the ability of algorithms to confidently predict COVID-Pneumonia while reinforcing the established learning; that benchmarking based on RT-PCR may not serve as reliable ground truth in radiological diagnosis. Such tools can pave the path for multi-modal high throughput detection of COVID-Pneumonia in screening and referral.



For full paper: http://www.nature.com/articles/s41598-021-02003-w

Device for Assessing Knee Joint Dynamics During Magnetic Resonance Imaging

Device For Assessing Knee Joint Dynamics During Magnetic Resonance Imaging

Abstract

Background:

Knee assessment with and without load using magnetic resonance imaging (MRI) can provide information on knee joint dynamics and improve the diagnosis of knee joint diseases. Performing such studies on a routine MRI-scanner require a load-exerting device during scanning. There is a need for more studies on developing loading devices and evaluating their clinical potential.



Purpose:

Design and develop a portable and easy-to-use axial loading device to evaluate the knee joint dynamics during the MRI study.


Study Type:

Prospective study.


Subjects:

Nine healthy subjects.


Field Strength/Sequence:

A 0.25 T standing-open MRI and 3.0 T MRI. PD-T2-weighted FSE, 3D-fast-spoiled-gradient-echo, FS-PD, and CartiGram sequences.


Assessment:

Design and development of loading device, calibration of loads, MR safety assessment (using projectile angular displacement, torque, and temperature tests). Scoring system for ease of doing. Qualitative (by radiologist) and quantitative (using structural similarity index measure [SSIM]) image-artifact assessment. Evaluation of repeatability, comparison with various standing stances load, and loading effect on knee MR parameters (tibiofemoral bone gap [TFBG], femoral cartilage thickness [FCT], tibial cartilage thickness [TCT], femoral cartilage T2-value [FCT2], and tibia cartilage T2-value [TCT2]). The relative percentage change (RPC) in parameters due to the device load was computed.


Statistical Test:

Pearson’s correlation coefficient (r).



Results:


The developed device is conditional-MR safe (details in the manuscript and supplementary materials), 15 × 15 × 45 cm3 dimension, and <3 kg. The ease of using the device was 4.9/5. The device introduced no visible image artifacts, and SSIM of 0.9889 ± 0.0153 was observed. The TFBG intraobserver variability (absolute difference) was <0.1 mm. Interobserver variability of all regions of interest was <0.1 mm. The load exerted by the device was close to the load during standing on both legs in 0.25 T scanner with r > 0.9. Loading resulted in RPC of 1.5%–11.0%, 7.9%–8.5%, and −1.5% to 13.0% in the TFBG, FCT, and TCT, respectively. FCT2 and TCT2 were reduced in range of 1.5–2.7 msec and 0.5–2.3 msec due to load.



Data Conclusion:


The proposed device is conditionally MR safe, low cost (material cost < INR 6000), portable, and effective in loading the knee joint with up to 50% of body weight.



Evidence Level:


1


Technical Efficacy:


Stage 1



For full paper: http://https://onlinelibrary.wiley.com/doi/abs/10.1002/jmri.27877

Model for in-vivo estimation of stiffness of tibiofemoral joint using MR imaging and FEM analysis

Model For In-Vivo Estimation Of Stiffness Of Tibiofemoral Joint Using MR Imaging And FEM Analysis

Abstract

Background:

Appropriate structural and material properties are essential for finite-element-modeling (FEM). In knee FEM, structural information could extract through 3D-imaging, but the individual subject’s tissue material properties are inaccessible.


Purpose:

The current study\’s purpose was to develop a methodology to estimate the subject-specific stiffness of the tibiofemoral joint using finite-element-analysis (FEA) and MRI data of knee joint with and without load.


Methods:

In this study, six Magnetic Resonance Imaging (MRI) datasets were acquired from 3 healthy volunteers with axially loaded and unloaded knee joint. The strain was computed from the tibiofemoral bone gap difference (ΔmBGFT) using the knee MR images with and without load. The knee FEM study was conducted using a subject-specific knee joint 3D-model and various soft-tissue stiffness values (1 to 50 MPa) to develop subject-specific stiffness versus strain models.


Results:

Less than 1.02% absolute convergence error was observed during the simulation. Subject-specific combined stiffness of weight-bearing tibiofemoral soft-tissue was estimated with mean values as 2.40 ± 0.17 MPa. Intra-subject variability has been observed during the repeat scan in 3 subjects as 0.27, 0.12, and 0.15 MPa, respectively. All subject-specific stiffness-strain relationship data was fitted well with power function (R2 = 0.997).


Conclusion:

The current study proposed a generalized mathematical model and a methodology to estimate subject-specific stiffness of the tibiofemoral joint for FEM analysis. Such a method might enhance the efficacy of FEM in implant design optimization and biomechanics for subject-specific studies.

Trial registration The institutional ethics committee (IEC), Indian Institute of Technology, Delhi, India, approved the study on 20th September 2017, with reference number P-019; it was a pilot study, no clinical trail registration was recommended.



For full paper: http://link.springer.com/article/10.1186/s12967-021-02977-1

Automatic pre-population of normal chest x-ray reports using a high-sensitivity deep learning algorithm: a prospective study of clinical AI deployment (RPS1005b)

Purpose:

To evaluate a high-sensitivity deep learning algorithm for normal/abnormal chest x-ray (CXR) classification by deploying it in a real clinical setting.

Methods and materials:

A commercially available deep learning algorithm (QXR, Qure.ai, India) was integrated into the clinical workflow for a period of 3 months at an outpatient imaging facility. The algorithm, deployed on-premise, was integrated with PACS and RIS such that it automatically analysed all adult CXRs and reports for those which were determined to be “normal” were automatically populated in the RIS using HL7 messaging. Radiologists reviewed the CXRs as part of their regular workflow and ‘accepted’ or changed the pre-populated reports. Changes in reports were divided into ‘clinically insignificant’ and ‘clinically significant’ following which those CXRs with clinically significant changes were reviewed by a specialist chest radiologist with 8 years’ experience.

Results:

A total of 1,970 adult CXRs were analysed by AI, out of which 388 (19.69%) were identified to be normal. 361/388 (93.04%) of these were accepted by radiologists and in 14/388 (3.60%) clinically less significant changes (e.g. increased broncho-vascular markings) were made in reports. Upon review of the balance 13/388 (3.35%) CXRs, it was found that 12 had truly clinically significant missed findings by AI, including 3 with opacities, 3 with lymphadenopathy, 3 with blunted CP angle, 2 with nodules, and 1 with consolidation.

Conclusion:

This study shows that there is a great potential to automate the identification of normal CXRs to a great degree, with very high sensitivity.

Validation of a high precision semantic search tool using a curated dataset containing related and unrelated reports of clinically relevant search terms (RPS 1005b)

Purpose:

To validate a sematic search tool by testing the search results for complex terms.

Methods and materials:

The tool consists of two pipelines: an offline indexing pipeline and a querying pipeline. The raw text from both reports and queries were first passed through a set of pre-processing steps; sentence tokenisation, spelling correction, negation detection, and word sense disambiguation. It was transformed into a concept plane followed by indexing or querying. During querying, additional concepts were added using a query expansion technique to include nearby related concepts. The validation was done on a set of 30 search queries, carefully curated by two radiologists. The reports that are related to the search queries were randomly selected with the help of keyword search and the text was re-read to determine its suitability to the queries. These reports formed the \”related\” group. Similarly, the reports that were not exactly satisfying the context of the search queries were categorised as the \”not related\” group. A set of 5 search queries and 250 reports were used for tuning the model initially. A total of 500 reports of the 10 search queries formed the corpus of the test set. The search results for each test query were evaluated and appropriate statistical analysis was performed.

Results:

The average precision and recall rates on 10 unseen queries on a small corpus for respective queries containing related and unrelated reports were 0.54 and 0.42. On a larger corpus containing 60 K reports, the average precision for these 15 queries was 0.6.

Conclusion:

We describe a method to clinically validate a sematic search tool with high precision.

Estimating AI-generated Bias in Radiology Reporting by Measuring the Change in the Kellgren-Lawrence Grades of Knee Arthritis Before and After Knowledge of AI Results—A Multi-reader Retrospective Study

Estimating AI-Generated Bias In Radiology Reporting By Measuring The Change In The Kellgren-Lawrence Grades Of Knee Arthritis Before And After Knowledge Of AI Results—A Multi-Reader Retrospective Study​

PURPOSE:

To estimate the extent of bias generated by AI in the radiologists’ reporting of grades of osteoarthritis on Knee X-rays by observing the change in grading after the knowledge of predictions of a deep learning algorithm.

METHOD AND MATERIALS:

Anteroposterior views of 271 knee x-rays (542 joints) were randomly extracted from PACS and anonymized.
These x-rays were analyzed using DeepKnee, an open-source algorithm based on the Deep Siamese CNN
architecture that automatically predicts the presence of osteoarthritis on Knee X Rays on a 5 scale Kellgren and
Lawrence system (KL) along with an attention map. These x-rays were independently read by three sub-specialist MSK radiologists on the CARPL AI research platform (CARING Research, India). The KL grade for each Xray was recorded by the radiologists, following which the AI algorithm grade was shown, and radiologists given the option to change their result. The pre-AI result and post-AI results were both recorded. The change in the scores of all three readers was calculated and modulus of change in the score was estimated using the
incongruence rate. The consensus shift before and after the knowledge of the AI results was also estimated.

RESULTS:

There were a total of 542 knee joints that were analyzed by the algorithm and read by the three radiologists giving total 1,626 “instances”. There were 139 instances (8.5%) of readers changing their results. The number of shifts was 13,44, 31, 32 & 19 for grades 0 to 4 respectively. The reader1, reader2, reader3 changed their estimations in 52 (single shift), 34 (single shift), 53 (50 single shift, 2 two shifts, 1 three shift). The intra-reader incongruence rates were 9.6%, 6.3% and 9.8 % respectively. The Krippendorff’s alpha among the readers before knowledge and after knowledge AI results was 0.84 & 0.87 implying minimal convergence towards AI results. Three-reader, two-reader, and no consensus were found in 219, 296, and 27 cases before and 248, 279, and 15 cases after knowledge of AI results (see Figure 1).


Figure 1

CONCLUSION:

We demonstrate that there is a tendency of readers to converge towards AI results which, as expected, occurs more often in the ‘middle’ or ‘median’ grades rather than the extremes of grade.

CLINICAL RELEVANCE/APPLICATION:

With an increase in the number and variety of AI applications in radiology, it is important to consider the extent and relevance of the behavior-modifying effect of AI algorithms on radiologists.

Can AI Help Read Pediatric Chest X-rays? An independent Evaluation on 3,000+ Scans

Can AI Help Read Pediatric Chest X-Rays? An Independent Evaluation On 3,000+ Scans

PURPOSE:

To evaluate the performance of a commercially available deep learning-based AI algorithm on pediatric chest X-rays (CXRs).

METHOD AND MATERIALS:

3,319 frontal (PA and AP) CXRs of patients’ aged 6 to 18 years were pulled from PACS and anonymised at a tertiary care pediatric hospital in Brazil. Labels (normal, abnormal) were ascertained from the radiology reports. The data was loaded on to CARPL AI Research platform (CARING Research, India) for AI inference and validation-related statistical analysis. The algorithm under test was QXR Version 3.0 (Qure.ai, India). The algorithmic output consisted of three categories – “normal”, “abnormal” and “to be read”. The “to be read” scans,
which refer to cases where the scans are meant to be read by a radiologist directly, were excluded from calculation of summary statistics. False negative scans were re-read by a specialized pediatric radiologist with 6 years of experience.

RESULTS:

Out of the 3,319 cases, 1,802 were labeled as “to be read” and excluded from analysis. On the remaining 1,517 cases the algorithm gave a sensitivity of 91% and specificity of 96%. The 38 false negatives were reviewed and only 9 truly missed findings existed out of which 7 cases had consolidation, 1 had atelectasis and 1 had vascular engorgement.


Figure 1


CONCLUSION:

Our independent evaluation provides evidence of AI’s ability to accurately read and triage normal pediatric CXRs thereby saving significant time and effort on part of radiologists.

CLINICAL RELEVANCE/APPLICATION:

Most AI algorithms are trained on adult data and hence have poor performance on pediatric cases where lack of trained radiologists is a constant problem, especially in the developing and underdeveloped world.

High throughput detection and genetic epidemiology of SARS-CoV-2 using COVIDSeq next generation sequencing

Abstract

The rapid emergence of coronavirus disease 2019 (COVID-19) as a global pandemic affecting millions of individuals globally has necessitated sensitive and high-throughput approaches for the diagnosis, surveillance and for determining the genetic epidemiology of SARS-CoV-2. In the present study, we used the COVIDSeq protocol, which involves multiplex-PCR, barcoding and sequencing of samples for high-throughput detection and deciphering the genetic epidemiology of SARS-CoV-2. We used the approach on 752 clinical samples in duplicates, amounting to a total of 1536 samples which could be sequenced on a single S4 sequencing flow cell on NovaSeq 6000. Our analysis suggests a high concordance between technical duplicates and a high concordance of detection of SARS-CoV-2 between the COVIDSeq as well as RT-PCR approaches. An in-depth analysis revealed a total of six samples in which COVIDSeq detected SARS-CoV-2 in high confidence which were negative in RT-PCR. Additionally, the assay could detect SARS-CoV-2 in 21 samples and 16 samples which were classified inconclusive and pan-sarbeco positive respectively suggesting that COVIDSeq could be used as a confirmatory test. The sequencing approach also enabled insights into the evolution and genetic epidemiology of the SARS-CoV-2 samples. The samples were classified into a total of 3 clades. This study reports two lineages B.1.112 and B.1.99 for the first time in India. This study also revealed 1,143 unique single nucleotide variants and added a total of 73 novel variants identified for the first time. To the best of our knowledge, this is the first report of the COVIDSeq approach for detection and genetic epidemiology of SARS-CoV-2. Our analysis suggests that COVIDSeq could be a potential high sensitivity assay for detection of SARS-CoV-2, with an additional advantage of enabling genetic epidemiology of SARS-CoV-2.

Link: https://www.biorxiv.org/content/10.1101/2020.08.10.242677v1

Clinical Explainability Failure (CEF) & Explainability Failure Ratio (EFR): changing the way we validate classification algorithms?

Abstract

Adoption of Artificial Intelligence (AI) algorithms into the clinical realm will depend on their inherent trustworthiness, which is built not only by robust validation studies but is also deeply linked to the explainability and interpretability of the algorithms. Most validation studies for medical imaging AI report performance of algorithms on study level labels and lay little emphasis on measuring the accuracy of explanations generated by these algorithms in the form of heat maps or bounding boxes, especially in true positive cases. We propose a new metric, Explainability Failure Ratio (EFR), derived from Clinical Explainability Failure (CEF) to address this gap in AI evaluation. We define an Explainability Failure as a case where the classification generated by an AI algorithm matches with study level ground truth but the explanation output generated by the algorithm is inadequate to explain the algorithms output. We measured EFR for two algorithms that automatically detect consolidation on chest X rays to determine the applicability of the metric and observed a lower EFR for the model that had lower sensitivity for identifying consolidation on chest X rays, implying that trustworthiness of a model should be determined not only by routine statistical metrics but also by novel clinically oriented models.

Link: https://www.medrxiv.org/content/10.1101/2020.08.12.20169607v1

Giant glomus tumor of the knee mimicking soft-tissue sarcoma

ABSTRACT

Glomangiomas (glomus tumors) are benign vascular tumors commonly located at the distal extremities, are usually subungual lesions, and account for 2% of all soft-tissue tumors. Patients with digital glomus tumors present with hypersensitivity to cold, paroxysmal severe pain, and point tenderness. These tumors are infrequent in the knee area, and when seen are superficial, usually have a diameter of less than 1 cm, which make their radiological diagnosis arduous. We report a noteworthy, unusual case of a large glomus tumor in the popliteal fossa showing biceps femoris infiltration, in a 51-year-old female patient who experienced severe intermittent posterior knee pain for the past 2 years. Magnetic resonance imaging revealed a large popliteal inhomogeneous soft-tissue lesion with irregular margins insinuating the posterolateral musculature mimicking soft-tissue sarcoma. Histopathology revealed a glomus tumor.
Keywords: Glomus tumor, Soft-tissue sarcoma, Glomangioma, Knee

INTRODUCTION

Glomus tumor is usually a benign neoplasm. It is a perivascular mesenchymal tumor arising from the glomus body which is a thermoregulatory apparatus within the dermis. It is most commonly seen in subungual region of the finger but may occur anywhere. It is one of the characteristic fingertip masses that can be diagnosed with magnetic resonance imaging (MRI). These lesions occur in adults aged 20–40 years with subungual lesions showing female predominance. These tumors are infrequent in the knee area, and when seen are superficial, usually subcentimetric.

For full paper – https://mss-ijmsr.com/giant-glomus-tumor-of-the-knee-mimicking-soft-tissue-sarcoma/