Comparison of 90 and 180 Micron Resolution Cone Beam CT Scans in Patients with Artefact Causing Root Canal Filling Material

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

Many dental patients have root-canal filled teeth which cause artefacts on a Cone Beam CT (CBCT) scan. We studied the effect of voxel size on image quality of CBCT scans in presence and absence of root-canal filling material.

Methods and Materials

CBCT scans of 30 patients, 15 with (group-1) and 15 without (group-2) root-canal filling,having both 90u and 180u voxel size were obtained. In group-1, patient scans had root canal filling material & no other artefact causing material in field of view. In group-2, patient scans had no artefact causing material. In all scans, CNR (contrast-to-noise ratio) was calculated by determining the mean and standard deviation of grey values on DICOM data, adjacent to root canal filling material(group-1)/central region of scan volume (group-2) and also determing the grey values at edge of the scan volume (both groups). CNRs of the scans were analysed by Student t-test.

Result

Significant difference (paired t-test, p=0.018) was seen in CNR between the 90u and 180u scans of the same patient in group-1. Significant difference (p=0.135) was not seen in CNR between the 90 and 180u scans of the same patient in group-2.

Conclusion

In presence of root-canal filling material, the image quality of 90 microns scan is significantly better than that of 180 microns scan. However, in absence of artefact causing material, the image quality is not affected significantly by voxel size, even though it is better in 90 microns scan.

Are radiologists’ bad teachers for AI algorithms? – Differences in the interobserver variability between consensus-defined labelling and free labelling of NIH Chestxray14 dataset

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

To assess differences in interobserver variability before and after a consensus-based definition of the NIH Chestxray14 dataset labels.

Methods and Materials

We randomly extracted 800 x-rays from the NIH chestxray14 dataset. They were read by three radiologists with more than ten years’ experience. Of the14 NIH labels, atelectasis, consolidation and pneumonia were clubbed
under ‘opacity’. The other labels were used ‘as is’.The study was divided into two parts. During the first part, the radiologist assigned the labels for 400 x-rays based on their prior domain knowledge. In the next part, the labels were defined on the remaining 400 cases, post-consensus. The interobserver variability was assessed (Fleiss bounds) via the Krippendorff\’s alpha coefficient corrected for chance.

Results

The interobserver variability between free and consensus labelling did not vary in general. Opacity, pneumothorax, effusion, nodule mass, ‘no finding’ were in the ‘fair to good’ (0.41-0.75) range, while infiltration, emphysema, fibrosis and pleural thickening were in the ‘poor’ (<0.40) bound in both tests. Interestingly effusion and cardiomegaly labelling worsened to ‘poor’ post-consensus. Significantly, no label was in the ‘very good’ (>0.75) category.

Conclusion: Our assessment of the Chestxray14 dataset suggests no label has ‘very good’ agreement both in free and consensus-based labelling. There is evidence to support instances where specific labels might require stricter expert definitions. In conclusion, AI training is advised only on labelling with alpha close to 0.75 (example ‘Normal’ vs ‘Abnormal’ or ‘pneumothorax’) when employing a purely image-based interobserver metric as ground truth.

Automated multiregional Prostatesegmentation in Magnetic Resonance using deeply supervised Convolutional Neural Networks

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

A CNN-based automatic prostate segmentation method is proposed, aiming to identify and differentiate central transitional and peripheral prostate glands as well as seminal vesicles

Methods and Materials

A total of 131 axial T2-weighted MR prostate examinations were acquired in different 3T machines and with different acquisition protocols. The central and peripheral glands and the seminal vesicles were manually labelled in all the acquired T2-weighted series by an expert to train the models. A deeply supervised U-Net based architecture was used to train this network with the Dice score coefficient as cost function and Adam as optimization algorithm. To maximize the performance of the CNN, a Cyclic Learning Rate was used during the training stage. Also, Image Processing algorithms were used to further refine the predicted segmentation masks during inference.

Results

The clinical validation was performed on a different set of 25 T2-weighted cases from an external centre which was not part of the training set. The segmentation results from the network were compared and corrected by an expert radiologist to match best truth. Finally, the Dice score coefficient between the model’s predictions and the expert corrected masks was calculated. The scores for the central-transitional gland, peripheral gland, seminal vesicles and background were 0.92±0.03, 0.90±0.05, 0.91±0.05, and 0.99±0.00, respectively.

Conclusion

Fully automated multiregional segmentation of the prostate gland and seminal vesicles can be addressed by deeply supervised CNN. This step will help localizing prostatic lesions and characterizing the pattern of prostatic enlargement.

Time to replace T2-STIR with Diffusion-Weighted imaging for visualisation of nerve disorders?

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

To determine if diffusion-weighted imaging of nerves can provide additional information or change the diagnosis compared to T2/STIR imaging

Methods and Materials

88 MRI scans (48 lumbar plexus, 24 brachial plexus, 16 peripheral nerve) performed on a 3.0T MR750w (GE Healthcare, USA) for to assess nerve damage/disease were extracted from PACS, anonymised and segregated into scans with T2w STIR (2.5-5mm/0-1mm gap; TR-5150ms, TE-48ms, TI-187-188ms, bandwidth-62.6 kHz, FOV 24×24, matrix- 288×128) Structural Nerve Imaging (SNI) and Diffusion-Weighted (4mm/1mm overlap, TR-8743, TE-63.5,TI-249.5,Bandwidth-250kHz, FOV- 40x40cm, matrix-64×128) Functional Nerve Imaging (FNI). Scans were read by a senior specialist radiologist (16 years experience). Visibility of focal signal abnormality, diffuse signal abnormality, nerve fibre continuity and muscular changes were recorded in each scan (present vs absent). Results from both sets of images were compared and the value of functional nerve imaging evaluated.

Results

Overall, FNI changed the diagnosis in 58 (66%) cases compared to SNI. The number of cases where the diagnosis changed in lumbar plexus, brachial plexus and peripheral nerves were 36 (75%), 16 (67%) and 5 (31%) respectively. Findings visualised on FNI but not on SNI were focal signal abnormalities (24/27%) and diffuse signal abnormalities (39/45%). Nerve fibre continuity and muscular changes appeared similar on both with no change reported in 82 (93%) and 85 (97%) scans.

Conclusion

Diffusion-Weighted imaging of nerves, especially for visualisation of the lumbar and brachial plexus, on 3.0T MR, adds to the diagnosis and could replace T2-STIRw structural imaging

Opening the “Black Box” – Radiological Insights into a Deep Neural Network for Lung Nodule Characterisation

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

To explain predictions of a deep residual convolutional network for characterization of lung nodule by analyzing heat
maps

Methods and Materials

A 20-layer deep residual CNN was trained on 1245 Chest CTs from NLST trial to predict the malignancy risk of a nodule. We used occlusion to systematically block regions of a nodule and map drops in malignancy risk score to generate clinical attribution heatmaps on 160 nodules from LIDC-IDRI dataset, which were analysed by a thoracic radiologist. The features were described as heat inside nodule (IH)-bright areas inside nodule, peripheral heat (PH)-continuous/interrupted bright areas along nodule contours, heat in adjacent plane(AH)-brightness in scan planes juxtaposed with the nodule, satellite heat (SH)- a smaller bright spot in proximity to nodule in the same scan plane, heat map larger than nodule (LH)-bright areas corresponding to the shape of the nodule seen outside the nodule margins and heat in calcification (CH)

Results

These six features were assigned binary values. This feature vector was fed into a standard J48 decision tree with 10-fold cross-validation, which gave an 85 % weighted classification accuracy with a 77.8 %TP rate, 8% FP rate for benign cases and 91.8% TP and 22.2 %FP rates for malignant cases. IH was more frequently observed in nodules classified as malignant whereas PH, AH, and SH were more commonly seen in nodules classified as benign.

Conclusion

We discuss the potential ability of a radiologist to visually parse the deep learning algorithm generated \’heat map\’ to identify features aiding classification

Evaluating variability of T2 values of the cartilage, menisci and muscles around knee joint on CartiGram sequence at 1.5 T and 3.0 T MR

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

To evaluate the temporal and inter magnetic field strength variability of T2 values of the cartilage, menisci and muscles of the knee joint using healthy volunteer data.

Methods and Materials

T2 CartiGram (CG) sequence of knee joint was performed on Four healthy asymptomatic volunteers (age 32±3 years) on 3.0T and 1.5T (GE Healthcare) MRI scanners in addition to the standard MRI. In the first part of the study, for evaluating the same day variability at 1.5T and 3.0T, CG was performed twice with a break of 5 minutes and subject lying in the same position inside the scanner. In the second part, for evaluating inter-day variability on the same scanner, CG was performed twice with a gap of 1 month on 3.0T scanner. From the T2 map, mean values and coefficient of variation were calculated.

Results

The intra-day coefficient of variations for the lateral and medial side were 0.49%, 0.77% for the cartilage, 1.57%, 1.60% for muscle and 2.2%, 2.7% for meniscus on 1.5 T whereas the similar values were 0.69 %, 0.6 % for cartilage, 0.66 %, 1.2 % for muscle and 3.0 and 1.8 for meniscus on 3 T.The temporal coefficient of variations for the lateral and medial side were 3.2%, 4.7% for the cartilage, 1.17%, 0.46% for muscle and 7.8%, 4.3 % for meniscus.

Conclusion

The intra-day variability of the T2 values was lowest for the cartilage on 1.5 T and 3T scanner whereas the temporal variability was lowest for the muscles.

FuzzyPACS: Linking Large Unorganised Image and Report Databases for Development and Validation of Deep Learning Algorithms

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

Developing and validation of Deep Learning (DL) algorithms for medical imaging requires access to large organised datasets of images and their corresponding reports. Currently, most medical imaging data in the world is unorganised and requires images and text reports to be manually linked. An approach for linking medical iamges and reports of patients, where no unique identifier for linking them exists, is presented.

Methods and Materials

A dicom image database of 311,694 studies and a separate MySQL database with 296,938 reports needed to be matched at study level. No unique identifier existed to link the two databases and not all reports had matching images, and there was only partial overlap between the databases. Additionally, patient names were inexactly entered with varied formats in the two databases making direct matching impossible. Fuzzywuzzy Python library, which incorporates fuzzy string matching, a technique based on Levenshtein Distance between string to estimate text similarity, was used to match patient name in the two databases following date and modality level filters. Four fuzzy matching techniques (simple, partial, token-set and token-sort ratios) were evaluated.

Result

Simple, partial, token-set and token-sort ratios gave 4.56%, 46.45%, 57.37% and 7.97% matches of reports respectively with 95% match confidence. Token set ratio, which had the highest match percentage, matched 170,336 reports to their corresponding studies.

Conclusion

Fuzzy matching is a promising technique to merge independent datasets withoutunique identifiers, saving thousands of man-hours, critical for development and validation of DL algorithms.

Why Guidelines are Important? Inter observer variability in assessing MRI signs of Endometriosis between reader following ESUR guidelines versus reader using prior domain knowledge

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

To assess the interobserver variability in identifying the MRI signs of endometriosis as described by European society of urogenital radiology.

Methods and Materials

This retrospective study included 77 randomly selected cases of endometriosis diagnosed on MRI. The cases were pulled from PACS, anonymized and assigned to two radiologists with 16 years and 11 years of experience in body imaging. The radiologists identified the presence or absence of following signs in each case: Retroflexion of uterus, torus uterinus involvement, Uterosacral ligament thickening, tethering of rectum, ovaries adherent to uterus, posterior position of ovaries, cluster Of haemorrhagic cysts in ovaries, thickened round ligament, uterus and ovaries posterior to inter-ischial line, T2 Shading, restricted diffusion, hematosalpinx, vaginal vault pulled-up, bladder involvement and superficial peritoneal implants.Radiologist 1 followed the definitions of the ESUR guidelines whereas Radiologist 2 assigned the labels based on prior domain knowledge. Interobserver variability was assessed (Fleiss bounds) via the Krippendorff\’s alpha coefficient corrected for chance.

Results

The interobserver variability ranged from ‘fair to good’ (Fleiss bounds, 0.41-0.75) in 9 labels (retroflexion, ovaries adherent to uterus, posterior position of ovaries, haemorrhagic cluster cyst in ovaries, T2-shading, hematosalpinx, pulling up of vaginal vault, bladder involvement, superficial peritoneal implants). Five labels (torus uterinus involvement, uterosacral ligament thickening, tethering of rectum, thickened round ligament, uterus and ovaries posterior to inter-ischial line, restricted diffusion) were deemed ‘poor’ (<0.40)

Conclusion

We identified the MRI signs of endometriosis having high inter-observer variability. There is scope for improvement in the agreement if the guidelines are followed universally

Combined traditional image processing and deep learning approach for automated detection of tears of the anterior cruciate ligament- Is it the game changer for AI in Musculoskeletal MRI?

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

We propose a novel ensemble-approach of traditional image processing combined with deep learning to detect ACL tear on knee MRI.

Methods and Materials

FSPD knee MRI images of 66 patients (16 normal, 50 with full-or partial-thickness ACL tears) done on a 3.0 Tesla MRI were extracted, the intensity values of MRI images were standardized by histogram matching followed by intensity normalization image pre-processing techniques. Automatic delineation of tibia, femur and bounding box of the ACL bundle was achieved by a three-dimensional CNN implemented in the research version of HealthSuite Insights (Philips HealthTech). A total of 88 features comprising of first order statistics and texture measures were computed from Gray Level Co-occurrence Matrix (GLCM) and Gray Level Run Length Matrix (GRLM) of the image volume within the ACL bounding box. Feature subset selection was performed by 2 sample t-test with statistical significance (p < 0.05).

Results

The performance of the deep learning approach for segmentation of femur and tibia is evaluated on 20 randomly selected MRI datasets. The ground truth was created by manual segmentation of femur and tibia by a MSK radiologist. The dice score was 0.91 +/- 0.084. The performance of the predictive model for discrimination of ACL tear was assessed by k-fold crossvalidation and the accuracy, sensitivity, and specificity were 89.5%, 93.3%, and 81% respectively.

Conclusion

The proposed machine learning technique gave good performance in the delineation of knee anatomical structures and detection of ACL tears in the knee MRI.

Automated classification of chest X-rays as normal/abnormal using a high sensitivity deep learning algorithm

Oral Presentation at the European Congress of Radiology, Vienna, 2019

Purpose

Majority of Chest X-rays (CXRs) performed globally are normal and radiologists spend significant time ruling out these scans. We present a Deep Learning (DL) model trained for the specific use of classifying CXRs into normal and abnormal, potentially reducing time and cost associated with reporting normal studies.

Methods and Materials

A DL algorithm trained on 1,150,084 CXRs and their corresponding reports was developed. A retrospectively acquired independent test set of 430 CXRs (285 abnormal, 145 normal) was analysed by the algorithm, classifying each X-ray as normal or abnormal. Ground truth for the independent test set was established by a sub-specialist chest radiologist with 8 years\’ experience by reviewing every Chest X-ray image with reference to the existing report. Algorithm output was compared against ground truth and summary statistics were calculated.

Results

The algorithm correctly classified 376 (87.44%) CXRs with a sensitivity of 97.19% (95% CI -94.54% to 98.78%) and specificity of 68.28% (95% CI -60.04% to 75.75%). There were 46 (10.70%) false positives and 8 (1.86%) false negatives (FNs). Out of the 8 FNs, 3 were designated as clinically insignificant (mild, inactive fibrosis) and 5 as significant (rib fractures, pneumothorax).

Conclusion

High-sensitivity DL algorithms can potentially be deployed for primary read of CXRs enabling radiologists to spend appropriate time on abnormal cases, saving time and thereby cost of reporting CXRs, especially in non-emergency situations. More in-depth prospective trials are required to ascertain the overall impact of such algorithms.