How to Lie with Statistics: Things To Keep in Mind While Evaluating a Deep Learning Claim

TEACHING POINTS

In today's age of deep learning and artificial intelligence, a radiologist must know what to watch out for while evaluating a deep learning algorithm's claim
What is ground truth?
Specific points to keep in mind while evaluating:
- What is the medium of communication? Is it a video, a pre-print or a reputed peer-reviewed journal article?
- What is the performance metric? Accuracy is a bad metric to use.
- What data was the algorithm developed on? Generally, algorithms developed on poor ground truth have poor performance
- What data was the algorithm validated on? Generally, algorithms validated on data from the same institution from where training data was obtained tend to falsely perform better
- How much data was it tested on? Test data should not only be independent, but also adequate, both in number and disease heterogeneity.
- What are the implications of the algorithm failing - what if a chest X-Ray algorithm misses a critical finding?
Try to get access to the actual algorithm and run it in your department