“Deep learning” algorithm from Google demonstrates high sensitivity and specificity in diabetic retinopathy screening.

A research paper produced by a team of Google Inc. scientists, published in the Journal of the American Medical Association, has claimed that a “deep learning” computer machine algorithm, adapted to screen retinal fundus images, can detect referable diabetic retinopathy with a sensitivity and specificity significantly beyond the 80% sensitivity and specificity recommended by most screening guidelines. If reproducible and transferable to clinical practice, the technology has the potential to create a step change in how national screening systems may be adapted to manage the increasing epidemic of diabetic retinopathy across the globe.

The research project used a large set of 128,175 retinal fundus images to first “train” a computer algorithm to recognize and “learn” the most predictive features of the images which had been graded by a pool of 54 US-licensed ophthalmologists, or ophthalmology trainees in their last year of residency. According to the Google scientists, the deep-learning approach is a machine learning process which, in this instance, was used to train a neural network to grade retinal fundus images obtained from EyePACS in the United States, and from three eye hospitals in India. Each image is ascribed a severity grade by the algorithm which is then compared with the actual known grade assigned by a human ophthalmologist. The parameters of the algorithm are then modified slightly to decrease the error on that image. This process is then run repeatedly for every image in the training set, multiple times, with each iteration improving on the one before. Given a sufficiently large training set the end result is an algorithm capable of assessing a previously un-seen image and computing the diabetic retinopathy severity from the pixel intensities of the image.

Following fine-tuning of the algorithm, it was tested on two independent datasets – “EyePACS-1”, consisting of 9,963 images, and the “Messidor-2” set, consisting of 1,748 images. Both of the datasets were also graded by at least 7 US board-certified ophthalmologists selected for high intra-grading consistency. A simple majority decision was used in calling each image – if an image was classified as referable by ≥50% of ophthalmologists, then it was deemed referable. The algorithm was evaluated at two operating points, one selected for high specificity and another for high sensitivity. The results achieved showed that the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993) for EyePACS-1 and 0.990 (95% CI, 0.986-0.995) for Messidor-2. Using the first operating cut point with high specificity, the sensitivity was 90.3% for for EyePACS-1 (95% CI, 87.5%-92.7%) and the specificity was 98.1% (95% CI, 97.8%-98.5%). Analysis of the Messidor-2 datset showed a sensitivity of 87.0% (95% CI, 81.1%-91.0%) and a specificity of 98.5% (95% CI, 97.7%-99.1%). Analysis using a second operating point of high sensitivity showed that for EyePACS-1, the sensitivity was 97.5% and the specificity was 93.4% while for Messidor-2 the sensitivity was 96.1% and the specificity was 93.9%. While the study had a number of limitations and uncertainities persist in respect of how such technology might be integrted into clinical care, an encouraging editorial in the same issue of JAMA commented that, “deep machine learning provides a thoughtful analysis of data. The push of artificial intelligence into the health care arena is timely, welcomed, and much needed, as all available resources will be required to address the most pressing health care problems globally in an efficient, timely, and cost-effective manner.”