Deep Learning as an Adjunct to Large-Scale Teleretinal Screening Programs
Despite well-established guidelines for screening and potential early detection of DR by an eye care provider, between 30% and 50% of individuals with diabetes do not adhere to these recommendations for a multitude of reasons.18,19 Teleretinal screening programs for DR may help close this gap and are already demonstrating success in select regional markets with nonmydriatic cameras being deployed in various settings.
Prior to the onset of the global coronavirus disease 2019 (COVID-19) pandemic, the majority of telemedicine-related initiatives in ophthalmology involving diabetic eye care centered around “store-and-forward” models of screening for disease. With this approach, fundus cameras were to be strategically set up in locations outside of the eye care provider’s primary place of service, but somewhere that a patient with diabetes is likely to require a visit to. Pilot programs were being planned for and deployed in various settings: the primary care physician’s or endocrinologist’s office, urgent care centers, laboratory setting, radiology imaging suites, dialysis centers, pharmacies, and even routine grocery stores where an optical shop may be present. While in the long run, these models are likely to continue expanding and gain widespread adoption, the arrival of the novel coronavirus along with the threat of its endemic persistence have likely changed the near-term trajectory and timetable for how and when DR screening is performed.
Nevertheless, with an already strong reliance on diagnostic multimodal imaging, coupled with the willingness to adopt novel, transformative technologies into the clinical workspace, ophthalmology is a field well-positioned to successfully utilize telemedicine in the coming future. As this infrastructure continues to develop, deep learning overlay may augment remote imaging diagnostic capabilities by reducing the degree of human involvement needed (ie, shifting from requiring direct human interpretation of every image to primarily human oversight with grading/affirmation needed for only referable, abnormal cases). Potential benefits of deep learning-based screening programs would include: 1) increased efficiency and coverage (ie, algorithms are programmed to withstand repetitive image processing, can work in parallel, and do not fatigue), 2) reducing barriers to access in areas where an eye care provider may not be present, 3) providing earlier detection of referable eye disease, and 4) decreasing overall healthcare costs through earlier intervention of treatable disease rather than resorting to more costly interventions in the more advanced phases of pathology.
Equally as important as the disease being screened for is the imaging modality being utilized to do the screening. Numerous nonmydriatic fundus camera systems are currently available, but limited investigations have been conducted thus far using wide-field imaging, which may offer unique advantages for future teleretinal screening programs as evidenced by the collaboration between Nikon’s Optos and Google’s Verily subdivisions in late 2016.20 Currently, the greatest limitation of studies utilizing wide-field cameras is the relatively low numbers of images used for training (n<500), as deep learning requires a large number of data sets for optimal training. Moving forward, larger sets of classified, labeled wide-field images will need to be procured for more optimal deep learning algorithm development.
Current Limitations and the Road Ahead
Although there is a rapidly growing body of literature supporting a role for deep learning applications within ophthalmology, significant work remains as the next steps are taken towards its clinical validation and eventual implementation. While there is certainly much reason for optimism, numerous challenges remain. Many of these studies retrospectively utilized training sets from relatively homogenous patient populations. Moving forward, the goal will be to continue training on larger image sets that are diverse across not only the patient demographic but also the type of images obtained (ie, different fundus cameras, wide-field imaging, mydriatic vs nonmydriatic images, etc).
A separate area of concern is the “black box” nature of deep learning, whereby the rationale for the outputs generated by the algorithms are not entirely understood by not only the physicians, but also the engineers who programmed them. This has created some apprehension in the public eye and raises an ethical dilemma of how to build public trust for a technology we do not fully comprehend. Nevertheless, groups have been attempting to fill in these gaps in knowledge by generating heat maps highlighting regions of influence on each image that contributed to the algorithm’s conclusion.9
Should we arrive at a future where automated image analysis has been integrated into clinical practice, there are concerns over whether this may eventually lead to a reduction in physician skills and clinical acumen due to an overreliance on technology.21,22 This phenomenon is known as deskilling, where the skill level required to complete a task is reduced when components of the task become automated, leading to inefficiencies whenever the technology fails or breaks down.21,22
Bridging the “Real-World” Chasm
At the 2018 Human Intelligence and Artificial Intelligence in Medicine Symposium held at Stanford University School of Medicine in California, numerous speakers cautioned about the lack of prospective, peer-reviewed data published and that there is potential for patient harm if this technology is rushed into the clinic without first enduring sufficient testing and regulation.23 Even with the pivotal IDx-DR results from Abràmoff and colleagues, which were used to form the basis for FDA approval of the system, there still remains the unknown issue of clinical effectiveness.24 In other words, are patients directly benefitting from the use of these systems and demonstrating at least noninferior visual outcomes with these screening algorithms as opposed to traditional screening measures?
Current requirements for the deployment of machine learning-based software in the clinical setting, such as the standards set for clearance by the FDA in the United States or a CE mark in Europe, focus primarily on accuracy. However, there are no explicit requirements that a machine learning platform must improve the outcome for patients, largely because such trials have not yet been conducted.
Recently, a team at Google Health published their experience on evaluating a deep learning algorithm for the detection of DR in a real-world setting in Thailand, becoming one of the first published studies to look at the effect of a deep learning system on clinical and operations workflow.25 For this study, the team at Google Health observed pre-existing DR screening and intake protocols at 11 clinics over an 8-month period. In Thailand’s system, registered nurses acquire fundus photos of patients’ eyes during regular check-ups and send them off to be evaluated by a remote specialist. This process can be long and cumbersome, sometimes taking up to 10 weeks.
Google’s deep learning algorithm, which currently has CE mark clearance extending to Thailand, was then deployed in the clinics. This deep learning system has been previously shown to have specialist-level accuracy (>90% sensitivity and specificity) for the detection of referable DR. With the addition of the deep learning platform, patients more or less followed the same journey through the clinic as they did previously, with the exception of being able to receive an immediate notification of whether a referral was needed to see an eye specialist when referable DR was detected.
Numerous challenges were encountered and reported by the team; namely, the issue about how to handle suboptimal, or ungradable, images was highlighted. Like similar deep learning systems, their deep learning model had been trained on high-quality photographs. Thus, to ensure accuracy, these algorithms have been designed to reject images that are difficult to grade. In a real-world setting, with a busy clinic and nurses taking many photos during the course of the day, sometimes in poor lighting conditions, over 20% of the images were deemed ungradable and thus rejected from the study. What then happened to these patients ended up adding inefficiencies, rather than removing them, from the system. Sometimes, they spent extra time to retake an image that the deep learning model had previously rejected. In the event that a nurse was unable to capture a gradable image, the patients were then instructed to visit a specialist at another clinic on another day (which sometimes was over an hour commute away). In some cases, these patients ended up having no referable disease in the end, thus wasting the added clinic visit. The nurses expressed frustration to the Google Health team in these instances, especially when they felt the rejected images showed no signs of disease and that a referral to a specialist was unwarranted.
A second issue that affected the efficiency of this deep learning-augmented screening protocol was the available internet connection. Images needed to be uploaded to the cloud prior to obtaining an assessment while the patient waited for the results. If the internet connection was strong, the results would return within a few seconds. However, poor internet connections in a number of clinics resulted in significant time delays and reduced the maximal throughput of patients able to be screened.
The Google team has since been working together with these clinics to modify the workflows and possibly augment the deep learning algorithm to be more tolerant of imperfect images. The results of this study were critical, however, in that they illustrated there is much more to applying a deep learning algorithm to a real-world clinical setting that just its sensitivity and specificity. Seeing how the system integrates into existing clinical operations workflow is imperative, while improving the efficiency and experience of all stakeholders (the patient, physician, and staff).
Deep learning has shown substantial promise to date in automated image analysis towards the accurate diagnosis of DR from fundus photographs. Moving forward, this technology appears poised to augment larger-scale DR screening efforts, especially as the COVID-19 pandemic has disrupted access to prevention and treatment services for patients with retinal diseases. However, additional testing and research is required to further clinically validate this technology and integrate it into clinical workflows with minimal disruption.
Disclosure: Dr Rahimy is a physician consultant for Google and Regeneron.
1. American Diabetes Association. Economic costs of diabetes in the U.S. in 2012. Diabetes Care. 2013;36(4):1033-1046.
2. New CDC report: more than 100 million Americans have diabetes or prediabetes. News release. Centers for Disease Control and Prevention; July 18, 2017. Accessed June 14, 2020. https://www.cdc.gov/media/releases/2017/p0718-diabetes-report.html
3. Zimmet PZ. Diabetes and its drivers: the largest epidemic in human history? Clin Diabetes Endocrinol. 2017;3:1.
4. Fong DS, Aiello L, Gardner TW, et al. Retinopathy in diabetes. Diabetes Care. 2004;27(suppl 1):S84-s87.
5. PathAI. Accessed June 14, 2020. https://www.pathai.com
6. Enlitic. Accessed June 14, 2020. http://www.enlitic.com
7. Abràmoff MD, Lou Y, Erginay A, et al. Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning. Invest Ophthalmol Vis Sci. 2016;57:5200-5206.
8. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410.
9. Gargeya R, Leng T. Automated identification of diabetic retinopathy using deep learning. Ophthalmology. 2017;124(7):962-969.
10. Ting DSW, Cheung CY, Lim G, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318(22):2211-2223.
11. Ramachandran N, Hong SC, Sime MJ, Wilson GA. Diabetic retinopathy screening using deep neural network. Clin Exp Ophthalmol. 2018;46(4):412-416.
12. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med. 2018;1:39.
13. Bhaskaranand M, Ramachandra C, Bhat S, et al. The value of automated diabetic retinopathy screening with the EyeArt system: a study of more than 100,000 consecutive encounters from people with diabetes. Diabetes Technol Ther. 2019;21(11):635-643.
14. Gulshan V, Rajan RP, Widner K, et al. Performance of a deep learning algorithm vs manual grading for detecting diabetic retinopathy in India. JAMA Ophthalmol. 2019;137(9):987-993.
15. Abràmoff MD, Folk JC, Han DP, et al. Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol. 2013;131(3):351-357.
16. Shen D, Wu G, Suk H-I. Deep learning in medical image analysis. Annu Rev Biomed Eng. 2017;19:221-248.
17. Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60-88.
18. Kuo S, Fleming BB, Gittings NS, et al. Trends in care practices and outcomes among Medicare beneficiaries with diabetes. Am J Prev Med. 2005;29(5):396-403.
19. Brechner RJ, Cowie CC, Howie LJ, Herman WH, Will JC, Harris MI. Ophthalmic examination among adults with diagnosed diabetes mellitus. JAMA. 1993;270(14):1714-1718.
20. Nikon and Verily establish strategic alliance to develop machine learning-enabled solutions for diabetes-related eye disease. News release. Nikon; December 27, 2016. https://www.nikon.com/news/2016/1227_01.htm
21. Cabitza F, Rasoini R, Gensini GF. Unintended consequences of machine learning in medicine. JAMA. 2017;318(6):517-518.
22. Hoff T. Deskilling and adaptation among primary care physicians using two work innovations. Health Care Manage Rev. 2011;36(4):338-348.
23. Human Intelligence & Artificial Intelligence in Medicine Symposium. Stanford University School of Medicine. Accessed September 9, 2018. https://med.stanford.edu/presence/initiatives/hiai-symposium.html
24. Keane PA, Topol EJ. With an eye to AI and autonomous diagnosis. NPJ Digit Med. 2018;1:40.
25. Beede E, Baylor E, Hersch F, et al. A human-centered evaluation of a deep learning system deployed in clinics for the detection of diabetic retinopathy. CHI ’20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, April 2020. 2020:1-12.