Neural Transfer Learning for Assigning Diagnosis Codes to EMRs
Anthony Rios and Ramakanth Kavuluru.
Artificial Intelligence in Medicine (2019)
Electronic medical records (EMRs) are manually annotated by healthcare professionals and specialized medical coders with a standardized set of alphanumeric diagnosis and procedure codes, specifically the International Classification of Diseases (ICD). Annotating EMRs with ICD codes is important for medical billing and downstream epidemiological studies. However, manually annotating EMRs is both time-consuming and error prone. In this paper, we explore the use of convolutional neural networks (CNNs) for automatic ICD coding. Because many codes occur infrequently, CNN performance is inhibited. Therefore, we propose supplementing EMR data with PubMed indexed biomedical research abstracts through neural transfer learning.
Materials and Methods
Transfer learning is the process of “transferring” knowledge acquired from one task (the source task) to a different (target) task. For the source task, we train a CNN to predict medical subject headings (MeSH) using 1.6 million PubMed indexed biomedical abstracts. For the target task, we train a CNN on 71,463 real-world EMRs collected from the University of Kentucky (UKY) medical center to predict ICD diagnosis codes. We introduce a simple, yet effective, transfer learning methodology which avoids forgetting knowledge gained from the source task.
Compared to our prior work using EMRs from the UKY medical center, we improve both the micro and macro F-scores by more than 8%. Likewise, compared to other transfer learning methods, our approach results in nearly 2% improve in macro F-score.
We show that transfer learning can improve CNN performance for EMR coding in the presence of data sparsity issues. Furthermore, we find that our proposed transfer learning approach outperforms other methods with respect to macro F-score. Finally, we analyze how transfer learning impacts codes with respect to code frequency. We find that we achieve greater improvement on infrequent codes compared to improvements in most frequent codes.