Edinburgh Information Extraction for Electronic Healthcare Records


We are developing natural language processing (NLP) technology for raw text in electronic healthcare records (EHR) in collaboration with clinicians and epidemiologists at the University of Edinburgh. The first system (EdIE-R), a rule-based information extraction and classification system for stroke phenotypes and types, was developed based on extensive analysis of Scottish brain imaging reports and their manually enriched annotations created by domain experts.  The datasets that were annotated for system development and evaluation are the Edinburgh Stroke Study and brain imaging reports from NHS Tayside (Alex et al. 2019).  We are also exploring the use of machine learning and deep learning approaches for doing the same tasks as EdIE-R (Gorinsky et al., 2019).

We have created  a demo to showcase how our NLP tools extract concepts from text and identify negation (Grivas et al., 2020  and Sykes et al., 2020).  We have also published two systematic reviews exploring the use and application of NLP methods to radiology reports (see Casey et al., 2021 and Davidson et al., 2021).

Radiology displayed in our EdIE-viz demo for after recognising entities and negation in a synthetic radiology report.

Currently we are benchmarking our own NLP tools as well as external systems developed in other research groups and in industry to radiology reports from NHS Fife and reports for Generation Scotland patients (from NHS Tayside, Lothian and Greater Glasgow and Fife).  Our next goal is to link the information extracted using NLP to other data available for each patient.  A longer-term goal is to do large-scale analysis of radiology reports across Scotland for epidemiological research and potential cohort selection.

This work has been applied and conducted in a number of projects including an MRC Mental Health Data Pathfinder study (PI: Prof. Andrew McIntosh, CoI: Dr. Heather Sibley) and in a Turing project (PI: Dr. Beatrice Alex).


  • Dr Beatrice Alex, Chancellor’s Fellow at the Edinburgh Futures Institute and the School of Literatures, Languages and Cultures and Turing Fellow at the School of Informatics, University of Edinburgh and the Alan Turing Institute
  • Dr Arlene Casey, Research Fellow, Usher Institute
  • Dr Claire Grover, Senior Research Fellow at the School of Informatics, University of Edinburgh and Turing Fellow at the Alan Turing Institute
  • Richard Tobin, Research Fellow at the School of Informatics, University of Edinburgh
  • Andreas Grivas, Research Assistant at the School of Informatics, University of Edinburgh
  • Emma Davidson, PhD student, Centre for Clinical Brain Sciences
  • Michael Poon, Neurosurgeon, PhD student, Usher Institute
  • Dr Daniel Duma
  • Dr Víctor Suárez-Paniagua, HDR-UK
  • Dr Hang Dong, Research Fellow, Usher Institute
  • Dr Honghan Wu, Assistant Professor, UCL
  • Prof Catherine Sudlow,  Professor of Neurology and Clinical Epidemiology, Director of Centre for Medical Informatics, Usher Institute, University of Edinburgh, and Chief Scientist, UK Biobank, University of Edinburgh
  • Dr Heather Whalley, Senior Research Fellow , Division of Psychiatry, University of Edinburgh
  • Dr Grant Mair, Senior Clinical Lecturer in Neuroradiology at the Centre for Clinical Brain Sciences, University of Edinburgh
  • Dr William Whiteley, Scottish Senior Clinical Fellow and Consultant Neurologist at the Centre for Clinical Brain Sciences, University of Edinburgh


You can try out our EdIE-viz demo here.  It shows the output for EdIE-R, EdIE-BiLSTM and EdIE-BERT (see EMNLP LOUHI 2020 paper) for extracting information (entities and negation) from brain imaging reports.


Here is the code on GitHub that accompanies our EMNLP LOUHI 2020 paper. This repository contains the following systems and tools for information extraction from brain radiology reports:

  • EdIE-R: Our rule-based system
  • EdIE-BiLSTM: Our neural network system with a character-aware BiLSTM sentence encoder
  • EdIE-BERT: Our neural network system with a BERT encoder
  • EdIE-viz: Contains code to run our web interface
  • paper: Contains scripts related to extracting results and plots


  • Emma M Davidson, Michael T.C. Poon, Arlene Casey, Andreas Grivas, Daniel Duma, Hang Dong, Víctor Suárez-Paniagua, Claire Grover, Richard Tobin, Heather Whalley, Honghan Wu, Beatrice Alex and William Whiteley (2021). The reporting quality of natural language processing studies: systematic review of studies of radiology reports. BMC Medical Imaging, 21, 142. [pdf, DOI]
  • Arlene Casey, Emma Davidson, Michael Poon, Hang Dong, Daniel Duma, Andreas Grivas, Claire Grover, Víctor Suárez-Paniagua, Richard Tobin, William Whiteley, Honghan Wu and Beatrice Alex (2021). A Systematic Review of Natural Language Processing Applied to Radiology Reports. BMC Medical Informatics and Decision Making, 21, 179. [arXiv, pdfDOI]
  • Andreas Grivas, Beatrice Alex, Claire Grover, Richard Tobin, William Whiteley (2020). Not a cute stroke: Analysis of Rule- and Neural Network-Based Information Extraction Systems for Brain Radiology Reports, in Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis (LOUHI 2020) at EMNLP 2020, November 2020. [pdf]
  • Dominic Sykes, Andreas Grivas, Claire Grover, Richard Tobin, Cathie Sudlow, William Whiteley, Andrew McIntosh, Heather Whalley, Beatrice Alex, Comparison of Rule-based and Neural Network Models for Negation Detection in Radiology Reports, accepted to appear in a special issue in the Journal of Natural Language Engineering, 2020. [DOI, accepted manuscript]
  • Beatrice Alex, Claire Grover, Richard Tobin, Cathie Sudlow, Grant Mair and William Whiteley (2019). Text Mining Brain Imaging Reports. Journal of Biomedical Semantics, 10, 23, 2019, doi:10.1186/s13326-019-0211-7. [html, pdf]
  • Emily Wheater, Grant Mair, Cathie Sudlow, Beatrice Alex, Claire Grover and William Whiteley (2019). A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records. BMC Medical Informatics and Decision Making, 19, 184, 2019, doi:10.1186/s12911-019-0908-7. [html, pdf]
  • Philip John Gorinski, Honghan Wu, Claire Grover, Richard Tobin, Conn Talbot, Heather Whalley, Cathie Sudlow, William Whiteley and Beatrice Alex (2019). Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches, accepted for presentation at the HealTAC 2019 Conference, 24-25th of April 2019. [arXiv.org]
  • Claire Grover, Richard Tobin, Beatrice Alex, Catherine Sudlow, Grant Mair and William Whiteley (2018). Text Mining Brain Imaging Reports, HealTAC-2018, Manchester, UK.
  • William Whiteley, Claire Grover, Beatrice Alex, Cathie Sudlow and Grant Mair (2016). A natural language processing algorithm to identify stroke in brain imaging reports on a large scale. Poster presented at the 2nd European Stroke Organisation Conference (ESOC 2016), Barcelona, Spain. [pdf]


This work has been funded by:

  • The Alan Turing Institute (CG & BA, EPSRC grant EP/N510129/1)
  • MRC Pathfinder (MRC – MCPC17209)
  • The Medical Research Council (WW,  MRC Clinician Scientist Award G0902303)
  • Chief Scientist Office (WW, Scottish Senior Clinical Fellowship, CAF/17/01).
  • Stroke Association Edith Murphy Foundation (GM, Senior Clinical Lectureship, SA L-SMP 18n1000)