Edinburgh Information Extraction for Electronic Healthcare Records

Project Summary

We are developing natural language processing technology for raw text in electronic healthcare records (EHR) in collaboration with clinicians and epidemiologists at the University of Edinburgh. The first system (EdIE-R), a rule-based information extraction and classification system for stroke phenotypes and types, was developed based on extensive analysis of Scottish brain imaging reports and their manually enriched annotations created by domain experts.  The datasets that were annotated for system development and evaluation are the Edinburgh Stroke Study and brain imaging reports from NHS Tayside. EdIE-R is currently being prepared for an official release to the research community. We are also exploring the use of machine learning approaches for doing the same tasks as EdIE-R.


  • Dr. Beatrice Alex, Chancellor’s Fellow at the Edinburgh Futures Institute and the School of Literatures, Languages and Cultures and Turing Fellow at the School of Informatics and the Alan Turing Institute
  • Dr. Claire Grover, Senior Research Fellow at the School of Informatics and Turing Fellow at the Alan Turing Institute
  • Richard Tobin, Research Fellow at the School of Informatics
  • Prof. Catherine Sudlow,  Professor of Neurology and Clinical Epidemiology, Director of Centre for Medical Informatics, Usher Institute, University of Edinburgh, and Chief Scientist, UK Biobank
  • Dr. Grant Mair, Senior Clinical Lecturer in Neuroradiology at the Centre for Clinical Brain Sciences
  • Dr. William Whiteley, Scottish Senior Clinical Fellow and Consultant Neurologist at the Centre for Clinical Brain Sciences


So far this work has been funded by:

  • The Alan Turing Institute (CG & BA, EPSRC grant EP/N510129/1)
  • The Medical Research Council (WW,  MRC Clinician Scientist Award G0902303)
  • Chief Scientist Office (WW, Scottish Senior Clinical Fellowship, CAF/17/01).
  • Stroke Association Edith Murphy Foundation (GM, Senior Clinical Lectureship, SA L-SMP 18n1000)


  • Beatrice Alex, Claire Grover, Richard Tobin, Cathie Sudlow, Grant Mair and William Whiteley, Text Mining Brain Imaging Reports, accepted for a special issue to appear in the Journal of Biomedical Semantics in 2019. [preprint available on request]
  • Claire Grover, Richard Tobin, Beatrice Alex, Catherine Sudlow, Grant Mair and William Whiteley (2018). Text Mining Brain Imaging Reports, HealTAC-2018, Manchester, UK.
  • William Whiteley, Claire Grover, Beatrice Alex, Cathie Sudlow and Grant Mair (2016). A natural language processing algorithm to identify stroke in brain imaging reports on a large scale. Poster presented at the 2nd European Stroke Organisation Conference (ESOC 2016), Barcelona, Spain. [pdf]