Edinburgh Information Extraction for Electronic Healthcare Records


We are developing natural language processing technology for raw text in electronic healthcare records (EHR) in collaboration with clinicians and epidemiologists at the University of Edinburgh. The first system (EdIE-R), a rule-based information extraction and classification system for stroke phenotypes and types, was developed based on extensive analysis of Scottish brain imaging reports and their manually enriched annotations created by domain experts.  The datasets that were annotated for system development and evaluation are the Edinburgh Stroke Study and brain imaging reports from NHS Tayside. EdIE-R is currently being prepared for an official release to the research community. We are also exploring the use of machine learning and deep learning approaches for doing the same tasks as EdIE-R.

This work is applied and conducted in a number of projects including an MRC Mental Health Data Pathfinder study (PI: Prof. Andrew McIntosh, CoI: Dr. Heather Sibley) and in a Turing project (PI: Dr. Beatrice Alex).


  • Dr. Beatrice Alex, Chancellor’s Fellow at the Edinburgh Futures Institute and the School of Literatures, Languages and Cultures and Turing Fellow at the School of Informatics, University of Edinburgh and the Alan Turing Institute
  • Dr. Claire Grover, Senior Research Fellow at the School of Informatics, University of Edinburgh and Turing Fellow at the Alan Turing Institute
  • Richard Tobin, Research Fellow at the School of Informatics, University of Edinburgh
  • Andreas Grivas, Research Assistant at the School of Informatics, University of Edinburgh
  • Prof. Catherine Sudlow,  Professor of Neurology and Clinical Epidemiology, Director of Centre for Medical Informatics, Usher Institute, University of Edinburgh, and Chief Scientist, UK Biobank, University of Edinburgh
  • Dr. Heather Whalley, Senior Research Fellow , Division of Psychiatry, University of Edinburgh
  • Dr. Grant Mair, Senior Clinical Lecturer in Neuroradiology at the Centre for Clinical Brain Sciences, University of Edinburgh
  • Dr. William Whiteley, Scottish Senior Clinical Fellow and Consultant Neurologist at the Centre for Clinical Brain Sciences, University of Edinburgh


So far this work has been funded by:

  • The Alan Turing Institute (CG & BA, EPSRC grant EP/N510129/1)
  • The Medical Research Council (WW,  MRC Clinician Scientist Award G0902303)
  • Chief Scientist Office (WW, Scottish Senior Clinical Fellowship, CAF/17/01).
  • Stroke Association Edith Murphy Foundation (GM, Senior Clinical Lectureship, SA L-SMP 18n1000)


  • Philip John Gorinski, Honghan Wu, Claire Grover, Richard Tobin, Conn Talbot, Heather Whalley, Cathie Sudlow, William Whiteley and Beatrice Alex (2019). Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches, accepted for presentation at the HealTAC 2019 Conference, 24-25th of April 2019. [arxiv.org pdf]
  • Beatrice Alex, Claire Grover, Richard Tobin, Cathie Sudlow, Grant Mair and William Whiteley, Text Mining Brain Imaging Reports, accepted for a special issue to appear in the Journal of Biomedical Semantics in 2019. [preprint]
  • Claire Grover, Richard Tobin, Beatrice Alex, Catherine Sudlow, Grant Mair and William Whiteley (2018). Text Mining Brain Imaging Reports, HealTAC-2018, Manchester, UK.
  • William Whiteley, Claire Grover, Beatrice Alex, Cathie Sudlow and Grant Mair (2016). A natural language processing algorithm to identify stroke in brain imaging reports on a large scale. Poster presented at the 2nd European Stroke Organisation Conference (ESOC 2016), Barcelona, Spain. [pdf]