Edinburgh Information Extraction for Electronic Healthcare Records
Summary
We are developing natural language processing (NLP) technology for raw text in electronic healthcare records (EHR) in collaboration with clinicians and epidemiologists at the University of Edinburgh. The first system (EdIE-R), a rule-based information extraction and classification system for stroke phenotypes and types, was developed based on extensive analysis of Scottish brain imaging reports and their manually enriched annotations created by domain experts. The datasets that were annotated for system development and evaluation are the Edinburgh Stroke Study and brain imaging reports from NHS Tayside (Alex et al. 2019). We also explored the use of machine learning and deep learning approaches for doing the same tasks as EdIE-R (Gorinsky et al., 2019, Grivas et al., 2020) and have validated EdIE-R against data from different Scottish NHS cohorts (Casey et al., 2023) comparing it to other similar external systems on radiology reports from NHS Fife and reports for Generation Scotland patients (from NHS Tayside, Lothian and Greater Glasgow and Fife).
We have created a demo to showcase how our NLP tools extract concepts from text and identify negation (Grivas et al., 2020 and Sykes et al., 2020). We have also published two systematic reviews exploring the use and application of NLP methods to radiology reports (see Casey et al., 2021 and Davidson et al., 2021).

Most recently we have linked the information extracted from radiology reports using NLP to other structured data available for patients using the Scottish Medical Imaging data provided to us on the Scottish National Safe Haven. The goal is to conduct large-scale analysis of radiology reports across Scotland for epidemiological research and potential cohort selection and to use the NLP output as part of a dementia prediction algorithm.
Collaborators
- Dr Beatrice Alex, Chancellor’s Fellow at the Edinburgh Futures Institute and the School of Literatures, Languages and Cultures and Turing Fellow at the School of Informatics, University of Edinburgh and the Alan Turing Institute
- Dr Arlene Casey, Research Fellow, Usher Institute
- Dr Claire Grover, Senior Research Fellow at the School of Informatics, University of Edinburgh and Turing Fellow at the Alan Turing Institute
- Richard Tobin, Research Fellow at the School of Informatics, University of Edinburgh
- Andreas Grivas, Research Assistant at the School of Informatics, University of Edinburgh
- Emma Davidson, PhD student, Centre for Clinical Brain Sciences
- Michael Poon, Neurosurgeon, PhD student, Usher Institute
- Dr Daniel Duma
- Dr Víctor Suárez-Paniagua, HDR-UK
- Dr Hang Dong, Research Fellow, Usher Institute
- Dr Honghan Wu, Assistant Professor, UCL
- Prof Catherine Sudlow, Professor of Neurology and Clinical Epidemiology, Director of Centre for Medical Informatics, Usher Institute, University of Edinburgh, and Chief Scientist, UK Biobank, University of Edinburgh
- Dr Heather Whalley, Senior Research Fellow , Division of Psychiatry, University of Edinburgh
- Dr Grant Mair, Senior Clinical Lecturer in Neuroradiology at the Centre for Clinical Brain Sciences, University of Edinburgh
- Dr William Whiteley, Scottish Senior Clinical Fellow and Consultant Neurologist at the Centre for Clinical Brain Sciences, University of Edinburgh
Demo
You can try out our EdIE-viz demo here. It shows the output for EdIE-R, EdIE-BiLSTM and EdIE-BERT (see Grivas et al., 2020) for extracting information (entities and negation) from brain imaging reports.
Software
Here is the code on GitHub that accompanies our EMNLP LOUHI 2020 paper (Grivas et al., 2020). This repository contains the following systems and tools for information extraction from brain radiology reports:
- EdIE-R: Our rule-based NLP system which extracts 24 phenotypes for types of stroke and tumour
- EdIE-BiLSTM: Our neural network system with a character-aware BiLSTM sentence encoder
- EdIE-BERT: Our neural network system with a BERT encoder
- EdIE-viz: Contains code to run our web interface
- paper: Contains scripts related to extracting results and plots
Publications
- Arlene Casey, Emma Davidson, Claire Grover, Richard Tobin, Andreas Grivas, Huayu Zhang, Patrick Schrempf, Alison Q. O’Neil, Liam Lee, Michael Walsh, Freya Pellie, pdf, DOI] 2023). Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports. Frontiers in digital health, 5, p.1184919. [
- Emma M. Davidson, Michael T.C. Poon, Arlene Casey, Andreas Grivas, Daniel Duma, Hang Dong, Víctor Suárez-Paniagua, Claire Grover, Richard Tobin, Heather Whalley, Honghan Wu, Beatrice Alex and William Whiteley (2021). The reporting quality of natural language processing studies: systematic review of studies of radiology reports. BMC Medical Imaging, 21, 142. [pdf, DOI]
- Arlene Casey, Emma Davidson, Michael Poon, Hang Dong, Daniel Duma, Andreas Grivas, Claire Grover, Víctor Suárez-Paniagua, Richard Tobin, William Whiteley, Honghan Wu and Beatrice Alex (2021). A Systematic Review of Natural Language Processing Applied to Radiology Reports. BMC Medical Informatics and Decision Making, 21, 179. [arXiv, pdf, DOI]
- Andreas Grivas, Beatrice Alex, Claire Grover, Richard Tobin, William Whiteley (2020). Not a cute stroke: Analysis of Rule- and Neural Network-Based Information Extraction Systems for Brain Radiology Reports, in Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis (LOUHI 2020) at EMNLP 2020, November 2020. [pdf]
- Dominic Sykes, Andreas Grivas, Claire Grover, Richard Tobin, Cathie Sudlow, William Whiteley, Andrew McIntosh, Heather Whalley, Beatrice Alex (2020). Comparison of Rule-based and Neural Network Models for Negation Detection in Radiology Reports, Journal of Natural Language Engineering, 27(2), pp.203-224. [DOI, accepted manuscript]
- Beatrice Alex, Claire Grover, Richard Tobin, Cathie Sudlow, Grant Mair and William Whiteley (2019). Text Mining Brain Imaging Reports. Journal of Biomedical Semantics, 10, 23, 2019, doi:10.1186/s13326-019-0211-7. [html, pdf]
- Emily Wheater, Grant Mair, Cathie Sudlow, Beatrice Alex, Claire Grover and William Whiteley (2019). A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records. BMC Medical Informatics and Decision Making, 19, 184, 2019, doi:10.1186/s12911-019-0908-7. [html, pdf]
- Beatrice Alex, Claire Grover, Richard Tobin, Cathie Sudlow, Grant Mair and William Whiteley (2019). Text Mining Brain Imaging Reports, accepted for a special issue, Journal of Biomedical Semantics, 10, pp.1-11. [pdf, DOI]
- Philip John Gorinski, Honghan Wu, Claire Grover, Richard Tobin, Conn Talbot, Heather Whalley, Cathie Sudlow, William Whiteley and Beatrice Alex (2019). Named Entity Recognition for Electronic Health Records: A Comparison of Rule-based and Machine Learning Approaches, accepted for presentation at the HealTAC 2019 Conference, 24-25th of April 2019. [arXiv.org]
- Claire Grover, Richard Tobin, Beatrice Alex, Catherine Sudlow, Grant Mair and William Whiteley (2018). Text Mining Brain Imaging Reports, HealTAC-2018, Manchester, UK.
- William Whiteley, Claire Grover, Beatrice Alex, Cathie Sudlow and Grant Mair (2016). A natural language processing algorithm to identify stroke in brain imaging reports on a large scale. Poster presented at the 2nd European Stroke Organisation Conference (ESOC 2016), Barcelona, Spain. [pdf]
Funding
This work has been funded by:
- The Alan Turing Institute Project and Fellowships (CG & BA, EPSRC grant EP/N510129/1)
- MRC Pathfinder (MRC – MCPC17209)
- The Medical Research Council (WW, MRC Clinician Scientist Award G0902303)
- Chief Scientist Office (WW, Scottish Senior Clinical Fellowship, CAF/17/01).
- The Alzheimer’s Society
- HDR-UK
- Stroke Association Edith Murphy Foundation (GM, Senior Clinical Lectureship, SA L-SMP 18n1000)
- Innovate UK on behalf of UKRI (iCAIRD, project number: 104690
- Generation Scotland (Chief Scientist Office of the Scottish Government Health Directorates (CZD/16/6), Scottish Funding Council (HR03006) and the Wellcome Trust (216767/Z/19/Z))
- The Advanced Care Research Centre (L&G)
- NEURii (Eisai, Gates Ventures, Health Data Research UK and LifeArc)