The Ordnance Gazetteer of Scotland: A survey of Scottish Topography, Statistical, Biographical and Historical (1803-1901) is collection of twenty volumes of the most popular descriptive historical gazetteers of Scotland in the 19th century. They are considered to be geographical dictionaries and include an alphabetic list of principal places in Scotland, including towns, counties, castles, glens, antiquities and parishes. Each entry also includes detailed historical information and a geographical description about each place. Descriptive gazetteers such as these are written complements to maps, atlases and cartographic works.
This dataset was recently made available by the National Library of Scotland (NLS) in the form of over 13,000 page images, corresponding optically character recognised (OCRed) text in XML format, and metadata for each item in the collection. In total the OCRed text (which is non post-corrected) comprises of almost 14.5 million words and collectively these gazetteers provide a comprehensive geographical encyclopaedia of Scotland in the 19th century. While this is a valuable resource, it is too time-consuming to geoparse this data manually.
This project will focus on devising automatic methods to geoparse the Gazetteers of Scotland. Previous related work on geoparsing The Survey of English Placenames has involved the painstaking development of rule-based geo-parsing methods [1,2] using and adapting the Edinburgh Geoparser. We will also use the Edinburgh Geoparser as a baseline for this project as a starting point. As part of this work we are also creating a new version of the Edinburgh Geoparser for historical text to work in conjunction with the GB1900 gazetteer [3] which we are planning to make available for research and teaching purposes. The Edinburgh Geoparser’s resolution component was also integrated with the DEFOE text analysis tool [4,5], a spark-based library which allows running text mining queries across large datasets such as historical newspapers and datasets made available by NLS, including the Gazetteer of Scotland. We will compare different geo-tagging and resolution pipelines and formally evaluate their geoparsing performance by means of a manually annotated gold standard.
The main goal of the project is to enable better quality geoparsing performance for mapping historical text. We are also looking to partner with humanities and digital humanities scholars who are interested in applying such methods in the light of their particular use cases.
Collaborators
Vasilis Karaiskos, PPLS, University of Edinburgh
Claire Grover, School of Informatics
Richard Tobin, School of Informatics
Rosa Filgueira Vicente, Heriot-Watt University
Melissa Terras, Centre for Data, Culture and Society, Edinburgh Futures Institute, University of Edinburgh
Sarah Ames, National Library of Scotland
Chris Fleet, National Library of Scotland
Beatrice Alex, Edinburgh Futures Institute, School of Literatures, Languages and Cultures, University of Edinburgh
Publications
Filgueira, Rosa, Claire Grover, Melissa Terras, and Beatrice Alex. “Geoparsing the historical Gazetteers of Scotland: accurately computing location in mass digitised texts.” In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pp. 24-30. 2020. [URL]
Funding
This work has been supported by the School of Informatics and by the Edinburgh Futures Institute.