Geoparsing the Gazetteers of Scotland

The Ordnance Gazetteer of Scotland: A survey of Scottish Topography, Statistical, Biographical and Historical (1803-1901) is collection of twenty volumes of the most popular descriptive historical gazetteers of Scotland in the 19th century.  They are considered to be geographical dictionaries and include an alphabetic list of principal places in Scotland, including towns, counties, castles, glens, antiquities and parishes. Each entry also includes detailed historical information and a geographical description about each place.  Descriptive gazetteers such as these are written complements to maps, atlases and cartographic works. 

This dataset was recently made available by the National Library of Scotland (NLS) in the form of over 13,000 page images, corresponding optically character recognised (OCRed) text in XML format, and metadata for each item in the collection.  In total the OCRed text (which is non post-corrected) comprises of almost 14.5 million words and collectively these gazetteers provide a comprehensive geographical encyclopaedia of Scotland in the 19th century.  While this is a valuable resource, it is too time-consuming to geoparse this data manually.

This project will focus on devising automatic methods to geoparse the Gazetteers of Scotland.  Previous related work on geoparsing The Survey of English Placenames has involved the painstaking development of rule-based geo-parsing methods [1,2] using and adapting the Edinburgh Geoparser.  We will also use the Edinburgh Geoparser as a baseline for this project as a starting point. As part of this work we are also creating a new version of the Edinburgh Geoparser for historical text to work in conjunction with the GB1900 gazetteer [3] which we are planning to make available for research and teaching purposes. The Edinburgh Geoparser’s resolution component was also integrated with the DEFOE text analysis tool [4,5], a spark-based library which allows running text mining queries across large datasets such as historical newspapers and datasets made available by NLS, including the Gazetteer of Scotland.  We will compare different geo-tagging and resolution pipelines and formally evaluate their geoparsing performance by means of a manually annotated gold standard.

The main goal of the project is to enable better quality geoparsing performance for mapping historical text. We are also looking to partner with humanities and digital humanities scholars who are interested in applying such methods in the light of their particular use cases.

Collaborators

Vasilis Karaiskos, PPLS, University of Edinburgh

Claire Grover, School of Informatics

Richard Tobin, School of Informatics

Rosa Filgueira Vicente, Heriot-Watt University

Melissa Terras, Centre for Data, Culture and Society, Edinburgh Futures Institute, University of Edinburgh

Sarah Ames, National Library of Scotland

Chris Fleet, National Library of Scotland

Beatrice Alex, Edinburgh Futures Institute, School of Literatures, Languages and Cultures, University of Edinburgh

Publications

Filgueira, Rosa, Claire Grover, Melissa Terras, and Beatrice Alex. “Geoparsing the historical Gazetteers of Scotland: accurately computing location in mass digitised texts.” In Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora, pp. 24-30. 2020. [URL]

Funding

This work has been supported by the School of Informatics and by the Edinburgh Futures Institute.

References

[1] Grover, Claire, and Richard Tobin. “A gazetteer and georeferencing for historical English documents.” In Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pp. 119-127. 2014. [URL] [2] Alex, Beatrice, Kate Byrne, Claire Grover, and Richard Tobin. “Adapting the Edinburgh geoparser for historical georeferencing.” International Journal of Humanities and Arts Computing 9, no. 1 (2015): 15-35. [URL, pre-print] [3] Southall, Humphrey, Paula Aucott, Chris Fleet, Tom Pert, and Michael Stoner. “GB1900: Engaging the public in very large scale gazetteer construction from the ordnance survey “County series” 1: 10,560 mapping of Great Britain.” Journal of Map & Geography Libraries 13, no. 1 (2017): 7-28. [URL] [4] https://github.com/alan-turing-institute/defoe

[5] Filgueira Vicente, R, Jackson, M, Roubickova, A, Krause, A, Terras, M, Hauswedell, T, Nyhan, J, Beavan, D, Hobson, T, Coll Ardanuy, M, Colavizza, G, Hetherington, J & Ahnert, R 2019, defoe: A Spark-based Toolbox for Analysing Digital Historical Textual Data. in 2019 IEEE 15th International Conference on e- Science (e-Science). 2019 IEEE 15th International Conference on e-Science (e-Science), San Diego, United States, 24/09/19. [URL]