Happy Geoparsing: The Edinburgh Geoparser v1.3 is out

Photo credit: Markus Winkler (Unsplash)

New Release

We have released version 1.3 of the Edinburgh Geoparser and updated the accompanying lesson on the Programming Historian. The Geoparser now runs with a free OpenStreetmap visualisation by default. Anita Hawes, Publishing Assistant at Programming Historian, recently made us aware that users of the Geoparser who followed our lesson were asked to enter credit card details when creating the key for using Mapbox for the map visualisation.  We want our language technology to be open and free, so we reacted quickly to fix that.

We have now changed the Geoparser’s visualisation component to use OpenStreetMap tiles by default. OpenStreetMap tiles can be used for light use free of charge (and without signing up to anything) in accordance with their Tile Usage Policy.

If you have a Mapbox account you can continue to use it with the Geoparser by setting the GEOPARSER_MAP_KEY environment variable as before, but make sure you are aware of the possibility that they may charge you if you have given them a credit card number and exceed their limits on free use.

This is the only change we made in v1.3 compared to v1.2.  If you don’t use the visualisation component there is no need to update.

Figure 1: Examples of some geo-parsed exonyms (Vienna for Wien, Munich for München, Copenhagen for København, Venice for Venezia, Milan for Milano and Florence for Firenze).

Watch out for Exonyms

An exonym is a place name for which foreigners have a different name, like Munich for München. The main disadvantage of using OpenStreetMap tiles – from the point of view of an English-language geoparser – is that it generally displays maps in the language of the area or country, rather than English. This is a problem for exonyms as a place name on the map might not coincide with the name in the text. Despite this mismatch, it’s actually compelling to see how place names vary in different languages. For example, check out the place name for Hungary:

To help track your locations, the Geoparser visualisation centres the map on the corresponding pin when clicking on a place name that was recognised and is highlighted in the text and it also displays the recognised place name when hovering on the pin (see Figure 1).

Happy New Geoparsing!

Volunteer to Help Save Ukrainian Cultural Heritage Online (SUCHO)

Here is an urgent message from Prof Melissa Terras on how to help preserve Ukrainian Cultural Heritage … please spread the word.

Dear Colleagues,

Trusted friends of mine have set up SUCHO, Save Ukrainian Cultural Heritage Online (SUCHO) https://www.sucho.org

They are asking for volunteers to help identify and archive sites and content, while they are still online. You do not have to read Ukrainian or Russian to help. 

You can submit items to be saved: https://docs.google.com/forms/d/e/1FAIpQLSffa64-l6qXqEumAcf38OEOrTFeYZEmF531PNv9ZgzNFbcgxQ/viewform

And Volunteer to help put things in the internet archive, or use more advanced archiving software: https://docs.google.com/forms/d/e/1FAIpQLSc6KbhtEOI8zKsQmKT_waE1XlYEF1E6t-HzJ7Gc1EBfMvMg_A/viewform

Please do share with colleagues, and your students, and your networks. It’s one concrete thing we can do to help Ukraine, from afar.

You may also be interested in following the work of the Ukrainian Library Association, who are coordinating a National Digital Library in Ukraine: https://www.facebook.com/ula.org.ua/

Best wishes, and I hope you are doing ok at this difficult time.
Melissa 
————
Professor Melissa Terras
University of Edinburgh, College of Arts, Humanities and Social Sciences
@melissaterras

Edinburgh Geoparser Back On The Map

Photo credit: Timo Wielink on Unsplash

We are delighted to announce the release of the new version (v1.2) of the Edinburgh Geoparser, a tool to geoparser contemporary English text.  Most importantly, it now comes with a new map display using OpenStreetMap.

We also made a few fixes to make it run on the latest versions of MacOS and have added instructions of how to visualise the timeline display on different browsers.

The new geoparser also incorporates a gazetteer lookup that’s now supported by the University of Edinburgh Digital Library team.  We continue to support queries to all gazetteers that were distributed with the previous release of the geoparser (see the list here).

We are a small research team so updating this technology regularly can be challenging but we hope that with this new release the Edinburgh Geoparser will continue to be useful for place-based research and teaching.  More information on how do download the new version and on its updated documentation can be found here.

Reflections on my first PhD Publication at the Second Workshop on Gender Bias in Natural Language Processing

This post originally appeared on Lucy Havens’ blog on 10 February 2021.

This winter (December 2020), I published a new research methodology for Natural Language Processing (NLP) researchers to consider, which I refer to as a bias-aware methodology. Earlier in the year, a couple months into my PhD research on using NLP to detect biases in language, I’d been relieved to see Blodgett et al.’s ‘Critical Survey’ confirm what I’d begun to suspect: NLP bias research was missing the human element.  As a researcher new to the NLP domain, I’d been shifting between frustration with the vagueness of existing NLP bias research and doubt in my own understanding.  Soon after reading the Survey, I came across Kate Crawford’s 2017 keynote, The Trouble with Bias.  Both the Survey and keynote discuss the harmful consequences of siloed technology research, and they both call for interdisciplinary and stakeholder collaboration throughout the development of technology systems.  The Survey was published three years after the keynote.  Why was there still a need to make the same calls?

I realized that, although there was a wealth of evidence supporting the need for interdisciplinary and stakeholder collaboration, there wasn’t guidance on how to go about engaging in such collaboration.  Drawing on my background working at the intersection of multiple disciplines, I went to work creating a new methodology that would outline how to collaborate across disciplines and with system stakeholders.  Though my work and studies have fallen under many different names (to name a few: Information Systems, Human-Computer Interaction, Customer Experience, Design Informatics), I consistently situate myself in the same sort of place: at the intersection of groups of people who do not typically work together.  I enjoy adapting the tools of one discipline to another to enable new types of research questions to be asked and new insights to be discovered.  To adapt one discipline’s tools for another, I listen closely to how people communicate, adopting distinct vocabularies and presentation styles depending on my audience.  I employ human-centered design methods, observing and interviewing, even if only informally, to gather information about the goals and concerns of my collaborators.  As those involved in anything participatory, user-centered, or customer experience-related have likely experienced, once you’re exposed to the methods, it’s difficult to stop yourself from seeing everything through a human-centered design lens.  So, my PhD was inevitably including some form of human-centered design.

In the new methodology I propose with my co-authors in Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research, I’ve embedded interdisciplinary concepts and practices into three activities for researchers to execute in parallel: (1) Examining Power Relations, (2) Explaining the Bias of Focus, and (3) Applying NLP Methods.  The practice of participatory action research, which plays a part in all three activities, embeds stakeholder collaboration into the methodology as well.  I’m in the process of executing these three activities during my PhD research, so I will certainly refine the methodology over time (I’d also love feedback on how it suits your work and how you’d adjust it!).  That being said, the methodology does provide a starting point for all types of NLP research and development, facilitating critical reflection on power relations and their resulting biases that impact all NLP datasets and systems.  If your dataset or system has a huge community of potential stakeholders, the methodology asks you to make decisions based on the people at the margins of that stakeholder community, assembling as diverse a group of people as possible with whom you can collaborate.  If your project timeline does not allow adequate time for stakeholder collaboration, the methodology asks you to be detailed in the documentation of your work, stating the time, place and people that make up your project context, and the power relations between people in your project context.

NLP uses human language as a data source, meaning NLP datasets are inherently biased, and NLP systems built on those datasets are inherently biased.  Everyone has a unique combination of experiences that give them a particular perspective, or bias, and this isn’t necessarily a bad thing.  The problems arise when a particular perspective is presented as universal or neutral.  If we identify which perspectives are present in our research and, to the best of our ability, which perspectives are absent, we can help people who visit our work realize how they should adapt it to suit their context.  Adopting the bias-aware methodology requires a mindset shift, where the human element has just as much weight as the technological element.   We must set project timelines and funding models that allow for collaboration with adequately diverse groups of people. 

For more on why and how to use a bias-aware NLP research methodology, check out the published paper in the ACL Anthology or read the preprint on ArXiv! 

Citation:

Havens, Lucy, Melissa Terras, Benjamin Bach, and Beatrice Alex. 2020 “Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research.” Proceedings of the Second Workshop on Gender Bias in Natural Language Processing. Barcelona, Spain (Online), December 13, 2020, pp. 107-124. Association for Computational Linguistics. Available: https://www.aclweb.org/anthology/2020.gebnlp-1.10

By Lucy Havens