Reflections on my first PhD Publication at the Second Workshop on Gender Bias in Natural Language Processing

This post originally appeared on Lucy Havens’ blog on 10 February 2021.

This winter (December 2020), I published a new research methodology for Natural Language Processing (NLP) researchers to consider, which I refer to as a bias-aware methodology. Earlier in the year, a couple months into my PhD research on using NLP to detect biases in language, I’d been relieved to see Blodgett et al.’s ‘Critical Survey’ confirm what I’d begun to suspect: NLP bias research was missing the human element. As a researcher new to the NLP domain, I’d been shifting between frustration with the vagueness of existing NLP bias research and doubt in my own understanding. Soon after reading the Survey, I came across Kate Crawford’s 2017 keynote, The Trouble with Bias. Both the Survey and keynote discuss the harmful consequences of siloed technology research, and they both call for interdisciplinary and stakeholder collaboration throughout the development of technology systems. The Survey was published three years after the keynote. Why was there still a need to make the same calls?

I realized that, although there was a wealth of evidence supporting the need for interdisciplinary and stakeholder collaboration, there wasn’t guidance on how to go about engaging in such collaboration. Drawing on my background working at the intersection of multiple disciplines, I went to work creating a new methodology that would outline how to collaborate across disciplines and with system stakeholders. Though my work and studies have fallen under many different names (to name a few: Information Systems, Human-Computer Interaction, Customer Experience, Design Informatics), I consistently situate myself in the same sort of place: at the intersection of groups of people who do not typically work together. I enjoy adapting the tools of one discipline to another to enable new types of research questions to be asked and new insights to be discovered. To adapt one discipline’s tools for another, I listen closely to how people communicate, adopting distinct vocabularies and presentation styles depending on my audience. I employ human-centered design methods, observing and interviewing, even if only informally, to gather information about the goals and concerns of my collaborators. As those involved in anything participatory, user-centered, or customer experience-related have likely experienced, once you’re exposed to the methods, it’s difficult to stop yourself from seeing everything through a human-centered design lens. So, my PhD was inevitably including some form of human-centered design.

In the new methodology I propose with my co-authors in Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research, I’ve embedded interdisciplinary concepts and practices into three activities for researchers to execute in parallel: (1) Examining Power Relations, (2) Explaining the Bias of Focus, and (3) Applying NLP Methods. The practice of participatory action research, which plays a part in all three activities, embeds stakeholder collaboration into the methodology as well. I’m in the process of executing these three activities during my PhD research, so I will certainly refine the methodology over time (I’d also love feedback on how it suits your work and how you’d adjust it!). That being said, the methodology does provide a starting point for all types of NLP research and development, facilitating critical reflection on power relations and their resulting biases that impact all NLP datasets and systems. If your dataset or system has a huge community of potential stakeholders, the methodology asks you to make decisions based on the people at the margins of that stakeholder community, assembling as diverse a group of people as possible with whom you can collaborate. If your project timeline does not allow adequate time for stakeholder collaboration, the methodology asks you to be detailed in the documentation of your work, stating the time, place and people that make up your project context, and the power relations between people in your project context.

NLP uses human language as a data source, meaning NLP datasets are inherently biased, and NLP systems built on those datasets are inherently biased. Everyone has a unique combination of experiences that give them a particular perspective, or bias, and this isn’t necessarily a bad thing. The problems arise when a particular perspective is presented as universal or neutral. If we identify which perspectives are present in our research and, to the best of our ability, which perspectives are absent, we can help people who visit our work realize how they should adapt it to suit their context. Adopting the bias-aware methodology requires a mindset shift, where the human element has just as much weight as the technological element. We must set project timelines and funding models that allow for collaboration with adequately diverse groups of people.

For more on why and how to use a bias-aware NLP research methodology, check out the published paper in the ACL Anthology or read the preprint on ArXiv!

Citation:

Havens, Lucy, Melissa Terras, Benjamin Bach, and Beatrice Alex. 2020 “Situated Data, Situated Systems: A Methodology to Engage with Power Relations in Natural Language Processing Research.” Proceedings of the Second Workshop on Gender Bias in Natural Language Processing. Barcelona, Spain (Online), December 13, 2020, pp. 107-124. Association for Computational Linguistics. Available: https://www.aclweb.org/anthology/2020.gebnlp-1.10

By Lucy Havens