Gaelic Language Technology

Two years ago, the Edinburgh LTG teamed up with researchers working on Gaelic language technology. They operate under the banner of the Gaelic Algorithmic Research Group (GARG), which is run by Dr William Lamb from Celtic and Scottish Studies. To date, the group has:

  • developed an automatic handwriting recognition model to convert manuscripts held at the School of Scottish Studies Archives (University of Edinburgh) to text using Transkribus, and
  • built a working automatic speech recogniser (ASR) for Scottish Gaelic, which turns Gaelic speech recordings to text.

The group is also currently working on automatic orthographic normalisation, building a large Gaelic language model and evaluating the ASR algorithms on a blind test set. This research builds on previous work led by Dr Lamb, which includes developing a Gaelic POS-tagged corpus (ARCOSG) and a Gaelic POS-tagger and lemmatiser, which are useful tools for preprocessing Gaelic text.

Collaborators

PI: Dr William Lamb

Co-Is: Dr Beatrice Alex; Prof James Loxley

Consultant: Dr Marc Sinclair (Sr Speech Scientist, Quorate Technology Ltd)

Research Assistants: Dr Sharon Arbuthnot, Michael Bauer, Dr Samuel Danso, Lucy Evans, Susanna Naismith, Robert Thomas 

Funding

The work has been supported by the following funders: 

  • Carnegie Trust for the Universities of Scotland
  • Bòrd na Gàidhlig
  • Soillse
  • Scottish Funding Council / Data-driven Data Innovation
  • University of Edinburgh (Challenge Investment Fund)