Gaelic Language Technology

Two years ago, the Edinburgh LTG teamed up with researchers working on Scottish Gaelic language technology. We operate under the banner of the Gaelic Algorithmic Research Group (GARG), which is led by Dr William Lamb from Celtic and Scottish Studies.

To date, GARG started out by developing an automatic handwriting recognition model to convert manuscripts held at the School of Scottish Studies Archives (University of Edinburgh) to text using Transkribus. More information on this work can be found here.

Automatic transcription of Gaelic handwriting using Transkribus

We then built a working automatic speech recogniser (ASR) for Scottish Gaelic, which turns Gaelic speech recordings into text.

Example of time-aligned Scottish Gaelic speech and its corresponding transcript

Building a speech recognisers crucially requires correct alignment of Scottish Gaelic speech and text, which that was developed by Quorate Technology Ltd. Currently the speech recogniser works at 70-75% accuracy but we are planning to improve this further with additional Scottish Gaelic training data. To find out more detailed information on this work see this blog and to see examples of our existing speech recogniser displayed as time-aligned subtitles click here! We’ve also made a demo available to try out yourself.

Automatically subtitled video

The group is also developed an automatic orthographic normalisation tool for creating more data to build a large Gaelic language model. More information on this normalisation work can be found here, we demo the system here.

This research builds on previous work led by Dr Lamb, which includes developing a Gaelic POS-tagged corpus (ARCOSG) and a Gaelic POS-tagger and lemmatiser, which are useful tools for preprocessing Gaelic text.

Collaborators

PI: Dr William Lamb

Co-Is: Dr Beatrice Alex; Prof James Loxley

Consultant: Dr Marc Sinclair (Sr Speech Scientist, Quorate Technology Ltd)

Research Assistants: Dr Sharon Arbuthnot, Michael Bauer, Dr Samuel Danso, Lucy Evans, Susanna Naismith, Robert Thomas 

Demos

Automatic Scottish Gaelic speech recogniser: https://www.garg.ed.ac.uk/asr_demo

Automatics Scottish Gaelic text normaliser: https://www.garg.ed.ac.uk/an_gocair

Funding

The work has been supported by the following funders: 

  • Carnegie Trust for the Universities of Scotland
  • Bòrd na Gàidhlig
  • Soillse
  • Scottish Funding Council / Data-driven Data Innovation
  • University of Edinburgh (Challenge Investment Fund)