Two years ago, the Edinburgh LTG teamed up with researchers working on Gaelic language technology. They operate under the banner of the Gaelic Algorithmic Research Group (GARG), which is led by Dr William Lamb from Celtic and Scottish Studies.
To date, GARG started out by developing an automatic handwriting recognition model to convert manuscripts held at the School of Scottish Studies Archives (University of Edinburgh) to text using Transkribus. More information on this work can be found here.
Automatic transcription of Gaelic handwriting using Transkribus
We then built a working automatic speech recogniser (ASR) for Scottish Gaelic, which turns Gaelic speech recordings to text.
Building a speech recognisers crucially requires correct alignment of Scottish Gaelic speech and text, which that was developed by Quorate Technology Ltd. Currently the speech recogniser works at 70-75% accuracy but we are hoping to improve this further with additional Scottish Gaelic training data. To find out more detailed information on this work see this blog here and to see examples of our existing speech recogniser displayed as time-aligned subtitles click here!
The group is also currently working on automatic orthographic normalisation, building a large Gaelic language model and evaluating the ASR algorithms on a blind test set. This research builds on previous work led by Dr Lamb, which includes developing a Gaelic POS-tagged corpus (ARCOSG) and a Gaelic POS-tagger and lemmatiser, which are useful tools for preprocessing Gaelic text.
PI: Dr William Lamb
Co-Is: Dr Beatrice Alex; Prof James Loxley
Consultant: Dr Marc Sinclair (Sr Speech Scientist, Quorate Technology Ltd)
Research Assistants: Dr Sharon Arbuthnot, Michael Bauer, Dr Samuel Danso, Lucy Evans, Susanna Naismith, Robert Thomas
The work has been supported by the following funders:
- Carnegie Trust for the Universities of Scotland
- Bòrd na Gàidhlig
- Scottish Funding Council / Data-driven Data Innovation
- University of Edinburgh (Challenge Investment Fund)