Two years ago, the Edinburgh LTG teamed up with researchers working on Scottish Gaelic language technology. We operate under the banner of the Gaelic Algorithmic Research Group (GARG), which is led by Dr William Lamb from Celtic and Scottish Studies.
To date, GARG started out by developing an automatic handwriting recognition model to convert manuscripts held at the School of Scottish Studies Archives (University of Edinburgh) to text using Transkribus. More information on this work can be found here.
Automatic transcription of Gaelic handwriting using Transkribus
We then built a working automatic speech recogniser (ASR) for Scottish Gaelic, which turns Gaelic speech recordings into text.
Building a speech recognisers crucially requires correct alignment of Scottish Gaelic speech and text, which that was developed by Quorate Technology Ltd. Currently the speech recogniser works at 70-75% accuracy but we are planning to improve this further with additional Scottish Gaelic training data. To find out more detailed information on this work see this blog and to see examples of our existing speech recogniser displayed as time-aligned subtitles click here! We’ve also made a demo available to try out yourself.
The group is also developed an automatic orthographic normalisation tool for creating more data to build a large Gaelic language model. More information on this normalisation work can be found here, we demo the system here.
This research builds on previous work led by Dr Lamb, which includes developing a Gaelic POS-tagged corpus (ARCOSG) and a Gaelic POS-tagger and lemmatiser, which are useful tools for preprocessing Gaelic text.
Collaborators
PI: Dr William Lamb
Co-Is: Dr Beatrice Alex; Prof James Loxley
Consultant: Dr Marc Sinclair (Sr Speech Scientist, Quorate Technology Ltd)
Research Assistants: Dr Sharon Arbuthnot, Michael Bauer, Dr Samuel Danso, Lucy Evans, Susanna Naismith, Robert Thomas
Demos
Automatic Scottish Gaelic speech recogniser: https://www.garg.ed.ac.uk/asr_demo
Automatics Scottish Gaelic text normaliser: https://www.garg.ed.ac.uk/an_gocair
Funding
The work has been supported by the following funders:
- Carnegie Trust for the Universities of Scotland
- Bòrd na Gàidhlig
- Soillse
- Scottish Funding Council / Data-driven Data Innovation
- University of Edinburgh (Challenge Investment Fund)