This week at lunch ‘n Learn we were fortunate to enjoy a talk from Ms Katy Wigdahl, CEO of Speechmatics, the most powerful, inclusive, and accurate speech recognition software ever released. Ms Wigdahl has overseen the growth of the company since its inception as a consultancy-based speech company in 2006 until the launch this year of the URSA model with 100+ languages.

Speech is the default mode of communication for humans, we’re hardwired to talk to each other but historically speech has been underserved by technology. But with the advent of the large language model (LLM), new possibilities arose. An LLM is a type of AI for text generation. It takes an input text prompt and generates an output. So, what’s the point? Well, the input could be just a sentence or two, with an output of hundreds or thousands of words. LLM’s are an example of natural language processing. The neural network of an LLM is trained on billions of words – broken down into tokens – and, through a process of machine learning, an artificial understanding of these words (and the relationship between them) is built up. The degree of relational understanding is quantified as parameters. Where a token can be thought of as the neuron in a human brain, the parameters are the synapses – the connections in between. Without connections, you have a static database of information, with connections, you have a contextual understanding of that information.

In the case of Speechmatics, the AI deploys automatic speech recognition to make live transcriptions from a spoken broadcast and can translate it into languages that you don’t speak yourself – like code! During the presentation Ms Wigdahl was able to give us a live demonstration of the software translating a BBC world service broadcast into Korean in real time. She also informed us that the software has recently picked up the Ukrainian language in just 3 weeks! We were all blown away and a year 9 pupil later wrote, ‘I was excited and learned a lot of things, such as voice recognition software is often adjusted to represent male American voices, and this excludes about 80% of the world’s population.’ Another incredible instalment of Lunch ‘n Learn concludes.

Dr Price, STEAM Champion