By Arjun Chandran
For most people communication is a necessity, imperative for connecting with others and sharing our thoughts, feelings, and emotions. However, those with neurological conditions often find it extremely difficult or cumbersome to fulfill this basic human necessity. To help those struggling with communication difficulty, a team of researchers from UC Berkeley and UC San Francisco has worked on a way to convert brain waves into speech using neural networks.
Traditional methodologies for communication devices often rely on nonverbal movements, or computer brain interfaces, that allow the user to pick letters one by one to spell out what they want to say. Although these methods work, they are often hard to use, inconvenient and slow. Rather than go letter by letter, researchers decided to focus on vocal tract movements, and the sounds produced, to mimic the rate at which natural speech occurs. The researchers accomplished this using two neural networks. The first neural network (stage 1) is responsible for converting neural activity into their associated vocal tract movements. This network was trained on the neural activity (located in the ventral sensorimotor cortex (vSMC), superior temporal gyrus (STG) and inferior frontal gyrus (IFG)) of five epilepsy patients who were tasked with speaking several hundred sentences aloud. The second neural network (stage 2) is responsible for converting vocal tract movements to their associated sounds and was trained on the output of the first neural network.
To test the efficacy of this method, researchers had two groups of listeners reconstruct pre-determined sentences generated by the neural prosthetic. The results showed that while the system was not perfect, listeners were able to do a fairly good job at reconstructing the correct sentences. Although the researchers trained the stage 1 neural network on individuals with the capability to speak, these stage 1 neural networks were highly conserved across all speakers. This allows for the direct application of a trained stage 1 neural network onto an individual who cannot communicate, which (when combined with a stage 2 neural network) provides them with a speech prosthetic that is more intuitive to learn and easier to use when compared to traditional speech prosthetics.
Ultimately, by demonstrating the viability of this two-stage process, these researchers have laid the foundation for the production of easy to use and efficient speech prosthetics. This would drastically improve the quality of life of individuals who have lost the ability to speak or communicate due to injury or disease, by returning a once lost, free-flowing, voice to those that need it the most.
Anumanchipalli, Gopala K., et al. “Speech Synthesis from Neural Decoding of Spoken Sentences.” Nature News, Nature Publishing Group, 24 Apr. 2019, Retrieved from www.nature.com/articles/s41586-019-1119-1.