[ad_1]
× closure
Image source: Pixabay/CC0 Public Domain
Researchers have developed a new deep learning model that promises to significantly improve audio quality in real-world scenarios by leveraging a previously underutilized tool: human perception.
The researchers found that they could take people’s subjective ratings of sound quality and combine them with speech enhancement models to achieve better speech quality as measured by objective metrics.
The new model outperforms other standard methods in minimizing noise audio, which is unwanted sound that may disrupt what the listener really wants to hear. Most importantly, we found that the prediction quality scores produced by the model correlated closely with the judgments made by humans.
The traditional approach to limiting background noise is to use artificial intelligence algorithms to extract the noise from the desired signal. But these objective methods don’t always line up with listeners’ assessments of how comprehensible a speech is, said study co-author Donald Williamson, an associate professor in the Department of Computer Science and Engineering at The Ohio State University.
“What differentiates this study from others is that we tried to use perception to train the model to eliminate unwanted sounds,” Williamson said. “If humans can perceive something about the signal quality, then our model can use that as additional information to learn and better cancel the noise.
The research is published in the journal IEEE/ACM Transactions on Audio, Speech and Language Processingfocuses on improving monophonic speech enhancement, or speech coming from a single audio channel (such as a microphone).
This study trained the new model on two data sets from previous research involving recordings of people talking. In some cases, background noise such as television or music may obscure the conversation. Listeners rated the speech quality of each recording on a scale of 1 to 100.
The team’s model achieved impressive performance from a federated learning approach that combines specialized speech-augmentation language modules with a predictive model that predicts the average opinion a human listener is likely to give to a noisy signal. Score.
The results showed that their new approach outperformed other models in improving speech quality, as measured by objective metrics such as perceptual quality, intelligibility and human ratings.
But exploiting human perception of sound quality has its own problems, Williamson said.
“Noisy audio is difficult to assess because it’s very subjective. It depends on your hearing ability and your listening experience,” he said. He said that factors such as hearing aids or cochlear implants can also affect the average person’s perception of the sound environment.
Because improving the quality of speech in noise is critical to improving hearing aids, speech recognition programs, speaker verification applications, and hands-free communication systems, these perceptual differences must be small enough to prevent noise from becoming user-unfriendly.
As the complex relationship between artificial intelligence and the real world continues to evolve, Williamson imagines that, similar to image augmented reality devices, future technologies may enhance audio in real time, adding or removing certain parts of the sound environment to improve consumers’ experience. experience. Overall listening experience.
To help achieve this, the researchers plan to continue using human subjective evaluations to support their model in order to handle more complex audio systems and ensure that it keeps pace with the changing expectations of human users.
“In general, the entire machine learning artificial intelligence process requires more human participation,” he said. “I hope the field will recognize this importance and continue to support going down this path.”
More information:
Khandokar Md. Nayem et al., Attention-based speech enhancement using a human quality perception model, IEEE/ACM Transactions on Audio, Speech and Language Processing (2023). DOI: 10.1109/TASLP.2023.3328282
[ad_2]
Source link