How Listen, Attend and Spell (LAS) neural network was gigantic is breakthrough in speech AI

4 min · 6 de ene de 2025

Descripción

In this episode, we dive into the revolutionary Listen, Attend and Spell (LAS) model that transforms how speech-to-text systems work. Unlike traditional methods that separate the process into multiple stages, LAS combines everything into one model, making it faster and more efficient. The system has two key parts: a 'listener' that processes the audio input, and a 'speller' that converts the information into text using attention-based mechanisms. Tune in to learn how LAS outperforms older speech recognition models, achieving impressive accuracy without relying on dictionaries or language models! Link to research paper- https://arxiv.org/abs/1508.01211 [https://arxiv.org/abs/1508.01211] Follow us on social media: Linkedin: https://www.linkedin.com/company/smallest/ [https://www.linkedin.com/company/smallest/] Twitter: https://x.com/smallest_AI [https://x.com/smallest_AI] Instagram: https://www.instagram.com/smallest.ai/ [https://www.instagram.com/smallest.ai/] Discord: https://smallest.ai/discord [https://smallest.ai/discord]

Comentarios

Sé la primera persona en comentar

¡Regístrate ahora y únete a la comunidad de Wave of the Day!

Prueba gratis

Todos los episodios

20 episodios

What is NN-grams?

What happens when you combine the best of old-school language models and the power of neural networks? You get NN-grams! In this episode, we break down how this new model blends n-grams (which remember word patterns) with neural networks (which can generalize like a pro). The result? More accurate and faster speech recognition. NN-grams are already outperforming traditional models on tasks like Italian speech recognition, and they’re faster too. Want to know how this hybrid model is changing the speech AI game? Tune in to learn more! Link to research paper- https://arxiv.org/abs/1606.07470 [https://arxiv.org/abs/1606.07470] Follow us on social media: Linkedin: https://www.linkedin.com/company/smallest/ [https://www.linkedin.com/company/smallest/] Twitter: https://x.com/smallest_AI [https://x.com/smallest_AI] Instagram: https://www.instagram.com/smallest.ai/ [https://www.instagram.com/smallest.ai/] Discord: https://smallest.ai/discord [https://smallest.ai/discord]

7 de ene de 20254 min

How Listen, Attend and Spell (LAS) neural network was gigantic is breakthrough in speech AI

6 de ene de 20254 min

What is scheduled sampling? Improving sequence prediction in RNNs

In this episode, we explore how Scheduled Sampling helps Recurrent Neural Networks (RNNs) make better predictions for tasks like machine translation and image captioning. Normally, during training, RNNs use the actual previous word or token to predict the next one. But when making predictions, the model has to use its own previous predictions, which can lead to mistakes building up. Scheduled Sampling solves this by slowly shifting the model from using the correct token during training to using its own predictions, helping it learn more effectively and reduce errors. Tune in to learn how this approach helped improve results in a major image captioning competition! Link to research paper- https://arxiv.org/abs/1506.03099 [https://arxiv.org/abs/1506.03099] Follow us on social media: Linkedin: https://www.linkedin.com/company/smallest/ [https://www.linkedin.com/company/smallest/] Twitter: https://x.com/smallest_AI [https://x.com/smallest_AI] Instagram: https://www.instagram.com/smallest.ai/ [https://www.instagram.com/smallest.ai/] Discord: https://smallest.ai/discord [https://smallest.ai/discord]

5 de ene de 20254 min

How batch normalization led to faster, smarter AI training

How do you speed up deep neural network training and improve its performance simultaneously? Batch Normalization is the answer. By addressing internal covariate shift, it allows models to train faster, requiring fewer steps and lower learning rates. In this episode, we break down how this technique was applied to a state-of-the-art image classification model, cutting training time by 14 times and surpassing human-level accuracy on ImageNet. Tune in to learn how Batch Normalization is transforming deep learning and setting new benchmarks in AI research. Link to research paper- https://arxiv.org/abs/1502.03167 [https://arxiv.org/abs/1502.03167] Follow us on social media: Linkedin: https://www.linkedin.com/company/smallest/ [https://www.linkedin.com/company/smallest/] Twitter: https://x.com/smallest_AI [https://x.com/smallest_AI] Instagram: https://www.instagram.com/smallest.ai/ [https://www.instagram.com/smallest.ai/] Discord: https://smallest.ai/discord [https://smallest.ai/discord]

4 de ene de 20253 min

Teaching AI to Move: GRUs in Sequence Modeling

How does AI learn to predict and generate realistic human motion? In this episode, we dive into the power of Gated Recurrent Units (GRUs) for sequence modeling. Discover how this advanced RNN architecture captures long-term dependencies, predicts motion data point by point, and generates lifelike movements. From speech synthesis to machine translation, GRUs are proving their versatility—tune in to see how they’re reshaping AI’s ability to understand and create dynamic sequences. Link to research paper- https://arxiv.org/abs/1501.00299 [https://arxiv.org/abs/1501.00299] Follow us on social media: Linkedin: https://www.linkedin.com/company/smallest/ [https://www.linkedin.com/company/smallest/] Twitter: https://x.com/smallest_AI [https://x.com/smallest_AI] Instagram: https://www.instagram.com/smallest.ai/ [https://www.instagram.com/smallest.ai/] Discord: https://smallest.ai/discord [https://smallest.ai/discord]

3 de ene de 20253 min

How Listen, Attend and Spell (LAS) neural network was gigantic is breakthrough in speech AI

Descripción

Comentarios

Empieza 7 días de prueba

Todos los episodios