Whisper is a versatile speech recognition model that has been extensively trained on a wide range of audio samples. It's a multitasking model that can handle multiple tasks, including multilingual speech recognition, speech translation, and language identification. — Note: Generative AI services are used as assistants in this blog post!! Introduction Pre-trained audio encoders, while able to learn high-quality representations of speech, have limitations due to their unsupervised nature. They lack an equally high-quality decoder, requiring a complex fine-tuning stage to be useful for tasks like speech recognition. Fine-tuning…