The multitask training format uses a set of special tokens that serve as task specifiers or classification targets. All of these tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing for a single model to replace many different stages of a traditional speech processing pipeline. Model SizeĪ Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. Links to both versions are below, check out more details on the Versions page. We still host all other model sizes in a previous version. We’ve created a version of Whisper which only runs the most recent Whisper model, large-v2. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech transcription as well as speech translation and language identification. Whisper is a general-purpose speech transcription model.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |