Free large voice datasets for machine learning

 Free large voice datasets for machine learning


           If you are into speech to text, whether, you are developing your own code or training models based on existing speech to text recognition engines, these datasets could be helpful. All of these are large english datasets. There are several other smaller datasets, which are omitted in this post on purpose, such as a digits dataset and commands dataset which wouldn't be much helpful.



Over a million utterances.


The Spoken Wikipedia Corpora

Hundreds of hours, over 16 GB in size.




Over 450 hours of audio. 


Over 1400 hours and 50 GB in size.


LibriSpeech

Over 500 hours and 30 GB.


Free large voice datasets for machine learning

Comments

Popular posts from this blog

Multi-part Upload to S3 programmatically in .Net using C#

How to install Kaldi-ASR on Ubuntu 18

True Multi-Factor Authentication