Posts

Showing posts from November, 2020

Free large voice datasets for machine learning

 Free large voice datasets for machine learning            If you are into speech to text, whether, you are developing your own code or training models based on existing speech to text recognition engines, these datasets could be helpful. All of these are large english datasets. There are several other smaller datasets, which are omitted in this post on purpose, such as a digits dataset and commands dataset which wouldn't be much helpful. VoxCeleb Over a million utterances. The Spoken Wikipedia Corpora Hundreds of hours, over 16 GB in size. TED-LIUM Release 3 Over 450 hours of audio.  CommonVoice   Over 1400 hours and 50 GB in size. LibriSpeech Over 500 hours and 30 GB. Free large voice datasets for machine learning