Kaldi - Speech to text using Aspire
Kaldi - Speech to text using Aspire
Assuming you already installed Kaldi, the following blog post shows how to install the Aspire Model. A list of available Kaldi models can be found here. If you don't have Kaldi installed yet, follow this web page - How to install Kaldi-ASR on Ubuntu 18
Navigate to egs/aspire/s5
wget https://kaldi-asr.org/models/1/0001_aspire_chain_model_with_hclg.tar.bz2
tar xfv 0001_aspire_chain_model.tar.gz
steps/online/nnet3/prepare_online_decoding.sh --mfcc-config conf/mfcc_hires.conf data/lang_chain exp/nnet3/extractor exp/chain/tdnn_7b exp/tdnn_7b_chain_online
wget https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav
./cmd.sh
./path.sh
ffmpeg -i LDC93S1.wav -acodec pcm_s16le -ac 1 -ar 8000 output.wav
navigate to src/online2bin
In the following command path(s) to words.txt, final.mdl, HCLG.fst, test.wav may need to be adjusted as per your installation.
./online2-wav-nnet3-latgen-faster \
--online=false \
--do-endpointing=false \
--frame-subsampling-factor=3 \
--config=/home/ubuntu/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/conf/online.conf \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=1.0 \
--word-symbol-table=/home/ubuntu/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/words.txt \
/home/ubuntu/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/final.mdl \
/home/ubuntu/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/HCLG.fst \
'ark:echo utterance-id1 utterance-id1|' \
'scp:echo utterance-id1 /home/ubuntu/kaldi/egs/aspire/s5/output.wav|' \
'ark:/dev/null'
In the output, there would be several log statements and a line that starts with "utterance-id1", after "utterance-id1" would be the translation.
Kaldi - Speech to text using Aspire
Assuming you already installed Kaldi, the following blog post shows how to install the Aspire Model. A list of available Kaldi models can be found here. If you don't have Kaldi installed yet, follow this web page - How to install Kaldi-ASR on Ubuntu 18
Navigate to egs/aspire/s5
wget https://kaldi-asr.org/models/1/0001_aspire_chain_model_with_hclg.tar.bz2
tar xfv 0001_aspire_chain_model.tar.gz
steps/online/nnet3/prepare_online_decoding.sh --mfcc-config conf/mfcc_hires.conf data/lang_chain exp/nnet3/extractor exp/chain/tdnn_7b exp/tdnn_7b_chain_online
wget https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav
./cmd.sh
./path.sh
ffmpeg -i LDC93S1.wav -acodec pcm_s16le -ac 1 -ar 8000 output.wav
navigate to src/online2bin
In the following command path(s) to words.txt, final.mdl, HCLG.fst, test.wav may need to be adjusted as per your installation.
./online2-wav-nnet3-latgen-faster \
--online=false \
--do-endpointing=false \
--frame-subsampling-factor=3 \
--config=/home/ubuntu/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/conf/online.conf \
--max-active=7000 \
--beam=15.0 \
--lattice-beam=6.0 \
--acoustic-scale=1.0 \
--word-symbol-table=/home/ubuntu/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/words.txt \
/home/ubuntu/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/final.mdl \
/home/ubuntu/kaldi/egs/aspire/s5/exp/tdnn_7b_chain_online/graph_pp/HCLG.fst \
'ark:echo utterance-id1 utterance-id1|' \
'scp:echo utterance-id1 /home/ubuntu/kaldi/egs/aspire/s5/output.wav|' \
'ark:/dev/null'
In the output, there would be several log statements and a line that starts with "utterance-id1", after "utterance-id1" would be the translation.
Kaldi - Speech to text using Aspire
Comments
Post a Comment
Chime in!