Whisper

These notes cover using whiser, whisper.cpp and how to train a new model which can be used by either of the others.

Creating a Custom Model

Creating the Training Data

Need audio files in the correct format, text files (CSV) with the text for each one and a model file which lists how to match.

Making Audio Files

Record the files on anyhting you may use. The speech recognition will be improved if the recording source is that which is used for real speech (assumption). These files can be converted using ffmpeg, e.g.

ffmpeg -i ~/Documents/_record_filename.???_ -ar 16k ExampleText.wav

The output must be a 16k sample rate wav file for whisper.cpp to test the speech recognition on them (useful to create training data).

cd WebDev/whisper.cpp
./main -i ExampleText.wav --model models/ggml-small.en.bin