STT Options for OpenVoiceOS

Sat, Nov 30, 2024
5-minute read

OpenVoiceOS, or OVOS, is the spiritual successor (and fork) to the Mycroft voice assistant. It is a privacy-focused, open-source voice assistant that you can run on your own hardware. OVOS has a plugin system that allows you to swap out the default speech-to-text (STT) engine for one that you prefer.

The plugin system, while powerful, can be confusing due to the sheer number of options available. This post will cover some of the STT options available to you and when you might use them.

Running Plugins Directly

While the open source community typically runs voice assistants on Raspberry Pi or other low-cost hardware, some run their voice assistants on more powerful hardware. If you have a powerful machine, or the model doesn’t take much to run, you can run your STT plugin directly on your voice assistant hardware.

You may notice that there aren’t any recommendations for STT plugins that run on Raspberry Pi. While you could run citrinet, VOSK, or smaller fasterwhisper models on a Raspberry Pi, the performance will be too slow to use comfortably in a voice assistant. If you have strong privacy needs, hosting a local STT server on a more powerful machine and pointing your voice assistant to it is a better option. Otherwise, you might consider using a cloud-based STT service such as Microsoft Azure STT.

Here are the STT plugins directly maintained by OVOS as of this writing (late November 2024):

Hardware/Environment	Model	GitHub Link	Notes
GPU-Optimized	NeMo with GPU	ovos-stt-plugin-nemo	Best performance with GPU acceleration, but many models exist optimized for CPU
GPU-Optimized	fasterwhisper	ovos-stt-plugin-fasterwhisper	Fast and accurate with GPU acceleration, multilingual models available
GPU-Optimized	Meta MMS	ovos-stt-plugin-mms	GPU recommended for optimal performance
GPU-Optimized	wav2vec2	ovos-stt-plugin-wav2vec2	Benefits from GPU acceleration, but not required
CPU-Capable	VOSK	ovos-stt-plugin-vosk	Works well on CPU, suitable for smaller devices, not accurate and not recommended for most use cases .
CPU-Capable	citrinet	ovos-stt-plugin-citrinet	ONNX-converted NeMo model for CPU usage. Only citrinet can be exported to ONNX, and this plugin cannot run other NeMo models.
API-Based/Cloud	Chromium	ovos-stt-plugin-chromium	Uses deprecated API, very fast but unsupported by Google
API-Based/Cloud	Microsoft Azure STT	ovos-stt-plugin-azure	Cloud-based service, usage-based cost, fast and high quality but not private
Language-Specific	projectAINA-remote	ovos-stt-plugin-projectAINA-remote	Catalan models
Language-Specific	HiTZ	ovos-stt-plugin-HiTZ	Basque models
Language-Specific	Nòs	ovos-stt-plugin-nos	Galician models
Language-Specific	Iberian peninsula fasterwhisper	ovos-stt-plugin-fasterwhisper-zuazo	Optimized for Iberian languages

OVOS offers many STT plugins as proofs-of-concept in their OVOS Hatchery Git organization. The plugins in this repository worked as of their last commit, but are not guaranteed to work in perpetuity or with the latest changes to OVOS. If you didn’t see an STT engine you wanted to try then you might see it in the Hatchery, ready to be maintained by someone with time and interest.

OVOS-STT-HTTP-Server

If you want to run multiple STT engines at once, or if you want to run your STT engine on a different machine from your voice assistant, you can use the OVOS STT HTTP Server. This server puts a consistent API on the front of any supported OVOS STT plugin, enabling you to run your personal, private speech-to-text on a different machine from your Voice Assistant. It also means you are not constrained by the hardware on your Voice Assistant - you could run FasterWhisper on a machine with a GPU, for example, and use it as your primary STT engine. Finally, you can use a single STT server for a number of local assistants, allowing you to take advantage of a more powerful machine for all your STT needs.

In addition to usage with voice assistants, the REST API can be used for any custom application that requires speech-to-text. For example, you could use it to automatically transcribe podcasts overnight when no one would typically be using the server. You could also use it to transcribe notes you took that day from meetings or lectures.

STT Plugins From Neon

Neon.AI is a downstream partner of OVOS and a former Mycroft channel partner. They maintain a number of STT plugins that are compatible with OVOS, though some are for specialized use cases. Here are some of the plugins available from Neon:

Hardware/Environment	Model	GitHub Link	Notes
API-Based/Cloud	Custom NeMo citrinet	neon-stt-plugin-nemo-remote	Point at a remote server with a custom trained NeMo model which can run on a Raspberry Pi 4 or higher, though it’s a bit slow
API-Based/Cloud	Google Cloud STT	neon-stt-plugin-google_cloud_streaming	Google Cloud STT models through their API, not private
CPU-Optimized	Custom NeMo citrinet	neon-stt-plugin-nemo	Custom trained NeMo model which can run on a Raspberry Pi 4 or higher, though it’s a bit slow
API-Based/CPU-Optimized	Custom NeMo citrinet	neon-stt-nemo	Custom trained NeMo model which can run on a Raspberry Pi 4 or higher, though it’s a bit slow. This plugin has streaming capabilities to work on low-powered hardware (think ESP32 or Raspberry Pi Zero 2W)
CPU-Optimized	Silero	neon-stt-plugin-silero	Silero models, optimized for CPU usage, can run on a Raspberry Pi 4 or higher but not very accurate. Not recommended for anything but POCs.

There are a number of other Neon STT plugins, but they are either archived or use models that are no longer maintained or supported.

Conclusion

Questions? Comments? Feedback? Let me know on the Open Conversational AI Forums or OVOS support chat on Matrix. I’m available to help and so is the rest of the community.

home automation personal voice assistant homelab mycroft neon ovos