A speech-to-text (STT) system is as its name implies; A way of transforming the spoken words via sound into textual files that can be used later for any purpose.

Speech-to-text technology is extremely useful. It can be used for a lot of applications such as a automation of transcription, writing books/texts using your own sound only, enabling complicated analyses on information using the generated textual files and a lot of other things.

In the past, the speech-to-text technology was dominated by proprietary software and libraries; Open source alternatives didn’t exist or existed with extreme limitations and no community around. This is changing, today there are a lot of open source speech-to-text tools and libraries that you can use right now.

Here we list 5 of them.

Open Source Speech Recognition Libraries

Project DeepSpeech

5 Good Open Source Speech Recognition/Speech-to-Text Systems 15 open source speech recognition
Image via Mozilla.

This project is made by Mozilla; The organization behind the Firefox browser. It’s a 100% free and open source speech-to-text library that also implies the machine learning technology using TensorFlow framework to fulfill its mission.

In other words, you can use it to build training models yourself to enhance the underlying speech-to-text technology and get better results, or even to bring it to other languages if you want. You can also easily integrate it to your other machine learning projects that you are having on TensorFlow. Sadly it sounds like the project is currently only supporting English by default.

It’s also available in many languages such as Python (3.6); Which allows you to have it working in seconds:

pip3 install deepspeech
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav

You can also install it using npm:

npm install deepspeech

For more information, refer to the project’s homepage.

Kaldi

5 Good Open Source Speech Recognition/Speech-to-Text Systems 17 open source speech recognition

Kaldi is an open source speech recognition software written in C++, and is released under the Apache public license. It works on Windows, macOS and Linux. Its development started back in 2009.

Kaldi’s main features over some other speech recognition software is that it’s extendable and modular; The community is providing tons of 3rd-party modules that you can use for your tasks. Kaldi also supports deep neural networks, and offers an excellent documentation on its website.

While the code is mainly written in C++, it’s “wrapped” by Bash and Python scripts. So if you are looking just for the basic usage of converting speech to text, then you’ll find it easy to accomplish that via either Python or Bash.

Project’s homepage.

Julius

5 Good Open Source Speech Recognition/Speech-to-Text Systems 19 open source speech recognition

Probably one of the oldest speech recognition software ever; It’s development started in 1991 at the University of Kyoto, and then its ownership was transferred to an independent project team in 2005.

Julius main features include its ability to perform real-time STT processes, low memory usage (Less than 64MB for 20000 words), ability to produce N-best/Word-graph output, ability to work as a server unit and a lot more. This software was mainly built for academic and research purposes. It is written in C, and works on Linux, Windows, macOS and even Android (on smartphones).

Currently it supports both English and Japanese languages only. The software is probably availbale to install easily in your Linux distribution’s repository; Just search for julius package in your package manager. The latest version was released around one and half months ago.

Project’s homepage.

Wav2Letter++

5 Good Open Source Speech Recognition/Speech-to-Text Systems 21 open source speech recognition
Image via Facebook blog.

If you are looking for something modern, then this one is for you. Wav2Letter++ is an open source speech recognition software that was released by Facebook’s AI Research Team just 2 months ago. The code is released under the BSD license.

Facebook is describing its library as “the fastest state-of-the-art speech recognition system available”. The concepts on which this tool is built makes it optimized for performance by default; Facebook’s also-new machine learning library FlashLight is used as the underlying core of Wav2Letter++.

Wav2Letter++ needs you first to build a training model for the language you desire by yourself in order to train the algorithms on it. No pre-built support of any language (including English) is available; It’s just a machine-learning-driven tool to convert speech to text. It was written in C++, hence the name (Wav2Letter++).

Project’s homepage.

DeepSpeech2

5 Good Open Source Speech Recognition/Speech-to-Text Systems 23 open source speech recognition

Researchers at the Chinese giant Baidu are also working on their own speech-to-text engine, called DeepSpeech2. It’s an end-to-end open source engine that uses the “PaddlePaddle” deep learning framework for converting both English & Mandarin Chinese languages speeches into text. The code is released under BSD license.

The engine can be trained on any model and for any language you desire. The models are not released with the code; You’ll have to build them yourself, just like the other software. DeepSpeech2’s source code is written in Python; So it should be easy for you to get familiar with it if that’s the language you use.

Project’s homepage.

Conclusion

The speech recognition category is still mainly dominated by proprietary software giants like Google and IBM (which do provide their own closed-source commercial services for this), but the open source alternatives are promising. Those 5 open source speech recognition engines should get you going in building your application, all of them are still under heavy development by time. In few years, we expect open source to become the norm for those technologies just like in the other industries.

If you have any other recommendations for this list, or comments in general, we’d love to hear them below!

6 Comments

  1. Is the Android speech to text app going to be ported to, at least, Linux (which I use)? I have it on my phone and it’s really good!

    Also are there any text to speech programs available, again for at least Linux?

    Reply

    • M.Hanny Sabbagh

      February 20, 2019 at 7:20 am

      As far as I know nobody is working on porting individual applications from android to GNU/Linux.

      There’s a program called KDE Simon, you can check for it.

      Reply

  2. Bob Putnam

    February 21, 2019 at 2:30 pm

    There’s a Chrome browser extension that works extraordinarily well.

    Reply

  3. Lootosee

    April 11, 2019 at 4:11 am

    All these projects seem pretty useless if they aren’t packaged in an executable or binary format for use on a particular OS. Short of techie or geek types, regular people are not going to tweak or compile source code. The Windows OS already has SAPI, so what is the incentive to try one of these projects? These projects are not making themselves accessible to the masses.

    Reply

    • M.Hanny Sabbagh

      April 11, 2019 at 9:58 am

      Those projects are simply not for regular people, they are for programmers and those who are building a system that requires speech renegotiation, then they can use those systems instead of the proprietary ones.

      Reply

      • Sarah

        June 20, 2019 at 2:11 am

        All fine and good, except even for stuff like pocketsphinx, nobody bothers to explain how to write out a terminal command for it in Linux.

        Rather than they saying it’s for programmers, why not say “it’s for a subset of programmers that can self-learn their own terminal commands”.

        I am a programmer, and there is no tutorials on it worth anything.

        Reply

Leave a Reply

Your email address will not be published. Required fields are marked *