Create voice-controlled applications with Jasper
Jasper is an open source Python-based platform used for creating voice-controlled applications. Together with some hardware Jasper provides always-on tool which you can ask questions or apply to control your home with your voice from several meters away.
Jasper's hardware and software architecture
Jasper’s design was specifically tailored for the Raspberry Pi (Model B). It requires several off-the-shelf components: USB microphone, wi-fi adapter or ethernet cable, speakers, micro-USB cable, and SD card. Hardware should be connected to Raspberry Pi. The ethernet cable is needed for logging in to the Pi from computer during the software installation. When system is ready, you can use wireless adapter instead of wired connection.
Jasper can be installed on Raspberry Pi in three ways:
- Run pre-compiled disk image for Model B, then image SD card, clone the repository and install the Python dependencies.
- Apply package manager (not available Debian or Raspbian).
- Install manually according to instructions if there is need to compile Jasper software from scratch.
Jasper’s software architecture is composed of several components:
- jasper.py contains code for Jasper management.
- Profile - user configuration needed for correct work of the tool.
- Conversation instance is an input for the mic and profile.
- Brain is an interface between developer-written modules and the core framework.
The profile created by Jasper uses information to customize responses of Jasper modules. For instance, accurate notifications of weather in your area, correct time zone sensitive reporting, text messaging, etc. Profile data is private and not shared with third parties unless configured otherwise. Profile, mic, and conversation instances are created by jasper.py. Mic and profile take conversation instance as input based on which notifier and brain are created. All interactive components are loaded into memory by the brain.
Speech recognition within Jasper
Important part of Jasper is Speech-To-Text (STT) engine - software that transcribes recorded spoken commands into written text. You can choose among several STT engines:
- Pocketsphinx is an open source speech decoder. Advantages: lightweight and quick; developed especially for mobile devices and embedded systems like the Raspberry Pi; does not transfer mic data over the internet so your personal information is safe; recognition is performed offline. Disadvantages: not very high recognition rate; it has a lot of dependencies.
- Google STT is an STT engine designed at Google. Advantages: very accurate, control and flexibility over the speech recognition capabilities. Disadvantages: amount of speech transformed per day is limited; always needs active internet connection.
- AT&T STT is a speech decoder developed by AT&T. Advantages: low cost, highly accurate speech functionality via RESTful API; vocabulary and grammar configuration. Disadvantages: commercial, always needs active internet connection because recognition is performed online.
- Wit.ai STT helps developers to work with natural language, e.g. for mobile apps, home automation, wearable devices, messenger agents and robots. Advantages: stable; relies on the wit.ai cloud services; customizable vocabulary and language; trains speech recognition algorithms via crowdsorcing. Disadvantages: recognition is performed online.
- Julius is an open source speech recognition engine. Advantages: high-performance; does not need an active internet connection. Disadvantages: complex because you will have to train your own acoustic model.
Jasper is applied to develop always-on, voice-controlled applications with Raspberry Pi. You can finetune such apps using out-of-the-box modules or write your own with a help of very simple interface for developers. Find more information on the Jasper website.