Julius is a lightweight open-source Speech Recognition Engine

"Julius" is a high-performance, small-footprint large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. It is primarily written for C programming language.

The algorithm is based on 2-pass tree-trellis search, which fully incorporates major decoding techniques such as tree-organized lexicon, 1-best / word-pair context approximation, rank/score pruning, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc.

Julius main platforms are Linux and other Unix-based system, as well as Windows, Mac, Androids and other platforms.

Features

  • An open-source LVCSR software (BSD 3-clause license).
  • Real-time, hi-speed, accurate recognition based on 2-pass strategy.
  • Low memory requirement: less than 32MBytes required for work area (<64MBytes for 20k-word dictation with on-memory 3-gram LM).
  • Supports LM of N-gram with arbitrary N. Also supports rule-based grammar, and word list for isolated word recognition.
  • Language and unit-dependent: Any LM in ARPA standard format and AM in HTK ascii hmm definition format can be used.
  • Highly configurable: can set various search parameters. Also alternate decoding algorithm (1-best/word-pair approx., word trellis/word graph intermediates, etc.) can be chosen.
  • List of major supported features:
  • On-the-fly recognition for microphone and network input
  • GMM-based input rejection
  • Successive decoding, delimiting input by short pauses
  • N-best output
  • Word graph output
  • Forced alignment on word, phoneme, and state level
  • Confidence scoring
  • Server mode and control API
  • Many search parameters for tuning its performance
  • Character code conversion for result output.
  • (Rev. 4) Engine becomes Library and offers simple API
  • (Rev. 4) Long N-gram support
  • (Rev. 4) Run with forward / backward N-gram only
  • (Rev. 4) Confusion network output
  • (Rev. 4) Arbitrary multimodel decoding in a single thread.
  • (Rev. 4) Rapid isolated word recognition
  • (Rev. 4) User-defined LM function embedding
  • DNN-based decoding, using front-end module for frame-wise state probability calculation for flexibility.

Licenses

This code is made available under the modified BSD License (BSD-3-Clause License).

Resources