tts

CoquiTTS: An Open-Source Text-To-Speech Library

a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Hazem Abbas

Nov 20, 2022 — 1 min read

Table of Content

CoquiTTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality.

It comes with pretrained models, tools for measuring dataset quality and already used in 20+ languages for products and research projects.

CoquiTTS is written with Python, and it can be a handy tool for video game developers, post-production, dubbing, and creating educational videos.

CoquiTTS developers are working now on, Coqui studio which will offer a straightforward simple user-friendly interface to clone and create text-to-speech audios in MP3 format.

Features

High-performance Deep Learning models for Text2Speech tasks.
Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
Speaker Encoder to compute speaker embeddings efficiently.
Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
Fast and efficient model training.
Detailed training logs on the terminal and Tensorboard.
Support for Multi-speaker TTS.
Efficient, flexible, lightweight but feature complete Trainer API.
Released and ready-to-use models.
Tools to curate Text2Speech datasets underdataset_analysis.
Utilities to use and test your models.
Modular (but not too much) code base enabling easy implementation of new ideas.

Implemented Models

Spectrogram models

Tacotron: paper
Tacotron2: paper
Glow-TTS: paper
Speedy-Speech: paper
Align-TTS: paper
FastPitch: paper
FastSpeech: paper
SC-GlowTTS: paper
Capacitron: paper

End-to-End Models

VITS: paper
YourTTS: paper

Attention Methods

Guided Attention: paper
Forward Backward Decoding: paper
Graves Attention: paper
Double Decoder Consistency: blog
Dynamic Convolutional Attention: paper
Alignment Network: paper

Speaker Encoder

GE2E: paper
Angular Loss: paper

Vocoders

MelGAN: paper
MultiBandMelGAN: paper
ParallelWaveGAN: paper
GAN-TTS discriminators: paper
WaveRNN: origin
WaveGrad: paper
HiFiGAN: paper
UnivNet: paper

License

The project is released under the MPL-2.0 License.

Resources

tts Open-source Deep Learning Artificial Intelligence youtube Python programming Machine Learning

CoquiTTS: An Open-Source Text-To-Speech Library

Hazem Abbas

Table of Content

Features

Implemented Models

Spectrogram models

End-to-End Models

Attention Methods

Speaker Encoder

Vocoders

License

Resources

Are You Truly Ready to Put Your Mobile or Web App to the Test?

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

Why Platformers Are ADHD Superpowers (And How My VR Game ‘Bubbles’ Blew My Friend’s Mind)

How Coding with AI Can Mess with Your Confidence (and Why That’s Okay)

The Unethical Sneaky Ads of Clickup.com Against Milanote; NOT Cool

Understanding ADHD: A Parent’s Guide to Spotting the Signs and Supporting Your Child

Table of Content

Features

Implemented Models

Spectrogram models

End-to-End Models

Attention Methods

Speaker Encoder

Vocoders

License

Resources

Read More Articles in tts

EchoCharm is a Free Versatile text-to-speech TTS Application

"audapolis": The Revolutionary Editor Empowering Spoken-Word Media Editing

Koodo Reader: open-source ebook reader (Free app)

16 Open-source and Free TTS (Text-To-Speech) Programs for Windows

16 Open-source Web-based Text-to-Speech Apps and TTS JavaScript Libraries

Best 10 Free Text To Speech (TTS) Services

Articles

Systems

Development

Apps

Science - Healthcare

Open-source Apps

Medical Apps

Lists

Dev. Resources

Read more

Why Platformers Are ADHD Superpowers (And How My VR Game ‘Bubbles’ Blew My Friend’s Mind)

How Coding with AI Can Mess with Your Confidence (and Why That’s Okay)

The Unethical Sneaky Ads of Clickup.com Against Milanote; NOT Cool

Understanding ADHD: A Parent’s Guide to Spotting the Signs and Supporting Your Child