Vocode - Build Voice-enabled AI Apps with this Amazing Open-source Python Framework

Vocode - Build Voice-enabled AI Apps with this Amazing Open-source Python Framework

Table of Content

Vocode is an open-source library that simplifies building voice-enabled applications powered by large language models (LLMs). It allows developers to create real-time, voice-based conversations with LLMs and deploy them to phone calls, Zoom meetings, and beyond.

With Vocode, you can build interactive apps like personal assistants or voice-controlled games such as chess.

It provides easy-to-use integrations and abstractions, streamlining the development of voice-enabled apps, making LLM interactions more accessible through voice interfaces.

Features

  • 🗣 Real-time Conversations: Initiate voice-based conversations directly with your system audio using LLMs.
  • ➡️ 📞 Phone Number Setup: Set up phone numbers that interact with LLM-based agents for automated responses.
  • 📞 ➡️ Outbound Phone Calls: Send calls from your managed phone numbers powered by LLM-based agents.
  • 🧑‍💻 Zoom Call Integration: Dial into Zoom meetings with voice-enabled LLMs.
  • 🤖 Langchain Agent Integration: Make outbound calls to real phone numbers using LLMs via Langchain agents.
  • Ultra-Realistic Voices: Choose from a wide selection of highly realistic voices to enhance customer experiences.
  • Multilingual Capability: Configure bots to communicate in multiple languages, offering native language support for your customers.
  • Custom Language Models: Utilize your own language models to create bots with personalized, context-specific communication.
  • External Action Integration: Automate tasks such as scheduling, payments, and more by integrating with external systems.
  • Phone Menu Navigation: Bots can navigate phone menus to reach the appropriate department or individual.
  • Hold Waiting: Bots can wait on hold for you and transfer the call once a human agent is available.
  • Knowledge Base Connection: Enhance bots with knowledge from your own resources to improve their responses and intelligence.
  • Analytics and Monitoring: Track bot performance and customer interactions for valuable insights.
  • Available 24/7: Bots are always active, scaling up or down as needed, eliminating the need for hiring, training, or managing human agents.

Out-of-the-box Integrations

  • Transcription Services:
    • AssemblyAI
    • Deepgram
    • Gladia
    • Google Cloud
    • Microsoft Azure
    • RevAI
    • Whisper
    • Whisper.cpp
  • LLMs:
    • OpenAI
    • Anthropic
  • Synthesis Services:
    • Rime.ai
    • Microsoft Azure
    • Google Cloud
    • Play.ht
    • Eleven Labs
    • Cartesia
    • Coqui (OSS)
    • gTTS
    • StreamElements
    • Bark
    • AWS Polly

Install

pip install vocode

Usage

import asyncio
import signal

from pydantic_settings import BaseSettings, SettingsConfigDict

from vocode.helpers import create_streaming_microphone_input_and_speaker_output
from vocode.logging import configure_pretty_logging
from vocode.streaming.agent.chat_gpt_agent import ChatGPTAgent
from vocode.streaming.models.agent import ChatGPTAgentConfig
from vocode.streaming.models.message import BaseMessage
from vocode.streaming.models.synthesizer import AzureSynthesizerConfig
from vocode.streaming.models.transcriber import (
    DeepgramTranscriberConfig,
    PunctuationEndpointingConfig,
)
from vocode.streaming.streaming_conversation import StreamingConversation
from vocode.streaming.synthesizer.azure_synthesizer import AzureSynthesizer
from vocode.streaming.transcriber.deepgram_transcriber import DeepgramTranscriber

configure_pretty_logging()


class Settings(BaseSettings):
    """
    Settings for the streaming conversation quickstart.
    These parameters can be configured with environment variables.
    """

    openai_api_key: str = "ENTER_YOUR_OPENAI_API_KEY_HERE"
    azure_speech_key: str = "ENTER_YOUR_AZURE_KEY_HERE"
    deepgram_api_key: str = "ENTER_YOUR_DEEPGRAM_API_KEY_HERE"

    azure_speech_region: str = "eastus"

    # This means a .env file can be used to overload these settings
    # ex: "OPENAI_API_KEY=my_key" will set openai_api_key over the default above
    model_config = SettingsConfigDict(
        env_file=".env",
        env_file_encoding="utf-8",
        extra="ignore",
    )


settings = Settings()


async def main():
    (
        microphone_input,
        speaker_output,
    ) = create_streaming_microphone_input_and_speaker_output(
        use_default_devices=False,
    )

    conversation = StreamingConversation(
        output_device=speaker_output,
        transcriber=DeepgramTranscriber(
            DeepgramTranscriberConfig.from_input_device(
                microphone_input,
                endpointing_config=PunctuationEndpointingConfig(),
                api_key=settings.deepgram_api_key,
            ),
        ),
        agent=ChatGPTAgent(
            ChatGPTAgentConfig(
                openai_api_key=settings.openai_api_key,
                initial_message=BaseMessage(text="What up"),
                prompt_preamble="""The AI is having a pleasant conversation about life""",
            )
        ),
        synthesizer=AzureSynthesizer(
            AzureSynthesizerConfig.from_output_device(speaker_output),
            azure_speech_key=settings.azure_speech_key,
            azure_speech_region=settings.azure_speech_region,
        ),
    )
    await conversation.start()
    print("Conversation started, press Ctrl+C to end")
    signal.signal(signal.SIGINT, lambda _0, _1: asyncio.create_task(conversation.terminate()))
    while conversation.is_active():
        chunk = await microphone_input.get_audio()
        conversation.receive_audio(chunk)


if __name__ == "__main__":
    asyncio.run(main())

License

MIT License

Resources & Downloads

GitHub - vocodedev/vocode-core: 🤖 Build voice-based LLM agents. Modular + open source.
🤖 Build voice-based LLM agents. Modular + open source. - vocodedev/vocode-core
Vocode - Open source voice AI agents







Open-source Apps

9,500+

Medical Apps

500+

Lists

450+

Dev. Resources

900+

Read more

Bias in Healthcare AI: How Open-Source Collaboration Can Build Fairer Algorithms for Better Patient Care

Bias in Healthcare AI: How Open-Source Collaboration Can Build Fairer Algorithms for Better Patient Care

The integration of artificial intelligence (AI), particularly large language models (LLMs) and machine learning algorithms, into healthcare has transformed the industry dramatically. These technologies enhance various aspects of patient care, from diagnostics and treatment recommendations to continuous patient monitoring. However, the application of AI in healthcare is not without challenges.