StoryTeller Is an Open-source Free Multimodal AI Story Teller, built with Stable Diffusion, ChatGPT, and neural text-to-speech (TTS).
Multimodal AI Story Teller, built with Stable Diffusion, GPT, and neural text-to-speech
Introducing Story Teller, a multimodal AI storyteller created with Stable Diffusion, GPT, and neural text-to-speech (TTS). With just a prompt as the opening line, GPT generates the plot, while Stable Diffusion creates an image for each sentence. Then, a TTS model narrates each line, resulting in a fully animated video of a short story complete with audio and visuals.
To start developing locally, install dev dependencies and pre-commit hooks. This will ensure that linting and code quality checks are completed before each commit. The final video will be saved as /out/out.mp4, with other intermediate images, audio files, and subtitles.
Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals.
Story Teller is available on PyPI, and the quickest way to run a demo is through the CLI. Simply type the command, and your video will be ready. Additionally, you can adjust the defaults with custom parameters by toggling the CLI flags as needed. For more advanced use cases, you can interface directly with Story Teller using Python code and configure the model with custom settings.
Features
- Available on PyPI
- Quick demo through CLI
- Intermediate images, audio files, and subtitles generated
- Customizable parameters using CLI flags
- Advanced use cases supported with Python code interface
- Model can be configured with custom settings
License
- MIT License