Kokoro-TTS

High-quality neural text-to-speech with local processing

Kokoro-TTS

Kokoro-TTS is a high-quality neural text-to-speech engine that runs locally on your machine. It produces more natural and expressive speech compared to Edge-TTS, making it ideal for professional video narration.

Available on: Standard and Pro plans

Features

  • High-quality neural voices — natural-sounding speech with emotion and intonation
  • Local processing — no internet required, runs entirely on your machine
  • GPU acceleration — significantly faster with an NVIDIA CUDA-compatible GPU
  • Multiple voice models — choose from different voice styles
  • Speech rate control — adjust speed from 50% to 200%
  • Silence removal — automatically trim silence from generated audio

Requirements

  • Standard or Pro subscription
  • eSpeak-NG — automatically installed on first use (required for phoneme synthesis)
  • GPU (recommended) — NVIDIA GPU with CUDA support for fast processing
  • CPU fallback — works without GPU, but processing is slower

How to Use

Single Generation

  1. Open the Tool Suite panel → select Kokoro-TTS
  2. Type or paste your text in the input field
  3. Select a Voice Model from the dropdown
  4. Adjust Speech Rate if desired
  5. Toggle Remove Silence on/off as needed
  6. Click Generate to create the audio
  7. Preview with Play, then Save to export

Batch Generation

  1. Prepare a text file with one line per audio segment
  2. Click Load Text File in Kokoro-TTS
  3. Configure voice model and settings
  4. Select an Output Folder
  5. Click Batch Generate — progress is tracked in real-time

Output Formats

  • MP3 — compressed, good for general use
  • WAV — uncompressed, best quality for video production

Performance

| Setup | Processing Speed | |-------|-----------------| | NVIDIA GPU (CUDA) | Real-time or faster | | CPU only | 2-5x slower than real-time |

On first use, Kokoro-TTS will download voice model files and install eSpeak-NG. This is a one-time setup that takes a few minutes.

Tips

  • For the best video narration quality, use Kokoro-TTS over Edge-TTS
  • Enable Remove Silence for tighter audio timing in videos
  • If you have an NVIDIA GPU, make sure CUDA drivers are up to date for best performance
  • Process audio in batch to save time — configure settings once, then process the entire script