Kokoro-TTS
High-quality neural text-to-speech with local processing
Kokoro-TTS
Kokoro-TTS is a high-quality neural text-to-speech engine that runs locally on your machine. It produces more natural and expressive speech compared to Edge-TTS, making it ideal for professional video narration.
Available on: Standard and Pro plans
Features
- High-quality neural voices — natural-sounding speech with emotion and intonation
- Local processing — no internet required, runs entirely on your machine
- GPU acceleration — significantly faster with an NVIDIA CUDA-compatible GPU
- Multiple voice models — choose from different voice styles
- Speech rate control — adjust speed from 50% to 200%
- Silence removal — automatically trim silence from generated audio
Requirements
- Standard or Pro subscription
- eSpeak-NG — automatically installed on first use (required for phoneme synthesis)
- GPU (recommended) — NVIDIA GPU with CUDA support for fast processing
- CPU fallback — works without GPU, but processing is slower
How to Use
Single Generation
- Open the Tool Suite panel → select Kokoro-TTS
- Type or paste your text in the input field
- Select a Voice Model from the dropdown
- Adjust Speech Rate if desired
- Toggle Remove Silence on/off as needed
- Click Generate to create the audio
- Preview with Play, then Save to export
Batch Generation
- Prepare a text file with one line per audio segment
- Click Load Text File in Kokoro-TTS
- Configure voice model and settings
- Select an Output Folder
- Click Batch Generate — progress is tracked in real-time
Output Formats
- MP3 — compressed, good for general use
- WAV — uncompressed, best quality for video production
Performance
| Setup | Processing Speed | |-------|-----------------| | NVIDIA GPU (CUDA) | Real-time or faster | | CPU only | 2-5x slower than real-time |
On first use, Kokoro-TTS will download voice model files and install eSpeak-NG. This is a one-time setup that takes a few minutes.
Tips
- For the best video narration quality, use Kokoro-TTS over Edge-TTS
- Enable Remove Silence for tighter audio timing in videos
- If you have an NVIDIA GPU, make sure CUDA drivers are up to date for best performance
- Process audio in batch to save time — configure settings once, then process the entire script