Core Pipeline Overview
Understanding the 3-step AI pipeline that powers WTN Suite
Core Pipeline Overview
WTN Suite's core workflow is a 3-step AI pipeline that transforms webtoon and manga pages into narrated content. Each step builds on the previous one, taking you from raw images to a complete narration script.
The 3-Step Workflow
Step 1: Panel Detection
The first step uses a YOLO (You Only Look Once) AI model to automatically detect and crop individual panels from your source images.
How it works:
- Load your webtoon/manga pages into the project
- Select the YOLO model that matches your content style:
- Manhwa Style — optimized for Korean webtoons (vertical scroll format)
- Manhua Style — optimized for Chinese/Japanese manga (page-based format)
- Click Run Step 1 — the AI detects panel boundaries automatically
- Review the detected panels and adjust if needed
Features:
- Automatic detection of panel boundaries with high accuracy
- Smart cropping that preserves panel content without cutting off edges
- Manual adjustment tools for fine-tuning detection results
- Supports both vertical scroll (webtoon) and page-based (manga) formats
- Batch processing of multiple pages at once
Step 2: Vision Analysis
Each detected panel is sent to an AI vision model for detailed analysis. WTN Suite supports two providers:
- Google Gemini — Google's multimodal AI model
- OpenAI GPT-4V — OpenAI's vision-capable model
What the AI analyzes:
- Scene description — what's happening in each panel (actions, setting, atmosphere)
- Character identification — who is present in the scene, using the Character Cast
- Emotion detection — character expressions, body language, and mood
- Dialog extraction — text from speech bubbles and narration boxes
- Visual context — background elements, props, time of day, weather
Tips for better results:
- Build a Character Cast before running Step 2 for accurate character identification
- Use high-resolution source images for better vision analysis
- Review and edit analysis results before moving to Step 3
Step 3: Narrative Generation
Based on the vision analysis, AI generates a cohesive narrative for the entire chapter or episode.
Features:
- Story coherence — maintains narrative flow and continuity across all panels
- Character voices — assigns distinct speaking styles to each character
- Customizable tone — choose from multiple narrator prompt templates
- In-UI editing — modify any panel's narrative text directly in the app
- Per-panel regeneration — rerun AI on specific panels without reprocessing everything
Narrator Prompt Templates:
- Master Prompt v4.1 — Cinematic storytelling with strong continuity rules (recommended)
- Template 2 v4 — Scenario-based narration (Pure Narration, Character Interaction, Visuals Only, Mixed)
- Template 3 v3 — Simplified version of Template 2 for faster processing
- Khmer Language — Full Cambodian language template for Khmer content creators
- Custom Prompt — Write your own prompt (minimum 10 characters)
After the Pipeline
Once all three steps are complete, you have several export options:
| Output | Description | Tool | |--------|-------------|------| | Text Export | Save narrative scripts as text files | Built-in export | | Panel Export | Save cropped panels as individual images | Built-in export | | Audio Generation | Convert narration to speech | Edge-TTS or Kokoro-TTS | | Video Creation | Combine panels + audio into video | Video Compositor | | CapCut Export | Export as CapCut-compatible project | CapCut Director |
Best Practices
- Start with quality source material — higher resolution images produce better results
- Build your Character Cast first — this significantly improves character recognition in Step 2
- Review each step's output before moving to the next — fixing issues early saves time
- Experiment with prompt templates — different templates suit different content styles
- Use batch processing — run the pipeline on multiple chapters for efficiency