Core Pipeline Overview

Understanding the 3-step AI pipeline that powers WTN Suite

Core Pipeline Overview

WTN Suite's core workflow is a 3-step AI pipeline that transforms webtoon and manga pages into narrated content. Each step builds on the previous one, taking you from raw images to a complete narration script.

The 3-Step Workflow

Step 1: Panel Detection

The first step uses a YOLO (You Only Look Once) AI model to automatically detect and crop individual panels from your source images.

How it works:

Load your webtoon/manga pages into the project
Select the YOLO model that matches your content style:
- Manhwa Style — optimized for Korean webtoons (vertical scroll format)
- Manhua Style — optimized for Chinese/Japanese manga (page-based format)
Click Run Step 1 — the AI detects panel boundaries automatically
Review the detected panels and adjust if needed

Features:

Automatic detection of panel boundaries with high accuracy
Smart cropping that preserves panel content without cutting off edges
Manual adjustment tools for fine-tuning detection results
Supports both vertical scroll (webtoon) and page-based (manga) formats
Batch processing of multiple pages at once

Step 2: Vision Analysis

Each detected panel is sent to an AI vision model for detailed analysis. WTN Suite supports two providers:

Google Gemini — Google's multimodal AI model
OpenAI GPT-4V — OpenAI's vision-capable model

What the AI analyzes:

Scene description — what's happening in each panel (actions, setting, atmosphere)
Character identification — who is present in the scene, using the Character Cast
Emotion detection — character expressions, body language, and mood
Dialog extraction — text from speech bubbles and narration boxes
Visual context — background elements, props, time of day, weather

Tips for better results:

Build a Character Cast before running Step 2 for accurate character identification
Use high-resolution source images for better vision analysis
Review and edit analysis results before moving to Step 3

Step 3: Narrative Generation

Based on the vision analysis, AI generates a cohesive narrative for the entire chapter or episode.

Features:

Story coherence — maintains narrative flow and continuity across all panels
Character voices — assigns distinct speaking styles to each character
Customizable tone — choose from multiple narrator prompt templates
In-UI editing — modify any panel's narrative text directly in the app
Per-panel regeneration — rerun AI on specific panels without reprocessing everything

Narrator Prompt Templates:

Master Prompt v4.1 — Cinematic storytelling with strong continuity rules (recommended)
Template 2 v4 — Scenario-based narration (Pure Narration, Character Interaction, Visuals Only, Mixed)
Template 3 v3 — Simplified version of Template 2 for faster processing
Khmer Language — Full Cambodian language template for Khmer content creators
Custom Prompt — Write your own prompt (minimum 10 characters)

After the Pipeline

Once all three steps are complete, you have several export options:

| Output | Description | Tool | |--------|-------------|------| | Text Export | Save narrative scripts as text files | Built-in export | | Panel Export | Save cropped panels as individual images | Built-in export | | Audio Generation | Convert narration to speech | Edge-TTS or Kokoro-TTS | | Video Creation | Combine panels + audio into video | Video Compositor | | CapCut Export | Export as CapCut-compatible project | CapCut Director |

Best Practices

Start with quality source material — higher resolution images produce better results
Build your Character Cast first — this significantly improves character recognition in Step 2
Review each step's output before moving to the next — fixing issues early saves time
Experiment with prompt templates — different templates suit different content styles
Use batch processing — run the pipeline on multiple chapters for efficiency