Core Pipeline Overview

Understanding the 3-step AI pipeline that powers WTN Suite

Core Pipeline Overview

WTN Suite's core workflow is a 3-step AI pipeline that transforms webtoon and manga pages into narrated content. Each step builds on the previous one, taking you from raw images to a complete narration script.

The 3-Step Workflow

Step 1: Panel Detection

The first step uses a YOLO (You Only Look Once) AI model to automatically detect and crop individual panels from your source images.

How it works:

  1. Load your webtoon/manga pages into the project
  2. Select the YOLO model that matches your content style:
    • Manhwa Style — optimized for Korean webtoons (vertical scroll format)
    • Manhua Style — optimized for Chinese/Japanese manga (page-based format)
  3. Click Run Step 1 — the AI detects panel boundaries automatically
  4. Review the detected panels and adjust if needed

Features:

  • Automatic detection of panel boundaries with high accuracy
  • Smart cropping that preserves panel content without cutting off edges
  • Manual adjustment tools for fine-tuning detection results
  • Supports both vertical scroll (webtoon) and page-based (manga) formats
  • Batch processing of multiple pages at once

Step 2: Vision Analysis

Each detected panel is sent to an AI vision model for detailed analysis. WTN Suite supports two providers:

  • Google Gemini — Google's multimodal AI model
  • OpenAI GPT-4V — OpenAI's vision-capable model

What the AI analyzes:

  • Scene description — what's happening in each panel (actions, setting, atmosphere)
  • Character identification — who is present in the scene, using the Character Cast
  • Emotion detection — character expressions, body language, and mood
  • Dialog extraction — text from speech bubbles and narration boxes
  • Visual context — background elements, props, time of day, weather

Tips for better results:

  • Build a Character Cast before running Step 2 for accurate character identification
  • Use high-resolution source images for better vision analysis
  • Review and edit analysis results before moving to Step 3

Step 3: Narrative Generation

Based on the vision analysis, AI generates a cohesive narrative for the entire chapter or episode.

Features:

  • Story coherence — maintains narrative flow and continuity across all panels
  • Character voices — assigns distinct speaking styles to each character
  • Customizable tone — choose from multiple narrator prompt templates
  • In-UI editing — modify any panel's narrative text directly in the app
  • Per-panel regeneration — rerun AI on specific panels without reprocessing everything

Narrator Prompt Templates:

  1. Master Prompt v4.1 — Cinematic storytelling with strong continuity rules (recommended)
  2. Template 2 v4 — Scenario-based narration (Pure Narration, Character Interaction, Visuals Only, Mixed)
  3. Template 3 v3 — Simplified version of Template 2 for faster processing
  4. Khmer Language — Full Cambodian language template for Khmer content creators
  5. Custom Prompt — Write your own prompt (minimum 10 characters)

After the Pipeline

Once all three steps are complete, you have several export options:

| Output | Description | Tool | |--------|-------------|------| | Text Export | Save narrative scripts as text files | Built-in export | | Panel Export | Save cropped panels as individual images | Built-in export | | Audio Generation | Convert narration to speech | Edge-TTS or Kokoro-TTS | | Video Creation | Combine panels + audio into video | Video Compositor | | CapCut Export | Export as CapCut-compatible project | CapCut Director |

Best Practices

  1. Start with quality source material — higher resolution images produce better results
  2. Build your Character Cast first — this significantly improves character recognition in Step 2
  3. Review each step's output before moving to the next — fixing issues early saves time
  4. Experiment with prompt templates — different templates suit different content styles
  5. Use batch processing — run the pipeline on multiple chapters for efficiency