ByteDance Users Manual
As of …A practical guide to ByteDance Seed Team's two flagship media models — Seedream 4.5 for image generation/editing (up to 4K) and Seedance 2.0 for unified audio-video generation (up to 15-sec multi-shot with dual-channel audio). Both are accessible via Higgsfield, fal.ai, Runware, attap.ai, and other inference platforms.
ByteDance (the company behind TikTok) runs a serious AI research org called Seed. Their two big creative models in 2026: Seedream 4.5 for images (up to 4K, multi-image editing, strong typography) and Seedance 2.0 for video — and Seedance is unusual: it generates video AND audio at the same time, with up to 9 reference images, 3 reference clips, and 3 reference audio inputs as guidance.
Getting started in 60 seconds
- Pick your platform. ByteDance Seed models aren't sold via a single first-party developer console — they're distributed via partner inference platforms: Higgsfield, fal.ai, Runware, attap.ai, and others.
- Pick the model:
seedream-4.5for images,seedance-2.0for video+audio. Earlier versions (Seedream 4.0, Seedance 1.0) still available for legacy paths. - Bring references. Both models reward strong reference inputs — Seedream takes multiple input images for consistency; Seedance accepts up to 9 image / 3 video / 3 audio references.
- Plan around output specs. Seedream caps at 4MP (~4K). Seedance 2.0 outputs 4–15s clips at 480p / 720p with dual-channel audio.
Surfaces
ByteDance Seed publishes documentation at seed.bytedance.com. Practical access:
- Higgsfield — direct UI access for both models, integrated with their creative pipeline.
- fal.ai — Seedance 2.0 API live since April 2026.
- Runware — multi-vendor API; ByteDance models exposed alongside competitors.
- attap.ai — credit-priced access ($300 credits for Seedance 2.0).
Seedream 4.5 — image generation & editing
| Area | What Seedream 4.5 does |
|---|---|
| Architecture | Diffusion Transformer + VAE. Native high-resolution generation. |
| Resolution | Up to 4K (4MP) output natively. Inference reportedly ~1.8s for a 2K image under stated conditions on Seedream 4.0. |
| Multi-image editing | Accurately identifies main subjects across multiple input images; preserves reference details. Strong at compositing scenes from multiple sources. |
| Typography | Enhanced dense-text and typography rendering — explicitly improved over Seedream 4.0. |
| Best for | High-fidelity creative imagery, brand-consistent multi-image series, layout-heavy outputs (posters, ads, slides). |
Seedance 2.0 — unified audio-video
| Area | What Seedance 2.0 does |
|---|---|
| Released | 2026-02-12. |
| Modalities | Unified multimodal joint generation — text + image + audio + video inputs, multi-shot audio-video output in a single pass. |
| Inputs | Mixed references: up to 9 images, 3 video clips, 3 audio clips in one prompt. |
| Output | 4–15 second clips, multi-shot, with dual-channel audio. Native resolution 480p / 720p. |
| Editing | Targeted modifications to specified clips, characters, actions, storylines. Video extension generates continuous shots. |
| Best for | Short-form story content, marketing/ad creative with sync audio, character-driven scenes that need internal consistency. |
Release timeline
| Date | Release | What changed |
|---|---|---|
| 2024 | Seedream 1-3 | Iteration of the image-gen line. |
| 2025 | Seedance 1.0 | First public Seedance video model. |
| 2025 | Seedream 4.0 | Diffusion Transformer + VAE; 4K native; fast inference. |
| 2026-02-12 | Seedance 2.0 | Unified audio-video, 9-image / 3-video / 3-audio refs, 4-15s output, dual audio. |
| 2026 | Seedream 4.5 | "All-round improvement" — typography, multi-image consistency, fidelity. |
| 2026-04 | Seedance 2.0 on fal.ai | Production API live. |
Access & pricing
- fal.ai — pay-per-generation pricing; rates published on the model page.
- Higgsfield — subscription + per-credit; bundles other video models alongside.
- Runware — pay-as-you-go developer API.
- attap.ai — credit pricing; Seedream 5 (image) at 4 credits, Seedance 2.0 at 300 credits per generation as of writing.
Image creation workflow (Seedream 4.5)
- Brief. Define style, subject, composition, mood, output size (up to 4K).
- Reference upload. Provide multi-image references for subject consistency or style transfer.
- Generate. Send prompt + references to Seedream 4.5; iterate at small sizes to lock the look.
- Upscale to final. Once happy at iteration size, regenerate at 4K.
- Edit. For tweaks — color, layout, text — describe the change in natural language and re-run with the prior output as a reference.
Video + audio creation (Seedance 2.0)
- Storyboard. Plan a 4-15s clip; identify the shots you want, the mood, the audio character.
- Gather references. Up to 9 images (style, subject, set), 3 video clips (motion reference), 3 audio clips (tone, ambient).
- Single prompt. Compose one prompt describing the visual + audio + flow. Submit with references.
- Review. Check shot continuity, audio-visual sync. Most issues become visible immediately.
- Iterate or extend. Use video-extension feature to generate continuous shots beyond the initial clip.
Editing & extension
Seedance 2.0 supports targeted modification of specific elements (clips, characters, actions, storylines) without regenerating from scratch. Practical pattern:
- Mark the target. Reference the prior clip + describe specifically what to change ("change the character's coat from red to navy in shots 2-3").
- Preserve everything else. Explicit instructions to preserve other elements reduce regeneration drift.
- Extend to longer sequences. Chain extensions to build past the 15-second cap; quality degrades gradually so plan around 30-60s as practical max.
Prompt library
Brand-consistent product imagery (Seedream)
Multi-shot character video with audio (Seedance)
Targeted edit (preserve everything else)
Patterns
"References do the heavy lifting"
Seedream and Seedance both perform substantially better with strong reference inputs than from text-only prompts. Build a small library of reference assets per project and reuse them.
"Lock the look at small size, then upscale"
Iterating at 1K is fast and cheap; regenerating at 4K is slow and expensive. Burn iteration cycles small, commit at full resolution.
"Audio is part of the prompt"
Seedance 2.0 generates audio alongside video. If you don't describe it, you'll get default audio that may not match. Always describe the soundscape — even just "ambient room tone, no music" prevents surprises.