ByteDance Users Manual

As of …

A practical guide to ByteDance Seed Team's two flagship media models — Seedream 4.5 for image generation/editing (up to 4K) and Seedance 2.0 for unified audio-video generation (up to 15-sec multi-shot with dual-channel audio). Both are accessible via Higgsfield, fal.ai, Runware, attap.ai, and other inference platforms.

🎈 ELI5

ByteDance (the company behind TikTok) runs a serious AI research org called Seed. Their two big creative models in 2026: Seedream 4.5 for images (up to 4K, multi-image editing, strong typography) and Seedance 2.0 for video — and Seedance is unusual: it generates video AND audio at the same time, with up to 9 reference images, 3 reference clips, and 3 reference audio inputs as guidance.

Getting started in 60 seconds

Pick your platform. ByteDance Seed models aren't sold via a single first-party developer console — they're distributed via partner inference platforms: Higgsfield, fal.ai, Runware, attap.ai, and others.
Pick the model: seedream-4.5 for images, seedance-2.0 for video+audio. Earlier versions (Seedream 4.0, Seedance 1.0) still available for legacy paths.
Bring references. Both models reward strong reference inputs — Seedream takes multiple input images for consistency; Seedance accepts up to 9 image / 3 video / 3 audio references.
Plan around output specs. Seedream caps at 4MP (~4K). Seedance 2.0 outputs 4–15s clips at 480p / 720p with dual-channel audio.

Surfaces

ByteDance Seed publishes documentation at seed.bytedance.com. Practical access:

Higgsfield — direct UI access for both models, integrated with their creative pipeline.
fal.ai — Seedance 2.0 API live since April 2026.
Runware — multi-vendor API; ByteDance models exposed alongside competitors.
attap.ai — credit-priced access ($300 credits for Seedance 2.0).

Seedream 4.5 — image generation & editing

Area	What Seedream 4.5 does
Architecture	Diffusion Transformer + VAE. Native high-resolution generation.
Resolution	Up to 4K (4MP) output natively. Inference reportedly ~1.8s for a 2K image under stated conditions on Seedream 4.0.
Multi-image editing	Accurately identifies main subjects across multiple input images; preserves reference details. Strong at compositing scenes from multiple sources.
Typography	Enhanced dense-text and typography rendering — explicitly improved over Seedream 4.0.
Best for	High-fidelity creative imagery, brand-consistent multi-image series, layout-heavy outputs (posters, ads, slides).

Seedance 2.0 — unified audio-video

Area	What Seedance 2.0 does
Released	2026-02-12.
Modalities	Unified multimodal joint generation — text + image + audio + video inputs, multi-shot audio-video output in a single pass.
Inputs	Mixed references: up to 9 images, 3 video clips, 3 audio clips in one prompt.
Output	4–15 second clips, multi-shot, with dual-channel audio. Native resolution 480p / 720p.
Editing	Targeted modifications to specified clips, characters, actions, storylines. Video extension generates continuous shots.
Best for	Short-form story content, marketing/ad creative with sync audio, character-driven scenes that need internal consistency.

Why "unified multimodal" is the headline Most video generators produce silent video and require a separate TTS / SFX pass. Seedance 2.0 generates the audio alongside the video in one pass, which means the audio actually matches the visual events (footsteps land on the foot-down frame, dialogue sync is internally consistent). Veo 3 and Grok Imagine are the closest peers; Seedance 2.0 is differentiated by the variety of input modalities it accepts as references.

Release timeline

Date	Release	What changed
2024	Seedream 1-3	Iteration of the image-gen line.
2025	Seedance 1.0	First public Seedance video model.
2025	Seedream 4.0	Diffusion Transformer + VAE; 4K native; fast inference.
2026-02-12	Seedance 2.0	Unified audio-video, 9-image / 3-video / 3-audio refs, 4-15s output, dual audio.
2026	Seedream 4.5	"All-round improvement" — typography, multi-image consistency, fidelity.
2026-04	Seedance 2.0 on fal.ai	Production API live.

Access & pricing

fal.ai — pay-per-generation pricing; rates published on the model page.
Higgsfield — subscription + per-credit; bundles other video models alongside.
Runware — pay-as-you-go developer API.
attap.ai — credit pricing; Seedream 5 (image) at 4 credits, Seedance 2.0 at 300 credits per generation as of writing.

No first-party developer console ByteDance Seed publishes research and model cards but not a self-serve developer console for Western users equivalent to OpenAI/Anthropic. Plan to use a third-party platform; rates and capabilities can drift across providers.

Image creation workflow (Seedream 4.5)

Brief. Define style, subject, composition, mood, output size (up to 4K).
Reference upload. Provide multi-image references for subject consistency or style transfer.
Generate. Send prompt + references to Seedream 4.5; iterate at small sizes to lock the look.
Upscale to final. Once happy at iteration size, regenerate at 4K.
Edit. For tweaks — color, layout, text — describe the change in natural language and re-run with the prior output as a reference.

Video + audio creation (Seedance 2.0)

Storyboard. Plan a 4-15s clip; identify the shots you want, the mood, the audio character.
Gather references. Up to 9 images (style, subject, set), 3 video clips (motion reference), 3 audio clips (tone, ambient).
Single prompt. Compose one prompt describing the visual + audio + flow. Submit with references.
Review. Check shot continuity, audio-visual sync. Most issues become visible immediately.
Iterate or extend. Use video-extension feature to generate continuous shots beyond the initial clip.

Editing & extension

Seedance 2.0 supports targeted modification of specific elements (clips, characters, actions, storylines) without regenerating from scratch. Practical pattern:

Mark the target. Reference the prior clip + describe specifically what to change ("change the character's coat from red to navy in shots 2-3").
Preserve everything else. Explicit instructions to preserve other elements reduce regeneration drift.
Extend to longer sequences. Chain extensions to build past the 15-second cap; quality degrades gradually so plan around 30-60s as practical max.

Prompt library

Brand-consistent product imagery (Seedream)

Brand product imagery Generate a [hero shot / lifestyle scene / product detail] of [product] in the style of these reference images. Style anchors (uploaded): [reference 1, 2, 3 — describe each] Constraints: - Subject: keep proportions, color, texture identical to the product reference. - Setting: [describe — minimal studio / lifestyle environment / abstract]. - Lighting: [soft / dramatic / natural — be specific]. - Output: 4K, 16:9 [or chosen ratio]. Critical: the product itself must match the reference exactly. The setting can vary.

Multi-shot character video with audio (Seedance)

Character video Generate a 12-second multi-shot video with audio. Character reference: [image upload] Setting reference: [image upload] Audio reference: [audio upload — for ambient tone / dialogue voice] Shots: 1. [0-4s] [describe action, framing, camera move] 2. [4-8s] [describe] 3. [8-12s] [describe] Audio: [describe — dialogue, ambient, SFX, music]. Lock the character voice from the reference. Output: 720p, dual-channel audio.

Targeted edit (preserve everything else)

Targeted edit Edit the attached clip with this targeted change ONLY: Change: [describe the specific modification — element, frames, or shots affected] Preserve EXACTLY: - All other characters and their actions - The audio track (don't regenerate) - The framing and camera moves - The color palette outside the changed area Output the edited clip at the same length and resolution.

Patterns

"References do the heavy lifting"

Seedream and Seedance both perform substantially better with strong reference inputs than from text-only prompts. Build a small library of reference assets per project and reuse them.

"Lock the look at small size, then upscale"

Iterating at 1K is fast and cheap; regenerating at 4K is slow and expensive. Burn iteration cycles small, commit at full resolution.

"Audio is part of the prompt"

Seedance 2.0 generates audio alongside video. If you don't describe it, you'll get default audio that may not match. Always describe the soundscape — even just "ambient room tone, no music" prevents surprises.