Alibaba Users Manual
As of …A practical guide to Alibaba's AI lineup — Qwen 3.6 Max for agentic + reasoning + coding, Wan 2.7 for video, HappyHorse 1.0 for image-to-video, Qwen Image for stills, Qwen Omni for full-modality work, and the open-weights Qwen family for self-host.
Alibaba is the parent company behind Qwen — China's most active frontier-AI brand, and one of the most prolific open-weights labs in the world. The Qwen family covers everything: text + reasoning (Qwen 3.6 Max, Plus, and open variants), image generation (Qwen Image), video generation (Wan 2.7, HappyHorse 1.0), and full multimodal (Qwen Omni for text+audio+image+video).
The April–May 2026 wave is the most aggressive push yet: Qwen 3.6 Max with 1M+ context for complex agentic workflows, Qwen 3.6 Plus for cost-efficient speed, Wan 2.7 for video, and HappyHorse 1.0 as a top-ranked image-to-video model.
Getting started in 60 seconds
- Pick your door: chat.qwen.ai for the consumer chat app, Alibaba Cloud Model Studio for the developer console, huggingface.co/Qwen for open weights.
- Sign in with an Alibaba Cloud account (developer) or Qwen account (consumer chat). The chat app is free with limits; the API is pay-as-you-go on Model Studio.
- Pick a model:
qwen3.6-maxfor hardest tasks and agentic work,qwen3.6-plusfor fast/cheap,qwen3.5-omnifor audio + multimodal. For video:wan2.7orhappyhorse-1. For image gen:qwen-image. - Tell Qwen what good looks like. Goal, audience, format, constraints. Qwen 3.6 Max responds especially well to structured agentic prompts — explicit plan-then-execute patterns.
Which Alibaba surface should I use?
chat.qwen.ai
Free consumer chat
- Free with rate limits
- Default model is the latest Qwen flagship
- Web search, deep thinking, file upload
- Image / video gen tools surfaced
Model Studio (API)
Alibaba Cloud developer console
- OpenAI-compatible chat completions
- DashScope SDK (native)
- All Qwen + Wan + HappyHorse + Image models
- SG / CN / international regions
Self-host (open weights)
Hugging Face / Ollama / vLLM
- Qwen 3.6 (open variants), Qwen 3.5 series, Qwen 3.5 Omni
- Run on your own GPUs / clouds
- One of the deepest open-source families anywhere
- Note: Qwen 3.6 Max is proprietary, not open-weighted
Prompt fundamentals (Qwen edition)
Three things that matter most when prompting Qwen models:
- Lean into the agentic frame. Qwen 3.6 Max is explicitly tuned for autonomous agent workflows — app development, visual browsing, multi-step planning. Prompts that frame the task as "you are an agent that does X, here are your tools, here's the goal" outperform raw chat-style prompts on complex tasks.
- Use the long context. 1M+ tokens means you can fit entire codebases, multi-document research bundles, or long visual sequences. Don't artificially chunk what you don't need to.
- Pick the right tier. Qwen 3.6 Max for hard reasoning / coding / agentic / visual reasoning. Qwen 3.6 Plus for high-volume, latency-sensitive work where Max-grade quality isn't required. Qwen 3.5 Omni when you genuinely need audio + visual + text in one model.
Alibaba's text/code/reasoning lineup is the Qwen family. Right now (May 2026) there are two flagship proprietary models: Qwen 3.6 Max (smartest, biggest context, agentic-tuned) and Qwen 3.6 Plus (fast and cheap). Underneath, there's a deep open-weights family — Qwen 3.6 (35B / 27B), Qwen 3.5 (up to 397B), and Qwen 3.5 Omni for full multimodal — all on Hugging Face.
If you only remember one thing: Qwen 3.6 Max is the agentic flagship. It's designed to act as an autonomous AI agent — run app-development tasks, browse visually, chain reasoning across million-token contexts.
The current Qwen lineup
As of 2026-05-05, Qwen 3.6 Max is Alibaba's flagship for complex agentic workflows. The lineup splits into proprietary (Max, Plus) and open-weights (3.6 / 3.5 / Omni) tiers, plus dedicated multimodal models (Wan, HappyHorse, Image).
Qwen text/reasoning lineup
| Model | Type | Released | Best for | Context |
|---|---|---|---|---|
| Qwen 3.6 Max flagship | Proprietary | April/May 2026 | Complex agent workflows, high-level coding, visual reasoning. Top of the lineup for hardest tasks. | 1M+ tokens |
| Qwen 3.6 Plus cheap+fast | Proprietary | April 2026 | High-performance, optimized for speed and cost-efficiency. Default for high-volume production. | Long context |
| Qwen 3.6 (open) open | Open weights | April 2026 | Self-host. 35B-A3B (MoE) and 27B variants — image-text-to-text capable. | Long context |
| Qwen 3.5 Omni open · multimodal | Open weights | Feb 2026 | Native text + audio + image + video in one model — the most-multimodal open model in the family. | Multimodal |
| Qwen 3.5 (397B-A17B) open | Open weights | Feb 2026 | Largest open Qwen 3.5 — 397B total / 17B active MoE. 60% cheaper / 8× faster than 3.0-era predecessors. | Long context |
Qwen 3.6 Max — deep dive
This is the model the rest of the manual is centered on. Qwen 3.6 Max is Alibaba's flagship as of April/May 2026, designed end-to-end for autonomous agent workflows.
| Area | What Qwen 3.6 Max does |
|---|---|
| Context window | 1M+ tokens — fits an entire codebase, a long research bundle, or a multi-document agentic task in a single context. |
| Agentic focus flagship capability | Designed to act as an autonomous AI agent — explicitly tuned for app development and visual browsing tasks. This is the headline capability and what differentiates Max from Plus. |
| High-level coding | Strong on real engineering tasks: refactors, multi-file edits, debugging, planning. Pairs with the agentic frame for end-to-end coding workflows. |
| Visual reasoning | Built-in vision understanding — read screenshots, diagrams, dashboards, document layouts. Combined with agentic frame, supports visual-browsing-style tasks. |
| Proprietary | Closed-weights — accessed via Model Studio API only. (Open-weights options sit at the Plus / 3.6-open / 3.5 tiers.) |
| Pricing | Pay-as-you-go on Alibaba Cloud Model Studio. Pricing varies by region (China, Singapore, international) — verify in the Model Studio console before locking in capacity. |
Qwen 3.6 Max vs the competition (positioning)
| Model | Open? | Context | Headline strength |
|---|---|---|---|
| Qwen 3.6 Max | Proprietary | 1M+ | Agentic workflows, visual reasoning, app dev |
| Claude Opus 4.7 | Proprietary | 200K | Ambiguous reasoning, coding agent ecosystem |
| GPT-5.5 | Proprietary | ~272K-1M | Long-running goal completion, breadth |
| Gemini 3.1 Pro | Proprietary | 2M | Largest context, abstract reasoning, multimodal |
| Grok 4.3 | Proprietary | 1M | Cheapest hosted frontier, real-time X |
| DeepSeek V4 Pro | Open (MIT) | 1M | Open-weights agentic-coding SOTA |
Release timeline (chronological)
| Date | Release | What changed |
|---|---|---|
| 2023-08 | Qwen-7B / Qwen-14B | First public Qwen models. Open-weighted from day one. |
| 2024-09 | Qwen 2.5 | Major capability lift; large open-weights coverage; coding-specialised variants. |
| 2025-04 | Qwen 3.0 | Multimodal-native (text + image), longer context. |
| 2026-02 | Qwen 3.5 Series | Open-source + API. 60% cheaper / 8× faster than 3.0. Native multimodal (text/image/video). Includes Qwen 3.5 Omni for full audio + multimodal. |
| 2026-04 | Qwen 3.6 Plus | High-performance proprietary tier optimized for speed and cost. |
| 2026-04 | Wan 2.7 | New video gen model added to Model Studio lineup. |
| 2026-04 | HappyHorse 1.0 | Top-ranked image-to-video model — high-fidelity, realistic dynamic rendering. |
| 2026-04 / 05 | Qwen 3.6 Max | Current flagship. 1M+ context. Agentic focus. High-level coding + visual reasoning. |
How to pick a Qwen model
Pick Qwen 3.6 Max when…
- The task is agentic — multi-step planning, tool use, visual browsing.
- App-development workflows where the model drives the work.
- You need million-token context for codebase-wide reasoning.
- Visual reasoning — screenshots, diagrams, dashboards as input.
- Hard coding / refactor tasks across multiple files.
Pick Qwen 3.6 Plus when…
- High-volume production where cost dominates.
- Latency-sensitive UX — chat, classification, summarization.
- Quality requirements are moderate; you don't need agentic depth.
- You want Qwen-family consistency across hot and cold paths.
Pick Qwen 3.5 Omni when…
- You need text + audio + image + video in one model.
- Open weights matter — self-host or fine-tune.
- Audio is part of the input or output (the only Qwen tier with first-party audio).
- Building multimodal agents that span sensory channels.
Pick open Qwen 3.6 / 3.5 when…
- Compliance / data residency — must run on your own infra.
- Fine-tuning for a domain or task.
- Cost floor below any hosted API.
- Edge deployment (smaller variants).
Pricing
Alibaba Cloud Model Studio is pay-as-you-go in USD or CNY depending on the region (China, Singapore, international). Headline characteristics of the Qwen pricing structure:
- Qwen 3.5 Series shipped with a 60% price cut over the prior generation, and Qwen 3.6 continues that trajectory.
- Qwen 3.6 Plus is positioned as the cost-efficient tier — the explicit "match larger predecessors at smaller cost" model.
- Open-weights variants (Qwen 3.6 / 3.5) have zero per-token cost when self-hosted; pay only for compute.
Open-weights variants
Alibaba is one of the most active open-weights labs in the world. The current open Qwen family at huggingface.co/Qwen includes:
| Series | Variants | Modalities |
|---|---|---|
| Qwen 3.6 | 35B-A3B (MoE), 27B, plus FP8 quantized | Image-Text-to-Text |
| Qwen 3.5 | 397B-A17B (largest open Qwen 3.5), Omni, plus SAE research variants | Text, Multimodal (Omni: text+audio+image+video) |
| Qwen 3 | 30B-A3B, 8B base + sparse autoencoder research variants | Text |
| Qwen Image 2512 | Image generation from text | Text-to-Image |
Alibaba doesn't just do text. They have four distinct multimodal models, each focused on a different media type: Wan 2.7 (text-to-video), HappyHorse 1.0 (image-to-video, top-ranked for fidelity), Qwen Image (text-to-image), and Qwen 3.5 Omni (audio + everything else).
This breadth is closer to Google's stack than to DeepSeek's text-only profile. If you need one vendor for text + image + video + audio, Alibaba is one of three real options (alongside Google and OpenAI).
All-modalities overview
Wan 2.7 — text-to-video
New video generation model added to Model Studio in April 2026. Native text-to-video pipeline; production-ready API.
HappyHorse 1.0 — image-to-video
Top-ranked image-to-video model focused on high-fidelity, realistic dynamic rendering. Released April 2026.
Qwen Image — text-to-image
Open-weights image-gen model on Hugging Face (Qwen Image 2512). Text-to-image generation; strong on Chinese-language prompts.
Qwen 3.5 Omni — full multimodal
Single open-weights model accepting text, audio, image, and video. The "everything" tier — supports cross-modal reasoning in one forward pass.
Wan 2.7 (video generation)
Wan is Alibaba's video-generation line. Wan 2.7 is the latest entry in Model Studio (April 2026).
- Modalities: Text-to-video.
- Surface: Available via Alibaba Cloud Model Studio.
- Use cases: Marketing creative, product demos, education, social-media content.
- Compared to: Veo 3 (Google) is more polished cinematically; Grok Imagine Video is cheaper. Wan 2.7 is the China-native option with first-party Alibaba Cloud distribution.
HappyHorse 1.0 (image-to-video)
Released April 2026. Distinct from Wan: HappyHorse takes an existing image as input and animates it.
- Modality: Image-to-video — start from a still, get a video out.
- Strength: High-fidelity, realistic dynamic rendering. Top-ranked among image-to-video models per Alibaba's release notes.
- Use cases: Animating product photography, bringing static art to life, character animation.
- Alternative use: Pair with Qwen Image (or any image gen) to go text → image → video in two steps.
Qwen Image (text-to-image)
Open-weights image-generation family on Hugging Face. The current generation is Qwen Image 2512.
- Modality: Text-to-image.
- Distribution: Open weights on Hugging Face; available via Model Studio API.
- Strengths: Strong on Chinese-language prompts; competent on photorealism and stylized output.
- Pair with: HappyHorse 1.0 to animate the output, or feed into Wan 2.7 for video continuation.
Qwen 3.5 Omni (audio + multimodal)
The "everything in one model" tier. Qwen 3.5 Omni is the only Qwen variant that natively handles audio input/output alongside the standard text + image + video.
- Modalities: Text, audio, image, video — all in one open-weights model.
- Distribution: Open weights on Hugging Face; demos available online and offline.
- Use cases: Voice agents that can also see, multimodal accessibility, cross-modal search ("find the moment in this video where someone says X").
- Tradeoff: Generalist — for hardest-tier text reasoning, Qwen 3.6 Max wins; Omni's edge is breadth across modalities.
Practical workflows
Three end-to-end pipelines you can copy. Each names which Qwen model handles which step.
Workflow A — Marketing video (15s) from a product brief
- Qwen 3.6 Max — plan
Send the product brief and target audience. Ask for: 3 still-frame prompts, motion prompts for each, voiceover script with timestamps. (Use the "Marketing video pipeline" template under Use-case library.)
- Qwen Image — generate stills
Run each still-frame prompt to produce 3 hero frames at high resolution.
- HappyHorse 1.0 — animate each still
Send each still + its motion prompt. Get 3 short animated clips.
- Wan 2.7 — optional bridge shots
If you need transitions or B-roll between hero shots, generate them text-to-video.
- Off-Alibaba — assemble
Stitch in any video editor; layer the voiceover. Total runtime 15s; total Qwen calls ~7 if you don't iterate.
Workflow B — Animate an existing product photo
- Qwen 3.6 Max — direction
Upload the product photo. Use the "Image-to-video direction" template to get a tuned motion prompt for HappyHorse.
- HappyHorse 1.0 — render
Send the photo + motion prompt; get the looped clip back.
- Qwen 3.6 Max — review
Optionally send screenshots from the rendered clip back to Max and ask for QC notes — does the motion match brand guidelines? Anything to revise?
Workflow C — Multilingual product launch creative
- Qwen 3.6 Max — base creative
Generate the headline, body copy, and CTA in your source language.
- Qwen 3.6 Max — localize
Use the "Localize marketing copy" template (under Use-case library) for each target market — it goes beyond literal translation.
- Qwen Image — localized stills
For markets where image text matters, regenerate stills with localized typography. (Note: text-in-image rendering is improving but verify each output.)
- HappyHorse / Wan — localized motion
Animate the localized stills. The motion prompt itself usually doesn't need localization.
chat.qwen.ai is Alibaba's free consumer chat website. Sign in, start chatting. The default model is the latest Qwen flagship; specialized buttons in the input area let you toggle web search, deep reasoning ("Thinking"), and image / video generation tools.
It's free with rate limits — useful for testing prompts before wiring them into your product on the Model Studio API.
chat.qwen.ai — setup
- Visit chat.qwen.ai and sign in (Qwen / Alibaba account).
- Default model is the latest Qwen flagship. The chat surface routes to the current best Qwen automatically; specific model selection (Max vs Plus vs others) may be available via a model picker depending on region.
- Toggles to know:
- Thinking — turn on chain-of-thought reasoning for hard tasks.
- Web search — ground responses in current web results.
- File upload — drop in PDFs, code, docs, images. With long context you can upload sizable files.
- Image / Video tools — generate images via Qwen Image, video via Wan or HappyHorse, depending on the surface area of the chat experience in your region.
- Rate limits apply. For guaranteed throughput or production access, use Model Studio API.
Modes & tools — when to use each
| Toggle | Turn it on for | Skip it for |
|---|---|---|
| Thinking | Math, multi-step reasoning, code review, agentic planning, ambiguous questions. | Quick chat, simple summaries, formatting tasks. Adds latency. |
| Web search | Current events, recent product info, time-sensitive queries, fact-checking. | Math, code, abstract reasoning that doesn't need fresh data. |
| Image gen | Visual ideation, marketing concepts, illustrative diagrams. | Tasks that don't need an image — wastes tokens and credits. |
| Video gen | Short marketing clips, animated stills, visual prototypes. | Anything where a static image suffices — video is much heavier on cost. |
Optimal prompts for chat.qwen.ai
Qwen models reward structure. Below are tested templates for the chat surface.
Agentic app-dev plan
Visual reasoning over screenshots
Long-context codebase review
The Alibaba Cloud Model Studio API is OpenAI-compatible — same request shape, just point at the Alibaba endpoint and pass qwen3.6-max (or your chosen model) as the model field. There's also a native DashScope SDK if you prefer first-party tooling.
The two things to use early: function calling (Qwen 3.6 Max is tuned for agentic tool use) and the open-weights self-host path via Hugging Face (for compliance, fine-tuning, or cost-floor deployment).
Account & keys
- Visit Alibaba Cloud Model Studio and sign in with your Alibaba Cloud account. Choose the region that matches your latency / data-residency needs (China, Singapore, international).
- Add a payment method. Pay-as-you-go in CNY or USD depending on region.
- Generate an API key from the dashboard. Treat as a password — store in env vars, not in code.
- Verify access: hit the chat completions endpoint with a small test request to confirm the key and region.
First API call
The Model Studio chat completions endpoint is OpenAI-compatible. If you've used the OpenAI SDK, the only changes are base_url and model.
OpenAI-compatible vs DashScope native
| OpenAI-compatible | DashScope native | |
|---|---|---|
| Base URL pattern | .../compatible-mode/v1 | dashscope[-intl].aliyuncs.com/api/v1 |
| SDK | openai | dashscope |
| Best when | Migrating from OpenAI / minimal code change | You want first-party features and tight Alibaba Cloud integration |
| Multimodal models (Wan, HappyHorse) | Image / video gen via dedicated DashScope endpoints, not chat-completions | Native — these are first-class in DashScope |
Function calls & JSON
Qwen 3.6 Max supports OpenAI-style tool calls and structured output. This is the foundation for agentic workflows.
Agentic tasks (Qwen 3.6 Max's flagship use case)
Qwen 3.6 Max is explicitly tuned for autonomous agent workflows. The minimum viable loop:
Self-host (open weights)
Qwen 3.6 (open variants) and Qwen 3.5 Omni / 397B are downloadable from Hugging Face. Common deployment paths:
- Ollama for local laptop dev.
- vLLM for production GPU serving.
- Text Generation Inference (TGI) for Hugging Face-native serving.
- llama.cpp for quantized inference on commodity hardware.
- Alibaba Cloud PAI / EAS for first-party self-managed serving on Alibaba infra.
Streaming responses
For chat UIs and long generations, stream responses so users see progress instead of waiting for the full payload. The OpenAI-compatible endpoint supports stream=true exactly like the OpenAI SDK.
Rate limits, retries & errors
Treat the API as eventually reliable, not perfectly reliable. The minimum viable error-handling shape:
- 429 (rate limit): back off with jitter. Start at 1s, double up to ~30s. Don't retry forever — give up after 5 attempts and surface to the caller.
- 5xx (transient): retry up to 3 times with backoff; then fail loud.
- 4xx other than 429: don't retry — these are bugs in your request shape (bad model name, malformed JSON, missing field). Log and fix.
- Timeouts: set a sensible client timeout (e.g. 60s for Max non-streaming, longer for streaming). Don't inherit the default — it's usually too long.
Cost-control patterns
- Two-tier routing. Send simple work (classification, extraction, formatting) to
qwen3.6-plus; reserveqwen3.6-maxfor actual reasoning / agentic / visual tasks. Often cuts cost 60-80% with no quality drop. - Cap output tokens. Set
max_tokensdeliberately. With long-context inputs it's tempting to let output run unbounded — costs add up fast. - Short system prompts. A 5-line system prompt that defines role and constraints crisply outperforms a 30-line one in most evals — and costs less per call.
- Stop iterating in chat for large workflows. Convert "tell me more about X" loops into one well-scoped prompt — each iteration replays the entire context.
- Choose the right region. Latency and pricing differ across China / Singapore / international Model Studio regions. Match to where your users (and data) live.
Multi-region considerations
Alibaba Cloud Model Studio runs in multiple regions. The same model ID (e.g. qwen3.6-max) can have different latency, availability, and per-1M-token pricing depending on the region your endpoint lives in. Practical rules:
- Pin a region per environment (dev / staging / prod). Don't let it drift via DNS.
- Test at least one alternate region so you have a documented failover path.
- Data-residency-sensitive workloads: the Singapore-international region is typically the right choice for non-PRC data flows; verify the current terms before committing.
- Don't assume PII-handling parity across regions. The contracts and certifications differ.
Good prompts give Qwen four things: a role (who to be), a goal (what good looks like), an audience (who's reading), and a format (how to lay it out). For Qwen 3.6 Max specifically: add plan-execute-verify structure for hard agentic work.
Use-case library
Tested templates organized by what you're trying to accomplish. Each one assumes Qwen 3.6 Max unless noted; drop down to qwen3.6-plus when the task is simpler and cost matters.
Agentic app development (Qwen 3.6 Max's headline)
Build a feature end-to-end (plan → execute → verify)
Add a new endpoint with tests (smaller-scoped)
Visual browsing & UI reasoning (the other Max headline)
Walk through a UI workflow from screenshots
UX audit of a single screen
Long-context analysis (1M+ tokens)
Whole-codebase or whole-document review
Cross-document research synthesis with citations
Coding & engineering
Code review against your own conventions
Debug a failing test
API / interface design review
Multimodal pipelines (Qwen Image / Wan / HappyHorse)
Marketing video pipeline (Max → Image → HappyHorse)
Animate a single product photo (HappyHorse)
Data extraction & structured output
Extract a typed schema from messy text
Normalize free-form data into rows
Translation & multilingual (Qwen's strength)
High-fidelity Chinese ↔ English translation
Localize marketing copy (not just translate)
Research & decision-making
Steelman opposing positions before deciding
Risk pre-mortem
Writing & editing
Tighten prose without losing voice
Outline a long-form piece before writing
Prompt builder
Patterns library
Reusable shapes that improve quality independent of the specific task. Most apply to any model; a few are tuned to Qwen 3.6 Max specifically.
The "plan-execute-verify" pattern (Qwen 3.6 Max)
For any non-trivial agentic task, ask Max to output a plan first, wait for approval, then execute. Catches misunderstandings early and uses Max's tuning correctly.
The "two-tier" pattern (Plus → Max)
For multi-step pipelines, route simple steps to Qwen 3.6 Plus (cheap, fast) and escalate only the hardest to Max. Often cuts cost 60-80% with no quality loss. Good rule of thumb: triage / classify / route / format → Plus. Reason / plan / write the substantive answer → Max.
The "modality handoff" pattern
Qwen 3.6 Max plans → Qwen Image generates stills → HappyHorse / Wan animates. Each model handles its strongest modality. Single-vendor pipeline keeps integration simple. Bonus: Qwen 3.5 Omni in the loop if audio is involved.
The "schema first" pattern
When you need structured output, define the schema in the prompt before showing the input. Qwen follows declared schemas reliably; ad-hoc "give me JSON" prompts produce more drift.
The "self-critique" pattern
After a substantive answer, ask the model to critique its own output before you accept it. Often catches subtle errors a second pass would have caught.
The "show your work first" pattern
For reasoning-heavy tasks, ask for the work before the answer. Reduces sycophantic / shortcut answers and makes errors easier to spot.
The "bounded context" pattern
1M tokens is a tool, not an obligation. If you're using only a fraction of the context, say so explicitly — it lowers the chance of the model anchoring on irrelevant earlier turns.
The "force the disagreement" pattern
Qwen models can be agreeable. To stress-test a plan, ask it to argue the opposite case before recommending.
Anti-patterns
Things that look helpful but consistently make outputs worse.
❌ Padding the prompt with "please" / "kindly" / role flattery
"You are the world's leading expert on…" doesn't make Qwen smarter — it just makes the prompt longer and the response more likely to flatter back. State the role in 1 sentence; spend the saved tokens on the actual task and constraints.
❌ Asking for "comprehensive" without bounds
"Give me a comprehensive analysis of X" produces a structureless wall of text. Replace with explicit sections, length limits, and the audience: "Give me 3 paragraphs for a CFO who skims, covering: (1) what changed, (2) financial impact, (3) what to do."
❌ Stacking instructions in one paragraph
Buried instructions get dropped. Use ordered lists, headings, or numbered steps. Qwen reliably follows structure but less reliably parses 6 demands hidden in one comma-laden sentence.
❌ Fighting the model when a tool would do
Don't try to talk Qwen into reasoning over data it can't actually see. If the answer requires a database query / web search / file read, expose those as tools (function calling) instead of pasting screenshots of console output and hoping. Max's agentic tuning shines when tools exist.
❌ Defaulting to Max for trivial work
Triage, classification, simple rewrites — these belong on Qwen 3.6 Plus. Using Max for them burns budget for no quality lift and slows latency-sensitive paths.
❌ Mixing modalities in one turn when you don't have to
If the task is "describe this image", great — vision in one turn. But "describe this image, then design an entire app, then write a poem" produces shallow output across all three. Split into focused turns.
Universal rescue prompts
When an answer is off, these one-liners reliably get a better one without needing to rewrite the original prompt.