Moonshot AI Users Manual
As of …A practical guide to Kimi K2.6 — Moonshot AI's open-weight 1T-parameter MoE flagship released April 20, 2026, designed for long-horizon coding, multi-agent orchestration, and agentic UI/UX generation. Available via Moonshot's own API, OpenRouter, and direct download from Hugging Face.
Moonshot AI is a Beijing-based lab behind the Kimi model family. Kimi K2.6 (April 2026) is the latest — a 1-trillion-parameter Mixture-of-Experts model with 32B active parameters, released open-weight under a Modified MIT License. It's built for coding agents that run long, deeply orchestrated workflows.
The headline number: K2.6's Agent Swarm can coordinate up to 300 concurrent sub-agents across 4,000 coordinated steps in one run.
Getting started in 60 seconds
- Pick your door: kimi.com for free chat (English + Chinese), platform.moonshot.ai for the API, huggingface.co/moonshotai for open weights.
- Sign in — Kimi web/app uses email or phone. API uses Moonshot Platform account.
- Pick the model:
kimi-k2.6for the flagship; older Kimi models still available for legacy code paths. - Bring an agent harness. Kimi K2.6 is tuned for tool-use loops more than for plain chat. The deeper the agent loop, the more its tuning pays off.
Which Moonshot surface should I use?
kimi.com (chat)
Free consumer chat
- Free with rate limits
- Web search, file upload
- Long-context document analysis
- Chinese + English first-class
Moonshot Platform API
platform.moonshot.ai
- OpenAI-compatible chat completions
- Pay-as-you-go (lowest 1st-party rates)
- Function calling, JSON mode
- Agent Swarm tooling
Open weights / hosted
HF, OpenRouter, attap.ai, etc.
- Modified MIT license
- ~1T total params — heavy hardware
- OpenRouter / DeepInfra / Together host hosted variants
- Self-host for compliance / data residency
Prompt fundamentals (Kimi edition)
- Lean into agent loops. Kimi K2.6's tuning shines when there are tools to call and steps to coordinate. Treat it as an orchestrator, not just a chat brain.
- Use the long context (262K). Big enough for substantial codebases or multi-document research bundles, but not infinite — chunk if you genuinely have more.
- Mind the output cap. 16,384 tokens max output per request — material for production planning. For very long generation, chain calls.
The Kimi family currently revolves around Kimi K2.6 — a Mixture-of-Experts model where each token activates only ~3% of the weights. That's why a 1T-param model serves at competitive prices and on attainable hardware. Older Kimi versions (K2, K2.5) still exist; for new builds use K2.6.
Current Kimi lineup
As of 2026-05-05, K2.6 is the flagship. Earlier models are deprecated for new code paths.
| Model | API ID | Released | Best for | Context |
|---|---|---|---|---|
| Kimi K2.6 flagship · open | kimi-k2.6 |
2026-04-20 | Long-horizon coding, multi-agent orchestration, agentic UI/UX gen | 262,144 in / 16,384 out |
| Kimi K2.5 | kimi-k2.5 |
2026-Q1 | Predecessor; agent swarm capped at 100 sub-agents / 1,500 steps | ~262K |
| Kimi K2 (legacy) | kimi-k2 |
2025-H2 | Original K2; still available, migrate when convenient | Long-context |
Kimi K2.6 — deep dive
| Area | What K2.6 does |
|---|---|
| Architecture | 1 trillion total parameters, 32 billion active per token via Mixture-of-Experts routing. Only ~3% of weights fire per forward pass. |
| Context | 262,144 input tokens; 16,384 max output tokens per request. |
| Multimodality | Text, images, and video processed in the same architecture without separate vision modules. |
| Agent Swarm flagship feature | Up to 300 concurrent sub-agents across 4,000 coordinated steps per run (up from 100 / 1,500 on K2.5). Designed for end-to-end coding tasks across Python, Rust, Go. |
| License | Modified MIT — open weights with permissive commercial use; check the model card for the exact license text. |
| Pricing | $0.60 / $2.50 per 1M input/output tokens on the official Moonshot API; $0.75 / $3.50 on OpenRouter; available on 9+ providers. |
Release timeline
| Date | Release | What changed |
|---|---|---|
| 2023 | Moonshot AI founded | Beijing-based lab; long-context Kimi chat launches. |
| 2024 | Kimi 1.5 / Kimi Long-Context | Pioneered ~2M-token context in production chat. |
| 2025-H2 | Kimi K2 | First MoE-class flagship; agent tuning begins. |
| 2026-Q1 | Kimi K2.5 | Agent Swarm v1 (100 sub-agents, 1,500 steps). |
| 2026-04-20 | Kimi K2.6 | 1T MoE, 32B active. Agent Swarm v2 (300 sub-agents, 4,000 steps). Multimodal in one architecture. |
Pricing
| Provider | Input ($/1M) | Output ($/1M) | Notes |
|---|---|---|---|
| Moonshot API | $0.60 | $2.50 | Direct, lowest hosted rate |
| OpenRouter | $0.75 | $3.50 | Pay-as-you-go via OpenRouter |
| Other providers | ~$1.15–$2.15 per 1M (blended) | 9+ tracked providers | |
| Self-host | Compute only | 1T MoE — heavy GPU footprint; vLLM / SGLang typical | |
Open weights
Kimi K2.6 is downloadable from huggingface.co/moonshotai under a Modified MIT License. Practical paths:
- vLLM / SGLang for production GPU serving.
- Quantized variants (FP8 / INT4) on inference providers — most third-party hosts run quantized.
- llama.cpp — community quantizations exist; run on smaller hardware with quality tradeoffs.
kimi.com is Moonshot's free chat website. Chinese-first interface but English works fine. Long-context document Q&A is the strongest consumer surface — drop in a 200-page PDF and ask questions.
kimi.com — setup
- Visit kimi.com and sign in (email or phone).
- Default model is the latest Kimi flagship; specific model selection depends on region/account tier.
- Upload PDFs, code files, docs — large files welcome thanks to the long context.
- Toggle web search when you need fresh data; Kimi's first-party search is competent.
Optimal prompts for kimi.com
Long-document Q&A with citations
Codebase walk-through
Moonshot's API is OpenAI-compatible — point at api.moonshot.ai and use the OpenAI SDK. The interesting tooling sits on top: Agent Swarm for orchestrating up to 300 sub-agents across multi-thousand-step workflows.
Account & keys
- Visit platform.moonshot.ai and sign in.
- Add a payment method; pay-as-you-go.
- Generate an API key. Treat as password — env vars only.
First API call (OpenAI-compatible)
Agent Swarm
K2.6's headline tooling. The minimum viable shape:
- Orchestrator — Kimi K2.6 plans the work, breaks into sub-tasks.
- Sub-agents — up to 300 concurrent, each with a scoped goal and tool set.
- Coordinated steps — up to 4,000 across the swarm.
- Handoff — sub-agents return structured results to the orchestrator for assembly.
Self-host (open weights)
Pull weights from huggingface.co/moonshotai. Common deployment paths: vLLM, SGLang, llama.cpp (quantized). Plan for substantial multi-GPU compute at full precision.
Use-case library
Long-horizon coding task (Agent Swarm)
UI generation from a brief
Multi-agent code review
Patterns
"Plan, delegate, assemble" (the core Agent Swarm shape)
For any task with parallelizable sub-work: have K2.6 plan, fan out to sub-agents, then assemble. Cuts wall-clock time and surface costs by ~3-10× vs serial loops.
"Use 262K, don't pad it"
Long context is a tool, not a flex. If you only need 30K tokens, send 30K. The wider the context, the more careful K2.6 has to be about anchoring — narrow when you can.
"Cap the swarm to the work"
300 sub-agents is the ceiling, not the goal. Most real tasks fan out to 5-30 sub-agents. Over-fanning produces noise and bigger merges.