DeepSeek Users Manual
As of …A practical guide to DeepSeek's V4 lineup — what's in it, when to pick V4 Pro vs V4 Flash, how to call the API (OpenAI- or Anthropic-compatible), and how to get the most out of the cheapest open-weights frontier model on the market.
DeepSeek is a Chinese AI lab whose superpower is making smart models that are really, really cheap and open-weights (you can download them and run them yourself). Their newest model is DeepSeek V4 Pro — released April 24, 2026. It's a Mixture-of-Experts model with 1.6 trillion total parameters but only 49 billion "active" at a time, which is why it's so cheap to run.
If you're picking from the menu: V4 Pro for hard reasoning, agentic coding, deep research. V4 Flash for cheap, fast, simple stuff. Both have a 1 million-token context window and support thinking mode (where the model "thinks out loud" before answering).
Getting started in 60 seconds
- Pick your door: chat.deepseek.com for the free chat app, platform.deepseek.com for an API key, api-docs.deepseek.com for the developer docs.
- Sign in with email, Google, or Apple. The chat app is free with rate limits; the API is pay-as-you-go.
- Pick a model:
deepseek-v4-pro(flagship),deepseek-v4-flash(cheap+fast). Both support thinking (setthinking.type = "enabled") and non-thinking modes. - Tell DeepSeek what good looks like. Goal, audience, format. V4 Pro responds well to structured prompts with explicit constraints — it's a strong instruction-follower at frontier quality.
Which DeepSeek surface should I use?
chat.deepseek.com
Free consumer chat
- Free with rate limits
- DeepThink (reasoning) toggle
- Web search toggle
- File upload
API (platform.deepseek.com)
Pay-as-you-go developer
- OpenAI ChatCompletions compatible
- Anthropic Messages compatible
- 1M context, 384K max output
- Function calls, JSON mode, prefix caching
Self-host (open weights)
MIT-licensed, downloadable
- V4 Pro and V4 Flash both released open
- Run on your own GPUs or via providers (OpenRouter, DeepInfra, etc.)
- Useful for compliance / data residency
- Heavy hardware needs for V4 Pro
Prompt fundamentals (DeepSeek edition)
Three things to remember when prompting V4 models:
- Toggle thinking mode deliberately. For hard reasoning, math, coding, agentic planning — turn it on. For simple chat, classification, formatting — turn it off (you save tokens and latency, and quality is still strong).
- The 1M context is real. You can fit an entire codebase or a 1000-page document. Use prefix caching so repeated long prefixes (system prompt + reference doc) cost ~1/100 of cache-miss input.
- Direct, structured prompts win. V4 Pro likes explicit goals, constraints, and output formats. Few-shot examples help it match style. Chain-of-thought is mostly redundant if thinking mode is on.
DeepSeek's V4 lineup has two models right now, both released April 24, 2026: V4 Pro (the flagship — 1.6 trillion parameters, frontier-class brain) and V4 Flash (smaller — 284 billion params, much cheaper, surprisingly close in quality).
Both have a 1 million-token context and can output up to 384,000 tokens at a time. The most surprising thing about V4 Pro: at the current 75% discount, it's about 1/7 the price of Claude Opus 4.7 for similar frontier-class reasoning quality.
The current V4 lineup
As of 2026-05-05, V4 Pro is DeepSeek's flagship and the open-weights SOTA on agentic coding benchmarks. V4 Flash is the cheap-and-fast tier — its reasoning capability "closely approaches" V4 Pro per DeepSeek's own release notes.
deepseek-chat and deepseek-reasoner model IDs retire 2026-07-24.
V4 lineup at a glance
| Model | API ID | Total / active params | Context | Best for |
|---|---|---|---|---|
| DeepSeek V4 Pro flagship | deepseek-v4-pro |
1.6T / 49B (MoE) | 1M in, 384K out | Hardest reasoning, agentic coding, deep research, math/STEM. Open-source SOTA on agentic coding benchmarks. |
| DeepSeek V4 Flash cheap+fast | deepseek-v4-flash |
284B / 13B (MoE) | 1M in, 384K out | Reasoning that closely approaches V4 Pro at ~1/3 the cost. On par with V4 Pro for simple agent tasks. |
V4 Pro — deep dive
This is the model the rest of the manual is centered on. V4 Pro is DeepSeek's flagship as of April 24, 2026. Three things make it noteworthy:
| Area | What V4 Pro does |
|---|---|
| Architecture | 1.6T total parameters, 49B active via Mixture-of-Experts routing. Only ~3% of weights fire per forward pass — that's why inference is so cheap relative to dense models of comparable capability. |
| Context window | 1,000,000 input tokens with up to 384,000 output tokens per request — among the largest output limits anywhere. Output cap is meaningful: it lets you generate full-length codebases or long-form research reports in one call. |
| Agentic coding SOTA (open) | Per DeepSeek's release notes, V4 Pro achieves open-source SOTA on Agentic Coding benchmarks — multi-step planning, tool use, codebase navigation, and execution. This is the headline capability. |
| World knowledge | "Leads all current open models, trailing only Gemini 3.1 Pro" per DeepSeek. Strong for research-style queries and broad-knowledge tasks. |
| Reasoning | "Beats all current open models in Math/STEM/Coding." Both thinking and non-thinking modes available — toggle via thinking.type in the API. |
| License | MIT — most permissive license available. You can use the weights commercially, fork them, fine-tune, redistribute. The most capable open-weights frontier model in the world as of mid-2026. |
| Pricing | 75% discount through 2026-05-31 15:59 UTC. Discounted rates: $0.435 cache-miss input, $0.0036 cache-hit input, $0.87 output (per 1M tokens). List rates: $1.74 / $0.0145 / $3.48. At the discounted price, V4 Pro is roughly 1/7 the cost of Claude Opus 4.7. |
V4 Pro vs the competition (per-1M-token list price)
| Cache-miss input | Output | Context | |
|---|---|---|---|
| DeepSeek V4 Pro (discounted) | $0.435 | $0.87 | 1M |
| DeepSeek V4 Pro (list) | $1.74 | $3.48 | 1M |
| Claude Opus 4.7 | $5.00 | $25.00 | 200K |
| GPT-5.4 | $2.50 | $15.00 | ~400K |
| Gemini 3.1 Pro | $2.00 | $12.00 | 2M |
| Grok 4.3 | $1.25 | $2.50 | 1M |
| At list price V4 Pro is the cheapest frontier model on the table. At the current 75% discount it's not even close — under 1/5 the cost of Grok 4.3 input. Verify before billing-sensitive decisions. | |||
Release timeline (chronological)
| Date | Release | What changed |
|---|---|---|
| 2023-07 | DeepSeek founded | Spun out of High-Flyer Quant; mission to build open AGI from China. |
| 2023-11 | DeepSeek LLM (67B) | First public release. Dense 67B model. |
| 2024-05 | DeepSeek V2 | First MoE architecture from DeepSeek; major efficiency lift. 236B total / 21B active. |
| 2024-12 | DeepSeek V3 | 671B total / 37B active MoE. SOTA among open models on coding/math. |
| 2025-01 | DeepSeek R1 | Reasoning model with chain-of-thought training. Drove the "DeepSeek moment" in markets. |
| 2025-Q3 | V3.1 / V3.2 | Incremental V3 refresh; longer context, cheaper. |
| 2026-04-24 | DeepSeek V4 (Pro + Flash) | Current generation. 1.6T/49B (Pro), 284B/13B (Flash). 1M context default. SOTA open-source on agentic coding. |
| 2026-04-26 | Cache hit price cut | Cache-hit input prices reduced to 1/10 of launch pricing — strong incentive to use prefix caching. |
How to pick a DeepSeek model
Pick V4 Pro when…
- The task is genuinely hard reasoning, math, or scientific.
- You're doing agentic coding — multi-step planning, tool calls, codebase navigation.
- You need world-class research-grade knowledge, not just chat.
- You want frontier-class quality at the cheapest frontier price (especially during the 75% discount window).
- You need open weights — compliance, data residency, on-prem deployment.
Pick V4 Flash when…
- Cost dominates and quality requirements are moderate.
- Tasks are simpler — chat, classification, summarization, formatting.
- You need fast response times for user-facing features.
- You want thinking-mode reasoning at the lowest possible cost ($0.14/M cache-miss input).
- Simple agent tasks where V4 Flash is "on par with V4 Pro" per DeepSeek.
Pricing details (USD per 1M tokens)
| Model | Cache hit | Cache miss | Output | Notes |
|---|---|---|---|---|
| V4 Pro (discounted) | $0.0036 | $0.435 | $0.87 | 75% off through 2026-05-31 15:59 UTC |
| V4 Pro (list) | $0.0145 | $1.74 | $3.48 | Post-discount rate |
| V4 Flash | $0.0028 | $0.14 | $0.28 | Both modes |
Legacy & deprecated
| Model ID | Status | Migrate to |
|---|---|---|
deepseek-chat | Maps to V4 Flash non-thinking. Retires 2026-07-24 15:59 UTC. | deepseek-v4-flash (non-thinking) |
deepseek-reasoner | Maps to V4 Flash thinking. Retires 2026-07-24 15:59 UTC. | deepseek-v4-flash (thinking) |
| DeepSeek V3 / V3.1 / V3.2 | Older generation; available via providers. | deepseek-v4-pro or deepseek-v4-flash |
| DeepSeek R1 | Standalone reasoning model from V3-era; superseded. | deepseek-v4-pro with thinking enabled |
chat.deepseek.com is DeepSeek's free chat website. You don't need an API key — just sign in and chat. Two big toggles: DeepThink (turns on the model's "thinking out loud" reasoning, slower but smarter) and Search (lets the model look up current info on the web).
It's free with rate limits. Useful for testing prompts before you wire them into your app.
chat.deepseek.com — setup
- Visit chat.deepseek.com and sign in (email, Google, or Apple).
- Default model is V4 Flash. The chat interface routes you to V4 Flash by default for cost reasons. To force V4 Pro behavior, enable DeepThink — this turns on thinking-mode reasoning across whichever model is serving you.
- Toggles to know:
- DeepThink — turns on chain-of-thought reasoning. Slower, smarter. Use for math, hard reasoning, agentic planning.
- Search — turns on web grounding. The model will search and cite sources before answering.
- File upload — paste in PDFs, code files, docs. With 1M context you can upload large docs.
- No paid tier on chat.deepseek.com as of 2026-05. The chat surface is intentionally free; if you want guaranteed throughput or V4 Pro, use the API.
DeepThink & Search — when to use each
| Toggle | Turn it on for | Skip it for |
|---|---|---|
| DeepThink | Math problems, multi-step reasoning, code review, agentic planning, ambiguous questions, anything where "thinking" likely helps. | Quick chat, simple summaries, formatting tasks, casual Q&A. Wastes tokens; can over-think simple questions. |
| Search | Current events, recent product info, anything time-sensitive, fact-checking, citations needed. | Code, math, reasoning that doesn't need fresh data. Search adds latency and can introduce noise from low-quality sources. |
Optimal prompts for chat.deepseek.com
V4 models reward structure. Here are tested templates for the chat surface.
Hard reasoning / proof / math — DeepThink ON
Codebase review — DeepThink ON, file upload
Research synthesis — Search ON
The DeepSeek API is designed to be a drop-in replacement for OpenAI and Anthropic — same request shape, just point at api.deepseek.com and pass deepseek-v4-pro or deepseek-v4-flash as the model.
The two things you'll want to use right away: thinking mode (turn it on for hard problems) and prefix caching (massive savings if you reuse a long system prompt — cache hits are ~1/100 the price of cache misses).
Account & keys
- Visit platform.deepseek.com, sign in with your DeepSeek account.
- Add a payment method. Pay-as-you-go, billed in USD.
- Generate an API key from the dashboard. Treat it like a password — store in env vars, not in code.
- Verify access: hit the models endpoint to confirm your key works (see "First API call" below).
First API call
The DeepSeek endpoint is OpenAI-compatible. If you've used the OpenAI Python or JavaScript SDK, the only changes are base_url and model.
OpenAI vs Anthropic-compatible endpoints
DeepSeek exposes the same model behind two API shapes. Pick whichever your existing tooling is wired for — no quality difference between them.
| OpenAI-compatible | Anthropic-compatible | |
|---|---|---|
| Base URL | https://api.deepseek.com/v1 | https://api.deepseek.com/anthropic |
| Endpoint | /chat/completions | /v1/messages |
| SDK | openai | anthropic |
| System prompt | First message with role: system | Top-level system field |
| Tool calls | OpenAI tool-call schema | Anthropic tool-use blocks |
| Best when | You're already using OpenAI SDKs / proxies | You're already using Anthropic SDKs / Claude code paths |
Thinking vs non-thinking mode
Both V4 Pro and V4 Flash support a thinking mode where the model emits internal reasoning before the visible answer. You toggle it per-request.
| When | Mode | Why |
|---|---|---|
| Math, proofs, hard reasoning | Thinking | Materially better answers; the cost is more output tokens. |
| Agentic coding (multi-step plan + tool calls) | Thinking | Plan quality improves; mistakes drop. |
| Classification, formatting, simple chat | Non-thinking | Latency and cost win; quality nearly identical. |
| High-throughput pipelines | Non-thinking | Predictable token counts; faster end-to-end. |
max_tokens deliberately. With V4 Pro's 384K output cap, you have headroom — but you also have a bill.
Function calls & JSON mode
Both V4 models support function/tool calls and a strict JSON output mode using OpenAI-compatible schemas.
Context caching (prefix caching)
This is the biggest cost lever in the API. DeepSeek caches the prefix of your input automatically — repeated calls with the same opening prompt pay 1/100 to 1/400 the input rate for matched tokens.
| Model | Cache miss input | Cache hit input | Ratio |
|---|---|---|---|
| V4 Pro (discounted) | $0.435 /M | $0.0036 /M | ~120x cheaper |
| V4 Pro (list) | $1.74 /M | $0.0145 /M | ~120x cheaper |
| V4 Flash | $0.14 /M | $0.0028 /M | ~50x cheaper |
How to maximize hit rate:
- Stable system prompt first. Put your unchanging instructions and reference docs at the top — they get cached.
- Long context = bigger savings. If you're loading a 100K-token reference document on every call, caching turns that from $0.0435/call → $0.00036/call on V4 Pro discounted.
- Avoid leading with timestamps or per-request data. Anything that varies per call breaks the cache prefix. Move dynamic data to the end of the prompt.
- The cache hit price was reduced 10x on 2026-04-26 — DeepSeek is actively encouraging this pattern.
Migrating from V3 / R1 / deepseek-chat / deepseek-reasoner
| Old | New | Action |
|---|---|---|
deepseek-chat | deepseek-v4-flash (non-thinking) | Update model string. Behavior is similar; quality up. |
deepseek-reasoner | deepseek-v4-flash (thinking enabled) | Update model; add thinking.type=enabled. |
| DeepSeek V3 / V3.1 / V3.2 | deepseek-v4-pro or -flash | Major quality jump, especially on agentic / coding tasks. Re-tune prompts that relied on V3 quirks. |
| DeepSeek R1 | deepseek-v4-pro with thinking enabled | R1's role is now subsumed by V4 Pro's thinking mode — better reasoning, much wider knowledge. |
deepseek-chat and deepseek-reasoner are fully retired after 2026-07-24 15:59 UTC. Migrate before then; pin deepseek-v4-flash or deepseek-v4-pro in your config.
Good prompts have a role (who you want the AI to be), a goal (what good looks like), an audience (who's reading), and a format (how to lay it out). The builder below assembles those for you. Past that, the patterns are battle-tested templates you can copy.
Use-case library
Agentic coding (V4 Pro's headline strength)
Agentic code task — plan + execute + verify
Long-document analysis (using 1M context)
Document Q&A with citations
Hard reasoning / math
Step-by-step proof with sanity check
Prompt builder
Patterns library
The "spec-then-code" pattern (V4 Pro)
Before asking V4 Pro to write any non-trivial code, ask it to write the spec first. Then ask it to implement against the spec. Quality goes up dramatically because the spec catches missing requirements.
The "two-pass review" pattern
Use V4 Flash for the first pass (cheap, fast, catches obvious issues), then V4 Pro for the second pass (deep, expensive, catches subtle ones).
The "force the disagreement" pattern
V4 Pro can be agreeable. To stress-test a plan, ask it to argue the opposite case, then synthesize.