DeepSeek Users Manual — V4 Pro • V4 Flash • API

🎈 ELI5

DeepSeek is a Chinese AI lab whose superpower is making smart models that are really, really cheap and open-weights (you can download them and run them yourself). Their newest model is DeepSeek V4 Pro — released April 24, 2026. It's a Mixture-of-Experts model with 1.6 trillion total parameters but only 49 billion "active" at a time, which is why it's so cheap to run.

If you're picking from the menu: V4 Pro for hard reasoning, agentic coding, deep research. V4 Flash for cheap, fast, simple stuff. Both have a 1 million-token context window and support thinking mode (where the model "thinks out loud" before answering).

Getting started in 60 seconds

Pick your door: chat.deepseek.com for the free chat app, platform.deepseek.com for an API key, api-docs.deepseek.com for the developer docs.
Sign in with email, Google, or Apple. The chat app is free with rate limits; the API is pay-as-you-go.
Pick a model: deepseek-v4-pro (flagship), deepseek-v4-flash (cheap+fast). Both support thinking (set thinking.type = "enabled") and non-thinking modes.
Tell DeepSeek what good looks like. Goal, audience, format. V4 Pro responds well to structured prompts with explicit constraints — it's a strong instruction-follower at frontier quality.

Which DeepSeek surface should I use?

chat.deepseek.com

Free consumer chat

Free with rate limits
DeepThink (reasoning) toggle
Web search toggle
File upload

API (platform.deepseek.com)

Pay-as-you-go developer

OpenAI ChatCompletions compatible
Anthropic Messages compatible
1M context, 384K max output
Function calls, JSON mode, prefix caching

Self-host (open weights)

MIT-licensed, downloadable

V4 Pro and V4 Flash both released open
Run on your own GPUs or via providers (OpenRouter, DeepInfra, etc.)
Useful for compliance / data residency
Heavy hardware needs for V4 Pro

Prompt fundamentals (DeepSeek edition)

Three things to remember when prompting V4 models:

Toggle thinking mode deliberately. For hard reasoning, math, coding, agentic planning — turn it on. For simple chat, classification, formatting — turn it off (you save tokens and latency, and quality is still strong).
The 1M context is real. You can fit an entire codebase or a 1000-page document. Use prefix caching so repeated long prefixes (system prompt + reference doc) cost ~1/100 of cache-miss input.
Direct, structured prompts win. V4 Pro likes explicit goals, constraints, and output formats. Few-shot examples help it match style. Chain-of-thought is mostly redundant if thinking mode is on.

Open weights matter Both V4 Pro (1.6T total / 49B active) and V4 Flash (284B / 13B active) are released under MIT license on Hugging Face. That makes them the most capable open-weights models in the world as of mid-2026 — and the only frontier-class option you can legally fork, modify, and self-host without a research-only or commercial-restricted license.

🎈 ELI5

DeepSeek's V4 lineup has two models right now, both released April 24, 2026: V4 Pro (the flagship — 1.6 trillion parameters, frontier-class brain) and V4 Flash (smaller — 284 billion params, much cheaper, surprisingly close in quality).

Both have a 1 million-token context and can output up to 384,000 tokens at a time. The most surprising thing about V4 Pro: at the current 75% discount, it's about 1/7 the price of Claude Opus 4.7 for similar frontier-class reasoning quality.

The current V4 lineup

As of 2026-05-05, V4 Pro is DeepSeek's flagship and the open-weights SOTA on agentic coding benchmarks. V4 Flash is the cheap-and-fast tier — its reasoning capability "closely approaches" V4 Pro per DeepSeek's own release notes.

Latest releases Both V4 Pro and V4 Flash launched 2026-04-24 as DeepSeek-V4 Preview. 1M context is the default across all official DeepSeek services. The legacy deepseek-chat and deepseek-reasoner model IDs retire 2026-07-24.

About these dates Dates pulled from DeepSeek's official V4 announcement and pricing page. Prices in USD per million tokens. Always confirm against api-docs.deepseek.com/quick_start/pricing before billing-sensitive decisions.

V4 lineup at a glance

Model	API ID	Total / active params	Context	Best for
DeepSeek V4 Pro flagship	`deepseek-v4-pro`	1.6T / 49B (MoE)	1M in, 384K out	Hardest reasoning, agentic coding, deep research, math/STEM. Open-source SOTA on agentic coding benchmarks.
DeepSeek V4 Flash cheap+fast	`deepseek-v4-flash`	284B / 13B (MoE)	1M in, 384K out	Reasoning that closely approaches V4 Pro at ~1/3 the cost. On par with V4 Pro for simple agent tasks.

V4 Pro — deep dive

This is the model the rest of the manual is centered on. V4 Pro is DeepSeek's flagship as of April 24, 2026. Three things make it noteworthy:

Area	What V4 Pro does
Architecture	1.6T total parameters, 49B active via Mixture-of-Experts routing. Only ~3% of weights fire per forward pass — that's why inference is so cheap relative to dense models of comparable capability.
Context window	1,000,000 input tokens with up to 384,000 output tokens per request — among the largest output limits anywhere. Output cap is meaningful: it lets you generate full-length codebases or long-form research reports in one call.
Agentic coding SOTA (open)	Per DeepSeek's release notes, V4 Pro achieves open-source SOTA on Agentic Coding benchmarks — multi-step planning, tool use, codebase navigation, and execution. This is the headline capability.
World knowledge	"Leads all current open models, trailing only Gemini 3.1 Pro" per DeepSeek. Strong for research-style queries and broad-knowledge tasks.
Reasoning	"Beats all current open models in Math/STEM/Coding." Both thinking and non-thinking modes available — toggle via `thinking.type` in the API.
License	MIT — most permissive license available. You can use the weights commercially, fork them, fine-tune, redistribute. The most capable open-weights frontier model in the world as of mid-2026.
Pricing	75% discount through 2026-05-31 15:59 UTC. Discounted rates: $0.435 cache-miss input, $0.0036 cache-hit input, $0.87 output (per 1M tokens). List rates: $1.74 / $0.0145 / $3.48. At the discounted price, V4 Pro is roughly 1/7 the cost of Claude Opus 4.7.

V4 Pro vs the competition (per-1M-token list price)

	Cache-miss input	Output	Context
DeepSeek V4 Pro (discounted)	$0.435	$0.87	1M
DeepSeek V4 Pro (list)	$1.74	$3.48	1M
Claude Opus 4.7	$5.00	$25.00	200K
GPT-5.4	$2.50	$15.00	~400K
Gemini 3.1 Pro	$2.00	$12.00	2M
Grok 4.3	$1.25	$2.50	1M
At list price V4 Pro is the cheapest frontier model on the table. At the current 75% discount it's not even close — under 1/5 the cost of Grok 4.3 input. Verify before billing-sensitive decisions.

Discount expires The 75% V4 Pro discount runs through 2026-05-31 15:59 UTC. Plan capacity assuming the list price ($1.74 in / $3.48 out) for production usage past that date — unless DeepSeek extends, which they have a history of doing.

Release timeline (chronological)

Date	Release	What changed
2023-07	DeepSeek founded	Spun out of High-Flyer Quant; mission to build open AGI from China.
2023-11	DeepSeek LLM (67B)	First public release. Dense 67B model.
2024-05	DeepSeek V2	First MoE architecture from DeepSeek; major efficiency lift. 236B total / 21B active.
2024-12	DeepSeek V3	671B total / 37B active MoE. SOTA among open models on coding/math.
2025-01	DeepSeek R1	Reasoning model with chain-of-thought training. Drove the "DeepSeek moment" in markets.
2025-Q3	V3.1 / V3.2	Incremental V3 refresh; longer context, cheaper.
2026-04-24	DeepSeek V4 (Pro + Flash)	Current generation. 1.6T/49B (Pro), 284B/13B (Flash). 1M context default. SOTA open-source on agentic coding.
2026-04-26	Cache hit price cut	Cache-hit input prices reduced to 1/10 of launch pricing — strong incentive to use prefix caching.

How to pick a DeepSeek model

Pick V4 Pro when…

The task is genuinely hard reasoning, math, or scientific.
You're doing agentic coding — multi-step planning, tool calls, codebase navigation.
You need world-class research-grade knowledge, not just chat.
You want frontier-class quality at the cheapest frontier price (especially during the 75% discount window).
You need open weights — compliance, data residency, on-prem deployment.

Pick V4 Flash when…

Cost dominates and quality requirements are moderate.
Tasks are simpler — chat, classification, summarization, formatting.
You need fast response times for user-facing features.
You want thinking-mode reasoning at the lowest possible cost ($0.14/M cache-miss input).
Simple agent tasks where V4 Flash is "on par with V4 Pro" per DeepSeek.

Match the model to the task Don't default to V4 Pro for everything. For triage, classification, simple rewrites, intent detection — V4 Flash is 3x cheaper on input and indistinguishable on quality. Reserve V4 Pro for tasks where reasoning depth or agentic-coding capability matters.

Pricing details (USD per 1M tokens)

Model	Cache hit	Cache miss	Output	Notes
V4 Pro (discounted)	$0.0036	$0.435	$0.87	75% off through 2026-05-31 15:59 UTC
V4 Pro (list)	$0.0145	$1.74	$3.48	Post-discount rate
V4 Flash	$0.0028	$0.14	$0.28	Both modes

Cache hits are nearly free Cache-hit input is roughly 1/100 of cache-miss input on V4 Pro ($0.0036 vs $0.435). If you have a long stable system prompt, a reference document, or a reused conversation prefix — make sure prefix caching is on. The savings are dramatic.

Legacy & deprecated

Model ID	Status	Migrate to
`deepseek-chat`	Maps to V4 Flash non-thinking. Retires 2026-07-24 15:59 UTC.	`deepseek-v4-flash` (non-thinking)
`deepseek-reasoner`	Maps to V4 Flash thinking. Retires 2026-07-24 15:59 UTC.	`deepseek-v4-flash` (thinking)
DeepSeek V3 / V3.1 / V3.2	Older generation; available via providers.	`deepseek-v4-pro` or `deepseek-v4-flash`
DeepSeek R1	Standalone reasoning model from V3-era; superseded.	`deepseek-v4-pro` with thinking enabled

🎈 ELI5

chat.deepseek.com is DeepSeek's free chat website. You don't need an API key — just sign in and chat. Two big toggles: DeepThink (turns on the model's "thinking out loud" reasoning, slower but smarter) and Search (lets the model look up current info on the web).

It's free with rate limits. Useful for testing prompts before you wire them into your app.

chat.deepseek.com — setup

Visit chat.deepseek.com and sign in (email, Google, or Apple).
Default model is V4 Flash. The chat interface routes you to V4 Flash by default for cost reasons. To force V4 Pro behavior, enable DeepThink — this turns on thinking-mode reasoning across whichever model is serving you.
Toggles to know:
- DeepThink — turns on chain-of-thought reasoning. Slower, smarter. Use for math, hard reasoning, agentic planning.
- Search — turns on web grounding. The model will search and cite sources before answering.
- File upload — paste in PDFs, code files, docs. With 1M context you can upload large docs.
No paid tier on chat.deepseek.com as of 2026-05. The chat surface is intentionally free; if you want guaranteed throughput or V4 Pro, use the API.

DeepThink & Search — when to use each

Toggle	Turn it on for	Skip it for
DeepThink	Math problems, multi-step reasoning, code review, agentic planning, ambiguous questions, anything where "thinking" likely helps.	Quick chat, simple summaries, formatting tasks, casual Q&A. Wastes tokens; can over-think simple questions.
Search	Current events, recent product info, anything time-sensitive, fact-checking, citations needed.	Code, math, reasoning that doesn't need fresh data. Search adds latency and can introduce noise from low-quality sources.

Optimal prompts for chat.deepseek.com

V4 models reward structure. Here are tested templates for the chat surface.

Hard reasoning / proof / math — DeepThink ON

Hard reasoning I want you to think carefully through the problem below. Show your reasoning step by step before giving the final answer. If you spot multiple approaches, list them and pick the one with the strongest justification. Problem: [your problem] Output format: 1. Restate the problem in your own words. 2. Identify the key constraints / unknowns. 3. Walk through 1-2 candidate approaches. 4. Pick one and execute it fully. 5. Sanity-check the result.

Codebase review — DeepThink ON, file upload

Code review You are a senior reviewer. Review the attached codebase for: 1. Correctness bugs (off-by-one, race conditions, error handling gaps). 2. Security issues (injection, authz, secrets in code, unsafe deserialization). 3. Architecture smells (cyclic deps, leaky abstractions, premature abstraction). 4. Performance hot-paths (N+1 queries, blocking I/O, allocation in loops). For each finding: file:line — severity (critical/high/medium/low) — one-sentence rationale — suggested fix. Skip nitpicks (style, naming) unless they obscure intent.

Research synthesis — Search ON

Research Research the topic below and produce a structured brief. Topic: [your topic] Sources: prefer official docs, peer-reviewed papers, primary reporting. Avoid SEO content farms and low-quality aggregators. Output: - 3-bullet executive summary - Key facts (with citations inline) - Open questions / what's still unknown - 5 most important sources to read next, with one-sentence why-it-matters for each

🎈 ELI5

The DeepSeek API is designed to be a drop-in replacement for OpenAI and Anthropic — same request shape, just point at api.deepseek.com and pass deepseek-v4-pro or deepseek-v4-flash as the model.

The two things you'll want to use right away: thinking mode (turn it on for hard problems) and prefix caching (massive savings if you reuse a long system prompt — cache hits are ~1/100 the price of cache misses).

Account & keys

Visit platform.deepseek.com, sign in with your DeepSeek account.
Add a payment method. Pay-as-you-go, billed in USD.
Generate an API key from the dashboard. Treat it like a password — store in env vars, not in code.
Verify access: hit the models endpoint to confirm your key works (see "First API call" below).

First API call

The DeepSeek endpoint is OpenAI-compatible. If you've used the OpenAI Python or JavaScript SDK, the only changes are base_url and model.

Python — OpenAI SDK from openai import OpenAI client = OpenAI( api_key="YOUR_DEEPSEEK_KEY", base_url="https://api.deepseek.com/v1", ) resp = client.chat.completions.create( model="deepseek-v4-pro", messages=[ {"role": "system", "content": "You are a senior code reviewer."}, {"role": "user", "content": "Review this Python function for bugs: ..."}, ], ) print(resp.choices[0].message.content)

curl curl https://api.deepseek.com/v1/chat/completions \ -H "Authorization: Bearer $DEEPSEEK_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-v4-pro", "messages": [{"role":"user","content":"Hello, V4 Pro."}] }'

OpenAI vs Anthropic-compatible endpoints

DeepSeek exposes the same model behind two API shapes. Pick whichever your existing tooling is wired for — no quality difference between them.

	OpenAI-compatible	Anthropic-compatible
Base URL	`https://api.deepseek.com/v1`	`https://api.deepseek.com/anthropic`
Endpoint	`/chat/completions`	`/v1/messages`
SDK	`openai`	`anthropic`
System prompt	First message with `role: system`	Top-level `system` field
Tool calls	OpenAI tool-call schema	Anthropic tool-use blocks
Best when	You're already using OpenAI SDKs / proxies	You're already using Anthropic SDKs / Claude code paths

Thinking vs non-thinking mode

Both V4 Pro and V4 Flash support a thinking mode where the model emits internal reasoning before the visible answer. You toggle it per-request.

Enable thinking (OpenAI shape) { "model": "deepseek-v4-pro", "messages": [...], "extra_body": { "thinking": { "type": "enabled" } } }

When	Mode	Why
Math, proofs, hard reasoning	Thinking	Materially better answers; the cost is more output tokens.
Agentic coding (multi-step plan + tool calls)	Thinking	Plan quality improves; mistakes drop.
Classification, formatting, simple chat	Non-thinking	Latency and cost win; quality nearly identical.
High-throughput pipelines	Non-thinking	Predictable token counts; faster end-to-end.

Output token budget Thinking-mode responses can consume large output budgets (the reasoning trace counts as output). Set max_tokens deliberately. With V4 Pro's 384K output cap, you have headroom — but you also have a bill.

Function calls & JSON mode

Both V4 models support function/tool calls and a strict JSON output mode using OpenAI-compatible schemas.

Tool call (OpenAI shape) { "model": "deepseek-v4-pro", "messages": [{"role":"user","content":"What's the weather in Tokyo?"}], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get current weather for a city", "parameters": { "type": "object", "properties": { "city": {"type": "string"} }, "required": ["city"] } } }] }

JSON mode { "model": "deepseek-v4-pro", "messages": [ {"role":"system","content":"You output only valid JSON matching the schema."}, {"role":"user","content":"Extract {name, email, role} from this resume: ..."} ], "response_format": { "type": "json_object" } }

Combine thinking + tool calls Thinking mode plus tool calls is the strongest setup for agentic coding workflows — the model plans which tools to invoke, executes, observes, then plans again. This is the configuration DeepSeek tested for "open-source SOTA on Agentic Coding benchmarks."

Context caching (prefix caching)

This is the biggest cost lever in the API. DeepSeek caches the prefix of your input automatically — repeated calls with the same opening prompt pay 1/100 to 1/400 the input rate for matched tokens.

Model	Cache miss input	Cache hit input	Ratio
V4 Pro (discounted)	$0.435 /M	$0.0036 /M	~120x cheaper
V4 Pro (list)	$1.74 /M	$0.0145 /M	~120x cheaper
V4 Flash	$0.14 /M	$0.0028 /M	~50x cheaper

How to maximize hit rate:

Stable system prompt first. Put your unchanging instructions and reference docs at the top — they get cached.
Long context = bigger savings. If you're loading a 100K-token reference document on every call, caching turns that from $0.0435/call → $0.00036/call on V4 Pro discounted.
Avoid leading with timestamps or per-request data. Anything that varies per call breaks the cache prefix. Move dynamic data to the end of the prompt.
The cache hit price was reduced 10x on 2026-04-26 — DeepSeek is actively encouraging this pattern.

Migrating from V3 / R1 / deepseek-chat / deepseek-reasoner

Old	New	Action
`deepseek-chat`	`deepseek-v4-flash` (non-thinking)	Update `model` string. Behavior is similar; quality up.
`deepseek-reasoner`	`deepseek-v4-flash` (thinking enabled)	Update `model`; add `thinking.type=enabled`.
DeepSeek V3 / V3.1 / V3.2	`deepseek-v4-pro` or `-flash`	Major quality jump, especially on agentic / coding tasks. Re-tune prompts that relied on V3 quirks.
DeepSeek R1	`deepseek-v4-pro` with thinking enabled	R1's role is now subsumed by V4 Pro's thinking mode — better reasoning, much wider knowledge.

Hard deadline deepseek-chat and deepseek-reasoner are fully retired after 2026-07-24 15:59 UTC. Migrate before then; pin deepseek-v4-flash or deepseek-v4-pro in your config.

🎈 ELI5

Good prompts have a role (who you want the AI to be), a goal (what good looks like), an audience (who's reading), and a format (how to lay it out). The builder below assembles those for you. Past that, the patterns are battle-tested templates you can copy.

Use-case library

Agentic coding (V4 Pro's headline strength)

Agentic code task — plan + execute + verify

Agentic coding You are a senior engineer working in an agentic loop. You have access to: read_file, write_file, run_tests, search_repo. Task: [describe the change] Process: 1. PLAN — Read the relevant files. Restate the task in your own words. List the changes you'll make in order. Identify risks. 2. EXECUTE — Make changes one file at a time. After each, run tests if relevant. 3. VERIFY — Re-read the changed files. Confirm the change matches the plan. Run the full test suite. If anything fails, diagnose and fix before declaring done. Output: emit reasoning before tool calls; emit a final summary at the end with files changed, tests run, and any caveats.

Long-document analysis (using 1M context)

Document Q&A with citations

Long-doc Q&A The document below is your only source of truth. Answer the question using only information in the document. For every claim, cite the page or section number from the document. If the document doesn't contain the answer, say so explicitly — do not guess. Document: """ [paste the document — V4 Pro can handle ~700K words / 1M tokens] """ Question: [your question]

Hard reasoning / math

Step-by-step proof with sanity check

Math proof Solve the problem below. Use thinking mode. Show your reasoning step by step. After the proof, run a sanity check: pick a concrete instance, plug it into your result, and verify by hand. Problem: [your problem] Required sections: 1. Reasoning 2. Final result 3. Sanity check 4. Confidence (high/medium/low) and why

Prompt builder

Goal Role (optional) Audience Topic / input Output format Constraints (optional) Thinking mode

Patterns library

The "spec-then-code" pattern (V4 Pro)

Before asking V4 Pro to write any non-trivial code, ask it to write the spec first. Then ask it to implement against the spec. Quality goes up dramatically because the spec catches missing requirements.

Spec → code Before writing any code, write a 1-page spec for the feature below. Include: - User-visible behavior (what the user sees / does) - Inputs / outputs / errors - Edge cases I might not have thought of - 3 acceptance tests (in plain English) Wait for me to approve before implementing. Feature: [your feature]

The "two-pass review" pattern

Use V4 Flash for the first pass (cheap, fast, catches obvious issues), then V4 Pro for the second pass (deep, expensive, catches subtle ones).

The "force the disagreement" pattern

V4 Pro can be agreeable. To stress-test a plan, ask it to argue the opposite case, then synthesize.

Force disagreement I'm planning to do [X]. Before agreeing or helping, argue the strongest case AGAINST [X]. Steelman it — I want the version of the counter-argument I'd struggle to refute. Then, after laying out the counter-case, give your honest synthesis: do I do [X], or not? You're allowed to say "don't."