<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Anthropic on MailMiner Agent Blog</title><link>https://mailmineragent.com/tags/anthropic/</link><description>Recent content in Anthropic on MailMiner Agent Blog</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 27 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://mailmineragent.com/tags/anthropic/index.xml" rel="self" type="application/rss+xml"/><item><title>Why ClaudeCode / OpenCode + DeepSeek Cannot Unlock DeepSeek's Ultra-Low Cache Discounts</title><link>https://mailmineragent.com/posts/why-claudecode-opencode-deepseek-cache-mismatch/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://mailmineragent.com/posts/why-claudecode-opencode-deepseek-cache-mismatch/</guid><description>A critical architecture mismatch between segmented cache_control agents and strict full-prefix automatic caching — and why mixing these stacks wastes your biggest cost-saving feature.</description><content:encoded><![CDATA[<h2 id="introduction">Introduction</h2>
<p>DeepSeek&rsquo;s disk-based automatic context caching is famous for <strong>near 90% input token savings</strong>: cached prefix tokens cost just a tiny fraction of standard input pricing, with zero manual configuration required. Thousands of developers switch to DeepSeek chasing this aggressive discount for long system prompts, code rules, and repeated tool definitions.</p>
<p>But a costly reality hits teams running <strong>ClaudeCode / OpenCode (code agent runtimes built for Anthropic-style <code>cache_control</code>)</strong> against the DeepSeek API:</p>
<blockquote>
<p>Even with DeepSeek caching enabled globally, your cache hit rate collapses to near-zero, and you never see the promised ultra-low cached token billing.</p>
</blockquote>
<p>This is not a bug, nor misconfiguration. It is a fundamental <strong>architectural incompatibility</strong> between two entirely different caching paradigms: Anthropic&rsquo;s manual segmented block caching, and DeepSeek&rsquo;s rigid full-sequence prefix-only matching.</p>
<p>In this post, we break down the mechanics, agent workflow pain points, and why mixing these stacks wastes your biggest cost-saving feature.</p>
<hr>
<h2 id="1-core-background-how-each-caching-system-works">1. Core Background: How Each Caching System Works</h2>
<h3 id="11-deepseek-automatic-prefix-cache-strict-rule-set">1.1 DeepSeek Automatic Prefix Cache (Strict Rule Set)</h3>
<p>DeepSeek enables caching for all API keys by default, with one non-negotiable matching rule:</p>
<p>✅ <strong>A cache hit only triggers when the full token sequence starts identical from index <code>0</code> (the very first token).</strong></p>
<ul>
<li>The entire <code>messages[]</code> array must be an exact prefix extension: new content can only be <strong>appended to the END</strong> of the list.</li>
<li>Any insertion, deletion, or content change <em>anywhere before the final position</em> breaks the full prefix hash → <strong>full cache miss</strong>.</li>
<li>No manual tagging, no custom breakpoints, no separate cache segments; the entire message chain is treated as one single prefix unit.</li>
<li>Pricing benefit: Miss = full standard input cost; Hit = ultra-low discounted rate for matched prefix tokens.</li>
</ul>
<h3 id="12-anthropic-cache_control-segmented-block-caching-what-opencode--claudecode-relies-on">1.2 Anthropic <code>cache_control</code> Segmented Block Caching (What OpenCode / ClaudeCode Relies On)</h3>
<p>Anthropic designed <code>cache_control</code> explicitly for dynamic agent workflows:</p>
<ul>
<li>Developers add a <code>cache_control</code> tag inside <strong>individual content blocks</strong> (system prompts, tool definitions, static rule chunks) to create independent cache segments.</li>
<li>Up to four isolated cache breakpoints per request; each segment has its own TTL (<code>ephemeral</code> / <code>long_lived</code>) and independent storage.</li>
<li>Critical advantage: <strong>Modifications to later blocks do NOT invalidate earlier cached segments</strong>. If you insert <code>tool_use</code> / <code>tool_result</code> messages between marked static blocks, the pre-tagged system/tool definitions remain cached at discounted pricing.</li>
</ul>
<p>OpenCode / ClaudeCode are hardcoded to inject these <code>cache_control</code> markers automatically for long system rules, code guidelines, and tool schemas — this is their core cost-optimization logic for multi-turn code agents.</p>
<h3 id="the-first-hard-block-deepseek-ignores-cache_control-entirely">The First Hard Block: DeepSeek Ignores <code>cache_control</code> Entirely</h3>
<p>When OpenCode sends requests with embedded <code>cache_control</code> fields:</p>
<ol>
<li>DeepSeek&rsquo;s API silently <strong>drops the unknown field</strong> and does not parse any manual segment tags.</li>
<li>No independent blocks are created; the entire <code>messages</code> list is still evaluated as one single full prefix.</li>
<li>LiteLLM / proxy gateways also strip the field before forwarding to avoid invalid parameter errors.</li>
</ol>
<p>Your agent&rsquo;s intelligent segmented caching logic becomes completely invisible to DeepSeek.</p>
<hr>
<h2 id="2-why-code-agent-workflows-destroy-deepseeks-prefix-match">2. Why Code Agent Workflows Destroy DeepSeek&rsquo;s Prefix Match</h2>
<p>Standard code agents (ClaudeCode / OpenCode) run a repeating loop that <strong>guarantees middle-position message insertion</strong> — the exact scenario that breaks full-prefix caching.</p>
<h3 id="step-by-step-agent-loop-breakdown">Step-by-Step Agent Loop Breakdown</h3>
<ol>
<li>
<p><strong>Initial request:</strong>
<code>[SystemPrompt (code rules) → User task]</code>
DeepSeek caches this full 2-block prefix after first call.</p>
</li>
<li>
<p>Model returns <code>assistant</code> with <code>tool_calls</code> (file read, shell run, code edit).</p>
</li>
<li>
<p><strong>Critical breaking step</strong>: Your agent appends a standalone <code>role: tool</code> message <strong>between the last assistant and the next user message</strong>, not only at the list tail.</p>
<p>New full sequence:
<code>[System → User → Assistant(tool_call) → ToolResult]</code></p>
</li>
</ol>
<h4 id="what-happens-on-deepseek-side">What happens on DeepSeek side:</h4>
<ul>
<li>The original cached prefix was length <code>3</code> items; new request has <code>4</code> items total.</li>
<li>Even though the first three messages are textually identical, the <strong>overall sequence length and array structure differ</strong> from the stored prefix hash.</li>
<li>Result: <strong>100% cache miss</strong>; you pay full price for the entire long system prompt every round.</li>
</ul>
<h3 id="additional-failure-modes-in-multi-agent-setups">Additional Failure Modes in Multi-Agent Setups</h3>
<p>Most code platforms run <strong>multiple specialized agents</strong>, each with its own unique system prompt:</p>
<ul>
<li>Agent A: Code writer system rules</li>
<li>Agent B: Linter &amp; reviewer system rules</li>
<li>Agent C: Shell executor rules</li>
</ul>
<p>With DeepSeek prefix caching:</p>
<ul>
<li>Every unique system prompt creates a separate cache entry.</li>
<li>No cross-agent sharing of common content (shared tool definitions, global coding constraints), because the starting <code>system</code> block differs per agent.</li>
<li>Cache storage fills rapidly with fragmented entries; LRU eviction purges frequently used static prompts, worsening miss rates further.</li>
</ul>
<h3 id="dynamic-variables-inside-system-prompts-kill-consistency">Dynamic Variables Inside System Prompts Kill Consistency</h3>
<p>OpenCode commonly injects real-time variables into system prompts:</p>
<ul>
<li>Current date (resets daily)</li>
<li>Project working directory / file paths (switches per workspace)</li>
</ul>
<p>Even minor text changes at the <strong>start of the system block</strong> rewrite the full prefix hash. DeepSeek cannot isolate the fixed rule portion; the entire thousands-of-token prompt misses cache overnight or on workspace switch.</p>
<blockquote>
<p>With Anthropic segmented caching: only the small dynamic date/path segment re-runs the write premium; massive static code rules stay cached daily.</p>
</blockquote>
<hr>
<h2 id="3-the-cost-gap-real-world-comparison">3. The Cost Gap: Real-World Comparison</h2>
<h3 id="scenario-15-turn-code-agent-run--12k-token-static-system-prompt">Scenario: 15-turn code agent run | 12k-token static system prompt</h3>
<h4 id="a-native-claude--cache_control">A) Native Claude + <code>cache_control</code></h4>
<ul>
<li>1x write premium for system/tool blocks</li>
<li>Next 14 rounds: static segments hit 10% discounted read pricing</li>
<li>Total input cost: ~2.15 × base price</li>
</ul>
<h4 id="b-opencode--deepseek-default-deployment">B) OpenCode + DeepSeek (default deployment)</h4>
<ul>
<li>Every tool insertion = full cache miss on all turns</li>
<li>You pay full standard input cost for the 12k system prompt <strong>15 times consecutively</strong></li>
<li>Total input cost: 15 × base price → <strong>~7x more expensive</strong> than expected DeepSeek discount</li>
</ul>
<h4 id="c-pure-deepseek-simple-chat-only-tail-appended-messages">C) Pure DeepSeek simple chat (only tail-appended messages)</h4>
<ul>
<li>Stable full prefix hit every turn</li>
<li>Total input cost: ~1.8 × base price (maxed DeepSeek discount)</li>
</ul>
<p>The agent workflow eliminates all DeepSeek economic benefits entirely.</p>
<hr>
<h2 id="4-can-we-fix-this-with-workarounds">4. Can We Fix This With Workarounds?</h2>
<h3 id="workaround-1-remove-all-cache_control-injection">Workaround 1: Remove all <code>cache_control</code> injection</h3>
<p>Disabling automatic tagging makes requests valid for DeepSeek, but does <strong>not solve the core prefix-break issue</strong> during tool calls. Hit rates remain extremely low.</p>
<h3 id="workaround-2-force-all-dynamic-content-to-the-very-end-of-messages">Workaround 2: Force all dynamic content to the very end of <code>messages[]</code></h3>
<p>Move dates, paths, and variable data strictly after all static system rules and history. This slightly improves hit rates for simple chats, but <strong>cannot fix middle <code>tool</code> message insertion</strong> in agent loops.</p>
<h3 id="workaround-3-pre-warm-fixed-prefixes">Workaround 3: Pre-warm fixed prefixes</h3>
<p>Pre-send requests for all agent system templates to populate cache ahead of traffic. This helps static one-off calls but fails for tool loops, as insertion still invalidates matches.</p>
<h3 id="hard-truth">Hard Truth</h3>
<p>There is <strong>no reliable workaround</strong> to make Anthropic-style segmented agents work with DeepSeek full-prefix caching. The two systems have opposing design constraints.</p>
<hr>
<h2 id="5-two-valid-deployment-options">5. Two Valid Deployment Options</h2>
<h3 id="option-a-keep-opencode--claudecode--use-anthropic--minimax-natively">Option A: Keep OpenCode / ClaudeCode → Use Anthropic / MiniMax natively</h3>
<p>These models natively support <code>cache_control</code> block segmentation. Tool insertions only affect variable segments; static system/tool definitions retain discounted reads. This matches your agent runtime&rsquo;s built-in optimization logic perfectly.</p>
<h3 id="option-b-keep-deepseek--rewrite-agent-logic-for-strict-full-prefix-workflow">Option B: Keep DeepSeek → Rewrite agent logic for strict full-prefix workflow</h3>
<p>Mandate these rules for your agent:</p>
<ol>
<li>Never insert <code>tool</code> messages anywhere except the absolute end of the message array.</li>
<li>Freeze the full system prompt structure; avoid dynamic dates/paths inside the leading system block.</li>
<li>Disable all <code>cache_control</code> injection in OpenCode.</li>
</ol>
<p>This enables DeepSeek&rsquo;s automatic caching, but sacrifices flexible multi-agent &amp; complex tool workflows.</p>
<h3 id="never-recommend-hybrid-opencode--deepseek">Never Recommend: Hybrid <code>OpenCode + DeepSeek</code></h3>
<p>It combines the overhead of an agent built for segmented caching with a model that cannot honor that logic — you pay double engineering cost with zero discount gains.</p>
<hr>
<h2 id="conclusion">Conclusion</h2>
<p>DeepSeek&rsquo;s automatic prefix caching delivers industry-leading savings <strong>only for simple sequential conversations where new messages are exclusively appended to the end</strong>.</p>
<p>Runtimes like ClaudeCode / OpenCode are engineered around Anthropic&rsquo;s flexible <code>cache_control</code> block tagging, designed for dynamic agent loops with mid-sequence tool message insertion. When paired with DeepSeek:</p>
<ol>
<li><code>cache_control</code> tags are ignored; no segmented caching occurs.</li>
<li>Tool result insertion breaks the mandatory full-start prefix match every turn.</li>
<li>Cache hit rates plummet, and you never receive DeepSeek&rsquo;s advertised ultra-low cached token pricing.</li>
</ol>
<p>Choose your stack based on caching architecture, not just per-token sticker price:</p>
<ul>
<li><strong>Complex code agents with frequent tool calls</strong> → Anthropic / MiniMax (<code>cache_control</code> native support)</li>
<li><strong>Simple long-chat workloads with fixed trailing history</strong> → DeepSeek automatic prefix cache</li>
</ul>
<hr>
<h3 id="author-note">Author Note</h3>
<p>If you audit your API usage dashboards and see <code>prompt_cache_hit_tokens</code> near zero despite enabling DeepSeek caching, this architecture mismatch is almost certainly the root cause.</p>
]]></content:encoded></item></channel></rss>