Author’s note (February 2026): The Batch API architecture described in this article worked reliably for the initial vault processing (782 files, 100% success rate). However, in ongoing production use for new video processing, the Batch API proved unreliable — 4+ hour completion times with no per-item progress, no cancellation support, and opaque failures. During a later migration of this project, the Batch API was replaced with
asyncio.TaskGroupparallel processing, reducing batch times from hours to minutes with per-item WebSocket progress and individual cancellation. The engineering lessons in this article — progressive scale testing, the indexing bug discovery, the dual-mode routing pattern — remain valid regardless of the underlying API.
782 files. 8 batches. 25 minutes. One indexing bug that nearly ruined everything.
By the Dotzlaw Team
The Moment of Truth
We submitted 782 files to Anthropic’s Batch API across 8 batches. Twenty-five minutes later: 100% success rate, 50% cost savings, every file processed and matched back to its source.
But getting there required catching a bug that would have silently corrupted every result.
This is the story of building a dual-mode API architecture that automatically chooses between real-time and batch processing — and the progressive testing strategy that saved us from shipping broken data to production.
The Deal: 50% Off
Anthropic’s Batch API offers a straightforward trade: accept asynchronous processing (up to 24 hours, though usually around 30 minutes), and pay half price.
| Feature | Standard API | Batch API |
|---|---|---|
| Cost | Full price | 50% discount |
| Processing | Immediate | Within 24 hours (usually ~30 min) |
| Rate limits | Per-minute throttling | Up to 100,000 requests |
| Timeout | 10 minutes | No timeout |
| Use case | Interactive | Background processing |
For a user waiting on a single video transcript, you need real-time results. But for batch processing — overnight jobs, bulk content cleanup, vault-wide curation — there is no reason to pay full price.
We built our system to use both, automatically.
Dual-Mode Architecture
The architecture makes the decision for you. One video? Synchronous API, instant results, full price. Two or more? Batch API, asynchronous processing, half the cost. The user never has to think about it.
Figure 1 — Dual-mode architecture: the system automatically routes single files to the real-time API and batches of 2+ files to the Batch API, automating the cost-saving decision.
The synchronous path is a standard Anthropic API call — nothing special. The interesting engineering is all on the batch side.
KEY INSIGHT: Let the system choose sync vs. batch automatically based on workload size. Users get the best price without making infrastructure decisions.
The Batch Workflow
The Batch API follows a four-phase cycle: prepare, submit, poll, retrieve.
Figure 2 — The batch lifecycle and the custom_id: each request cycles through prepare, submit, poll, and retrieve, with the custom_id encoding both position index and filename for traceability.
Prepare — Each file gets packaged into a JSONL request with a custom_id that encodes both its position and its name. This ID is the only thread connecting a result back to its source file, so we made it informative:
file_00042_Building_RAG_SystemsThat format — zero-padded global index plus a sanitized filename — means that when something goes wrong at position 42, you know immediately which file is affected without cross-referencing a lookup table.
KEY INSIGHT: Your
custom_idformat is your debugging lifeline. Encode enough information to diagnose problems without needing to consult external mappings.
Submit — One API call per batch. Anthropic returns a batch ID immediately and queues the work. For 782 files, we split into 8 batches of up to 100 requests each.
Poll — Check the batch status every 30 seconds. The API reports how many requests have succeeded, how many are still processing, and how many have errored. Our batches typically completed in 5 to 15 minutes each.
Retrieve — Stream the results back and match each one to its source file using the custom_id. Apply the generated content. Log any individual failures (batch requests can fail independently — always design for partial failure).
The multi-batch challenge is worth highlighting. When your job exceeds a single batch, you need to submit multiple batches, track their IDs, poll them in parallel, and merge the results. The workflow scales linearly, but the bookkeeping gets more complex — and that complexity is exactly where our worst bug was hiding.
The Bug That Almost Ruined Everything
Here is what was supposed to happen: 122 files split across two batches, each file tagged with a globally unique custom_id, results matched back perfectly.
Here is what actually happened.
The first batch processed files 0 through 99. The second batch processed files 100 through 121. When results came back, we matched each custom_id to our metadata dictionary to find the corresponding source file. Batch 1 worked perfectly. Batch 2 returned garbage.
Not errors. Not failures. Wrong files.
The results for file 100 were being applied to file 0. File 101’s content overwrote file 1. Every single result from the second batch was silently landing on the wrong file.
Figure 3 — The silent killer: the per-batch index reset caused File 100’s data to overwrite File 0. No errors were thrown — results silently landed on the wrong files.
The cause: our metadata dictionary was being rebuilt per-batch instead of maintained globally. When batch 2 was prepared, it reset its internal counter to zero. The custom_id said file_00100, but the metadata lookup table only knew about indices 0 through 21. The zero-padded index in the custom_id was correct, but the dictionary it mapped into was scoped to the wrong batch.
The fix was conceptually simple — build the metadata dictionary once across all batches using global indices, not per-batch local indices. But the implications of missing it were severe. If we had run this on the full 782-file production job without catching it first, every file after the 100th would have received the wrong content. No errors. No warnings. Just silently corrupted data across your entire vault.
We caught it because we tested at 122 files before going to 782. That was not an accident.
The Unicode Surprise
The second bug was less dangerous but more baffling.
Figure 4 — The Unicode surprise: an emoji in a filename crashed the entire batch run on Windows. The processing was fine — the logging killed it.
During a test run on Windows, the batch processing crashed mid-flight. Not a logic error, not an API failure — a console encoding error. Some of our Obsidian note filenames contained emoji characters. When the progress logger tried to print those filenames to the Windows console, Python’s default encoding choked and threw an exception.
The processing itself was fine. The API calls were fine. The results were fine. But the logging killed the entire run because an uncaught encoding error propagated up and terminated the process.
The fix was a few lines of encoding-safe output handling. But without progressive testing, we would have discovered this bug in the middle of an 8-batch production run, potentially with some batches completed and others abandoned — a messy partial state to recover from.
Progressive Testing
Both of those bugs — the silent data corruption and the console crash — were caught because we never ran our first batch on production data.
Figure 5 — Progressive testing: we tested at 122 files specifically to break the batch boundary. That decision saved the data.
| Run | Items | Purpose |
|---|---|---|
| 1 | 2 | Validate the pipeline works end-to-end |
| 2 | 6 | Test edge cases (short files, special characters) |
| 3 | 52 | First real folder at moderate scale |
| 4 | 72 | Parallel execution test |
| 5 | 122 | Multi-batch boundary test (caught the index bug) |
| 6 | 782 | Full production run |
Each escalation level was chosen deliberately. Run 5 at 122 items specifically targeted the multi-batch boundary — and that is exactly where the index mismatch surfaced. If we had jumped from 6 files straight to 782, we would have shipped corrupted data.
KEY INSIGHT: Progressive testing is non-negotiable for batch operations. You cannot inspect 782 results by hand. Test at every boundary where the system’s behavior changes.
Figure 6 — Artifacts of intelligence: the old note (left) had flat tags like “AI, python, pydanticAI” and a basic description. The new note (right) has hierarchical tags (ai/agents/frameworks, coding/languages/python), a comprehensive AI-generated description, and automatic bidirectional links to semantically related notes.
Figure 7 — Operationalizing the workflow: the production dashboard combines ingest (paste a URL), human-in-the-loop review (approve or reject suggested tags and topics), and semantic linking (auto-generated related notes) into a single interface.
The Numbers
Real numbers from our production run processing 1,028 files with Claude Haiku 3.5:
Figure 8 — The economics of intelligence: 50% savings on Haiku are modest in absolute terms, but the principle scales dramatically with more expensive models like Sonnet and Opus.
| Approach | Estimated Cost |
|---|---|
| Standard API | ~$3.00 |
| Batch API | ~$1.50 |
| Savings | $1.50 (50%) |
For a one-time vault cleanup, saving $1.50 is not life-changing. But this system processes new content continuously. At a pace of 100 videos per week, the savings compound:
| Timeframe | Standard API | Batch API | Savings |
|---|---|---|---|
| Weekly | $0.50 | $0.25 | $0.25 |
| Monthly | $2.00 | $1.00 | $1.00 |
| Yearly | $24.00 | $12.00 | $12.00 |
The absolute numbers are modest because Haiku is already cheap. The principle scales: if you are using Sonnet or Opus for heavier processing tasks, those 50% savings become substantial fast. The Batch API makes ongoing AI processing economically viable — especially for personal projects where every dollar matters.
The Series
This is Part 2 of a 5-part series on building an AI-powered knowledge management system:
- From YouTube to Knowledge Graph — Turning 1,000+ videos into an interconnected knowledge base for $1.50
- Anthropic Batch API in Production (this article) — 50% cost savings at scale, and the bug that almost corrupted everything
- Building a Semantic Note Network — Vector search turned 1,024 isolated notes into a dense knowledge graph
- Obsidian Vault Curation at Scale — Three years of tag chaos, fixed in 30 minutes for $1.50
- Ask Your Vault Anything — A RAG chatbot that answers from your notes in 2.5 seconds
Next: Building a Semantic Note Network — What happens when you teach a vector database to find connections humans would miss