Two performance changes shipped together: the worker runs N concurrent claim slots (default 3) so the 10 generate_asset jobs per package don't serialize, and the OCR sample rate drops to 0.5 fps for the standard profile (premium stays at 1 fps). Combined effect: pipeline wall time falls from ~3.5–4 min to ~2–2.5 min on a typical 8-minute video.
A single package produces 10 generate_asset jobs (one per asset type — title set, description, chapters, tags, linkedin post, x post, x thread, article brief, newsletter summary, short clip plan). Each is a single LLM call. None of them touch any shared mutable state. Serializing them through a single worker slot was leaving ~65 s on the table per package.
| Phase | Before (single slot) | After (3 slots) | Savings |
|---|---|---|---|
| generate_asset × 10 | ~100 s | ~35 s | ~65 s (~3×) |
| analyze_intelligence (1 call) | ~10 s | ~10 s | — |
| thumbnail_concepts | ~7 s | ~7 s | — |
| Total pipeline wall | ~3.5–4 min | ~2–2.5 min | ~30% |
No new tables, no semaphores, no in-process locks. The Postgres queue already does the right thing — we just spawn N independent claim loops in the same process and let them race.
When three slots call claim() at the same instant, each opens its own transaction and runs the identical SELECT … FOR UPDATE SKIP LOCKED LIMIT 1. The first to reach a row locks it; the others skip past the locked row and grab the next unlocked one. No slot ever blocks, and no two slots ever return the same job.
// workers/runner.ts — abridged async function main() { const { kinds, concurrency, … } = parseArgs(process.argv.slice(2)); // One stale-lock reclaim timer for the whole process setInterval(reclaim, 30_000); // Spawn N independent slots — SKIP LOCKED handles mutex await Promise.all( Array.from({ length: concurrency }, (_, slot) => runSlot({ slot, lockedBy, kinds, … }) ) ); } async function runSlot({ slot, kinds, … }) { while (!shuttingDown) { const job = await claim(kinds, lockedBy); // SKIP LOCKED inside if (!job) { await sleep(idleMs); continue; } try { await HANDLERS[job.kind](job); await complete(job.id); } catch (err) { … } } }
With N parallel slots, the same handler kind can run concurrently. We checked each handler against the "could two of these stomp on each other?" question.
| Handler | Concurrency-safe? | Why |
|---|---|---|
generate_asset |
✓ yes | Each job writes ONE asset row per (package_id, asset_type). No two parallel jobs ever target the same row. |
analyze_intelligence |
✓ yes | Idempotency key analyze_intelligence:{sourceId}:{profile} means at most one runs per source. |
analyze_visual / fuse / transcribe_audio |
✓ yes | Same — single-source idempotency keys prevent any two from racing on the same source. |
dispatch |
✓ yes | One asset = one dispatch row (idempotency key dispatch:{assetId}). The youtube_direct branch reads sibling asset payloads but only runs post-approval, so generate_asset jobs aren't writing concurrently. |
markReadyForReviewIfComplete (lifecycle helper) |
✓ yes | Pure status-update — multiple calls produce the same final state regardless of ordering. |
recomputePackageDispatchState (lifecycle helper) |
✓ yes | Derives the package's dispatch state from the current assets table — order-independent. |
3 slots. Set in scripts/dev.sh:
WORKER_CONCURRENCY=3 # env var, picked up by --concurrency default
If you see lots of generate_asset jobs queued in the Jobs page:
# env at process start WORKER_CONCURRENCY=6 pnpm dev:all # OR via flag pnpm exec tsx workers/runner.ts --kinds … --concurrency 6
If you see 429 too many requests in the dispatch log or your provider's dashboard:
WORKER_CONCURRENCY=1 # back to serial — no parallelism
| Provider | Reasonable max concurrency | Notes |
|---|---|---|
| Anthropic (Claude) | 5–10 | Default RPM is high enough for a single-creator workload. Bumps available on request. |
| OpenAI | 3–8 | Depends on tier. Tier-1 accounts hit 429 around 3–4 concurrent for gpt-4o. |
| OpenRouter | 3–5 | Varies by upstream model; conservative default recommended. |
| Codex CLI (local subprocess) | 2–4 | Bounded by local CPU more than by API. 3 is comfortable on M-series. |
| LM Studio (local HTTP) | 1–2 | Most local servers queue requests internally; bumping concurrency doesn't help and may slow each call. |
max_concurrent column on llm_providers would let the worker auto-cap based on which provider is serving each call. Not built today — for now, the global WORKER_CONCURRENCY setting applies to all kinds equally.
The standard profile drops OCR from 1 fps → 0.5 fps. Premium stays at 1 fps. Halves the OCR subprocess wall on standard content with no measurable quality loss on overlay/lower-third detection. Today this saves time only on its own (since VLM is currently the bottleneck of the Visual phase), but it pairs perfectly with any future VLM-speed work.
fast_audio_onlystandard_audio_visualpremium_multimodal// workers/kinds/analyze_visual.ts const OCR_FPS_BY_PROFILE: Record<string, number> = { standard_audio_visual: 0.5, premium_multimodal: 1, }; const ocrFps = OCR_FPS_BY_PROFILE[profile] ?? 0.5; const ocrFrames = await sampleFrames({ inputPath: videoPath, outputDir: ocrFramesDir, fps: ocrFps, });
| Round | What landed | Pipeline wall | Δ from previous |
|---|---|---|---|
| baseline | fps=1 OCR + VLM, sequential | ~25–30 min | — |
| round 1 | A + B + C (scene-cut VLM, 768 px downscale, OCR ∥ VLM) | ~3.5–4 min | ~8× |
| round 2 (today) | D + E (worker concurrency, OCR 0.5 fps) | ~2–2.5 min | +30% on top |
Bars are pipeline wall time on a typical 8-minute video, log-spaced so the 25-min baseline and the 2-min current state both stay legible. Lower is better.