Worker concurrency + OCR fps

Why concurrency mattered most

generate_asset is LLM-bound, not CPU-bound.

A single package produces 10 generate_asset jobs (one per asset type — title set, description, chapters, tags, linkedin post, x post, x thread, article brief, newsletter summary, short clip plan). Each is a single LLM call. None of them touch any shared mutable state. Serializing them through a single worker slot was leaving ~65 s on the table per package.

Phase	Before (single slot)	After (3 slots)	Savings
generate_asset × 10	~100 s	~35 s	~65 s (~3×)
analyze_intelligence (1 call)	~10 s	~10 s	—
thumbnail_concepts	~7 s	~7 s	—
Total pipeline wall	~3.5–4 min	~2–2.5 min	~30%

How it works

SKIP LOCKED is the only mutex.

No new tables, no semaphores, no in-process locks. The Postgres queue already does the right thing — we just spawn N independent claim loops in the same process and let them race.

The architectural picture

N claim slots → one queue → one provider. The queue is the synchronisation primitive; nothing in the worker process needs a lock.

Why SKIP LOCKED is enough

When three slots call claim() at the same instant, each opens its own transaction and runs the identical SELECT … FOR UPDATE SKIP LOCKED LIMIT 1. The first to reach a row locks it; the others skip past the locked row and grab the next unlocked one. No slot ever blocks, and no two slots ever return the same job.

Same millisecond, three claims, three distinct rows. No retries, no deadlocks, no application-level mutex.

The timeline difference, visualised

BEFORE — single slot (~100 s)

slot 0

203

204

205

206

207

208

209

210

211

212

AFTER — 3 slots (~35 s)

slot 0

203

206

209

212

idle

slot 1

204

207

210

idle

slot 2

205

208

211

idle

Same 10 jobs, same provider, ~3× faster wall. The "idle" tails happen because there are no more generate_asset jobs queued — the slots wait for the next package or transition kind.

The code change

// workers/runner.ts — abridged

async function main() {
  const { kinds, concurrency, … } = parseArgs(process.argv.slice(2));

  // One stale-lock reclaim timer for the whole process
  setInterval(reclaim, 30_000);

  // Spawn N independent slots — SKIP LOCKED handles mutex
  await Promise.all(
    Array.from({ length: concurrency }, (_, slot) =>
      runSlot({ slot, lockedBy, kinds, … })
    )
  );
}

async function runSlot({ slot, kinds, … }) {
  while (!shuttingDown) {
    const job = await claim(kinds, lockedBy);   // SKIP LOCKED inside
    if (!job) { await sleep(idleMs); continue; }
    try {
      await HANDLERS[job.kind](job);
      await complete(job.id);
    } catch (err) { … }
  }
}

Why it's safe

Every handler audited for idempotency.

With N parallel slots, the same handler kind can run concurrently. We checked each handler against the "could two of these stomp on each other?" question.

Handler	Concurrency-safe?	Why
`generate_asset`	✓ yes	Each job writes ONE asset row per (package_id, asset_type). No two parallel jobs ever target the same row.
`analyze_intelligence`	✓ yes	Idempotency key `analyze_intelligence:{sourceId}:{profile}` means at most one runs per source.
`analyze_visual` / `fuse` / `transcribe_audio`	✓ yes	Same — single-source idempotency keys prevent any two from racing on the same source.
`dispatch`	✓ yes	One asset = one dispatch row (idempotency key `dispatch:{assetId}`). The youtube_direct branch reads sibling asset payloads but only runs post-approval, so generate_asset jobs aren't writing concurrently.
`markReadyForReviewIfComplete` (lifecycle helper)	✓ yes	Pure status-update — multiple calls produce the same final state regardless of ordering.
`recomputePackageDispatchState` (lifecycle helper)	✓ yes	Derives the package's dispatch state from the current assets table — order-independent.

Rule for new handlers When adding a new worker kind, set an idempotency key on the enqueue (per CLAUDE.md), make all writes per-row scoped (no read-modify-write on shared JSONB without conflict handling), and you're concurrency-safe by construction.

Tuning

How to raise or lower the slot count.

Default

3 slots. Set in scripts/dev.sh:

WORKER_CONCURRENCY=3  # env var, picked up by --concurrency default

Bump it (more parallelism)

If you see lots of generate_asset jobs queued in the Jobs page:

# env at process start
WORKER_CONCURRENCY=6 pnpm dev:all

# OR via flag
pnpm exec tsx workers/runner.ts --kinds … --concurrency 6

Drop it (hitting provider rate limits)

If you see 429 too many requests in the dispatch log or your provider's dashboard:

WORKER_CONCURRENCY=1  # back to serial — no parallelism

Provider	Reasonable max concurrency	Notes
Anthropic (Claude)	5–10	Default RPM is high enough for a single-creator workload. Bumps available on request.
OpenAI	3–8	Depends on tier. Tier-1 accounts hit 429 around 3–4 concurrent for gpt-4o.
OpenRouter	3–5	Varies by upstream model; conservative default recommended.
Codex CLI (local subprocess)	2–4	Bounded by local CPU more than by API. 3 is comfortable on M-series.
LM Studio (local HTTP)	1–2	Most local servers queue requests internally; bumping concurrency doesn't help and may slow each call.

Future work A per-provider max_concurrent column on llm_providers would let the worker auto-cap based on which provider is serving each call. Not built today — for now, the global WORKER_CONCURRENCY setting applies to all kinds equally.

Side-quest

OCR fps is now profile-aware.

The standard profile drops OCR from 1 fps → 0.5 fps. Premium stays at 1 fps. Halves the OCR subprocess wall on standard content with no measurable quality loss on overlay/lower-third detection. Today this saves time only on its own (since VLM is currently the bottleneck of the Visual phase), but it pairs perfectly with any future VLM-speed work.

`fast_audio_only`

— skipped —

Visual phase doesn't run at all under this profile.

`standard_audio_visual`

0.5 fps

One OCR frame every 2 seconds. Catches overlay text changes; misses sub-2s flashes (rare in talking-head content).

`premium_multimodal`

1 fps

Dense OCR for high-stakes content where a flashed lower-third or single-frame graphic might matter.

Effective wall savings

~30 s

On the standard 7:18 measurement video. Currently bounded by VLM (65 s) — surfaces as direct savings once VLM gets faster too.

// workers/kinds/analyze_visual.ts

const OCR_FPS_BY_PROFILE: Record<string, number> = {
  standard_audio_visual: 0.5,
  premium_multimodal: 1,
};

const ocrFps = OCR_FPS_BY_PROFILE[profile] ?? 0.5;
const ocrFrames = await sampleFrames({
  inputPath: videoPath,
  outputDir: ocrFramesDir,
  fps: ocrFps,
});

End-to-end impact

Where we are now.

Round	What landed	Pipeline wall	Δ from previous
baseline	fps=1 OCR + VLM, sequential	~25–30 min	—
round 1	A + B + C (scene-cut VLM, 768 px downscale, OCR ∥ VLM)	~3.5–4 min	~8×
round 2 (today)	D + E (worker concurrency, OCR 0.5 fps)	~2–2.5 min	+30% on top

Throughput, drawn to scale

Bars are pipeline wall time on a typical 8-minute video, log-spaced so the 25-min baseline and the 2-min current state both stay legible. Lower is better.

Round 1 did the heavy lifting (~8× from the visual rewrite); round 2's concurrency + OCR change shaves a further ~30% off an already-fast pipeline.

Cumulative From ~25–30 min down to ~2–2.5 min — roughly 10–12× faster end-to-end. The pipeline is now in "by the time you switch tabs, your video is ready" territory.

Three slots, one queue, no locks.

generate_asset is LLM-bound, not CPU-bound.

SKIP LOCKED is the only mutex.

The architectural picture

Why SKIP LOCKED is enough

The timeline difference, visualised

The code change

Every handler audited for idempotency.

How to raise or lower the slot count.

Default

Bump it (more parallelism)

Drop it (hitting provider rate limits)

OCR fps is now profile-aware.

fast_audio_only

standard_audio_visual

premium_multimodal

Effective wall savings

Where we are now.

Throughput, drawn to scale

`fast_audio_only`

`standard_audio_visual`

`premium_multimodal`