LLM routing

Which LLM for which task.

Every ChannelHelm LLM call — the background pipeline and the four synchronous Studio call sites — funnels through one resolver and routes by processing_profile. One decision per tier picks the model for everything below it. This is the field guide.

Three provider classes

Everything plugs in the same way.

Each row in the llm_providers table is one of these three types. The provider abstraction (a single chat() method) is identical across all of them — DojoClaw-style.

openai-compatible

The HTTP shape POST /v1/chat/completions spoken by OpenAI, OpenRouter, Ollama, LM Studio, OpenClaw, vLLM, and most local stacks. Set a base URL + (optional) key + model name — done.

  • Cloud: OpenAI · OpenRouter
  • Local: LM Studio · Ollama · vLLM · OpenClaw
  • Best for: the bulk of the pipeline — fast iteration, cheap local runs, easy model swaps.

anthropic

Native Claude Messages API. Used directly (not via the OpenAI shape) so you get full feature parity with Sonnet/Opus — long context, thinking budgets where applicable, top-tier instruction following.

  • Models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5
  • Best for: headline copy, article briefs, anything where prose quality matters most.
  • Premium tier: the obvious pick for premium_multimodal.

codex-cli

Spawns the local codex CLI as a subprocess (no HTTP, no API key). Useful as a free local fallback or when you want offline operation. Slower than HTTP — best for non-interactive jobs.

  • Model: whatever Codex CLI is configured for
  • Best for: background pipeline runs on a Mac with no cloud quota.
  • Avoid for: Studio Regenerate (the operator is waiting on it).
How the pick happens

One resolver, every call site.

Every LLM call — wherever it originates — goes through complete({ profile, … }) in workers/integrations/lm_studio.ts, which calls getProvider(profile) in workers/integrations/llm/get_provider.ts. The selection rule (#17, locked) is below — eligibility is strict, scoring is deterministic.

Where the calls come from

Six call sites, two execution contexts, one resolver. The background pipeline enqueues jobs; the Studio runs the LLM synchronously inside a Server Action (the documented carve-out in CLAUDE.md) so the operator isn't waiting on a worker to be alive.

LLM call sites converging on one resolver Background pipeline · worker analyze_intelligence scene log → topics · hooks · retention generate_asset × 10 titles · desc · tags · social · clip plans thumbnail_concepts concept + image-gen prompt Studio · synchronous Server Action regenerateAsset() generateSection() generateClipDescription() the one resolver complete({ profile }) → getProvider(profile) → selectProvider(rows, …) → provider.chat() + provenance openai-compatible anthropic codex-cli
Six call sites · one complete() · one getProvider() · three provider classes. The Studio's three sync Server Actions are the documented "LLM in a Server Action" carve-out — bounded, text-only, operator-waiting.
Provider resolution decision flow entry getProvider(profile) loads enabled rows, ordered rows exist? no yes selectProvider(rows, purpose) — score, then pick the highest ① exact purpose match r.purpose === profile · score 3 3 ② purpose = "all" a universal provider for every profile · score 2 2 ③ is_default flag last-resort cross-profile fallback · score 1 1 ✗ unrelated profile — never eligible premium_multimodal won't serve standard_audio_visual tiebreak — is_default DESC, then id ASC deterministic; no flaky test runs any eligible? fallback envDefaultConfig OPENCLAW / LM_STUDIO_* env no resolved provider.chat(messages) + provenance{ provider, type, model, host, prompt_version } yes first run · table empty → seedDefaultProviderIfEmpty() inserts an LM Studio row + uses envDefault
selectProvider is a pure, exported, unit-tested function. The env fallback (and one-time seed) is what made the "zero-config first run" work before you added any providers.
Practical implication: if you want a single model to handle everything, set its purpose to all. If you want a different model per tier, create three rows with purposes fast_audio_only / standard_audio_visual / premium_multimodal. A row tagged with one specific profile will only serve that profile — it won't quietly bleed into others.
Two categories, one table

Text providers for chat. Image providers for thumbnails.

The llm_providers table carries a category column: text (chat/LLM — everything above, the default) and image (text-to-image, for AI thumbnail generation). Same table, same /providers editor (a Text/Image toggle), encrypted keys at rest — but two disjoint resolvers that never cross. An image provider is never picked for chat; an LLM is never picked to paint a thumbnail.

The category fork — one table, two resolvers, no crossing one source llm_providers rows split by category column category='text' category='image' LLM resolver getProvider(profile) selectProvider · text rows only analyze_intelligence · generate_asset Studio Regenerate · Generate · clip desc image resolver getImageProvider(profile) selectProvider · image rows only thumbnail_concepts worker Runware → AI thumbnail images null → frame extraction fallback when no image provider they never cross
Two resolvers over the same table, filtered by category. getImageProvider reuses the exact same selectProvider() scoring as the LLM path — only the candidate rows differ. is_default is scoped per category.

How image resolution works

getImageProvider(purpose) loads only category='image' rows, then runs the same selectProvider() used for LLMs: exact purpose match (3) → all (2) → is_default (1), tiebreak is_default DESC then id ASC. So a per-profile image provider works identically to a per-profile LLM. Returns null when no image row exists.

  • Resolver: workers/integrations/image/get_image_provider.ts
  • Scoring: the shared selectProvider() from the LLM path
  • Never crosses: getProvider filters category='text'; getImageProvider filters category='image'

runware

The first (and today only) image provider type. Text-to-image over Runware's HTTP REST API (fetch, no SDK), serving Flux / Z-Image models like runware:z-image@turbo. Add it at /providers with the Image-category Runware preset.

  • Base URL: https://api.runware.ai/v1
  • Model: runware:z-image@turbo (or runware:100@1, …)
  • Call site: the thumbnail_concepts worker — the only image-provider consumer
Optional by design: with no image provider configured, thumbnail_concepts falls back to frame extraction — stills pulled from the video at the best hook timestamps. Configure Runware only when you want AI-painted thumbnails. The LLM still writes the image-gen prompt either way (that's a category='text' call); the image provider only renders it.
Task × provider matrix

What each task actually needs.

All LLM calls go through complete() from one of a handful of entry points: analyze_intelligence (the pipeline brief), generate.ts::generateAssetContent (every asset + the Studio Regenerate / per-section Generate buttons), and generateClipDescription (the Shorts editor's per-clip description generator). The right column is what works well enough; the stars are what's worth paying for.

Task Where it runs Profile in OpenAI-compatible (local) OpenAI-compatible (cloud) Anthropic Codex CLI
analyze_intelligencetopics · hooks · retention from scene log worker package.profile ★★ Qwen3-32B / Llama-3.3-70B ★★ gpt-4o-mini · Sonnet-4.6 via OR ★★★ Claude Sonnet 4.6 ◯ slow but works
generate_asset: youtube_title_set5 hook variants, scored 0–100 worker + Studio package.profile ○ Qwen3-32B usable ★★★ gpt-4o · Sonnet ★★★ Claude Sonnet/Opus — too slow for taste-driven copy
generate_asset: youtube_descriptionhook → body → chapters → CTA → hashtags worker + Studio package.profile ★★ Qwen3-32B ★★ gpt-4o-mini ★★★ Sonnet (best chapter timing) ◯ ok for batch backfill
generate_asset: youtube_tagsscored pills worker + Studio package.profile ★★ any local 7B+ works ★★ gpt-4o-mini ★★ Haiku 4.5 (fast + cheap) ◯ fine
generate_asset: linkedin_post · x_threadplatform-shaped copy worker package.profile ○ local can sound generic ★★ gpt-4o ★★★ Sonnet 4.6 — skip
generate_asset: article_brief→ DojoClaw worker package.profile ○ depends on local model ★★ gpt-4o · OR/Sonnet ★★★ Opus 4.7 if budget allows ◯ workable
generate_asset: short_clip_plan / long_clip_planstructured JSON — captions + cuts worker package.profile ★★ Qwen3-32B (great at JSON) ★★ gpt-4o-mini ★★ Haiku 4.5 (fast) ◯ ok
thumbnail_conceptsconcept + image-gen prompt (text) · render via image provider → worker package.profile ★★ Qwen3-32B ★★ gpt-4o-mini ★★★ Sonnet (best at visual ideation)
Studio · Regeneratesingle asset, ~10–30s, operator waiting server action package.profile ○ only if local model is loaded + warm ★★★ gpt-4o (low latency) ★★★ Haiku 4.5 or Sonnet 4.6 — too slow, kills the UX
Studio · Generate (per-section)title / description / tags from transcript server action package.profile ○ same as Regenerate ★★★ gpt-4o-mini · Haiku ★★★ Haiku 4.5 (best latency/price) — skip
Shorts · generateClipDescriptionper-clip post body, ≤280 chars, auto-fired server action package.profile ★★ short + bounded — local is fine ★★★ gpt-4o-mini (cheap, snappy) ★★★ Haiku 4.5 (best latency/price) — too slow for an editor auto-fire
★★★ = the obvious pick · ★★ = solid · ○ = workable but not first choice · — = don't.
Pick one model per profile

Three tiers, three opinions.

Profiles are how you trade quality for cost and latency. Set the profile on the brand (default) or per package — every LLM call in that package then follows it.

Per-profile routing — packages to provider rows package.processing_profile matched llm_providers row (by purpose) provider class fast_audio_only audio-only · batch · cheap purpose = "fast_audio_only" Haiku 4.5 · gpt-4o-mini · Qwen3-8B/14B standard_audio_visual default · full pipeline purpose = "standard_audio_visual" Sonnet 4.6 · gpt-4o · Qwen3-32B @16k premium_multimodal marquee · max reasoning purpose = "premium_multimodal" Opus 4.7 · gpt-4o long-ctx · Qwen3-32B purpose = "all" · is_default serves ANY profile with no exact match openai-compatible anthropic codex-cli
Solid arrows: exact purpose match (score 3). Dashed: fallthrough to a purpose="all" / is_default row when no exact row exists. A row tagged with one profile never bleeds into another.
transcription_only

Cheapest tier — Backlog Revival re-mining

Audio-only, no visual phase at all. Built for re-mining old material under current prompts (Backlog Revival). Same shape as fast_audio_only today but kept distinct so the two can diverge. Pick the cheapest fast model.

Anthropic · Haiku 4.5 OpenAI · gpt-4o-mini LM Studio · Qwen3-8B/14B
fast_audio_only

Audio-only, batch volume, cost-sensitive

Podcasts, webinars, daily uploads where you just need decent titles + description + tags. No visual pipeline. Aim for cheap + fast over polish.

Anthropic · Haiku 4.5 OpenAI · gpt-4o-mini LM Studio · Qwen3-8B/14B
standard_audio_visual

The default for YouTube long-form

Full pipeline — audio + visual + fusion + intelligence + clip plans + thumbnails. You want this to feel like a thoughtful human draft. Latency: minutes per package is fine, seconds for Studio actions matters.

Anthropic · Sonnet 4.6 (recommended) OpenAI · gpt-4o LM Studio · Qwen3-32B at 16k+ ctx
premium_multimodal

Marquee episodes, deep multi-platform plans

Big interviews, paid sponsorships, the videos you want featured in every channel. Maximum reasoning + nuance per call; budget is not the constraint.

Anthropic · Opus 4.7 (recommended) OpenAI · gpt-4o (long ctx) LM Studio · Qwen3-32B (if offline-required)
all

One model for everything (simplest setup)

Skip the tiering. A single purpose="all" provider handles every pipeline run + every Studio action. Best while you're still finding what you like.

Anthropic · Sonnet 4.6 (best all-rounder) OpenAI · gpt-4o
Mixed setup pattern: a lot of operators run local for the pipeline, cloud for the Studio. Tag the local LM Studio row with standard_audio_visual (cheap batch drafts), and add a Haiku/Sonnet row tagged all — the Studio's profile="all" calls then go to Anthropic for snappy responses, while bulk pipeline jobs stay local.
Quick presets

Three setups, ranked by setup time.

All three are valid. Pick the one that matches what you've already paid for and how much fiddling you want to do.

① All Anthropic

~2 minutes

One row, purpose all, Sonnet 4.6, API key. Done. Pipeline + Studio all go to Claude. The best default if you don't want to think about it.

② Tiered Anthropic

~5 minutes

Three rows: Haiku → fast_audio_only, Sonnet → standard_audio_visual, Opus → premium_multimodal. Quality scales with package profile; cost stays in check.

③ Local + cloud

~15 minutes

LM Studio (Qwen3-32B at 16k ctx) for the pipeline, Anthropic Haiku for Studio actions. Cheapest steady state on a Mac Studio; needs LM Studio loaded with enough context.

Configure it.

The Providers settings page lets you add, test, and tag rows per purpose. API keys are encrypted at rest and never sent back to the browser.

⚙ Open /providers