Detailed roadmap

What shipped, and what remains.

The roadmap overview gives you the one-line version. This page is the deep dive: for each shipped or remaining item, what it is, why it matters, and what it unlocks. ✓ SHIPPED marks what has landed.

Just shipped The vNext Development lane is complete. Read the vNext development summary for the full batch: Plan, Produce, Publish, Channel, auth, roles, GSC signals, and RTMP/CTV targets. And the v1.7 Verticals & Business Intelligence wave followed on June 10: Helm Ledger (costs + budgets), Helm Atlas (catalogue intelligence), auto-approve rules, bulk backlog ingest, and brand vertical presets — full guide in the Command Deck. v1.8 Quality, Reach & Decisions shipped June 11 — all ten candidates in one wave: prosody, glossary, claim guard, Atlas semantic search, quote cards, best-time, prompt A/B, Shorts Direct, Helm Briefing, client reports (summary).
v1.5 — Signal & Intelligence (shipped)

Close the Helm Signal loop.

Stop generating-and-forgetting: measure what performs and feed it back into generation. A/B routing was the first slice; these are the rest.

🖼️

Thumbnail feedback loop

✓ Shipped~ hours
What it is
Extends the A/B work to thumbnails. Today, when a title experiment decides, the winner is written to voice_examples so future titles drift toward what won. Thumbnails currently record nothing.
Why it matters
The thumbnail is half the click decision. Without this, thumbnail experiments tell you a winner but don't make the next video's thumbnails any better.
How it'd work
On a decided thumbnail (or title+thumbnail) experiment, capture the winning concept's visual_prompt + style traits and inject them as positive guidance into prompts/thumbnail_image.v1.md context for subsequent packages (a per-brand "what wins" exemplar set, mirroring voice examples).
Effort
Small — the experiment, decision, and feedback plumbing already exist; this adds a thumbnail-side writer + prompt wiring.
Unlocks
Thumbnails that compound the same way titles now do — the other half of the packaging loop.
📈

Sentiment-over-time curves

✓ Shipped~ low
What it is
An emotion curve across the video, derived from the fused scene log (the timestamped text + visual descriptions ChannelHelm already produces) — no new model inference.
Why it matters
The best Shorts come from emotional spikes, not arbitrary timestamps. An explicit curve makes "where's the energy" a first-class signal.
How it'd work
A light pass (lexicon or a single cheap LLM call) over each scene-log window scores valence/arousal; the curve is stored on the package. The clip planner prefers high-arousal windows; the Studio shows an emotion sparkline.
Effort
Low — reuses data already on disk; no new ML dependency.
Unlocks
Better-selected Shorts moments and an at-a-glance emotional map of every video.
🎯

Retention calibration model

✓ Shipped~ days
What it is
Replaces the current LLM-only retention guess with a small model calibrated against your channel's real retention curves.
Why it matters
Hook scoring and clip selection lean on predicted retention. Grounding those predictions in actual audience behavior makes every downstream choice sharper — and it improves as more videos accrue.
How it'd work
Use the YouTube Analytics scope (just added for A/B) to pull per-video retention + average-view-percentage into signals, accumulate, then fit a lightweight calibration that corrects the LLM's predicted-retention score toward measured truth.
Effort
Days — needs a data-accumulation window plus a modeling pass. Highest long-term payoff.
Unlocks
Retention scores you can trust; a flywheel that gets more accurate with every upload.
⚙️

Per-provider concurrency limits

✓ Shipped~ quick
What it is
A max_concurrent cap per row in llm_providers, enforced by the worker pool.
Why it matters
The queue runs N slots against whatever provider a job resolves to. A rate-limited provider (OpenAI/OpenRouter tiers) gets hammered and 429s when you raise WORKER_CONCURRENCY.
How it'd work
Add the column to the schema + /providers editor; the provider resolver holds a per-provider semaphore so in-flight requests never exceed the cap, independent of total worker slots.
Effort
Quick — one column, one semaphore.
Unlocks
Safely cranking worker concurrency without tripping provider limits.
v2 — Scale & Identity

Bigger structural moves.

For when single-operator throughput is no longer the constraint. Larger efforts, taken on when they unblock real volume.

📲

YouTube Direct for Shorts

v2~ medium
What it is
Upload Shorts per-clip via the YouTube Data API directly, instead of only routing them through Zernio.
Why it matters
Native uploads give finer control (privacy, scheduling, metadata) and remove a dependency for the YouTube destination specifically.
How it'd work
The dispatch worker fires two dispatches per rendered Short — YouTube Direct for the YouTube target, Zernio for TikTok/Instagram — reusing the per-clip publish options.
Effort
Medium — dual-dispatch logic + quota considerations.
Unlocks
First-class YouTube Shorts publishing under your own OAuth.
🎬

B-roll insertion

✓ Shippeddone
What it is
Honours the existing b_roll_enabled flag by compositing b-roll into rendered clips.
Why it matters
B-roll lifts retention on talking-head clips, and the existing UI flag now has a real rendered output path.
How it works
The word-snap/b-roll planner resolves clip-local cutaway segments and clip_render composites them through the ffmpeg filter graph.
Effort
Shipped in the vNext batch.
Unlocks
Visually richer clips without manual editing.
🗄️

Object storage (S3 / R2)

v2~ medium
What it is
An optional cloud object-storage backend for media, beyond the local NAS / archive export.
Why it matters
Local storage is plenty for v1 throughput; at higher volume an offload tier keeps the master lean.
How it'd work
A storage adapter behind the existing path helpers; the archive worker targets the bucket; mediaUrlFor resolves signed URLs.
Effort
Medium — adapter + path-resolution changes.
Unlocks
Effectively unbounded media retention without local disk pressure.
🗣️

Speaker ID by name

v2~ large
What it is
Replace generic speaker_01 labels with named identification via a per-brand face/voice index.
Why it matters
Named speakers make transcripts, chapters, and clip captions far more useful for multi-person content.
How it'd work
A per-brand enrollment index keyed off the existing diarization output; match voice/face embeddings to known identities.
Effort
Large — plus storage and privacy considerations.
Unlocks
Named diarization across the whole pipeline.
🔎

GSC article signals

✓ Shippeddone
What it is
Pulls Google Search Console position + page metrics for DojoClaw-published articles into the signals table.
Why it matters
Completes cross-surface performance data — the editorial half of the Helm Signal loop, alongside YouTube/social.
How it works
Brands connect GSC through local OAuth; collect_signal combines DojoClaw article data with clicks, impressions, CTR, and position.
Effort
Shipped in the vNext batch.
Unlocks
Article generation that learns from search performance.
👥

Local users and brand roles

✓ Shippeddone
What it is
Local dashboard users, signed sessions, and brand-scoped owner/editor/reviewer memberships.
Why it matters
Reviewers and collaborators can participate without weakening worker/API bearer-token control paths.
How it works
Dashboard pages use session auth; protected server actions assert brand roles before approving, dispatching, or changing member state.
Effort
Shipped in the vNext batch.
Unlocks
More than one person running the command center while staying local-first.
Ideas — unscheduled

Worth doing eventually.

A themed backlog of candidates. Each is tagged grounded (scaffolding already exists in the codebase) or bet (a new product direction). ✅ shipped marks items already built straight from this backlog (extended-network generation, long-clip planning, pinned comments, the unified performance dashboard, DojoClaw article signals, comment mining, brand-voice bootstrap, multi-language subtitles).

Reach multipliers — more output from one video

IdeaWhat it isTypeEffort
Generate for the 8 un-wired networks✅ Shipped. Per-network post generation for Facebook, Pinterest, Bluesky, Threads, Reddit, Telegram, Discord & Google Business — gated by the brand's connected Zernio accounts so nothing un-shippable gets drafted.✅ shipped
Long-clip planning✅ Shipped. long_clip_plan generates horizontal highlight segments; the renderer produces rendered_long_clip and dispatch routes it to YouTube via Zernio.✅ shipped
Multi-language subtitles✅ Shipped. Translate a Short's subtitles to other languages → per-language SRT + ASS sidecar files, reusing the transcript + ASS pipeline. TTS dubbing and a burned-in per-language re-render are deferred.✅ shipped
Quote cards / carouselsShipped (v1.8). See the v1.8 summary.✅ shipped
Per-platform Short captionsTailored caption + hashtags per destination. Deferred — captions belong to clips, so better built as a short_clip_plan enhancement than standalone asset types.betM

Deeper feedback loop — extend Helm Signal

IdeaWhat it isTypeEffort
Comment mining → content loop✅ Shipped. Post-publish, on-demand: mine a video's top YouTube comments → content_ideas + faq assets, from the Studio's "Mine comments" panel. (youtube_pinned_comment already generates from the analysis; this makes follow-up content audience-driven.)✅ shipped
Best-time-to-postLearn per-platform optimal windows from the signals already collected; pre-fill the scheduler.betS–M
Unified performance dashboard✅ Shipped. A new /performance route — one cross-surface view of how dispatched/published assets performed (signals + A/B results).✅ shipped
DojoClaw + GSC article signals✅ Shipped. collect_signal combines DojoClaw article analytics with Search Console clicks, impressions, CTR, and position when a GSC connection and published article URL are available.✅ shipped
Prompt-version A/BShipped (v1.8). Reported winner on /performance; never auto-pinned.✅ shipped

Quality & trust

IdeaWhat it isTypeEffort
Prosodic analysisShipped (v1.8). Pure-TS energy/emphasis pass at transcription time fills the scene log for real.✅ shipped
Audio-event detectionLaughter / music / applause (YAMNet on the Neural Engine) — good for podcasts and a cheap music-presence signal.groundedM
Brand glossaryShipped (v1.8). Canonical spellings in transcripts + prompts.✅ shipped
Fact-check / claim guardShipped (v1.8). Per-claim verdicts + Studio badge.✅ shipped
Music / copyright detectionFlag clips likely to carry copyrighted audio before non-YouTube syndication (worked through below).betM

Operator & business

IdeaWhat it isTypeEffort
Cost tracking & budgetsShipped (v1.7). Helm Ledger — see the Command Deck guide.✅ shipped
Brand-voice bootstrap✅ Shipped. /brands/[id]/voice seeds voice_examples from pasted samples or the brand's existing published assets, so voice is good from upload #1.✅ shipped
Bulk / batch ingestShipped (v1.7). /sources/new bulk panel — URLs or local folder, per-line outcomes.✅ shipped
Auto-approve rulesShipped (v1.7). Per-brand thresholds with a payload.auto_approved audit trail.✅ shipped

One idea, worked through in full — the depth each entry above reaches once it's picked up:

🎵

Music / copyright detection

Idea~ medium
What it is
Flag clips that likely contain copyrighted audio before they syndicate to TikTok / Instagram — and as early warning before a render + dispatch slot is spent (most useful on ingested third-party source like podcasts/webinars).
Why it's only an idea
It can only ever be a risk predictor, not a verdict. The authoritative judge is YouTube Content ID / each platform's fingerprinting, and none is queryable before publishing.
Accuracy ceiling
A fully-local build can detect music presence well (~90%) but not copyright status. Genuine identification needs an opt-in commercial fingerprint API (e.g. ACRCloud, Audible Magic) — accurate (~95%+ on catalog music) but an external-cloud dependency that breaks local-first, and still misses long-tail rights while over-flagging royalty-free.
Already covered
For the YouTube destination, YouTube's own pre-publish Checks run Content ID for free — so the value is narrow (non-YouTube syndication + early warning).
If built
Local music-presence flag by default (advisory "⚠ music at 0:12–0:34"), with an optional fingerprint provider for real identification. Always a risk score, never a green light.
How priorities are set ChannelHelm optimizes for one person turning more videos into more on-brand output with less manual work. Items move up when they unblock that; the feedback loop (v1.5) is weighted heavily because it compounds. Have a use case that isn't covered? It belongs on this list — say so.