STU MASON
REC00:00:00:00
CAPTURECRUNCH · CUT

Record once.
Your agent cuts it.

roll captures the take. Crunch makes it searchable. EdAtor makes the cut — EDL, overlays, export. AI does the judgement, FFmpeg does the work.

rollcrunchedatorfinished video

it works on my agent's machine.

WHAT

One take in. A finished cut — and a week of shorts — out.

You hit record and talk. Three tools you own do the rest. roll captures screen, camera and mic on a single clock, plus every click, keystroke and on-screen element. Crunch reads all of it — OCR and transcription — and turns the take into something searchable. EdAtor makes the editorial calls a human editor would, writes them as an edit list, dresses the cut in the Signal overlay kit, and exports. No timeline. No scrubbing. No “I’ll fix it in post”.

WHY

The edit is a decision. So make the footage RAG-able.

click × on-screen text × transcript = labelled action events. Once the take is searchable, an LLM can make real editorial decisions from it — what to cut, where to zoom, when to bleep — instead of guessing from raw pixels. Crunch does the heavy read once, so every downstream cycle is cheap.

And it all runs on a box you own: near-zero marginal cost, nothing leaving your estate. The scary bit isn’t the AI — it’s the plumbing. The plumbing’s done.

HOW

Three tools. One clock. Your stack.

01 · ROLLCAPTURE

Capture the whole truth, on one clock.

A native macOS recorder. Screen, camera and mic — sub-frame synced on one shared clock, so there’s no drift to chase and nothing to re-align by hand. But roll captures more than pixels: every click, drag, keystroke and scroll, plus the Accessibility role and label of whatever you touched — a full input-and-semantic telemetry track running alongside the video.

screen + camera + mic · one t0 clicks · keys · cursor · focus · AX labels keyframes snapshotted on click for clean OCR deterministic “pack” out

The output is a self-describing packscreen.mp4 camera.mp4 mic.m4a metadata.jsonl manifest.json, all on one host clock. That pack is the contract with everything downstream. roll’s job is capture + inspect — not edit.

02 · CRUNCHREAD

Reads the take. Makes it searchable.

Crunch is a self-hosted, OpenAI-compatible inference API for the boring, brilliant models under most apps — OCR, speech-to-text, embeddings, reranking, captioning. Point roll’s pack at POST /pack and it reads the take: line-level OCR of the screen, a word-timed transcript of the mic, every click joined to what was under the cursor and what was being said. The encoder models stay warm in memory, so calls come back in tens of milliseconds.

POST /packcrunch.json OCR · transcribe · embed · rerank · caption Whisper-turbo + Florence-2 sidecars · Tesseract OCR models warm in memory · near-zero marginal cost

The output is one lean crunch.json — the join of what was on screen, what was said and what was done, all on roll’s shared clock. Not a video dump: a queryable index with scored edit moments and a beat-by-beat outline. No GPU, no per-call meter, no data leaving the box. That file is the contract edator reads — the authoring index it works from instead of watching the footage.

03 · EDATORCUT

It cuts. And it has opinions.

EdAtor reads the crunched take and makes the calls a human editor makes — which take to keep, where to kill the dead air, when to punch in on the face, which word to bleep — and writes them down as a JSON edit pack: a plain edit-decision list, not a render. A deterministic FFmpeg pipeline executes it to the frame — same pack, same cut, every time. AI does the judgement, FFmpeg does the work.

JSON edit pack — the one hard contract cuts · roll-switch · face-zoom · native bleeps Signal kit: teach panels · callouts · registration frame face-tracked 9:16 shorts · loudness-safe master likeness-locked AI thumbnail

But it doesn't stop at the cut. EdAtor is a deadpan co-star with opinions — it corrects you when you fumble a term, calls out the beat worth calling out, turns a rambling point into a teach panel, and plays your own outtakes back at you. Then it dresses the clean cut in the private Signal kit, masters to a safe loudness, reframes the same source rolls into face-tracked 9:16 shorts, and generates the thumbnail — your likeness, locked. One take in; a finished cut, a week of verticals, and the thumbnail out.

COPY IT

Own the stack. Take the code.

Every piece runs on hardware I own — and the pieces are yours to copy. Same pattern, your box.

stu mason.

Folkestone · it works on my agent’s machine · agents → /llms.txt