# Record once. Your agent cuts it.

**roll captures. Crunch reads. EdAtor edits.** The whole video pipeline, on a box you own.

*it works on my agent's machine.*

---

## What

One take in. A finished cut — and a week of shorts — out.

You hit record and talk. Three tools you own do the rest. **roll** captures screen, camera
and mic on a single clock, plus every click, keystroke and on-screen element. **Crunch**
reads all of it — OCR and transcription — and turns the take into something searchable.
**EdAtor** makes the editorial calls a human editor would, writes them as an edit list,
dresses the cut in the Signal overlay kit, and exports. No timeline. No scrubbing.

## Why

The edit is a decision — so make the footage RAG-able.

`click × on-screen text × transcript` = labelled action events. Once the take is
searchable, an LLM can make real editorial decisions from it — what to cut, where to zoom,
when to bleep — instead of guessing from raw pixels. Crunch does the heavy read *once*, so
every downstream cycle is cheap. And it all runs on a box you own: near-zero marginal cost,
nothing leaving your estate. The scary bit isn't the AI — it's the plumbing. The plumbing's
done.

## How

### 01 · roll — capture

A native macOS recorder. Screen, camera and mic, sub-frame synced on one shared clock — no
drift to chase, nothing to re-align by hand. But roll captures more than pixels: every
click, drag, keystroke and scroll, plus the Accessibility role and label of whatever you
touched — a full input-and-semantic telemetry track alongside the video.

- screen + camera + mic on one `t0`, sub-frame synced
- telemetry: clicks, keys, cursor, focus, AX labels
- keyframes snapshotted on click for clean OCR
- deterministic "pack" out: `screen.mp4 · camera.mp4 · mic.m4a · metadata.jsonl · manifest.json`

That pack is the contract with everything downstream. roll's job is capture + inspect —
not edit. → https://github.com/StuMason/roll

### 02 · Crunch — read

A self-hosted, OpenAI-compatible inference API for the boring, brilliant models under most
apps — OCR, speech-to-text, embeddings, reranking, captioning — on a single box you own.
Point roll's pack at `POST /pack` and Crunch reads the take: line-level OCR of the screen, a
word-timed transcript of the mic, every click joined to what was under the cursor and what
was being said. The encoder models stay warm in memory, so calls come back in tens of
milliseconds.

- `POST /pack` in, one `crunch.json` out — async: submit, poll the job
- OCR · transcribe · embed · rerank · caption, all behind one key and base URL
- Whisper-turbo (ASR) + Florence-2 (caption/detect) sidecars; Tesseract for dense UI OCR
- pure PHP/ONNX core, models warm in memory — no GPU, no per-call meter, near-zero marginal cost

The output is one lean `crunch.json` — the join of what was on screen, what was said and what
was done, all on roll's shared clock. Not a video dump: a queryable index with scored edit
moments and a beat-by-beat outline. That file is the contract EdAtor reads — the authoring
index it works from instead of watching the footage. →
https://github.com/StuMason/crunch · https://crunch.stumason.dev

### 03 · EdAtor — cut

Reads the crunched take and makes the calls a human editor makes — which take to keep, where
to kill the dead air, when to punch in on the face, which word needs a bleep — and writes
them down as a JSON edit pack. That pack is the one hard contract: a plain edit-decision
list, not a render. A deterministic FFmpeg pipeline executes it to the frame — same pack,
same cut, every time.

- JSON edit pack (EDL) — the one hard contract; AI does the judgement, FFmpeg does the work
- cuts, roll-switch, face-zoom, PiP and native bleeps — rendered to the frame
- dresses the clean cut in the private "Signal" overlay kit; masters to a safe loudness
- reframes the *same source rolls* into face-tracked 9:16 shorts — not a re-crop of the cut

One take in; a finished, on-brand cut and a week of verticals out. No timeline, no scrubbing
— the whole edit is a file you can read. → https://github.com/StuMason/edator

---

## Copy it

Every piece runs on hardware I own — and the pieces are yours to copy. Same pattern, your box.

- roll → https://github.com/StuMason/roll
- crunch → https://github.com/StuMason/crunch
- stu mason → https://ai.stumason.dev
