The One-Command Video Editing Pipeline That Replaces CapCut and Descript
Record once, hand the file to Claude, run one command. This three-tool stack — HyperFrames, video-use, and Whisper — handles trimming, captions, and motion graphics automatically.
Last updated: June 12, 2026
Record once, hand the file to Claude, run one command. This three-tool stack — HyperFrames, video-use, and Whisper — handles trimming, captions, and motion graphics automatically.
This guide is reviewed for clarity, service accuracy, and AI-search readability. The next quarterly content review is tracked internally before unsupported metrics or client proof are added.
The One-Command Video Editing Pipeline That Replaces CapCut and Descript
What this replaces
The average short-form video takes 60–90 minutes to edit manually — trimming dead air, adding captions, and building motion graphics. At daily posting volume, that's 10 hours a week doing the same work over and over.
This pipeline cuts that to one command after you record. You hand a raw video file to Claude, run the command, and get a finished short with dead air removed, captions synced to your words, and motion graphics applied — no timeline editing required.
What gets replaced: CapCut, Descript, Premiere, and any per-video editor fees. All three tools in the stack are open source.
The three-tool stack
- HyperFrames — the motion graphics layer. An open-source Apache 2.0 framework that turns HTML, CSS, and GSAP animations into deterministic video output via headless Chrome and FFmpeg.
- video-use — the trimming layer. Auto-removes filler words, awkward pauses, and retakes from raw footage. Uses ElevenLabs Scribe transcription for accurate cut detection.
- Whisper — the caption and sync layer. Generates word-level timestamps so every caption, overlay, and motion graphic lands exactly when you say the word it corresponds to. Local Whisper runs free.
The five-step workflow
Only the first step requires you to do anything. The rest runs automatically.
- Step 1 — Record: phone selfie camera, talking head. Don't worry about filler words or pauses — the system strips them. One continuous take, 60–90 seconds of raw footage typically becomes a 30–45 second finished short.
- Step 2 — Trim with video-use: drop the raw recording into your video-use directory, run one command. ElevenLabs Scribe transcribes, identifies filler words and dead air, and outputs a clean cut.
- Step 3 — Transcribe with Whisper: generates word-level timestamps from the trimmed audio so every caption and graphic is frame-perfect.
- Step 4 — Apply motion graphics with HyperFrames: Claude reads your render config and the Whisper transcript, then generates the finished video with captions, overlays, and transitions applied.
- Step 5 — Review and post: the output is a finished short ready to upload. No timeline, no export settings.
Install the full stack with Claude Code
Use this prompt inside Claude Code to install and configure all three tools in one session.
promptSet up my AI video editing pipeline with three tools: HyperFrames, video-use, and Whisper. 1. Install HyperFrames: npx skills add heygen-com/hyperframes Verify the /hyperframes skill is active. 2. Install video-use for dead-air removal. Check if it's already installed; if not, install it and configure it to use ElevenLabs Scribe for transcription. 3. Install Whisper for word-level caption timestamps. Use the local version (free). Confirm it can process a short audio file. 4. Create a run.sh script in my project folder that: - Takes a raw video file as input - Runs video-use to trim dead air - Runs Whisper on the trimmed audio - Passes the transcript to HyperFrames for caption and motion graphic rendering - Outputs the finished video to an /output folder 5. Test the full pipeline on a sample file and show me the output. Tell me if any dependencies (FFmpeg, Node.js, Python) need to be installed first.
Set it up in Codex
The same setup works in Codex. Use this prompt after installing the tools via npx:
promptI've installed HyperFrames (npx skills add heygen-com/hyperframes), video-use, and Whisper locally. Create a one-command video editing pipeline that: 1. Accepts a raw .mp4 or .mov file as input 2. Trims dead air using video-use 3. Generates word-level timestamps using Whisper 4. Renders captions and motion graphics using HyperFrames 5. Outputs a finished short to /output Wrap this in a single shell script called edit.sh. Confirm all three tools are connected and test with a sample file.
Frequently asked questions
- Is this free to use? HyperFrames is open source (Apache 2.0). Local Whisper is free. video-use uses ElevenLabs Scribe for transcription by default — check current ElevenLabs pricing for your volume. FFmpeg and headless Chrome are free.
- Does it work on Windows as well as Mac? The stack runs on any system with Node.js, Python, and FFmpeg installed. The shell script can be adapted for Windows using PowerShell or WSL.
- How is this different from the HyperFrames content creation guide? That guide covers using HyperFrames to generate new videos from scratch using prompts. This pipeline is for editing raw recordings — it automates the trimming, caption, and motion graphics workflow on footage you've already shot.
