Hypd AI automation agency logo

Hypd

AI Tools / Published June 15, 2026

Give Claude Code Eyes: The /watch Skill That Lets Your Agent Watch Any Video

A Claude Code skill that downloads any video, extracts frames, transcribes audio, and hands everything to Claude — so it can answer questions about what it actually saw and heard, not what the title says.

Last updated: June 15, 2026

claude codeskillsvideoautomationai toolsplugins
Bottom line

A Claude Code skill that downloads any video, extracts frames, transcribes audio, and hands everything to Claude — so it can answer questions about what it actually saw and heard, not what the title says.

This guide is reviewed for clarity, service accuracy, and AI-search readability. The next quarterly content review is tracked internally before unsupported metrics or client proof are added.

The gap Claude Code couldn't close

Claude Code can read a webpage, run a script, browse a repo, analyze an image. What it can't do natively is watch a video. Paste a YouTube link and it has to guess from the title, pull a transcript that misses half of what's on screen, or admit it can't help.

The

prompt
/watch
skill closes that gap. Paste a URL or a local file path, ask a question, and Claude downloads the video, extracts frames at an auto-scaled rate, pulls a timestamped transcript, and reads every frame as an image. When it answers, it has seen the video and heard the audio.

Installing the skill

bash
# Claude Code (CLI)
/plugin marketplace add bradautomates/claude-video
/plugin install watch@claude-video

# Codex / generic skills runtime
git clone https://github.com/bradautomates/claude-video.git ~/.codex/skills/watch

For claude.ai (web): download

prompt
watch.skill
from the GitHub releases page and drop it into Settings → Capabilities → Skills.

How to use it

  • Analyze a video:
    prompt
    /watch https://youtu.be/dQw4w9WgXcQ what happens at the 30 second mark?
  • Reverse-engineer content:
    prompt
    /watch https://youtu.be/abc123 what hook did they open with?
  • Debug from a recording:
    prompt
    /watch bug-repro.mov what's going wrong?
  • Summarize:
    prompt
    /watch https://youtu.be/xyz summarize this
  • Focused window:
    prompt
    /watch https://youtu.be/abc --start 0:45 --end 1:15 what changed on screen?

What the skill actually does

When you call

prompt
/watch
, Claude executes a five-step pipeline:

  1. prompt
    yt-dlp
    downloads the video into a temp directory (local files are probed in place)
  2. prompt
    ffmpeg
    extracts frames at a duration-aware rate
  3. Transcript comes from native captions via
    prompt
    yt-dlp
    (free), with Whisper API as fallback
  4. Frames + transcript are handed to Claude, which reads each frame as an image in parallel
  5. Claude answers grounded in what's on screen and in the audio

Zero config on first run —

prompt
yt-dlp
and
prompt
ffmpeg
install automatically via
prompt
brew
on macOS. Whisper API key is only needed if a video has no captions.

The frame budget — why it matters for token cost

Every frame is an image token. The skill's auto-fps logic exists to avoid blowing your context budget on a long video:

  • ≤30 seconds: ~30 frames — dense, essentially every key moment
  • 30s–1 min: ~40 frames — still dense
  • 1–3 min: ~60 frames — comfortable
  • 3–10 min: ~80 frames — sparse but workable
  • >10 min: 100 frames — use
    prompt
    --start
    /
    prompt
    --end
    for focused questions

What people actually use it for

  • Content analysis — break down a competitor's hook structure, pacing, and CTA from the actual video, not the title
  • Bug reproduction — watch a screen recording of something broken and identify the frame where the issue appears
  • Video summarization — structure and key moments faster than 2x speed
  • Ad creative analysis — paste a TikTok or Reel URL and ask what made the first 3 seconds work

Frequently asked questions

What video sources does the skill support? Anything yt-dlp supports — YouTube, Loom, TikTok, X, Instagram, Vimeo, and hundreds of other platforms. Local

prompt
.mp4
,
prompt
.mov
,
prompt
.mkv
, and
prompt
.webm
files also work.

Do I need a Whisper API key? No, not for most videos. The skill first tries native captions via yt-dlp — free and instant. Whisper only kicks in for videos with no captions.

How do I get more detail on a specific moment in a long video? Use

prompt
--start
and
prompt
--end
flags:
prompt
/watch https://youtu.be/abc --start 2:30 --end 3:00 what's happening here?
Focused mode gets a denser frame budget for the specified range.