basher83 / video-processor
Install for your project team
Run this command in your project directory to install the skill for your entire team:
mkdir -p .claude/skills/video-processor && curl -o .claude/skills/video-processor/SKILL.md https://fastmcp.me/Skills/DownloadRaw?id=266
Project Skills
This skill will be saved in .claude/skills/video-processor/ and checked into git. All team members will have access to it automatically.
Important: Please verify the skill by reviewing its instructions before using it.
Process video files with audio extraction, format conversion (mp4, webm), and Whisper
1 views
0 installs
Skill Content
--- name: Video Processor description: Process video files with audio extraction, format conversion (mp4, webm), and Whisper transcription. Use when user mentions video conversion, audio extraction, transcription, mp4, webm, ffmpeg, or whisper transcription. --- # Video Processor ## Instructions This skill provides video processing utilities including audio extraction, format conversion, and audio transcription using FFmpeg and OpenAI's Whisper model. ### Prerequisites **Required tools** (must be installed in your environment): - **FFmpeg**: Multimedia framework for video/audio processing ```bash # macOS brew install ffmpeg # Ubuntu/Debian apt-get install ffmpeg # Verify installation ffmpeg -version ``` - **OpenAI Whisper**: Speech-to-text transcription model ```bash # Install via pip pip install -U openai-whisper # Verify installation whisper --help ``` **Python packages** (included in script via PEP 723): - click (CLI framework) - ffmpeg-python (Python wrapper for FFmpeg) ### Workflow Use the `scripts/video_processor.py` script for all video processing tasks. The script provides a simple CLI with the following commands: #### 1. **Extract Audio from Video** Extract the audio track from a video file: ```bash uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio input.mp4 output.wav ``` Options: - `--format`: Output audio format (default: wav). Supports: wav, mp3, aac, flac - Output is suitable for transcription or standalone audio use #### 2. **Convert Video to MP4** Convert any video file to MP4 format: ```bash uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 input.avi output.mp4 ``` Options: - `--codec`: Video codec (default: libx264). Common options: libx264, libx265, h264 - `--preset`: Encoding speed/quality preset (default: medium). Options: ultrafast, fast, medium, slow, veryslow #### 3. **Convert Video to WebM** Convert any video file to WebM format (web-optimized): ```bash uv run .claude/skills/video-processor/scripts/video_processor.py to-webm input.mp4 output.webm ``` Options: - `--codec`: Video codec (default: libvpx-vp9). Options: libvpx, libvpx-vp9 - WebM is optimized for web playback and streaming #### 4. **Transcribe Audio with Whisper** Transcribe audio or video files to text using OpenAI's Whisper model: ```bash # Transcribe video file (audio will be extracted automatically) uv run .claude/skills/video-processor/scripts/video_processor.py transcribe input.mp4 transcript.txt # Transcribe audio file directly uv run .claude/skills/video-processor/scripts/video_processor.py transcribe audio.wav transcript.txt ``` Options: - `--model`: Whisper model size (default: base). Options: - `tiny`: Fastest, lowest accuracy (~1GB RAM) - `base`: Fast, good accuracy (~1GB RAM) **[DEFAULT]** - `small`: Balanced (~2GB RAM) - `medium`: High accuracy (~5GB RAM) - `large`: Best accuracy, slowest (~10GB RAM) - `--language`: Language code (default: auto-detect). Examples: en, es, fr, de, zh - `--format`: Output format (default: txt). Options: txt, srt, vtt, json **Transcription workflow:** 1. If input is video, FFmpeg extracts audio to temporary WAV file 2. Whisper processes the audio file 3. Transcription is saved in requested format 4. Temporary files are cleaned up automatically #### 5. **Combined Workflow Example** Process a video end-to-end: ```bash # 1. Extract audio for analysis uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav # 2. Transcribe to SRT subtitles uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 lecture.srt --format srt --model small # 3. Convert to web format uv run .claude/skills/video-processor/scripts/video_processor.py to-webm lecture.mp4 lecture.webm ``` ### Key Technical Details **FFmpeg and Whisper Integration:** - FFmpeg doesn't transcribe audio itself - it prepares audio for external transcription - The workflow is: Extract audio (FFmpeg) → Transcribe (Whisper) → Optional: Re-integrate with video - FFmpeg can pipe audio directly to Whisper for real-time processing (advanced use case) **Audio Format for Transcription:** - Whisper works best with WAV or MP3 formats - Sample rate: 16kHz is optimal (script handles conversion automatically) - The script extracts audio with optimal settings for Whisper **Output Formats:** - **txt**: Plain text transcript - **srt**: SubRip subtitle format (includes timestamps) - **vtt**: WebVTT subtitle format (web standard) - **json**: Detailed JSON with word-level timestamps ### Error Handling The script includes comprehensive error handling: - Validates input files exist - Checks FFmpeg and Whisper are installed - Provides clear error messages for missing dependencies - Handles temporary file cleanup on errors ### Performance Tips - Use `tiny` or `base` models for quick drafts - Use `small` or `medium` for production transcriptions - Use `large` only when maximum accuracy is required - For long videos, consider extracting audio first, then transcribe in segments - WebM conversion with VP9 takes longer but produces smaller files ## Examples ### Example 1: Quick Video to MP4 Conversion User request: ```text I have an AVI file from my old camera. Can you convert it to MP4? ``` You would: 1. Use the to-mp4 command with default settings: ```bash uv run .claude/skills/video-processor/scripts/video_processor.py to-mp4 old_video.avi output.mp4 ``` 2. Confirm the conversion completed successfully 3. Inform the user about the output file location ### Example 2: Extract Audio and Transcribe User request: ```text I recorded a lecture video and need a transcript. Can you extract the audio and transcribe it? ``` You would: 1. First extract the audio: ```bash uv run .claude/skills/video-processor/scripts/video_processor.py extract-audio lecture.mp4 lecture.wav ``` 2. Then transcribe using the base model (good balance of speed/accuracy): ```bash uv run .claude/skills/video-processor/scripts/video_processor.py transcribe lecture.mp4 transcript.txt --model base ``` 3. Share the transcript.txt file with the user ### Example 3: Create Web-Optimized Video with Subtitles User request: ```text I need to put this video on my website with subtitles. Can you help? ``` You would: 1. Convert to WebM for web optimization: ```bash uv run .claude/skills/video-processor/scripts/video_processor.py to-webm presentation.mp4 presentation.webm ``` 2. Generate SRT subtitle file: ```bash uv run .claude/skills/video-processor/scripts/video_processor.py transcribe presentation.mp4 subtitles.srt --format srt --model small ``` 3. Inform user they now have: - presentation.webm (web-optimized video) - subtitles.srt (subtitle file for embedding) ### Example 4: High-Quality Transcription with Language Specification User request: ```text I have a Spanish interview video that needs an accurate transcript for publication. ``` You would: 1. Use a larger model with language specified for best accuracy: ```bash uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.txt --model medium --language es ``` 2. Optionally create SRT for review: ```bash uv run .claude/skills/video-processor/scripts/video_processor.py transcribe interview.mp4 transcript.srt --format srt --model medium --language es ``` 3. Review the transcript with the user and make any necessary corrections ### Example 5: Batch Processing Multiple Videos User request: ```text I have a folder of training videos that all need to be converted to WebM and transcribed. ``` You would: 1. List all video files in the directory: ```bash ls training_videos/*.mp4 ``` 2. For each video file, run the conversion and transcription: ```bash # For each video: video1.mp4, video2.mp4, etc. uv run .claude/skills/video-processor/scripts/video_processor.py to-webm training_videos/video1.mp4 output/video1.webm uv run .claude/skills/video-processor/scripts/video_processor.py transcribe training_videos/video1.mp4 output/video1.txt --model base # Repeat for each file ``` 3. Confirm all conversions and transcriptions completed 4. Provide summary of output files ## Summary The video-processor skill provides a unified interface for common video processing tasks: - **Audio extraction**: Extract audio tracks in various formats - **Format conversion**: Convert to MP4 (universal) or WebM (web-optimized) - **Transcription**: Speech-to-text with multiple output formats - **Flexible**: CLI arguments for model selection, language, and output formats All operations are handled through a single, well-documented script with sensible defaults and comprehensive error handling.