benchflow-ai / whisper-transcription

Install for your project team

Run this command in your project directory to install the skill for your entire team:

mkdir -p .claude/skills/whisper-transcription && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/3254" && unzip -o skill.zip -d .claude/skills/whisper-transcription && rm skill.zip

New-Item -Path ".claude/skills/whisper-transcription" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/3254" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".claude/skills/whisper-transcription" -Force; Remove-Item "skill.zip"

Project Skills

This skill will be saved in .claude/skills/whisper-transcription/ and checked into git. All team members will have access to it automatically.

Important: Please verify the skill by reviewing its instructions before using it.

Install skill for Codex

Run one of these commands to install the skill depending on your needs:

Project Local ($CWD/.codex/skills)

mkdir -p .codex/skills/whisper-transcription && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/3254" && unzip -o skill.zip -d .codex/skills/whisper-transcription && rm skill.zip

New-Item -Path ".codex/skills/whisper-transcription" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/3254" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".codex/skills/whisper-transcription" -Force; Remove-Item "skill.zip"

User Global (~/.codex/skills)

mkdir -p ~/.codex/skills/whisper-transcription && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/3254" && unzip -o skill.zip -d ~/.codex/skills/whisper-transcription && rm skill.zip

New-Item -Path "$HOME/.codex/skills/whisper-transcription" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/3254" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.codex/skills/whisper-transcription" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
REPO	`$CWD/.codex/skills`	Project directory. Teams can check in skills most relevant to a working folder here.
REPO	`$CWD/../.codex/skills`	A folder above CWD. Organizations can check in skills relevant to a shared area.
REPO	`$REPO_ROOT/.codex/skills`	Top-most root folder. Relevant to everyone using the repository.
USER	`$CODEX_HOME/skills`	Personal folder (`~/.codex/skills`). Curate skills that apply to any repository.

Install skill for GitHub Copilot

Run one of these commands to install the skill depending on your needs:

Project (.github/skills)

mkdir -p .github/skills/whisper-transcription && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/3254" && unzip -o skill.zip -d .github/skills/whisper-transcription && rm skill.zip

New-Item -Path ".github/skills/whisper-transcription" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/3254" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".github/skills/whisper-transcription" -Force; Remove-Item "skill.zip"

Personal (~/.copilot/skills)

mkdir -p ~/.copilot/skills/whisper-transcription && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/3254" && unzip -o skill.zip -d ~/.copilot/skills/whisper-transcription && rm skill.zip

New-Item -Path "$HOME/.copilot/skills/whisper-transcription" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/3254" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.copilot/skills/whisper-transcription" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
Project	`.github/skills/`	Repository-specific skills. Checked into git for the whole team.
Personal	`~/.copilot/skills/`	Personal skills available across all your projects.

Install skill for Google Antigravity

Run one of these commands to install the skill depending on your needs:

Workspace (.agent/skills)

mkdir -p .agent/skills/whisper-transcription && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/3254" && unzip -o skill.zip -d .agent/skills/whisper-transcription && rm skill.zip

New-Item -Path ".agent/skills/whisper-transcription" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/3254" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".agent/skills/whisper-transcription" -Force; Remove-Item "skill.zip"

Global (~/.gemini/antigravity/skills)

mkdir -p ~/.gemini/antigravity/skills/whisper-transcription && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/3254" && unzip -o skill.zip -d ~/.gemini/antigravity/skills/whisper-transcription && rm skill.zip

New-Item -Path "$HOME/.gemini/antigravity/skills/whisper-transcription" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/3254" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.gemini/antigravity/skills/whisper-transcription" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
Workspace	`.agent/skills/`	Workspace-specific skills for project workflows and conventions.
Global	`~/.gemini/antigravity/skills/`	Personal skills available across all workspaces.

Transcribe audio/video to text with word-level timestamps using OpenAI Whisper. Use when you need speech-to-text with accurate timing information for each word.

Productivity Writing

0 views

0 installs

Source: https://github.com/benchflow-ai/skillsbench/tree/main/tasks-no-skills/video-filler-word-remover/environment/skills/whisper-transcription

Skill Content

---
name: whisper-transcription
description: "Transcribe audio/video to text with word-level timestamps using OpenAI Whisper. Use when you need speech-to-text with accurate timing information for each word."
---

# Whisper Transcription

OpenAI Whisper provides accurate speech-to-text with word-level timestamps.

## Installation

```bash
pip install openai-whisper
```

## Model Selection

**Use the `tiny` model for fast transcription** - it's sufficient for most tasks and runs much faster:

| Model | Size | Speed | Accuracy |
|-------|------|-------|----------|
| tiny | 39 MB | Fastest | Good for clear speech |
| base | 74 MB | Fast | Better accuracy |
| small | 244 MB | Medium | High accuracy |

**Recommendation: Start with `tiny` - it handles clear interview/podcast audio well.**

## Basic Usage with Word Timestamps

```python
import whisper
import json

def transcribe_with_timestamps(audio_path, output_path):
    """
    Transcribe audio and get word-level timestamps.

    Args:
        audio_path: Path to audio/video file
        output_path: Path to save JSON output
    """
    # Use tiny model for speed
    model = whisper.load_model("tiny")

    # Transcribe with word timestamps
    result = model.transcribe(
        audio_path,
        word_timestamps=True,
        language="en"  # Specify language for better accuracy
    )

    # Extract words with timestamps
    words = []
    for segment in result["segments"]:
        if "words" in segment:
            for word_info in segment["words"]:
                words.append({
                    "word": word_info["word"].strip(),
                    "start": word_info["start"],
                    "end": word_info["end"]
                })

    with open(output_path, "w") as f:
        json.dump(words, f, indent=2)

    return words
```

## Detecting Specific Words

```python
def find_words(transcription, target_words):
    """
    Find specific words in transcription with their timestamps.

    Args:
        transcription: List of word dicts with 'word', 'start', 'end'
        target_words: Set of words to find (lowercase)

    Returns:
        List of matches with word and timestamp
    """
    matches = []
    target_lower = {w.lower() for w in target_words}

    for item in transcription:
        word = item["word"].lower().strip()
        # Remove punctuation for matching
        clean_word = ''.join(c for c in word if c.isalnum())

        if clean_word in target_lower:
            matches.append({
                "word": clean_word,
                "timestamp": item["start"]
            })

    return matches
```

## Complete Example: Find Filler Words

```python
import whisper
import json

# Filler words to detect
FILLER_WORDS = {
    "um", "uh", "hum", "hmm", "mhm",
    "like", "so", "well", "yeah", "okay",
    "basically", "actually", "literally"
}

def detect_fillers(audio_path, output_path):
    # Load tiny model (fast!)
    model = whisper.load_model("tiny")

    # Transcribe
    result = model.transcribe(audio_path, word_timestamps=True, language="en")

    # Find fillers
    fillers = []
    for segment in result["segments"]:
        for word_info in segment.get("words", []):
            word = word_info["word"].lower().strip()
            clean = ''.join(c for c in word if c.isalnum())

            if clean in FILLER_WORDS:
                fillers.append({
                    "word": clean,
                    "timestamp": round(word_info["start"], 2)
                })

    with open(output_path, "w") as f:
        json.dump(fillers, f, indent=2)

    return fillers

# Usage
detect_fillers("/root/input.mp4", "/root/annotations.json")
```

## Audio Extraction (if needed)

Whisper can process video files directly, but for cleaner results:

```bash
# Extract audio as 16kHz mono WAV
ffmpeg -i input.mp4 -vn -acodec pcm_s16le -ar 16000 -ac 1 audio.wav
```

## Multi-Word Phrases

For detecting phrases like "you know" or "I mean":

```python
def find_phrases(transcription, phrases):
    """Find multi-word phrases in transcription."""
    matches = []
    words = [w["word"].lower().strip() for w in transcription]

    for phrase in phrases:
        phrase_words = phrase.lower().split()
        phrase_len = len(phrase_words)

        for i in range(len(words) - phrase_len + 1):
            if words[i:i+phrase_len] == phrase_words:
                matches.append({
                    "word": phrase,
                    "timestamp": transcription[i]["start"]
                })

    return matches
```

benchflow-ai / whisper-transcription

Install for your project team

Download skill

Enable skills in Claude

Upload to Claude

Install skill for Codex

Install skill for GitHub Copilot

Install skill for Google Antigravity

Skill Content