benchflow-ai / filler-word-processing
Install for your project team
Run this command in your project directory to install the skill for your entire team:
mkdir -p .claude/skills/filler-word-processing && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/4409" && unzip -o skill.zip -d .claude/skills/filler-word-processing && rm skill.zip
Project Skills
This skill will be saved in .claude/skills/filler-word-processing/ and checked into git. All team members will have access to it automatically.
Important: Please verify the skill by reviewing its instructions before using it.
Process filler word annotations to generate video edit lists. Use when working with timestamp annotations for removing speech disfluencies (um, uh, like, you know) from audio/video content.
0 views
0 installs
Skill Content
---
name: filler-word-processing
description: "Process filler word annotations to generate video edit lists. Use when working with timestamp annotations for removing speech disfluencies (um, uh, like, you know) from audio/video content."
---
# Filler Word Processing
## Annotation Format
Typical annotation JSON structure:
```json
[
{"word": "um", "timestamp": 12.5},
{"word": "like", "timestamp": 25.3},
{"word": "you know", "timestamp": 45.8}
]
```
## Converting Annotations to Cut Segments
Each filler word annotation marks when the word starts. To remove it, use word-specific durations since different fillers have different lengths:
```python
import json
# Word-specific durations (in seconds)
WORD_DURATIONS = {
"uh": 0.3,
"um": 0.4,
"hum": 0.6,
"hmm": 0.6,
"mhm": 0.55,
"like": 0.3,
"yeah": 0.35,
"so": 0.25,
"well": 0.35,
"okay": 0.4,
"basically": 0.55,
"you know": 0.55,
"i mean": 0.5,
"kind of": 0.5,
"i guess": 0.5,
}
DEFAULT_DURATION = 0.4
def annotations_to_segments(annotations_file, buffer=0.05):
"""
Convert filler word annotations to (start, end) cut segments.
Args:
annotations_file: Path to JSON annotations
buffer: Small buffer before the word (seconds)
Returns:
List of (start, end) tuples representing segments to remove
"""
with open(annotations_file) as f:
annotations = json.load(f)
segments = []
for ann in annotations:
word = ann.get('word', '').lower().strip()
timestamp = ann['timestamp']
# Use word-specific duration, fall back to default
word_duration = WORD_DURATIONS.get(word, DEFAULT_DURATION)
# Cut starts slightly before the word
start = max(0, timestamp - buffer)
# Cut ends after word duration
end = timestamp + word_duration
segments.append((start, end))
return segments
```
## Merging Overlapping Segments
When filler words are close together, merge their cut segments:
```python
def merge_overlapping_segments(segments, min_gap=0.1):
"""
Merge segments that overlap or are very close together.
Args:
segments: List of (start, end) tuples
min_gap: Minimum gap to keep segments separate
Returns:
Merged list of segments
"""
if not segments:
return []
# Sort by start time
sorted_segs = sorted(segments)
merged = [sorted_segs[0]]
for start, end in sorted_segs[1:]:
prev_start, prev_end = merged[-1]
# If this segment overlaps or is very close to previous
if start <= prev_end + min_gap:
# Extend the previous segment
merged[-1] = (prev_start, max(prev_end, end))
else:
merged.append((start, end))
return merged
```
## Complete Processing Pipeline
```python
def process_filler_annotations(annotations_file, word_duration=0.4):
"""Full pipeline: load annotations -> create segments -> merge overlaps"""
# Load and create initial segments
segments = annotations_to_segments(annotations_file, word_duration)
# Merge overlapping cuts
merged = merge_overlapping_segments(segments)
return merged
```
## Tuning Parameters
| Parameter | Typical Value | Notes |
|-----------|---------------|-------|
| word_duration | varies | Short fillers (um, uh) ~0.25-0.3s, single words (like, yeah) ~0.3-0.4s, phrases (you know, i mean) ~0.5-0.6s |
| buffer | 0.05s | Small buffer captures word onset |
| min_gap | 0.1s | Prevents micro-segments between close fillers |
### Word Duration Guidelines
| Category | Words | Duration |
|----------|-------|----------|
| Quick hesitations | uh, um | 0.3-0.4s |
| Sustained hums (drawn out while thinking) | hum, hmm, mhm | 0.55-0.6s |
| Quick single words | like, yeah, so, well | 0.25-0.35s |
| Longer single words | okay, basically | 0.4-0.55s |
| Multi-word phrases | you know, i mean, kind of, i guess | 0.5-0.55s |
## Quality Considerations
- **Too aggressive**: Cuts into adjacent words, sounds choppy
- **Too conservative**: Filler words partially audible
- **Sweet spot**: Clean cuts with natural-sounding result
Test with a few samples before processing full video.