openclaw / office-to-md

Install for your project team

Run this command in your project directory to install the skill for your entire team:

mkdir -p .claude/skills/office-to-md && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1354" && unzip -o skill.zip -d .claude/skills/office-to-md && rm skill.zip

New-Item -Path ".claude/skills/office-to-md" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1354" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".claude/skills/office-to-md" -Force; Remove-Item "skill.zip"

Project Skills

This skill will be saved in .claude/skills/office-to-md/ and checked into git. All team members will have access to it automatically.

Important: Please verify the skill by reviewing its instructions before using it.

Install skill for Codex

Run one of these commands to install the skill depending on your needs:

Project Local ($CWD/.codex/skills)

mkdir -p .codex/skills/office-to-md && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1354" && unzip -o skill.zip -d .codex/skills/office-to-md && rm skill.zip

New-Item -Path ".codex/skills/office-to-md" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1354" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".codex/skills/office-to-md" -Force; Remove-Item "skill.zip"

User Global (~/.codex/skills)

mkdir -p ~/.codex/skills/office-to-md && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1354" && unzip -o skill.zip -d ~/.codex/skills/office-to-md && rm skill.zip

New-Item -Path "$HOME/.codex/skills/office-to-md" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1354" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.codex/skills/office-to-md" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
REPO	`$CWD/.codex/skills`	Project directory. Teams can check in skills most relevant to a working folder here.
REPO	`$CWD/../.codex/skills`	A folder above CWD. Organizations can check in skills relevant to a shared area.
REPO	`$REPO_ROOT/.codex/skills`	Top-most root folder. Relevant to everyone using the repository.
USER	`$CODEX_HOME/skills`	Personal folder (`~/.codex/skills`). Curate skills that apply to any repository.

Install skill for GitHub Copilot

Run one of these commands to install the skill depending on your needs:

Project (.github/skills)

mkdir -p .github/skills/office-to-md && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1354" && unzip -o skill.zip -d .github/skills/office-to-md && rm skill.zip

New-Item -Path ".github/skills/office-to-md" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1354" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".github/skills/office-to-md" -Force; Remove-Item "skill.zip"

Personal (~/.copilot/skills)

mkdir -p ~/.copilot/skills/office-to-md && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1354" && unzip -o skill.zip -d ~/.copilot/skills/office-to-md && rm skill.zip

New-Item -Path "$HOME/.copilot/skills/office-to-md" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1354" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.copilot/skills/office-to-md" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
Project	`.github/skills/`	Repository-specific skills. Checked into git for the whole team.
Personal	`~/.copilot/skills/`	Personal skills available across all your projects.

Install skill for Google Antigravity

Run one of these commands to install the skill depending on your needs:

Workspace (.agent/skills)

mkdir -p .agent/skills/office-to-md && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1354" && unzip -o skill.zip -d .agent/skills/office-to-md && rm skill.zip

New-Item -Path ".agent/skills/office-to-md" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1354" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".agent/skills/office-to-md" -Force; Remove-Item "skill.zip"

Global (~/.gemini/antigravity/skills)

mkdir -p ~/.gemini/antigravity/skills/office-to-md && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1354" && unzip -o skill.zip -d ~/.gemini/antigravity/skills/office-to-md && rm skill.zip

New-Item -Path "$HOME/.gemini/antigravity/skills/office-to-md" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1354" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.gemini/antigravity/skills/office-to-md" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
Workspace	`.agent/skills/`	Workspace-specific skills for project workflows and conventions.
Global	`~/.gemini/antigravity/skills/`	Personal skills available across all workspaces.

Convert Office documents (Word, Excel, PowerPoint, PDF) to Markdown using Microsoft's markitdown

Productivity Writing

0 views

0 installs

Tools I Recommend

ClaudeKit

Sponsor

Production-ready AI subagents automate your development & marketing workflows. Build in hours, not weeks.

Source: https://github.com/openclaw/skills/tree/main/skills/lijie420461340/office-to-md

Skill Content

---
name: office-to-md
description: Convert Office documents (Word, Excel, PowerPoint, PDF) to Markdown using Microsoft's markitdown
author: claude-office-skills
version: "1.0"
tags: [markdown, conversion, markitdown, microsoft, office]
models: [claude-sonnet-4, claude-opus-4]
tools: [computer, code_execution, file_operations]
library:
  name: markitdown
  url: https://github.com/microsoft/markitdown
  stars: 86k
---

# Office to Markdown Skill

## Overview

This skill enables conversion from various Office formats to Markdown using **markitdown** - Microsoft's open-source tool for converting documents to Markdown. Perfect for making Office content searchable, version-controllable, and AI-friendly.

## How to Use

1. Provide the Office file (Word, Excel, PowerPoint, PDF, etc.)
2. Optionally specify conversion options
3. I'll convert it to clean Markdown

**Example prompts:**
- "Convert this Word document to Markdown"
- "Turn this PowerPoint into Markdown notes"
- "Extract content from this PDF as Markdown"
- "Convert this Excel file to Markdown tables"

## Domain Knowledge

### markitdown Fundamentals

```python
from markitdown import MarkItDown

# Initialize converter
md = MarkItDown()

# Convert file
result = md.convert("document.docx")
print(result.text_content)

# Save to file
with open("output.md", "w") as f:
    f.write(result.text_content)
```

### Supported Formats

| Format | Extension | Notes |
|--------|-----------|-------|
| Word | .docx | Full text, tables, basic formatting |
| Excel | .xlsx | Converts to Markdown tables |
| PowerPoint | .pptx | Slides as sections |
| PDF | .pdf | Text extraction |
| HTML | .html | Clean markdown |
| Images | .jpg, .png | OCR with vision model |
| Audio | .mp3, .wav | Transcription |
| ZIP | .zip | Processes contained files |

### Basic Usage

#### Python API
```python
from markitdown import MarkItDown

# Simple conversion
md = MarkItDown()
result = md.convert("document.docx")

# Access content
markdown_text = result.text_content

# With options
md = MarkItDown(
    llm_client=None,      # Optional LLM for enhanced processing
    llm_model=None        # Model name if using LLM
)
```

#### Command Line
```bash
# Install
pip install markitdown

# Convert file
markitdown document.docx > output.md

# Or with output file
markitdown document.docx -o output.md
```

### Word Document Conversion

```python
from markitdown import MarkItDown

md = MarkItDown()

# Convert Word document
result = md.convert("report.docx")

# Output preserves:
# - Headings (as # headers)
# - Bold/italic formatting
# - Lists (bulleted and numbered)
# - Tables (as markdown tables)
# - Hyperlinks

print(result.text_content)
```

**Example Output:**
```markdown
# Annual Report 2024

## Executive Summary

This report summarizes the key achievements and challenges...

### Key Metrics

| Metric | 2023 | 2024 | Change |
|--------|------|------|--------|
| Revenue | $10M | $12M | +20% |
| Users | 50K | 75K | +50% |

## Detailed Analysis

The following sections provide...
```

### Excel Conversion

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("data.xlsx")

# Each sheet becomes a section
# Data becomes markdown tables
print(result.text_content)
```

**Example Output:**
```markdown
## Sheet1

| Name | Department | Salary |
|------|------------|--------|
| John | Engineering | $80,000 |
| Jane | Marketing | $75,000 |

## Sheet2

| Product | Q1 | Q2 | Q3 | Q4 |
|---------|----|----|----|----|
| Widget A | 100 | 120 | 150 | 180 |
```

### PowerPoint Conversion

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("presentation.pptx")

# Each slide becomes a section
# Speaker notes included if present
print(result.text_content)
```

**Example Output:**
```markdown
# Slide 1: Company Overview

Our mission is to...

## Key Points
- Innovation first
- Customer focused
- Global reach

---

# Slide 2: Market Analysis

The market opportunity is significant...

**Notes:** Mention the competitor analysis here
```

### PDF Conversion

```python
from markitdown import MarkItDown

md = MarkItDown()
result = md.convert("document.pdf")

# Extracts text content
# Tables converted where detected
print(result.text_content)
```

### Image Conversion (with Vision Model)

```python
from markitdown import MarkItDown
import anthropic

# Use Claude for image description
client = anthropic.Anthropic()

md = MarkItDown(
    llm_client=client,
    llm_model="claude-sonnet-4-20250514"
)

result = md.convert("diagram.png")
print(result.text_content)

# Output: Description of the image content
```

### Batch Conversion

```python
from markitdown import MarkItDown
from pathlib import Path

def batch_convert(input_dir, output_dir):
    """Convert all Office files to Markdown."""
    md = MarkItDown()
    input_path = Path(input_dir)
    output_path = Path(output_dir)
    output_path.mkdir(exist_ok=True)
    
    extensions = ['.docx', '.xlsx', '.pptx', '.pdf']
    
    for ext in extensions:
        for file in input_path.glob(f'*{ext}'):
            try:
                result = md.convert(str(file))
                output_file = output_path / f"{file.stem}.md"
                
                with open(output_file, 'w') as f:
                    f.write(result.text_content)
                
                print(f"Converted: {file.name}")
            except Exception as e:
                print(f"Error converting {file.name}: {e}")

batch_convert('./documents', './markdown')
```

## Best Practices

1. **Check Output Quality**: Review converted Markdown for accuracy
2. **Handle Tables**: Complex tables may need manual adjustment
3. **Preserve Structure**: Use consistent heading levels in source docs
4. **Image Handling**: Consider using vision models for important images
5. **Version Control**: Store converted Markdown in Git for tracking

## Common Patterns

### Document Archive
```python
import os
from datetime import datetime
from markitdown import MarkItDown

def archive_document(doc_path, archive_dir):
    """Convert and archive Office document to Markdown."""
    md = MarkItDown()
    result = md.convert(doc_path)
    
    # Create archive structure
    date_str = datetime.now().strftime('%Y-%m-%d')
    filename = os.path.basename(doc_path)
    base_name = os.path.splitext(filename)[0]
    
    # Save with metadata
    output_content = f"""---
source: {filename}
converted: {date_str}
---

{result.text_content}
"""
    
    output_path = os.path.join(archive_dir, f"{base_name}.md")
    with open(output_path, 'w') as f:
        f.write(output_content)
    
    return output_path
```

### AI-Ready Corpus
```python
from markitdown import MarkItDown
from pathlib import Path
import json

def create_ai_corpus(doc_folder, output_file):
    """Convert documents to JSON corpus for AI training/RAG."""
    md = MarkItDown()
    corpus = []
    
    for doc in Path(doc_folder).glob('**/*'):
        if doc.suffix in ['.docx', '.pdf', '.pptx', '.xlsx']:
            try:
                result = md.convert(str(doc))
                corpus.append({
                    'source': str(doc),
                    'filename': doc.name,
                    'content': result.text_content,
                    'type': doc.suffix[1:]
                })
            except Exception as e:
                print(f"Skipped {doc.name}: {e}")
    
    with open(output_file, 'w') as f:
        json.dump(corpus, f, indent=2)
    
    print(f"Created corpus with {len(corpus)} documents")
    return corpus
```

## Examples

### Example 1: Convert Documentation Suite
```python
from markitdown import MarkItDown
from pathlib import Path

def convert_docs_to_wiki(docs_folder, wiki_folder):
    """Convert all Office docs to markdown wiki structure."""
    md = MarkItDown()
    docs_path = Path(docs_folder)
    wiki_path = Path(wiki_folder)
    
    # Create wiki structure
    wiki_path.mkdir(exist_ok=True)
    
    # Create index
    index_content = "# Documentation Index\n\n"
    
    for doc in sorted(docs_path.glob('**/*.docx')):
        try:
            result = md.convert(str(doc))
            
            # Create relative path in wiki
            rel_path = doc.relative_to(docs_path)
            output_file = wiki_path / rel_path.with_suffix('.md')
            output_file.parent.mkdir(parents=True, exist_ok=True)
            
            # Write markdown
            with open(output_file, 'w') as f:
                f.write(result.text_content)
            
            # Add to index
            link = str(rel_path.with_suffix('.md')).replace('\\', '/')
            index_content += f"- [{doc.stem}]({link})\n"
            
            print(f"Converted: {doc.name}")
            
        except Exception as e:
            print(f"Error: {doc.name} - {e}")
    
    # Write index
    with open(wiki_path / 'index.md', 'w') as f:
        f.write(index_content)

convert_docs_to_wiki('./company_docs', './wiki')
```

### Example 2: Meeting Notes Processor
```python
from markitdown import MarkItDown
import re
from datetime import datetime

def process_meeting_notes(pptx_path):
    """Extract and structure meeting notes from PowerPoint."""
    md = MarkItDown()
    result = md.convert(pptx_path)
    
    # Parse the markdown
    content = result.text_content
    
    # Extract sections
    sections = {
        'attendees': [],
        'agenda': [],
        'decisions': [],
        'action_items': []
    }
    
    current_section = None
    
    for line in content.split('\n'):
        line_lower = line.lower()
        
        if 'attendee' in line_lower or 'participant' in line_lower:
            current_section = 'attendees'
        elif 'agenda' in line_lower:
            current_section = 'agenda'
        elif 'decision' in line_lower:
            current_section = 'decisions'
        elif 'action' in line_lower:
            current_section = 'action_items'
        elif line.strip().startswith(('-', '*', '•')) and current_section:
            sections[current_section].append(line.strip()[1:].strip())
    
    # Generate structured output
    output = f"""# Meeting Notes

**Date:** {datetime.now().strftime('%Y-%m-%d')}
**Source:** {pptx_path}

## Attendees
{chr(10).join('- ' + a for a in sections['attendees'])}

## Agenda
{chr(10).join('- ' + a for a in sections['agenda'])}

## Decisions Made
{chr(10).join('- ' + d for d in sections['decisions'])}

## Action Items
{chr(10).join('- [ ] ' + a for a in sections['action_items'])}
"""
    
    return output

notes = process_meeting_notes('team_meeting.pptx')
print(notes)
```

### Example 3: Excel to Documentation
```python
from markitdown import MarkItDown

def excel_to_data_dictionary(xlsx_path):
    """Convert Excel data model to data dictionary documentation."""
    md = MarkItDown()
    result = md.convert(xlsx_path)
    
    # Add documentation structure
    doc = f"""# Data Dictionary

Generated from: `{xlsx_path}`

{result.text_content}

## Usage Notes

- All tables are derived from the source Excel file
- Review data types and constraints before use
- Contact data team for clarifications

## Change Log

| Date | Change | Author |
|------|--------|--------|
| {datetime.now().strftime('%Y-%m-%d')} | Initial generation | Auto |
"""
    
    return doc

documentation = excel_to_data_dictionary('data_model.xlsx')
with open('data_dictionary.md', 'w') as f:
    f.write(documentation)
```

## Limitations

- Complex formatting may be simplified
- Images are not embedded (use vision model for descriptions)
- Some table structures may not convert perfectly
- Track changes in Word are not preserved
- Comments may not be extracted

## Installation

```bash
pip install markitdown

# For image/audio processing
pip install markitdown[all]

# For specific features
pip install markitdown[images]  # Image OCR
pip install markitdown[audio]   # Audio transcription
```

## Resources

- [GitHub Repository](https://github.com/microsoft/markitdown)
- [PyPI Package](https://pypi.org/project/markitdown/)
- [Supported Formats](https://github.com/microsoft/markitdown#supported-formats)

openclaw / office-to-md

Install for your project team

Download skill

Enable skills in Claude

Upload to Claude

Install skill for Codex

Install skill for GitHub Copilot

Install skill for Google Antigravity

Tools I Recommend

ClaudeKit

Skill Content