hugohe3 / ocr-image-to-markdown

Install for your project team

Run this command in your project directory to install the skill for your entire team:

mkdir -p .claude/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1744" && unzip -o skill.zip -d .claude/skills/ocr-image-to-markdown && rm skill.zip

New-Item -Path ".claude/skills/ocr-image-to-markdown" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1744" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".claude/skills/ocr-image-to-markdown" -Force; Remove-Item "skill.zip"

Project Skills

This skill will be saved in .claude/skills/ocr-image-to-markdown/ and checked into git. All team members will have access to it automatically.

Important: Please verify the skill by reviewing its instructions before using it.

Install skill for Codex

Run one of these commands to install the skill depending on your needs:

Project Local ($CWD/.codex/skills)

mkdir -p .codex/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1744" && unzip -o skill.zip -d .codex/skills/ocr-image-to-markdown && rm skill.zip

New-Item -Path ".codex/skills/ocr-image-to-markdown" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1744" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".codex/skills/ocr-image-to-markdown" -Force; Remove-Item "skill.zip"

User Global (~/.codex/skills)

mkdir -p ~/.codex/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1744" && unzip -o skill.zip -d ~/.codex/skills/ocr-image-to-markdown && rm skill.zip

New-Item -Path "$HOME/.codex/skills/ocr-image-to-markdown" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1744" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.codex/skills/ocr-image-to-markdown" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
REPO	`$CWD/.codex/skills`	Project directory. Teams can check in skills most relevant to a working folder here.
REPO	`$CWD/../.codex/skills`	A folder above CWD. Organizations can check in skills relevant to a shared area.
REPO	`$REPO_ROOT/.codex/skills`	Top-most root folder. Relevant to everyone using the repository.
USER	`$CODEX_HOME/skills`	Personal folder (`~/.codex/skills`). Curate skills that apply to any repository.

Install skill for GitHub Copilot

Run one of these commands to install the skill depending on your needs:

Project (.github/skills)

mkdir -p .github/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1744" && unzip -o skill.zip -d .github/skills/ocr-image-to-markdown && rm skill.zip

New-Item -Path ".github/skills/ocr-image-to-markdown" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1744" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".github/skills/ocr-image-to-markdown" -Force; Remove-Item "skill.zip"

Personal (~/.copilot/skills)

mkdir -p ~/.copilot/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1744" && unzip -o skill.zip -d ~/.copilot/skills/ocr-image-to-markdown && rm skill.zip

New-Item -Path "$HOME/.copilot/skills/ocr-image-to-markdown" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1744" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.copilot/skills/ocr-image-to-markdown" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
Project	`.github/skills/`	Repository-specific skills. Checked into git for the whole team.
Personal	`~/.copilot/skills/`	Personal skills available across all your projects.

Install skill for Google Antigravity

Run one of these commands to install the skill depending on your needs:

Workspace (.agent/skills)

mkdir -p .agent/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1744" && unzip -o skill.zip -d .agent/skills/ocr-image-to-markdown && rm skill.zip

New-Item -Path ".agent/skills/ocr-image-to-markdown" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1744" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath ".agent/skills/ocr-image-to-markdown" -Force; Remove-Item "skill.zip"

Global (~/.gemini/antigravity/skills)

mkdir -p ~/.gemini/antigravity/skills/ocr-image-to-markdown && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/1744" && unzip -o skill.zip -d ~/.gemini/antigravity/skills/ocr-image-to-markdown && rm skill.zip

New-Item -Path "$HOME/.gemini/antigravity/skills/ocr-image-to-markdown" -ItemType Directory -Force; Invoke-WebRequest -Uri "https://fastmcp.me/Skills/Download/1744" -OutFile "skill.zip"; Expand-Archive -Path "skill.zip" -DestinationPath "$HOME/.gemini/antigravity/skills/ocr-image-to-markdown" -Force; Remove-Item "skill.zip"

Scope	Location	Suggested Use
Workspace	`.agent/skills/`	Workspace-specific skills for project workflows and conventions.
Global	`~/.gemini/antigravity/skills/`	Personal skills available across all workspaces.

鉴于本地 OCR 工具的缺失，本技能利用 Agent 的多模态能力来查看图像（PNG, JPG 等）并将内容（文本、表格、逻辑图）转录为格式化的 Markdown。

Writing Data & Analytics

0 views

0 installs

Source: https://github.com/hugohe3/ppt-master/tree/main/.agent/skills/ocr_image_to_markdown

Skill Content

---
name: OCR Image to Markdown
description: 鉴于本地 OCR 工具的缺失，本技能利用 Agent 的多模态能力来查看图像（PNG, JPG 等）并将内容（文本、表格、逻辑图）转录为格式化的 Markdown。
---

# OCR 图像识别转 Markdown

本技能允许你“阅读”图片并将内容转换为可编辑的 Markdown 文本。这在提取数据表格、幻灯片内容或文档截图时特别有用，尤其是当无法使用外部 OCR 库时。

## 使用指南

1.  **确认目标图片**:
    *   定位你需要处理的图片文件。
    *   如有需要，使用 `list_dir` 浏览目录。

2.  **查看图片**:
    *   使用 `view_file` 工具来“看”图片内容。系统允许你直接处理图像数据。
    *   **关键**: 你必须对图片路径使用 `view_file`，这样你的视觉模型才能消化它。

3.  **转录内容**:
    *   基于你所看到的，将文本转录为 Markdown。
    *   **表格**: 将视觉看到的表格转换为标准 Markdown 表格 (`| 表头 | ... |`)。
    *   **标题**: 使用 `#`, `##` 等来标记图片中的标题，保持层级结构。
    *   **文本**: 将段落转录为普通文本。
    *   **数字**: 仔细核对所有数字，特别是财务报表中的数据。

4.  **保存输出**:
    *   使用 `write_to_file` 将转录的内容写入 `.md` 文件（例如 `ocr_results.md`）。
    *   如果处理多张图片，考虑将其追加到同一个文件中，或按逻辑组织。

## 最佳实践技巧

*   **表格**: 仔细对齐行和列。标准 Markdown 表格不支持单元格合并（rowspan/colspan）。你需要根据逻辑流将合并的单元格展开，或者留空。
*   **复杂布局**: 如果图片布局复杂（例如左右分栏），请按照逻辑阅读顺序（从上到下，从左到右）将其序列化。
*   **图表/图形**: 如果图片包含图表，请描述趋势，或者将可见的数据点提取为列表或表格。
*   **无需代码执行**: **不要** 试图编写或使用 Python 库（如 `pytesseract`, `easyocr`, `PIL`）来进行文本提取。请直接利用你自身的视觉能力。

## 示例场景

**请求**: "把这 3 张财务报告的截图转为 markdown。"

**执行**:
1.  `list_dir` 查看文件: `img1.png`, `img2.png`, `img3.png`。
2.  `view_file` 读取 `img1.png`。
3.  (内部处理): 识别表头 "Q1 Revenue" 和表格行数据。
4.  `view_file` 读取 `img2.png` 和 `img3.png`。
5.  `write_to_file` 创建 `financial_report.md` 并写入汇总的内容。

hugohe3 / ocr-image-to-markdown

Install for your project team

Download skill

Enable skills in Claude

Upload to Claude

Install skill for Codex

Install skill for GitHub Copilot

Install skill for Google Antigravity

Skill Content