benchflow-ai / marker
Install for your project team
Run this command in your project directory to install the skill for your entire team:
mkdir -p .claude/skills/marker && curl -L -o skill.zip "https://fastmcp.me/Skills/Download/2569" && unzip -o skill.zip -d .claude/skills/marker && rm skill.zip
Project Skills
This skill will be saved in .claude/skills/marker/ and checked into git. All team members will have access to it automatically.
Important: Please verify the skill by reviewing its instructions before using it.
Convert PDF documents to Markdown using marker_single. Use when Claude needs to extract text content from PDFs while preserving LaTeX formulas, equations, and document structure. Ideal for academic papers and technical documents containing mathematical notation.
0 views
0 installs
Skill Content
---
name: marker
description: Convert PDF documents to Markdown using marker_single. Use when Claude needs to extract text content from PDFs while preserving LaTeX formulas, equations, and document structure. Ideal for academic papers and technical documents containing mathematical notation.
---
# Marker PDF-to-Markdown Converter
Convert PDFs to Markdown while preserving LaTeX formulas and document structure. Uses the `marker_single` CLI from the marker-pdf package.
## Dependencies
- `marker_single` on PATH (`pip install marker-pdf` if missing)
- Python 3.10+ (available in the task image)
## Quick Start
```python
from scripts.marker_to_markdown import pdf_to_markdown
markdown_text = pdf_to_markdown("paper.pdf")
print(markdown_text)
```
## Python API
- `pdf_to_markdown(pdf_path, *, timeout=600, cleanup=True) -> str`
- Runs `marker_single --output_format markdown --disable_image_extraction`
- `cleanup=True`: use a temp directory and delete after reading the Markdown
- `cleanup=False`: keep outputs in `<pdf_stem>_marker/` next to the PDF
- Exceptions: `FileNotFoundError` if the PDF is missing, `RuntimeError` for marker failures, `TimeoutError` if it exceeds the timeout
- Tips: bump `timeout` for large PDFs; set `cleanup=False` to inspect intermediate files
## Command-Line Usage
```bash
# Basic conversion (prints markdown to stdout)
python scripts/marker_to_markdown.py paper.pdf
# Keep temporary files
python scripts/marker_to_markdown.py paper.pdf --keep-temp
# Custom timeout
python scripts/marker_to_markdown.py paper.pdf --timeout 600
```
## Output Locations
- `cleanup=True`: outputs stored in a temporary directory and removed automatically
- `cleanup=False`: outputs saved to `<pdf_stem>_marker/`; markdown lives at `<pdf_stem>_marker/<pdf_stem>/<pdf_stem>.md` when present (otherwise the first `.md` file is used)
## Troubleshooting
- `marker_single` not found: install `marker-pdf` or ensure the CLI is on PATH
- No Markdown output: re-run with `--keep-temp`/`cleanup=False` and check `stdout`/`stderr` saved in the output folder