Best MCP Servers for Web Scraping
Turn any website into structured data with MCP servers built for web scraping. Extract content, crawl pages, and convert HTML to clean markdown — all from your AI assistant.
DeepWiki
Instantly turn any Deepwiki article into clean, structured Markdown you can use anywhere. Deepwiki MCP Server safely crawls deepwiki.com pages, removes clutter like ads and navigation, rewrites links for Markdown, and offers fast performance with customizable output formats. Choose a single document or organize content by page, and easily extract documentation or guides for any supported library. It’s designed for secure, high-speed conversion and clear, easy-to-read results—making documentation and learning seamless.
Jina AI
Integrates with Jina AI's web services to enable web content extraction, search, and fact-checking through natural language interactions.
Web Fetcher
Fetches and extracts web content using Playwright's headless browser capabilities, delivering clean, readable content from JavaScript-heavy websites in HTML or Markdown format for research and information gathering.
DuckDuckGo Search
Integrates with DuckDuckGo to provide web search capabilities, content fetching, and parsing, with results formatted for large language model consumption.
Chinese Trends Hub
Provides real-time access to trending topics and content from major Chinese platforms including Weibo, Zhihu, Douyin, Bilibili, Douban, Toutiao, and 36kr through separate tools with temporary caching for improved performance.
FetchSERP
Integrates with FetchSERP API to provide SEO analysis, SERP data retrieval, web scraping, keyword research, backlink analysis, and domain intelligence across Google, Bing, Yahoo, and DuckDuckGo search engines.
Documentation Scraper
Provides specialized documentation scraping and retrieval from GitHub, NPM, PyPI, and web pages, enabling accurate reference to up-to-date library documentation without disrupting workflow.
Selenium WebDriver
Enables browser automation through Selenium WebDriver with support for Chrome, Firefox, and Edge browsers, providing navigation, element interaction, form handling, screenshot capture, JavaScript execution, and advanced actions for automated testing and web scraping tasks.
Airbnb
Integrates with Airbnb to enable vacation rental search and detailed property information retrieval without requiring API keys
DuckDuckGo Search
Provides web search capabilities through DuckDuckGo, enabling content retrieval, URL processing, and metadata extraction with customizable filtering options
Deep Research (Tavily)
Enables comprehensive web research by leveraging Tavily's Search and Crawl APIs to aggregate information from multiple sources, extract detailed content, and structure data specifically for generating technical documentation and research reports.
Baidu Search
Provides web search capabilities through Baidu's search engine, enabling retrieval of search results and webpage content with robust error handling and content parsing.
Serper Search and Scrape
Integrates with the Serper API to enable web searches and webpage content extraction, supporting research, content aggregation, and data mining tasks.
Serper (Google Search)
Enables AI to perform Google searches via the Serper API with support for location, language, and time period filters.
YouTube Transcript
Fetches and analyzes YouTube video transcripts by accepting URLs or video IDs and returning formatted transcript data with timestamps for video content analysis without watching.
Google News & Trends
Integrates with Google News RSS feeds and Google Trends to provide news article search, trending topic retrieval, and optional content summarization for news monitoring and trend analysis workflows.
Playwright
Automate web browsers for testing, scraping, and visual analysis.
Apify Actor
Use 4,000+ pre-built cloud tools, known as Actors, to extract data from websites, e-commerce, social media, search engines, maps, and more.
Web UI Copy
Transforms webpage content into a fully inlined, script-free HTML document with base64-encoded resources, enabling comprehensive web page analysis and extraction.
One Search
Provides a unified search and web scraping platform that integrates multiple search providers like SearxNG and Tavily, along with Firecrawl for advanced web content extraction, enabling flexible web data retrieval and structured information gathering.
Playwright Browser Automation
Enables LLM-powered browser automation for web tasks including navigation, interaction, and content extraction through Playwright's comprehensive browser control capabilities.
Puppeteer Real Browser
Provides stealth browser automation using puppeteer-real-browser with anti-detection features, human-like interactions, proxy support, and captcha solving for web scraping, testing, and form automation that bypasses bot detection mechanisms.
Fetch and Convert
Fetches and converts web content to Markdown using JSDOM and Turndown.
YggTorrent
Provides secure access to YggTorrent through an unofficial API wrapper, enabling torrent searching with category filtering, detailed metadata retrieval, and magnet link generation with automatic passkey injection for authenticated downloads.
YouTube Transcripts
Extract and analyze video captions and subtitles in multiple languages.
YouTube Subtitles
Integrates YouTube subtitle retrieval for natural language queries about video content.
Read Website Fast
Extracts web content and converts it to clean Markdown format using Mozilla Readability for intelligent article detection, with disk-based caching, robots.txt compliance, and concurrent crawling capabilities for fast content processing workflows.
GitHub Repo Extractor
Connects to GitHub repositories, enabling natural language queries about code structure, dependencies, and development history.
RSS Feed Parser
Provides RSS feed parsing and retrieval with RSSHub integration, automatically trying multiple instances when one fails and supporting custom rsshub:// protocol URLs for accessing current content from websites, social platforms, and news sources that don't natively provide RSS feeds.