RAG Pipelines
The most common Firecrawl integration pattern is feeding web content into retrieval augmented generation systems. Firecrawl's clean markdown output eliminates custom HTML parsing and produces consistently structured content that chunks well for vector databases.
Map
Discover all URLs on the target site using sitemap scanning
Batch Scrape
Scrape relevant pages and get clean markdown output
Chunk
Split markdown by heading hierarchy into semantic sections
Embed
Store chunks in Pinecone, Weaviate, or Chroma
Retrieve
Query relevant chunks when the LLM needs context
Map
Discover all URLs on the target site using sitemap scanning
Batch Scrape
Scrape relevant pages and get clean markdown output
Chunk
Split markdown by heading hierarchy into semantic sections
Embed
Store chunks in Pinecone, Weaviate, or Chroma
Retrieve
Query relevant chunks when the LLM needs context
MCP setup for AI coding agents
npx -y firecrawl-cli@latest init --all --browser. The MCP server exposes all endpoints as tools (firecrawl_scrape, firecrawl_map, firecrawl_search, firecrawl_crawl, firecrawl_extract, firecrawl_agent). Remote hosted URL: https://mcp.firecrawl.dev/{API_KEY}/v2/mcp