NEW ScrapingAnt MCP for Claude Code, Cursor & Windsurf — try it free →
★★★★★ 5.0 on Capterra

LLM-ready Markdown. Web pages in one call.

A Markdown extraction API for LLM workflows. Pass a URL to /v2/markdown, get back clean token-efficient Markdown — ~5-10× fewer tokens than raw HTML. No HTML cleanup, no XPath, no boilerplate. Drop straight into RAG pipelines, fine-tunes, or live agent context.

10 credits per call · failed requests cost 0 · cancel anytime

# Any URL → clean LLM-ready Markdown
$ curl 'https://api.scrapingant.com/v2/markdown' \
    --data-urlencode 'url=https://example.com/article' \
    -H 'x-api-key: YOUR_API_KEY'
import requests

r = requests.get(
    'https://api.scrapingant.com/v2/markdown',
    params={'url': 'https://example.com/article'},
    headers={'x-api-key': 'YOUR_API_KEY'},
)

doc = r.json()
print(doc['markdown'])  # clean Markdown, no boilerplate
const res = await fetch(
  'https://api.scrapingant.com/v2/markdown?' +
  new URLSearchParams({
    url: 'https://example.com/article',
  }),
  { headers: { 'x-api-key': 'YOUR_API_KEY' } },
);

const { markdown } = await res.json();
console.log(markdown);  // ready to feed your LLM
example.com/article — AD BANNER — — SPONSORED — comments · footer RAW PAGE ~80k tokens /v2/markdown article.md # ## · · · LLM-ready CLEAN MARKDOWN ~9k tokens · 9× smaller no parsing · no XPath · no boilerplate
REMOVED <script> tags ads & sponsored blocks navigation & menus cookie banners comments & footers KEPT headings (H1–H6) paragraphs & lists links & tables code blocks images (with alt) CONTENT-FIRST EXTRACTION readability-style heuristics + LLM-tuned cleanup
Cleanup

Drop the noise. Keep the signal.

Every page passes through a content-first extraction step that strips scripts, ads, sponsored blocks, navigation, cookie banners, comments, and footers. What's left is the article body — headings, paragraphs, lists, links, code, and tables — formatted as clean Markdown.

  • No HTML cleanup pass needed in your pipeline
  • Headings, lists, links, and tables preserved
  • Image alt text retained for vision-aware models
0 20k 40k 60k 80k raw HTML ≈ 80k tokens /v2/markdown 9k ~9× smaller average article tokenized with cl100k_base — results vary by source, savings hold RAG IMPACT cheaper embeds · longer context
Token efficiency

Tokens that count stay in your context.

A typical article page costs ~80k tokens as raw HTML and around 9k as cleaned Markdown — roughly a 9× reduction. That means tighter context windows, lower embedding cost, and more documents per query in your RAG retriever.

  • ~9× smaller than raw HTML on typical articles
  • Cleaner chunks tokenize and embed predictably
  • Lower inference + retrieval costs at scale
LIVE single call A Agent thought "check the latest…" tool call /v2/markdown 200 OK · ~700ms clean MD ready to read — back to the agent loop — BULK parallel jobs urls.txt · 12,000 lines 200 200 200 200 200 200 200 200 Markdown per call collect 12k pages your way SAME ENDPOINT /v2/markdown · synchronous response CONCURRENCY no cap — push your queue at the cluster
Live or bulk

On-demand for agents. Bulk for fine-tunes.

Same endpoint, two modes. Agents call /v2/markdown synchronously to read a page mid-thought — sub-second response, ready-to-tokenize Markdown back. Training pipelines fan out URL lists in parallel; we don't cap concurrency, so you collect Markdown for each page on your end at whatever rate your worker pool can drive. For MCP-aware clients (Claude, Cursor, Windsurf), the same Markdown tool is exposed through the ScrapingAnt MCP server as get_web_page_markdown — and through Claude Code with one claude mcp add.

  • Drop straight into LangChain / LlamaIndex / your custom retriever
  • No concurrency cap — push as many parallel calls as your code can drive
  • Failed fetches don't cost credits — clean fanout retries
Markdown endpoint docs →
SAME STACK AS /v2/general headless Chrome rotating proxies CAPTCHA-free TLS fingerprint Cloudflare bypass JS rendering + Markdown extraction layer on top
Built on the cluster

Same anti-bot. Same proxy fleet.

Every /v2/markdown call routes through the same headless Chrome cluster, rotating proxies, TLS fingerprinting, and CAPTCHA avoidance that backs the JavaScript rendering API. JavaScript-rendered pages, Cloudflare / Akamai-protected sites, and SPA targets all return clean Markdown — same call, same reliability. Switch to residential proxies via proxy_type=residential for tougher targets.

  • Real headless Chrome — handles JS-rendered content
  • Rotating proxies + CAPTCHA avoidance out of the box
  • Switch to residential for tougher targets — same call
Pricing

Industry leading pricing that scales with your business.

Compare plans side by side. Every tier includes 10,000 free credits to start.
👈Swipe to compare all 5 plans👉
Plans
Enthusiast
100K credits / mo
$19/mo
★ Most Popular
Startup
500K credits / mo
$49/mo
Business
3M credits / mo
$249/mo
Business Pro
8M credits / mo
$599/mo
Custom
10M+ credits / mo
$699+/mo
Monthly API credits 100,000 500,000 3,000,000 8,000,000 10M+
Support channel Email Priority email Priority email Priority email Priority + dedicated
Integration help Docs only Custom code snippets Debug sessions Priority debug sessions Full enterprise onboarding
Expert assistance included included included included
Custom proxy pools included included included
Custom anti-bot avoidances included included included
Dedicated account manager included included included
Start Free Start Free → Start Free Start Free Talk to Sales
Hit your limit mid-month?
Restart your plan instantly — no waiting for the next billing cycle. Credits refresh the moment you pay, so scraping never has to stop.
10,000 free credits every month
No credit card required
Pay only for successful scrapes — failed requests cost 0
Customers

What teams are saying.

From solo developers shipping side projects to enterprise pipelines at Fortune 500s.

★★★★★ 5.0 on Capterra →
★★★★★

“Onboarding and API integration was smooth and clear. Everything works great. The support was excellent.

Illia K.
Android Software Developer
★★★★★

“Great communication with co-founders helped me to get the job done. Great proxy diversity and good price.”

Andrii M.
Senior Software Engineer
★★★★★

“This product helps me to scale and extend my business. The setup is easy and support is really good.”

Dmytro T.
Senior Software Engineer
FAQ

Frequently asked questions.

Still curious? Get in touch with our team — we usually reply within hours.

What is LLM-ready data extraction?

LLM-ready data extraction is the workflow of fetching a web page, stripping the boilerplate (navigation, ads, scripts, footers, cookie banners), and returning the article content as clean Markdown that an LLM can consume directly — no HTML cleanup, no XPath rules, no token waste. ScrapingAnt's /v2/markdown endpoint is a Markdown extraction API: pass a URL, get back token-efficient Markdown ready for RAG pipelines, fine-tunes, or live agent context. Built on the same Headless Chrome and proxy stack as our JavaScript rendering API.

What does the Markdown extraction response look like?

JSON with two main fields: url (the URL we fetched) and markdown (the cleaned page content). Status code 200 on success. The Markdown preserves headings, lists, links, code blocks, and tables — but strips scripts, ads, navigation, footers, cookie banners, and other boilerplate. Same shape across every URL, so your RAG chunker doesn't need per-source rules.

How is /v2/markdown different from /v2/general?

The /v2/general endpoint returns the rendered HTML — useful when you want full control over parsing or DOM access.

/v2/markdown takes the same rendered page and returns clean LLM-ready Markdown. ~5-10× fewer tokens than raw HTML, no boilerplate to strip, no XPath selectors to maintain. Same proxy / anti-bot stack underneath; different output format.

How is LLM-ready Markdown different from the AI data scraper?

Different shapes for different jobs. /v2/markdown returns the whole page as cleaned Markdown — best when you want full content for RAG indexing or fine-tuning. The AI data scraper (/v2/extract) returns typed JSON keyed to a plain-English schema you describe — best when you know exactly which fields you need (price, rating, address). Pick by output shape, not by capability.

Does Markdown extraction handle JavaScript-rendered pages?

Yes — every request runs through real headless Chrome by default. Single-page apps, lazy-loaded content, and React / Vue / Next.js pages all return clean Markdown of the final visible content. Same engine that powers our JavaScript rendering API.

Can I run /v2/markdown for live agents and bulk corpus jobs?

Yes — same endpoint, both modes. Agents call /v2/markdown synchronously and get a sub-second response. Training pipelines fan out URL lists in parallel; there's no concurrency cap, so you push as many simultaneous calls as your worker pool can drive. Same auth, same response shape, same credit cost per call. For agent integrations, the same Markdown tool is also exposed through the ScrapingAnt MCP server as get_web_page_markdown. Markdown endpoint docs →

How are credits charged for LLM-ready Markdown?

Each /v2/markdown call costs the same as a regular /v2/general request — based on your proxy choice. Default datacenter routing is 10 credits per request; switch to residential proxies when the source page is anti-bot-protected and the cost adjusts. Failed requests cost 0. Every account starts with 10,000 free credits per month, no card required.

Is LLM-ready Markdown good enough for fine-tuning?

For most public-web content, yes — the Markdown output is human-readable and tokenizes cleanly. For very structured datasets (catalogs, schema-rich pages), combine it with the AI data scraper so each row carries explicit fields. Mixed approaches work great for RAG: Markdown for narrative pages, typed JSON for tables.

Talk to us

Building an LLM pipeline at scale?

Volume crawls, custom extraction schemas, dedicated capacity, or a one-off RAG corpus — drop us a line and a real human gets back within a few hours.

“Our clients are pleasantly surprised by the response speed of our team.”

Oleg Kulyk
Founder, ScrapingAnt

A real human replies within a few hours · we don't share your email

Thanks — we'll be in touch shortly.
Something went wrong submitting the form. Please try again or email us directly.

Ready to scrape the web?

10,000 free credits every month. No credit card. Pay only for successful requests.

Sign up in under 30 seconds — no card, no commitment.