NEW ScrapingAnt MCP for Claude Code, Cursor & Windsurf — try it free →
★★★★★ 5.0 on Capterra

AI data scraper. Plain English in. Typed JSON out.

An AI scraping API for structured data extraction. Send a URL plus a plain-English schema — title, price(number), reviews(list: title, body) — and our LLM returns typed JSON. No CSS selectors, no XPath, no parsers to maintain when the site re-skins.

failed requests cost 0 · cancel anytime · NDA on request

# Plain-English schema in. Typed JSON out.
$ curl 'https://api.scrapingant.com/v2/extract' \
    --data-urlencode 'url=https://shop.example.com/widget-pro' \
    --data-urlencode 'extract_properties=title, price(number), reviews(list: title, body)' \
    -H 'x-api-key: YOUR_API_KEY'
import requests

r = requests.get(
    'https://api.scrapingant.com/v2/extract',
    params={
        'url': 'https://shop.example.com/widget-pro',
        'extract_properties':
            'title, price(number), reviews(list: title, body)',
    },
    headers={'x-api-key': 'YOUR_API_KEY'},
)

data = r.json()  # { title, price, reviews: [...] }
print(data['price'])
const res = await fetch(
  'https://api.scrapingant.com/v2/extract?' +
  new URLSearchParams({
    url: 'https://shop.example.com/widget-pro',
    extract_properties:
      'title, price(number), reviews(list: title, body)',
  }),
  { headers: { 'x-api-key': 'YOUR_API_KEY' } },
);

const { title, price, reviews } = await res.json();
console.log(title, price, reviews.length);
shop.example.com $29.99 ★★★★★ ★★★★☆ RAW PAGE /v2/extract LLM extracts $ response.json { "title": "…", "price": 29.99, "reviews": [ { title: …, body: … }, ] } TYPED JSON camelCase keys · types coerced no selectors · no XPath · no broken parsers
SIMPLE title, price, description WITH TYPES title, price (number) , in_stock (boolean) NESTED LIST reviews (list: title, body, rating (number) ) NESTED OBJECT specs (object: weight (number) , dimensions)
Schema syntax

Describe what you want, in plain English.

Pass a comma-separated list of fields you want to extract. Add types in parentheses when it matters: price(number), in_stock(boolean). Nest lists or objects when the data is hierarchical: reviews(list: title, body, rating(number)).

  • Free-form names — get camelCased automatically in the response
  • Type hints: (number), (boolean), (list), (object)
  • Nest as deep as you need — the LLM unrolls the structure
AI extractor docs →
SOURCE V1 $29.99 SOURCE V2 — RE-SKIN SALE 29.99 USD SAME JSON SHAPE { "title": "…", "price": 29.99, "description": … } NO SELECTORS no XPath, no CSS rules to drift RESULT re-skins don't break extraction
Resilience

Site changes don't break the AI scraper.

Traditional scrapers depend on CSS selectors and XPath rules — the moment a target re-skins, you spend a sprint patching parsers. The AI data scraper reads the rendered DOM each call, so a moved element, renamed class, or restructured layout still produces the same JSON shape. That self-healing behaviour is the main reason teams move long-running pipelines off bespoke parsers and onto LLM extraction.

  • No selectors to maintain or test against
  • Schema stays consistent across visual variants
  • Missing fields return null — never crash your pipeline
PAGE TEXT → JSON VALUE price "$29.99 USD" number 29.99 in stock "In stock — ships today" boolean true tags "hot, sale, new" list ["hot", "sale"] Same coercion for (boolean) , (object) , and nested. Describe once, get the right type back every call.
Typed output

Numbers stay numbers. Lists stay lists.

Annotate types in your schema and the extractor coerces values into the right shape — strings to numbers, "in stock — ships today" to true, comma-separated text to arrays, nested data to objects. No post-processing on your end.

  • Strings, numbers, booleans, lists, objects
  • Coerces dirty input — currency strings, status phrases, free-text lists
  • Stable JSON shape across pages — easy to feed downstream
SAME STACK AS /v2/general headless Chrome rotating proxies CAPTCHA-free TLS fingerprint Cloudflare bypass JS rendering + LLM extraction layer on top
Built on the cluster

Same anti-bot. Same proxy fleet.

Every /v2/extract call routes through the same headless Chrome cluster, rotating proxies, and CAPTCHA avoidance that backs the JavaScript rendering API. JavaScript-rendered grids, lazy-loaded tables, and Cloudflare-protected sources all extract correctly because the LLM sees the final rendered DOM, not the initial HTML.

  • Real headless Chrome — extracts from the rendered DOM
  • Rotating proxies + CAPTCHA avoidance out of the box
  • Switch to residential proxies for tougher targets — same call
Pricing

Industry leading pricing that scales with your business.

Compare plans side by side. Every tier includes 10,000 free credits to start.
👈Swipe to compare all 5 plans👉
Plans
Enthusiast
100K credits / mo
$19/mo
★ Most Popular
Startup
500K credits / mo
$49/mo
Business
3M credits / mo
$249/mo
Business Pro
8M credits / mo
$599/mo
Custom
10M+ credits / mo
$699+/mo
Monthly API credits 100,000 500,000 3,000,000 8,000,000 10M+
Support channel Email Priority email Priority email Priority email Priority + dedicated
Integration help Docs only Custom code snippets Debug sessions Priority debug sessions Full enterprise onboarding
Expert assistance included included included included
Custom proxy pools included included included
Custom anti-bot avoidances included included included
Dedicated account manager included included included
Start Free Start Free → Start Free Start Free Talk to Sales
Hit your limit mid-month?
Restart your plan instantly — no waiting for the next billing cycle. Credits refresh the moment you pay, so scraping never has to stop.
10,000 free credits every month
No credit card required
Pay only for successful scrapes — failed requests cost 0
Customers

What teams are saying.

From solo developers shipping side projects to enterprise pipelines at Fortune 500s.

★★★★★ 5.0 on Capterra →
★★★★★

“Onboarding and API integration was smooth and clear. Everything works great. The support was excellent.

Illia K.
Android Software Developer
★★★★★

“Great communication with co-founders helped me to get the job done. Great proxy diversity and good price.”

Andrii M.
Senior Software Engineer
★★★★★

“This product helps me to scale and extend my business. The setup is easy and support is really good.”

Dmytro T.
Senior Software Engineer
FAQ

Frequently asked questions.

Still curious? Get in touch with our team — we usually reply within hours.

What is an AI data scraper?

An AI data scraper takes a URL plus a plain-English description of the fields you want, then returns structured JSON — no CSS selectors, no XPath, no parser code. ScrapingAnt's /v2/extract endpoint is an AI scraping API: send url and extract_properties, an LLM reads the rendered page, and you get typed JSON back. It pairs the same headless-Chrome rendering and rotating-proxy stack that powers our JavaScript rendering API with an LLM extraction layer on top.

How does the AI extraction schema work?

extract_properties takes free-form text describing the fields you want. Simple example: title, price, description. Typed example: title, price(number), description. Lists with nested fields: reviews(list: title, body, rating(number)).

Free-form names get camelCased in the response — "product title" becomes productTitle. See the AI extractor docs for the full grammar.

How is the AI data scraper different from LLM-ready Markdown?

The LLM-ready Markdown endpoint (/v2/markdown) returns the whole page as clean Markdown — best for RAG pipelines or fine-tunes when you want full content.

The AI data scraper (/v2/extract) returns typed JSON keyed to the fields you described — best when you know exactly which data points you need (price, rating, address) and don't want to write a parser. Same proxy + browser stack underneath; different output format.

What happens when the source site changes its layout?

Most of the time, nothing — because there are no CSS selectors or XPath rules to break. The LLM works from the rendered DOM each call, so a re-skinned page usually still produces the same JSON shape. If a field truly disappears, you'll get null for that field, not a parser error. This is what makes AI scraping more resilient than traditional scraping for long-running data pipelines.

Can fields return numbers, booleans, lists, or nested objects?

Yes. Annotate types in the schema: price(number), in_stock(boolean), tags(list), specs(object: weight, dimensions). The AI extractor coerces values into the requested types where possible — currency strings become numbers, status phrases become booleans, comma-separated text becomes arrays.

How are credits charged for AI extraction?

AI extraction is metered in API credits and costs more per call than a plain /v2/general request — the LLM inference is the expensive part. Exact credit cost depends on the proxy and rendering options you choose; see the AI extractor docs for the current rate. Failed requests cost 0 credits, and every account starts with 10,000 free credits.

Does the AI scraper handle JavaScript-heavy pages?

Yes — every request runs through real headless Chrome by default. SPAs, lazy-loaded grids, and React/Vue-rendered pages all extract correctly because the LLM sees the final rendered DOM, not the initial HTML. For pages with anti-bot walls, switch proxy_type to residential to route through our residential proxy pool — same call, same JSON shape back.

Can I call the AI scraper from an AI agent?

Yes. The same extraction is exposed through the ScrapingAnt MCP server so Claude, Cursor, Windsurf, Claude Code, and other MCP clients can call /v2/extract mid-conversation. Pass the URL and a plain-English schema; the agent gets typed JSON back without leaving the chat.

Talk to us

Extracting millions of pages?

Volume crawls, custom schemas across many domains, dedicated capacity, or a one-shot dataset — drop us a line and a real human gets back within a few hours.

“Our clients are pleasantly surprised by the response speed of our team.”

Oleg Kulyk
Founder, ScrapingAnt

A real human replies within a few hours · we don't share your email

Thanks — we'll be in touch shortly.
Something went wrong submitting the form. Please try again or email us directly.

Ready to scrape the web?

10,000 free credits every month. No credit card. Pay only for successful requests.

Sign up in under 30 seconds — no card, no commitment.