AI data scraper. Plain English in. Typed JSON out.
An AI scraping API for structured data extraction. Send a URL plus a plain-English schema — title, price(number), reviews(list: title, body) — and our LLM returns typed JSON. No CSS selectors, no XPath, no parsers to maintain when the site re-skins.
failed requests cost 0 · cancel anytime · NDA on request
# Plain-English schema in. Typed JSON out.
$ curl 'https://api.scrapingant.com/v2/extract' \
--data-urlencode 'url=https://shop.example.com/widget-pro' \
--data-urlencode 'extract_properties=title, price(number), reviews(list: title, body)' \
-H 'x-api-key: YOUR_API_KEY'import requests
r = requests.get(
'https://api.scrapingant.com/v2/extract',
params={
'url': 'https://shop.example.com/widget-pro',
'extract_properties':
'title, price(number), reviews(list: title, body)',
},
headers={'x-api-key': 'YOUR_API_KEY'},
)
data = r.json() # { title, price, reviews: [...] }
print(data['price'])const res = await fetch(
'https://api.scrapingant.com/v2/extract?' +
new URLSearchParams({
url: 'https://shop.example.com/widget-pro',
extract_properties:
'title, price(number), reviews(list: title, body)',
}),
{ headers: { 'x-api-key': 'YOUR_API_KEY' } },
);
const { title, price, reviews } = await res.json();
console.log(title, price, reviews.length); Why teams pick AI extraction over parsers.
Describe the data once in plain English. Let the LLM handle the messy DOM.
Plain English schema
Comma-separated fields. Optional types. Nested lists if you need them.
See the syntax →Self-healing
Layout changes don't break extraction — there's no selector to drift.
Why it holds →Typed output
Numbers, booleans, lists, nested objects. Coerced to the types you asked for.
Type docs →Describe what you want, in plain English.
Pass a comma-separated list of fields you want to extract. Add types in parentheses when it matters: price(number), in_stock(boolean). Nest lists or objects when the data is hierarchical: reviews(list: title, body, rating(number)).
- Free-form names — get camelCased automatically in the response
- Type hints:
(number),(boolean),(list),(object) - Nest as deep as you need — the LLM unrolls the structure
Site changes don't break the AI scraper.
Traditional scrapers depend on CSS selectors and XPath rules — the moment a target re-skins, you spend a sprint patching parsers. The AI data scraper reads the rendered DOM each call, so a moved element, renamed class, or restructured layout still produces the same JSON shape. That self-healing behaviour is the main reason teams move long-running pipelines off bespoke parsers and onto LLM extraction.
- No selectors to maintain or test against
- Schema stays consistent across visual variants
- Missing fields return
null— never crash your pipeline
Numbers stay numbers. Lists stay lists.
Annotate types in your schema and the extractor coerces values into the right shape — strings to numbers, "in stock — ships today" to true, comma-separated text to arrays, nested data to objects. No post-processing on your end.
- Strings, numbers, booleans, lists, objects
- Coerces dirty input — currency strings, status phrases, free-text lists
- Stable JSON shape across pages — easy to feed downstream
Same anti-bot. Same proxy fleet.
Every /v2/extract call routes through the same headless Chrome cluster, rotating proxies, and CAPTCHA avoidance that backs the JavaScript rendering API. JavaScript-rendered grids, lazy-loaded tables, and Cloudflare-protected sources all extract correctly because the LLM sees the final rendered DOM, not the initial HTML.
- Real headless Chrome — extracts from the rendered DOM
- Rotating proxies + CAPTCHA avoidance out of the box
- Switch to
residentialproxies for tougher targets — same call
What teams extract with the AI scraper.
Common briefs we see — yours doesn't have to fit any of these.
Product catalogs
Title, price, SKU, specs, variants, stock — described once in English, returned as typed JSON across thousands of SKUs.
Talk to us →Reviews & ratings
Star ratings, review titles + bodies, reviewer names, dates — extracted as a list, ready for sentiment or aggregation.
Talk to us →Real estate listings
Address, price, beds, baths, square footage, list date, agent — clean rows from any portal, ready for normalization.
Talk to us →Job & talent listings
Title, company, location, salary, requirements, posted date — pulled from boards, company pages, or regional portals.
Talk to us →News & articles
Headline, byline, publish date, body text, tags — structured output ready for downstream filtering or summarisation.
Talk to us →Lead enrichment
Take public profiles, return job title, company, tech stack, location — predictable schema across messy source pages.
Talk to us →Pricing
Industry leading pricing that scales with your business.
|
Plans
|
Enthusiast
100K credits / mo
$19/mo
|
★ Most Popular
Startup
500K credits / mo
$49/mo
|
Business
3M credits / mo
$249/mo
|
Business Pro
8M credits / mo
$599/mo
|
Custom
10M+ credits / mo
$699+/mo
|
|---|---|---|---|---|---|
| Monthly API credits | 100,000 | 500,000 | 3,000,000 | 8,000,000 | 10M+ |
| Support channel | Priority email | Priority email | Priority email | Priority + dedicated | |
| Integration help | Docs only | Custom code snippets | Debug sessions | Priority debug sessions | Full enterprise onboarding |
| Expert assistance | — | ||||
| Custom proxy pools | — | — | |||
| Custom anti-bot avoidances | — | — | |||
| Dedicated account manager | — | — | |||
| Start Free | Start Free → | Start Free | Start Free | Talk to Sales |
What teams are saying.
From solo developers shipping side projects to enterprise pipelines at Fortune 500s.
★★★★★ 5.0 on Capterra →★★★★★“Onboarding and API integration was smooth and clear. Everything works great. The support was excellent.”
★★★★★“Great communication with co-founders helped me to get the job done. Great proxy diversity and good price.”
★★★★★“This product helps me to scale and extend my business. The setup is easy and support is really good.”
Frequently asked questions.
Still curious? Get in touch with our team — we usually reply within hours.
What is an AI data scraper?
An AI data scraper takes a URL plus a plain-English description of the fields you want, then returns structured JSON — no CSS selectors, no XPath, no parser code. ScrapingAnt's /v2/extract endpoint is an AI scraping API: send url and extract_properties, an LLM reads the rendered page, and you get typed JSON back. It pairs the same headless-Chrome rendering and rotating-proxy stack that powers our JavaScript rendering API with an LLM extraction layer on top.
How does the AI extraction schema work?
extract_properties takes free-form text describing the fields you want. Simple example: title, price, description. Typed example: title, price(number), description. Lists with nested fields: reviews(list: title, body, rating(number)).
Free-form names get camelCased in the response — "product title" becomes productTitle. See the AI extractor docs for the full grammar.
How is the AI data scraper different from LLM-ready Markdown?
The LLM-ready Markdown endpoint (/v2/markdown) returns the whole page as clean Markdown — best for RAG pipelines or fine-tunes when you want full content.
The AI data scraper (/v2/extract) returns typed JSON keyed to the fields you described — best when you know exactly which data points you need (price, rating, address) and don't want to write a parser. Same proxy + browser stack underneath; different output format.
What happens when the source site changes its layout?
Most of the time, nothing — because there are no CSS selectors or XPath rules to break. The LLM works from the rendered DOM each call, so a re-skinned page usually still produces the same JSON shape. If a field truly disappears, you'll get null for that field, not a parser error. This is what makes AI scraping more resilient than traditional scraping for long-running data pipelines.
Can fields return numbers, booleans, lists, or nested objects?
Yes. Annotate types in the schema: price(number), in_stock(boolean), tags(list), specs(object: weight, dimensions). The AI extractor coerces values into the requested types where possible — currency strings become numbers, status phrases become booleans, comma-separated text becomes arrays.
How are credits charged for AI extraction?
AI extraction is metered in API credits and costs more per call than a plain /v2/general request — the LLM inference is the expensive part. Exact credit cost depends on the proxy and rendering options you choose; see the AI extractor docs for the current rate. Failed requests cost 0 credits, and every account starts with 10,000 free credits.
Does the AI scraper handle JavaScript-heavy pages?
Yes — every request runs through real headless Chrome by default. SPAs, lazy-loaded grids, and React/Vue-rendered pages all extract correctly because the LLM sees the final rendered DOM, not the initial HTML. For pages with anti-bot walls, switch proxy_type to residential to route through our residential proxy pool — same call, same JSON shape back.
Can I call the AI scraper from an AI agent?
Yes. The same extraction is exposed through the ScrapingAnt MCP server so Claude, Cursor, Windsurf, Claude Code, and other MCP clients can call /v2/extract mid-conversation. Pass the URL and a plain-English schema; the agent gets typed JSON back without leaving the chat.
Extracting millions of pages?
Volume crawls, custom schemas across many domains, dedicated capacity, or a one-shot dataset — drop us a line and a real human gets back within a few hours.
“Our clients are pleasantly surprised by the response speed of our team.”