What is DIY scraping?
DIY scraping means you build and maintain the entire pipeline yourself:- A scraper (BeautifulSoup, Playwright, Puppeteer, Scrapy)
- A proxy layer for IP rotation
- Retry and error-handling logic
- A scheduler to run jobs on a recurring basis
What is Bright Data Web Scraper API?
Bright Data Web Scraper API is a collection of 650+ pre-built scrapers maintained by Bright Data’s engineering team, covering top sites including LinkedIn, Amazon, Instagram, YouTube, TikTok, Google Maps, and many more. You send a URL, you get structured JSON back, no parsing, no selectors, no proxy configuration. Each scraper returns an average of 220+ data fields, covering granular details like rich snippets, map coordinates, ad extensions, and structured metadata that most DIY scrapers miss. If your target site isn’t covered, you can build a custom scraper in minutes using Bright Data Scraper Studio: just pass a URL and a plain-language description of the data you need. When a site changes its frontend and breaks your scraper, the Self-healing tool rewrites the affected code based on a prompt, so you don’t need to dig into the script manually.The same scrape, two ways
Here’s what scraping an Amazon product page looks like with each approach. DIY with Playwright: you write and maintain every selector:Python
cURL
Response
Key differences
| DIY (proxies + custom code) | Web Scraper API | |
|---|---|---|
| Scraping logic | You write and maintain selectors | Pre-built for each site |
| Anti-bot handling | You manage CAPTCHAs, fingerprinting, stealth | Automatic |
| When sites change | You debug and redeploy | Bright Data updates the scraper |
| Output format | Raw HTML you parse into structured data | Structured JSON/CSV with named fields |
| Data fields | Only what you code selectors for | 220+ fields on average per scraper |
| Time to first result | Hours to days | Minutes |
| Ongoing maintenance | You, indefinitely | Bright Data’s scraper team |
| Success rate | Typically 60–85% depending on anti-bot investment | 98.44% average |
| Supported sites | Any site you can write a parser for | 650+ pre-built; custom via Scraper Studio |
Supported sites
The Scrapers Library includes ready-made scrapers across categories:| Category | Example sites |
|---|---|
| E-commerce | Amazon, Walmart, eBay, Shopify |
| Social media | LinkedIn, Instagram, TikTok, X, Facebook |
| Search engines | Google, Bing, Yahoo, DuckDuckGo |
| Real estate | Zillow, Realtor, Redfin |
| Travel | Booking.com, Tripadvisor, Airbnb |
| Jobs & B2B | Indeed, Glassdoor, Crunchbase |
Sync vs async collection
Web Scraper API supports two collection modes:| Mode | Endpoint | Best for | Concurrency limit |
|---|---|---|---|
| Synchronous | /scrape | Single-URL lookups, price checks, CRM enrichment | 5,000 concurrent requests |
| Asynchronous | /trigger | Batch jobs with hundreds or thousands of URLs | 100 concurrent jobs, 1 GB input each |
snapshot_id: you poll for progress or receive results via webhook. Delivery options include webhooks (JSON, NDJSON, CSV), S3, Google Cloud, and Snowflake.
Sync requests have a 1-minute timeout. If the scrape takes longer, it auto-converts to async and returns a
snapshot_id.When to use what
| If you need… | Use | You write code? | You maintain scrapers? |
|---|---|---|---|
| Structured data from popular sites | Web Scraper API | API calls only | No |
| Raw HTML from any site (custom parsing) | Web Unlocker | Yes | Yes |
| JS-heavy pages (clicks, scrolls, forms) | Bright Data Browser API | Yes (Playwright/Puppeteer) | Yes |
| Full control over your existing stack | Proxies | Yes (everything) | Yes |
Limitations and tradeoffs
Predefined data fields. Each pre-built scraper returns an average of 220+ structured fields, which covers most use cases. If you need a field that isn’t included, you can use Bright Data Scraper Studio to customize the scraper’s output or fall back to Web Unlocker for raw HTML. Latency. Sync scrapes typically return in seconds, but complex sites may take longer and auto-convert to async. If you need sub-second responses, you may want to cache results or use pre-scraped Datasets.FAQs
How much does Web Scraper API cost?
How much does Web Scraper API cost?
Pricing starts at $1 per 1,000 records for standard domains and $2.50 per 1,000 for premium targets. New accounts receive $2 in free credits (no credit card required), plus a matched deposit of up to $500. See the pricing page for full details.
Can I use Web Scraper API with Python, Node.js, or other languages?
Can I use Web Scraper API with Python, Node.js, or other languages?
Yes. Web Scraper API is a standard REST API, so any language that can make HTTP requests works. Bright Data also provides an official Python SDK and a CLI tool for terminal-based workflows. See the Quickstart guide for examples.
What output formats does Web Scraper API support?
What output formats does Web Scraper API support?
JSON, NDJSON (newline-delimited JSON), JSON Lines, and CSV. Results can be delivered via webhook (up to 1 GB), API download (up to 5 GB), or pushed to external storage (S3, Google Cloud, Snowflake).
Should I use Web Scraper API or Web Unlocker?
Should I use Web Scraper API or Web Unlocker?
Use Web Scraper API when you want structured data from a supported site with zero scraper maintenance. Use Web Unlocker when you need raw HTML from any site and want to write your own custom parsing logic. Web Unlocker handles anti-bot bypass but returns HTML, not structured fields.
Does Web Scraper API work for AI and LLM training pipelines?
Does Web Scraper API work for AI and LLM training pipelines?
Yes. The structured JSON output is directly ingestable by AI pipelines without HTML cleaning or parsing. Bright Data also offers integration with MCP servers, LlamaIndex, Google ADK, Dify, and many more.
Is web scraping with Bright Data compliant and ethical?
Is web scraping with Bright Data compliant and ethical?
Bright Data operates under strict compliance standards. All scrapers collect only publicly available data. See the Trust Center for their ethical web data collection policies, KYC process, and compliance framework.