Skip to main content
Every scraping project starts the same way: write a parser, add proxy rotation, deploy. It works until the target site changes its DOM, adds a CAPTCHA, or starts fingerprinting headless browsers. Then you’re maintaining scrapers instead of using the data. This guide compares the DIY approach (proxies + custom code) with Bright Data Web Scraper API (650+ pre-built, maintained scrapers), so you can decide which fits your use case.

What is DIY scraping?

DIY scraping means you build and maintain the entire pipeline yourself:
  • A scraper (BeautifulSoup, Playwright, Puppeteer, Scrapy)
  • A proxy layer for IP rotation
  • Retry and error-handling logic
  • A scheduler to run jobs on a recurring basis
This gives you full control. You choose your selectors and handle edge cases your way. The tradeoff is maintenance. Every target site becomes a separate codebase that breaks independently when the site changes its DOM, adds anti-bot measures, or starts fingerprinting headless browsers.

What is Bright Data Web Scraper API?

Bright Data Web Scraper API is a collection of 650+ pre-built scrapers maintained by Bright Data’s engineering team, covering top sites including LinkedIn, Amazon, Instagram, YouTube, TikTok, Google Maps, and many more. You send a URL, you get structured JSON back, no parsing, no selectors, no proxy configuration. Each scraper returns an average of 220+ data fields, covering granular details like rich snippets, map coordinates, ad extensions, and structured metadata that most DIY scrapers miss. If your target site isn’t covered, you can build a custom scraper in minutes using Bright Data Scraper Studio: just pass a URL and a plain-language description of the data you need. When a site changes its frontend and breaks your scraper, the Self-healing tool rewrites the affected code based on a prompt, so you don’t need to dig into the script manually.

The same scrape, two ways

Here’s what scraping an Amazon product page looks like with each approach. DIY with Playwright: you write and maintain every selector:
Python
from playwright.sync_api import sync_playwright

def scrape_amazon_product(url: str, proxy: str) -> dict:
    with sync_playwright() as p:
        browser = p.chromium.launch(proxy={"server": proxy})
        page = browser.new_page()
        page.goto(url, wait_until="domcontentloaded")

        title = page.query_selector("#productTitle").inner_text().strip()
        price = page.query_selector(".a-price .a-offscreen").inner_text()
        rating = page.query_selector("#acrPopover span").inner_text()

        browser.close()
        return {"title": title, "price": price, "rating": rating}
Those CSS selectors (#productTitle, .a-price .a-offscreen) break whenever Amazon updates their frontend. When that happens, your scraper silently returns wrong data or crashes.
Web Scraper API: one API call, structured output:
cURL
curl "https://api.brightdata.com/datasets/v3/scrape?dataset_id=gd_l7q7dkf244hwjntr0&format=json" \
  -H "Authorization: Bearer API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[{"url": "https://www.amazon.com/dp/B0EXAMPLE"}]'
Response
{
  "title": "Wireless Bluetooth Headphones",
  "price": 49.99,
  "currency": "USD",
  "rating": 4.5,
  "reviews_count": 12847,
  "seller": "TechBrand Official",
  "availability": "In Stock"
}
Find the dataset_id for your target site in the Scrapers Library. Each site has its own ID.

Key differences

DIY (proxies + custom code)Web Scraper API
Scraping logicYou write and maintain selectorsPre-built for each site
Anti-bot handlingYou manage CAPTCHAs, fingerprinting, stealthAutomatic
When sites changeYou debug and redeployBright Data updates the scraper
Output formatRaw HTML you parse into structured dataStructured JSON/CSV with named fields
Data fieldsOnly what you code selectors for220+ fields on average per scraper
Time to first resultHours to daysMinutes
Ongoing maintenanceYou, indefinitelyBright Data’s scraper team
Success rateTypically 60–85% depending on anti-bot investment98.44% average
Supported sitesAny site you can write a parser for650+ pre-built; custom via Scraper Studio

Supported sites

The Scrapers Library includes ready-made scrapers across categories:
CategoryExample sites
E-commerceAmazon, Walmart, eBay, Shopify
Social mediaLinkedIn, Instagram, TikTok, X, Facebook
Search enginesGoogle, Bing, Yahoo, DuckDuckGo
Real estateZillow, Realtor, Redfin
TravelBooking.com, Tripadvisor, Airbnb
Jobs & B2BIndeed, Glassdoor, Crunchbase
If your target isn’t in the library, Bright Data Scraper Studio can generate a custom scraper from a URL and a natural language description of the data you need.

Sync vs async collection

Web Scraper API supports two collection modes:
ModeEndpointBest forConcurrency limit
Synchronous/scrapeSingle-URL lookups, price checks, CRM enrichment5,000 concurrent requests
Asynchronous/triggerBatch jobs with hundreds or thousands of URLs100 concurrent jobs, 1 GB input each
Sync returns results in the same HTTP response. Async returns a snapshot_id: you poll for progress or receive results via webhook. Delivery options include webhooks (JSON, NDJSON, CSV), S3, Google Cloud, and Snowflake.
Sync requests have a 1-minute timeout. If the scrape takes longer, it auto-converts to async and returns a snapshot_id.
For endpoint examples and request/response formats, see the Quickstart guide.

When to use what

If you need…UseYou write code?You maintain scrapers?
Structured data from popular sitesWeb Scraper APIAPI calls onlyNo
Raw HTML from any site (custom parsing)Web UnlockerYesYes
JS-heavy pages (clicks, scrolls, forms)Bright Data Browser APIYes (Playwright/Puppeteer)Yes
Full control over your existing stackProxiesYes (everything)Yes

Limitations and tradeoffs

Predefined data fields. Each pre-built scraper returns an average of 220+ structured fields, which covers most use cases. If you need a field that isn’t included, you can use Bright Data Scraper Studio to customize the scraper’s output or fall back to Web Unlocker for raw HTML. Latency. Sync scrapes typically return in seconds, but complex sites may take longer and auto-convert to async. If you need sub-second responses, you may want to cache results or use pre-scraped Datasets.

FAQs

Pricing starts at $1 per 1,000 records for standard domains and $2.50 per 1,000 for premium targets. New accounts receive $2 in free credits (no credit card required), plus a matched deposit of up to $500. See the pricing page for full details.
Yes. Web Scraper API is a standard REST API, so any language that can make HTTP requests works. Bright Data also provides an official Python SDK and a CLI tool for terminal-based workflows. See the Quickstart guide for examples.
JSON, NDJSON (newline-delimited JSON), JSON Lines, and CSV. Results can be delivered via webhook (up to 1 GB), API download (up to 5 GB), or pushed to external storage (S3, Google Cloud, Snowflake).
Use Web Scraper API when you want structured data from a supported site with zero scraper maintenance. Use Web Unlocker when you need raw HTML from any site and want to write your own custom parsing logic. Web Unlocker handles anti-bot bypass but returns HTML, not structured fields.
Yes. The structured JSON output is directly ingestable by AI pipelines without HTML cleaning or parsing. Bright Data also offers integration with MCP servers, LlamaIndex, Google ADK, Dify, and many more.
Bright Data operates under strict compliance standards. All scrapers collect only publicly available data. See the Trust Center for their ethical web data collection policies, KYC process, and compliance framework.