The Web Scraper API lets you extract data from websites programmatically. It offers both synchronous and asynchronous scraping methods for different use cases, from quick data retrieval to complex, large-scale extraction jobs.

The API handles real-time processing for up to 20 URL inputs, and batch processing for larger collections, accommodating various scraping requirements.

Scraping methods

Synchronous scraping (/scrape)

Synchronous scraping allows you to initiate a scrape and receive results in a single request, ideal for quick data retrieval.

  curl "https://api.brightdata.com/datasets/v3/scrape?dataset_id=gd_l1viktl72bvl7bjuj0&format=json" \
  -H "Authorization: Bearer API_KEY" \
  -H "Content-Type: application/json" \
  -d '[{"url": "https://www.linkedin.com/in/elad-moshe-05a90413/"}]'

Key features of synchronous scraping:

  • Immediate results in the same request
  • Perfect for single URL and quick extractions
  • Simplified error handling
  • 1-minute timeout (automatically switches to async if exceeded)

Asynchronous scraping (/trigger)

Asynchronous scraping initiates a job that runs in the background, allowing you to handle larger and more complex scraping tasks in batch mode. This approach can process up to 5,000 URLs on average, making it ideal for high-volume data collection projects.

Discovery tasks (finding related products, scraping multiple pages) require the asynchronous scraping (/trigger) due to their need to navigate and extract data across multiple web pages.

curl "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_l1viktl72bvl7bjuj0&format=json&uncompressed_webhook=true" \
  -H "Authorization: Bearer API_KEY" \
  -H "Content-Type: application/json" \
  -d '[
    {"url": "https://www.linkedin.com/in/elad-moshe-05a90413/"},
    {"url": "https://www.linkedin.com/in/jonathan-myrvik-3baa01109"},
    {"url": "https://www.linkedin.com/in/aviv-tal-75b81/"},
    {"url": "https://www.linkedin.com/in/bulentakar/"},
    {"url": "https://www.linkedin.com/in/nnikolaev/"}
  ]'

Key features of asynchronous scraping:

  • Handles multiple URLs in batch processing (up to 5,000 URLs on average)
  • No timeout limitations for long-running jobs
  • Progress monitoring via status checks
  • Ideal for large datasets
  • Required for “discovery” tasks that need to crawl multiple pages or perform complex data extraction

Choosing between synchronous and asynchronous scraping

Use caseRecommended method
Quick data checksSynchronous (/scrape)
Single page extractionSynchronous (/scrape)
Multiple pages or URLs
(For batch processing up to 5K URLs on average)
Asynchronous (/trigger)
Complex scraping patternsAsynchronous (/trigger)
Large datasetsAsynchronous (/trigger)

How To Collect?

Trigger a Collection (Demo)

  1. Choose your target website from our API offerings
  2. Select the appropriate scraper for your needs
  3. Decide between synchronous or asynchronous scraping based on your requirements:
    • Use synchronous (/scrape) for immediate results and simple extractions
    • Use asynchronous (/trigger) for complex scraping, multiple URLs, or large datasets
  4. Provide your input URLs via JSON or CSV
  5. Enable error reporting to track any issues
  6. Select your preferred delivery method

Via Webhook:

  1. Set your webhook URL and Authorization header if needed
  2. Select your preferred file format (JSON, NDJSON, JSON lines, CSV)
  3. Choose whether to send it compressed or not
  4. Test webhook to validate that the operation runs successfully (using sample data)
  5. Copy the code and run it.

Via Deliver to external storage:

  1. Select your preferred delivery location (S3, Google Cloud, Snowflake, or any other available option)
  2. Fill out the needed credentials according to your pick
  3. Select your preferred file format (JSON, NDJSON, JSON lines, CSV)
  4. Copy the code and run it.

Limit records

While running a discovery API, you can set a limit of the number of results per input provided

In the example below, we’ve set a limitation of 10 results per input

Management APIs

Additional actions you can do using our different API endpoints

Get snapshot list

Check your snapshot history with this API. It returns a list of all available snapshots, including the snapshot ID, creation date, and status. (link to endpoint playground)

Monitor Progress

Check your data collection status with this API. It should return “collecting” while gathering data, “digesting” when processing, and “ready” when available. (link to endpoint playground)

Cancel snapshot

Cancel a running collection, stop your data collection before finishing with this API. It should return “ok” while managing to stop the collection. (link to endpoint playground)

Monitor Delivery

Check your delivery status with this API. It should return “done” while the delivery was completed, “canceled” when the delivery was canceled, and “Failed” when the delivery was not completed. (link to endpoint playground)

System limitations

File size

Inputup to 1GB
Webhook deliveryup to 1GB
API Downloadup to 5GB (for bigger files use API delivery)
Delivery APIunlimited