- Web Unlocker API
- SERP API
- Scraping Browser
- Web Scraper IDE
- Web Scraper API
- Overview
- Authentication Guide
- POSTTrigger a collection
- Delivery APIs
- Management APIs
- Social Media APIs
- FAQs
- Error codes
- Web Archive API (Beta)
- Easy Scraper
- Browser Extension
- Bright Shield
Web Scraper API FAQs
Find answers to FAQs about Bright Data’s Web Scraper API, covering setup, authentication, data formats, pricing, and large-scale data extraction.
The Web Scraper API allows users to extract fresh data on demand from websites using pre-built scrapers. It can be used to automate data collection and integrate with other systems.
Data analysts, scientists, engineers, and developers or individuals seeking efficient methods to collect and analyze web data for AI, ML, big data applications, and more with no scraping development efforts will find Scraper APIs particularly beneficial.
Getting started with Scraper APIs is straightforward, once you open your Bright Data account, you will need to generate an API Token from your account settings. Once you have your key, you can refer to our API documentation for detailed instructions on making your first API call.
Each scraper can require different inputs. There are 2 main types of scrapers:
-
PDP
These scrapers require URLs as inputs. A PDP scraper extracts detailed product information like specifications, pricing, and features from web pages -
Discovery/ Discovery+PDP
Discovery scrapers allow you to explore and find new entities/products through search, categories, Keywords and more.
Each discovery API allow you to find the desired data using a different method, it can by keyword, category URL or even location
Authentication is done using an API token. Include the token in the Authorization
header of your requests as follows: Authorization: Bearer YOUR_API_TOKEN
.
Once picking the API you want to run, you can customize your request using our detailed API parameters documentation, specifying the different types and expected inputs and responses.
You get 20 free API calls on the account level for experimenting with the product to use for PDP type scrapers with up to 10 inputs on each call, (Discovery type scrapers are not included in the trial).
-
Calls 1-5 will return full results
-
Calls 6-15 will return partially censored results (e.g., AB*****YZ)
You can quickly test the product by customizing the code on the control panel (Demo video)
Pick your desired API from the variety of APIs
Enter your inputs
Enter your API token
Select your preferred delivery method
Using a webhook - update the webhook URL and copy paste the “trigger data collection” code using and run the code on your client.
Using an API - fill out the needed credentials and information based on the specific setting you chose (S3
, GCP
, pubsub
and more) and copy the code and run the code after collection ends
Copy the code and run it on your client
All of the above can also be done by free tools such as Webhook-site and Postman
We also offer additional management APIs to acquire information about the collection status and fetch a list of all the snapshots under Management APIs tab
The Web scraper API supports data extraction in various formats including JSON
, NDJSON
, JSONL
and CSV
. Specify your desired format in the request parameters.
We charge based on the number of records we delivered, you only pay for what you get, do note that unsuccessful attempts resulting from incorrect inputs by the user will still be billed. Since the failure to retrieve data was due to user input rather than our system’s performance, resources were still consumed in processing the request. The rate per record depends on your subscription plan (starting from 0.7$ per 1000 records). Check our pricing plans or your account details for specific rates.
For account admins: If your API token expires, you need to create a new one in your account settings.
For account users: If your API token expires, please contact your account admin to issue a new token.
Featuring capabilities for high concurrency and batch processing, Scraper APIs excel in large-scale data extraction scenarios. This ensures developers can scale their scraping operations efficiently, accommodating massive volumes of requests with high throughput.
To upgrade your subscription plan, visit the billing section on your dashboard account and select the desired plan. For further assistance, contact our support team.
The Web Scraper APIs support a vast range of Use cases including competitive benchmarking, market trend analysis, dynamic pricing algorithms, sentiment extraction, and feeding data into machine learning pipelines. Essential for e-commerce, fintech, and social media analytics, these APIs empower developers to implement data-driven strategies effectively.
We offer real-time support for scrapers using URLs as inputs, with up to 20 URL inputs, and batch support for more than 20 inputs, regardless of the scraper type.
The Web Scraper API delivers real-time data for up to 20 inputs per call, with response times varying by domain, ensuring fresh data without relying on cached information.
Scrapers that discover new records (e.g., “Discover by keyword,” “Discover by hashtag”) generally take longer and use batch support, as the actual response times can be influenced by several factors, including the target URL’s load time and the execution duration of user-defined Page Interactions.
The response time is influenced by factors such as the load time of the target URL and the execution of user-defined page interactions. An indication of the avg response time per for each scraper can be found on the specific Scraper page.
You can cancel a run using the following endpoint:
curl -H “Authorization: Bearer TOKEN” -H “Content-Type: application/json” -k “https://api.brightdata.com/datasets/v3/snapshot/SNAPSHOT\_ID/cancel” -X POST
Make sure the snapshot id is the one you want to cancel.
Note: If you cancel the run no data will be delivered to you and a snapshot can’t be canceled after it finished collecting
The key difference between a notify URL and a webhook URL in API configurations lies in their purpose and usage:
Notify URL:
Typically used for asynchronous communication. The system sends a notification to the specified URL when a task is completed or when an event occurs. The notification is often lightweight and doesn’t include detailed data but may provide a reference or status for further action (e.g., “Job completed, check logs for details”).
Webhook URL:
Also used for asynchronous communication but is more data-centric. The system pushes detailed, real-time data payloads to the specified URL when a specific event occurs. Webhooks provide direct, actionable information without requiring the client to poll the system.
Example Use Case:
A notify URL might be used to inform you that a scraping job is finished. A webhook URL could send the actual scraped data or detailed metadata about the completion directly to you.
The snapshot is available for 30 days, you can retrieve the snapshot during this time period via delivery API options and the snapshot ID
There are certain limitations on these platforms:
Posts (by profile URL) | up to 900 posts per input |
Comments | up to 50 comments per input |
Reels | up to 1600 per input |
Posts (by keyword) | up to 150,000 |
Posts (by profile URL) | up to 43,000 |
Comments | up to 9 per input |
Reels | up to 9000 |
Media Links expiring after 24 hours.
Profiles | up to 1000 records per input |
Posts (by keyword) | up to 1000 |
Posts (by profile URL) | up to 5000 |
Posts (by keyword) | up to 4000 per input |
Comments | all 1st level comments with no limit |
Profiles (by search URL) | up to 2000 per input |
Comments | up to 1000 per input |
Posts (by keyword) | up to 200 per input |
Posts (by profile URL) | up to 5000 per input |
Posts | up to 1000 per input |
Posts(by keyword) | up to 4000 per input |
Posts(by URL) | up to 9000 per input |
Posts | up to 1000 per input |
Profiles | up to 500 per input |
Posts (by keyword) | up to 600 per input |
Posts (by URL) | up to 20,000 per input |
Posts (by search filters) | up to 700 per input |
Media only accessible with a generated token in the same session.
Posts are limited to amount that is shown publicly on profile (e.g. 10)
When a snapshot is marked as empty, it means there are no valid or usable records in the snapshot. However, this does not imply the snapshot is completely devoid of content. In most cases, it contains information such as errors or dead pages:
-
Errors: Issues encountered during the data collection process, such as invalid inputs, system errors, or access restrictions.
-
Dead Pages: Pages that could not be accessed for reasons like 404 errors (page not found), removed content (e.g., unavailable products), or restricted access.
To view these details, you can use the parameter include_errors=true
in your request, which will display the errors and information about the dead pages in the snapshot. This helps you diagnose and understand the issues within the snapshot.
Was this page helpful?