Bright Data Docs home pagelight logodark logo
  • My requests
  • New request
  • Sign up
  • Sign up
Welcome
Proxy Infrastructure
Web Access APIs
Data Feeds
API Reference
General
Integrations
Introduction
  • Overview
Product Guides
  • Marketplace
  • Scrapers
    • Overview
    • Scrapers Library
      • Overview
      • Authentication Guide
      • API Reference
      • Integrations
      • Error codes
      • FAQs
    • Custom Scrapers
  • Functions
  • Archive
  • Data Validation
Scrapers Library

Web Scraper API FAQs

Find answers to FAQs about Bright Data’s Web Scraper API, covering setup, authentication, data formats, pricing, and large-scale data extraction.

The Web Scraper API allows users to extract fresh data on demand from websites using pre-built scrapers. It can be used to automate data collection and integrate with other systems.

Data analysts, scientists, engineers, and developers or individuals seeking efficient methods to collect and analyze web data for AI, ML, big data applications, and more with no scraping development efforts will find Scraper APIs particularly beneficial.

Getting started with Scraper APIs is straightforward, once you open your Bright Data account, you will need to generate an API key from your account settings. Once you have your key, you can refer to our API documentation for detailed instructions on making your first API call.

Each scraper can require different inputs. There are 2 main types of scrapers:

  1. PDP
    These scrapers require URLs as inputs. A PDP scraper extracts detailed product information like specifications, pricing, and features from web pages
  2. Discovery/ Discovery+PDP
    Discovery scrapers allow you to explore and find new entities/products through search, categories, Keywords and more.

Each discovery API allow you to find the desired data using a different method, it can by keyword, category URL or even location

Authentication is done using an API key. Include the API key in the Authorization header of your requests as follows: Authorization: Bearer YOUR_API_KEY.

Once picking the API you want to run, you can customize your request using our detailed API parameters documentation, specifying the different types and expected inputs and responses.

You get 20 free API calls on the account level for experimenting with the product to use for PDP type scrapers with up to 10 inputs on each call, (Discovery type scrapers are not included in the trial).

  • Calls 1-5 will return full results
  • Calls 6-15 will return partially censored results (e.g., AB*****YZ)

You can quickly test the product by customizing the code on the control panel (Demo video)

1

Pick your desired API from the variety of APIs

2

Enter your inputs

3

Enter your API key

4

Select your preferred delivery method

Using a webhook - update the webhook URL and copy paste the “trigger data collection” code using and run the code on your client.

Using an API - fill out the needed credentials and information based on the specific setting you chose (S3, GCP, pubsub and more) and copy the code and run the code after collection ends

5

Copy the code and run it on your client

All of the above can also be done by free tools such as Webhook-site and Postman

We also offer additional management APIs to acquire information about the collection status and fetch a list of all the snapshots under Management APIs tab

The Web scraper API supports data extraction in various formats including JSON, NDJSON, JSONL and CSV. Specify your desired format in the request parameters.

We charge based on the number of records we delivered, you only pay for what you get, do note that unsuccessful attempts resulting from incorrect inputs by the user will still be billed. Since the failure to retrieve data was due to user input rather than our system’s performance, resources were still consumed in processing the request. The rate per record depends on your subscription plan (starting from 0.7$ per 1000 records). Check our pricing plans or your account details for specific rates.

For account admins: If your API key expires, you need to create a new one in your account settings.

For account users: If your API key expires, please contact your account admin to issue a new API key.

Featuring capabilities for high concurrency and batch processing, Scraper APIs excel in large-scale data extraction scenarios. This ensures developers can scale their scraping operations efficiently, accommodating massive volumes of requests with high throughput.

To upgrade your subscription plan, visit the billing section on your dashboard account and select the desired plan. For further assistance, contact our support team.

The Web Scraper APIs support a vast range of Use cases including competitive benchmarking, market trend analysis, dynamic pricing algorithms, sentiment extraction, and feeding data into machine learning pipelines. Essential for e-commerce, fintech, and social media analytics, these APIs empower developers to implement data-driven strategies effectively.

We offer real-time support for scrapers using URLs as inputs, with up to 20 URL inputs, and batch support for more than 20 inputs, regardless of the scraper type.

The Web Scraper API delivers real-time data for up to 20 inputs per call, with response times varying by domain, ensuring fresh data without relying on cached information.

Scrapers that discover new records (e.g., “Discover by keyword,” “Discover by hashtag”) generally take longer and use batch support, as the actual response times can be influenced by several factors, including the target URL’s load time and the execution duration of user-defined Page Interactions. An indication of the average response time for each scraper can be found on the specific Scraper page.

You can cancel a run using the following endpoint:

curl -H "Authorization: API key" -H "Content-Type: application/json" -k "https://api.brightdata.com/datasets/v3/snapshot/SNAPSHOT_ID/cancel" -X POST

Make sure the snapshot id is the one you want to cancel.

Note: If you cancel the run no data will be delivered to you and a snapshot can’t be canceled after it finished collecting

The key difference between a notify URL and a webhook URL in API configurations lies in their purpose and usage:

Notify URL:

Typically used for asynchronous communication. The system sends a notification to the specified URL when a task is completed or when an event occurs. The notification is often lightweight and doesn’t include detailed data but may provide a reference or status for further action (e.g., “Job completed, check logs for details”).

Webhook URL:

Also used for asynchronous communication but is more data-centric. The system pushes detailed, real-time data payloads to the specified URL when a specific event occurs. Webhooks provide direct, actionable information without requiring the client to poll the system.

Example Use Case:

A notify URL might be used to inform you that a scraping job is finished. A webhook URL could send the actual scraped data or detailed metadata about the completion directly to you.

The snapshot is available for 30 days, you can retrieve the snapshot during this time period via delivery API options and the snapshot ID

There are certain limitations on these platforms:

Posts (by profile URL)
Comments
Reels
Posts (by keyword)
Posts (by profile URL)
Comments
Reels

Media Links expiring after 24 hours.

Profiles
Posts (by keyword)
Posts (by profile URL)
Posts (by keyword)
Comments
Profiles (by search URL)
Comments
Posts (by keyword)
Posts (by profile URL)
Posts
Posts(by keyword)
Posts(by URL)
Posts
Profiles
Posts (by keyword)
Posts (by URL)
Posts (by search filters)

Media only accessible with a generated token in the same session.

Posts are limited to amount that is shown publicly on profile (e.g. 10)

When a snapshot is marked as empty, it means there are no valid or usable records in the snapshot. However, this does not imply the snapshot is completely devoid of content. In most cases, it contains information such as errors or dead pages:

  • Errors: Issues encountered during the data collection process, such as invalid inputs, system errors, or access restrictions.
  • Dead Pages: Pages that could not be accessed for reasons like 404 errors (page not found), removed content (e.g., unavailable products), or restricted access.

To view these details, you can use the parameter include_errors=true in your request, which will display the errors and information about the dead pages in the snapshot. This helps you diagnose and understand the issues within the snapshot.

You can stop a running collection by utilizing the following API call: https://docs.brightdata.com/scraping-automation/web-scraper-api/management-apis/cancel-snapshot

ae.com

airbnb.com

amazon.com

apps.apple.com

ashleyfurniture.com

asos.com

balenciaga.com

bbc.com

berluti.com

bestbuy.com

booking.com

bottegaveneta.com

bsky.app

carsales.com.au

carters.com

celine.com

chanel.com

chileautos.cl

crateandbarrel.com

creativecommons.org

crunchbase.com

delvaux.com

digikey.com

dior.com

ebay.com

edition.cnn.com

en.wikipedia.org

enricheddata.com

espn.com

etsy.com

example.com

facebook.com

fanatics.com

fendi.com

finance.yahoo.com

g2.com

github.com

glassdoor.com

global.llbean.com

goodreads.com

google.com

hermes.com

homedepot.ca

homedepot.com

ikea.com

imdb.com

indeed.com

infocasas.com.uy

inmuebles24.com

instagram.com

la-z-boy.com

lazada.com.my

lazada.sg

lazada.vn

lego.com

linkedin.com

loewe.com

lowes.com

manta.com

martindale.com

massimodutti.com

mattressfirm.com

mediamarkt.de

metrocuadrado.com

montblanc.com

mouser.com

moynat.com

mybobs.com

myntra.com

news.google.com

nordstrom.com

olx.com

otodom.pl

owler.com

ozon.ru

pinterest.com

pitchbook.com

play.google.com

prada.com

properati.com.co

raymourflanigan.com

realestate.com.au

reddit.com

reuters.com

revenuebase.ai

sephora.fr

shop.mango.com

shopee.co.id

sleepnumber.com

slintel.com

target.com

tiktok.com

toctoc.com

tokopedia.com

toysrus.com

trustpilot.com

trustradius.com

unashamedcataddicts.quora.com

us.shein.com

ventureradar.com

vimeo.com

walmart.com

wayfair.com

webmotors.com.br

wildberries.ru

worldpopulationreview.com

worldpostalcode.com

www2.hm.com

x.com

xing.com

yapo.cl

yelp.com

youtube.com

ysl.com

zalando.de

zara.com

zarahome.com

zillow.com

zonaprop.com.ar

zoominfo.com

zoopla.co.uk

If your target domain is not on this list, we can develop a custom scraper specifically for you

We don’t provide dedicated scrapers specifically for hotels, but we do offer a Booking.com scraper and the option to create a custom scraper tailored to your specific requirements.

Here’s a quick guide to help you get started and choose the right solution for your needs:

  • Option 1: Enriched, Pre-Collected Data – Explore Our Datasets Marketplace

If you’re looking for ready-to-use, high-quality data, our Datasets Marketplace is the perfect place to start. We’ve already done the heavy lifting by collecting and enriching vast amounts of data from a variety of sources. These datasets are designed to save you time and effort, so you can focus on analyzing the data and making smarter decisions.

Simply browse our marketplace, find the dataset that fits your needs, and start using it right away.

Option 2: Web Scrapers for Fresh and Real-Time Data

If your project requires fresh data or highly specific information that isn’t available in our Datasets Marketplace, we offer powerful tools to help you collect fresh and real-time data directly from the web. Here’s how you can get started:

Pre-Built Web Scrapers We offer a wide range of pre-built web scrapers for popular websites, allowing you to collect data quickly and efficiently. These scrapers are ready to use and require minimal setup, making them a great choice for users who want to hit the ground running.

Custom Scrapers

Can’t find your target website in our list of pre-built scrapers? No problem\! We can create a custom scraper tailored specifically to your needs. Our team of experts will work with you to design a solution that collects the exact data you’re looking for.

Build Your Own Scraper

For users with JavaScript knowledge or access to developer resources, we also offer the option to build your own scraper using our Integrated Development Environment (IDE). This gives you full control and flexibility to create a scraper that meets your unique requirements.

Have questions or need assistance? Our team of experts is always here to help. Let’s get started\!

  1. Find the “Google Maps reviews” scraper on the dashboard and choose if you want to run it as an API request or initiate it using the “No code” option from the control panel
  2. Enter the input parameters (The place page URL and, Number of days to retrieve reviews from)
  3. Configure the needed request parameters if using an API
  4. Initiate the run and collect the data

To cancel a running snapshot, use one of the following methods:

  1. API Request:
    • Send a POST request to the endpoint:

      POST /datasets/v3/snapshot/cencel (playgrownd)

    • Replace {snapshot_id} with the ID of the snapshot you want to cancel.

  2. Control Panel:
    • Go to the Logs tab of the scraper.
    • Locate the running snapshot.
    • Hover over the specific run and click the “X” to cancel it.

Both methods will stop the snapshot process if it is currently running.

Yes, Bright Data GPT scraper always works with the “Search” function active.

Scrapers available in the Web Scrapers Library are pre-built solutions, and their underlying code is not accessible for modification or viewing.
For those interested in seeing how scrapers work, the Web Scraper IDE provides several example templates when you create a new scraper. These examples serve as practical references to help you understand scraping techniques and build your own custom solutions.

Yes, using the web scraper API you can return the scrape data to the request point
Using the following endpoint - POST api.brightdata.com/datasets/v3/scrape
This endpoint allows you to fetch data efficiently and ensures seamless integration with your applications or workflows.

How does it works?
The API enables you to send a scraping request and receive the results directly at the request point. This eliminates the need for data retrieval or the need to send to external storage, streamlines your data collection process.

Limitations

  • For long collection operations the best practice is to use our tigger/ endpoing (In case the collection request is taking too long while using /scrape endpoint, you will get the
    snapshot ID, which you will use to download the data once ready)

A Dataset ID is a unique identifier used in Web Scraper API requests. It’s included in the request URL to specify which particular Web Scraper you want to access. This ID ensures that your API call retrieves data from the correct scraper in our system. Here is how it is used: https://api.brightdata.com/datasets/v3/trigger?dataset_id=DATASET_ID_HERE

A dataset id will look like: gd_XXXXXXXXXXXXXXXXX For example: gd_l1viktl72bvl7bjuj0

You can find the exact dataset ID in the Web Scraper API page for the Scraper you are intrested in under the API Request Builder tab, it will be already inserted in the API request for you to copy.

Note: An id that looks like s_XXXXXXXXXXXXXXXXXXfor example: s_m7hm4et0141r2rhojq is not a dataset ID, it is a snapshot id - a snapshot is a collection of data that is collected from a single Web Scraper API request.

Was this page helpful?

Error codesScrape Any Website
linkedinyoutubegithub
Powered by Mintlify