POST
/
datasets
/
v3
/
trigger
curl --request POST \
  --url https://api.brightdata.com/datasets/v3/trigger \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "url": "https://il.linkedin.com/company/bright-data"
  }
]'
{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

How It Works

By default, scraping requests are processed asynchronously. When a request is submitted, the system begins processing the job in the background and immediately returns a snapshot ID. Once the scraping task is complete, the results can be retrieved at your convenience by using the snapshot ID to download the data via the API. Alternatively, you can configure the request to automatically deliver the results to an external storage destination, such as an S3 bucket or Azure Blob Storage. This approach is well-suited for handling larger jobs or integrating with automated data pipelines.

Body

The inputs to be used by the scraper. Can be provided either as JSON or as a CSV file:

Content-Type
string

A JSON array of inputs

Example: [{"url":"https://www.airbnb.com/rooms/50122531"}]


A CSV file, in a field called data

Example (curl): data=@path/to/your/file.csv

Web Scraper Types

Each scraper can require different inputs. There are 2 main types of scrapers:

1. PDP

These scrapers require URLs as inputs. A PDP scraper extracts detailed product information like specifications, pricing, and features from web pages

2. Discovery

Discovery scrapers allow you to explore and find new entities/products through search, categories, Keywords and more.

Request examples

PDP with URL input

Input format for PDP is always a URL, pointing to the page to be scraped.

Sample Request
curl -H "Authorization: Bearer API_TOKEN" -H "Content-Type: application/json" -d '[{"url":"https://www.airbnb.com/rooms/50122531"},{"url":"https://www.airbnb.com/rooms/50127677"}]' "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_ld7ll037kqy322v05&format=json&uncompressed_webhook=true"

Discovery input based on the discovery method

Sample Request
curl -H "Authorization: Bearer x2x3fdaaddrer" -H "Content-Type: application/json" -d '[{"keyword":"light bulb"},{"keyword":"dog toys"},{"keyword":"home decor"}]' "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_l7q7dkf244hwjntr0&endpoint=https://webhook-url.com&auth_header=QWxhZGRpbjpPcGVuU2VzYW1l&notify=https://notify-me.com/&format=ndjson&uncompressed_webhook=true&type=discover_new&discover_by=keyword&limit_per_input=10"

Input format for discovery can vary according to the specific scraper. Inputs can be:

[{"keyword": "light bulb"},{"keyword": "dog toys"},{"keyword": "home decor"}]

And more. Find out what inputs each scraper requires here.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

dataset_id
string
required

Dataset ID for which data collection is triggered.

Example:

"gd_l1vikfnt1wgvvqz95w"

type
enum<string>

Set it to "discover_new" to trigger a collection that includes a discovery phase.

Available options:
discover_new
discover_by
string

Specifies which discovery method to use. Available options: "keyword", "best_sellers_url", "category_url", "location" and more (according to the specific API). Relevant only for collections that include a discovery phase.

include_errors
boolean

Include errors report with the results.

limit_per_input
number

Limit the number of results per input. Relevant only for collections that include a discovery phase.

Required range: x >= 1
limit_multiple_results
number

Limit the total number of results.

Required range: x >= 1
notify
string

URL where the notification will be sent once the collection is finished. Notification will contain snapshot_id and status.

endpoint
string

Webhook URL where data will be delivered.

format
enum<string>

Specifies the format of the data to be delivered to the webhook endpoint.

Available options:
json,
ndjson,
jsonl,
csv
auth_header
string

Authorization header to be used when sending notification to notify URL or delivering data via webhook endpoint.

uncompressed_webhook
boolean

By default, the data will be sent to the webhook compressed. Pass true to send it uncompressed.

Body

{key}
any

Response

200 - application/json
Collection job successfully started
snapshot_id
string

ID of your request that can be used in the next APIs

Example:

"s_m4x7enmven8djfqak"