POST
/
datasets
/
v3
/
trigger
curl --request POST \
  --url https://api.brightdata.com/datasets/v3/trigger \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '[
  {
    "url": "https://il.linkedin.com/company/bright-data"
  }
]'
{
  "snapshot_id": "s_m4x7enmven8djfqak"
}

正文

要供抓取器使用的输入。可以作为 JSON 或 CSV 文件提供:

Content-Type
string

Content-Type: application/json

输入的 JSON 数组

示例: [{"url":"https://www.airbnb.com/rooms/50122531"}]


Content-Type: multipart/form-data

一个 CSV 文件,字段名为 data

示例 (curl): data=@path/to/your/file.csv

Web Scraper 类型

每种抓取器可能需要不同的输入。主要有两种类型的抓取器:

1. PDP

这些抓取器需要 URL 作为输入。PDP 抓取器从网页中提取详细的产品信息,如规格、定价和功能。

2. Discovery

Discovery 抓取器允许您通过搜索、类别、关键词等来探索和发现新实体/产品。

请求示例

PDP 以 URL 输入

PDP 的输入格式始终是 URL,指向要抓取的页面。

Sample Request
curl -H "Authorization: Bearer API_TOKEN" -H "Content-Type: application/json" -d '[{"url":"https://www.airbnb.com/rooms/50122531"},{"url":"https://www.airbnb.com/rooms/50127677"}]' "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_ld7ll037kqy322v05&format=json&uncompressed_webhook=true"

基于 discovery 方法的 Discovery 输入

Sample Request
curl -H "Authorization: Bearer x2x3fdaaddrer" -H "Content-Type: application/json" -d '[{"keyword":"light bulb"},{"keyword":"dog toys"},{"keyword":"home decor"}]' "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_l7q7dkf244hwjntr0&endpoint=https://webhook-url.com&auth_header=QWxhZGRpbjpPcGVuU2VzYW1l&notify=https://notify-me.com/&format=ndjson&uncompressed_webhook=true&type=discover_new&discover_by=keyword&limit_per_input=10"

discovery 的输入格式可以根据特定的抓取器有所不同。输入可以是:

[{"keyword": "light bulb"},{"keyword": "dog toys"},{"keyword": "home decor"}]

以及更多。了解每个抓取器需要的输入,请参见 这里.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

dataset_id
string
required

Dataset ID for which data collection is triggered.

Example:

"gd_l1vikfnt1wgvvqz95w"

type
enum<string>

Set it to "discover_new" to trigger a collection that includes a discovery phase.

Available options:
discover_new
discover_by
string

Specifies which discovery method to use. Available options: "keyword", "best_sellers_url", "category_url", "location" and more (according to the specific API). Relevant only for collections that include a discovery phase.

include_errors
boolean

Include errors report with the results.

limit_per_input
number

Limit the number of results per input. Relevant only for collections that include a discovery phase.

Required range: x >= 1
limit_multiple_results
number

Limit the total number of results.

Required range: x >= 1
notify
string

URL where the notification will be sent once the collection is finished. Notification will contain snapshot_id and status.

endpoint
string

Webhook URL where data will be delivered.

format
enum<string>

Specifies the format of the data to be delivered to the webhook endpoint.

Available options:
json,
ndjson,
jsonl,
csv
auth_header
string

Authorization header to be used when sending notification to notify URL or delivering data via webhook endpoint.

uncompressed_webhook
boolean

By default, the data will be sent to the webhook compressed. Pass true to send it uncompressed.

Body

{key}
any

Response

200 - application/json
Collection job successfully started
snapshot_id
string

ID of your request that can be used in the next APIs

Example:

"s_m4x7enmven8djfqak"