Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.brightdata.com/llms.txt

Use this file to discover all available pages before exploring further.

This guide walks you through sending your first request to the Bright Data Scraper Studio API. By the end, you will trigger a published collector from your own code and receive structured JSON back. The Bright Data Scraper Studio API is built around two HTTP calls:
  1. POST /dca/trigger, which queues one or more inputs and returns a snapshot ID.
  2. GET /dca/dataset?id=<snapshot_id>, which serves the snapshot once it is ready.
If you do not yet have a published collector, build one first with the AI Agent or the IDE.
Typical time to first record is about three minutes for a collector with one to ten inputs.

Prerequisites

Set both values as environment variables once and reuse them across every snippet below:
export BRIGHT_DATA_API_TOKEN="your_api_token_here"
export BRIGHT_DATA_COLLECTOR_ID="c_xxxxxxxxxxxxxxxx"

Make your first request

1

Authenticate every call

Every Bright Data Scraper Studio API call uses bearer-token authentication. Add this header to every request:
Authorization: Bearer YOUR_BRIGHT_DATA_API_TOKEN
A missing or invalid token returns 401 Unauthorized.
2

Trigger your collector

Send the inputs you want the collector to process as a JSON array in the request body. Each object in the array must match the input schema you defined when you built the collector. The default schema is a single url field.
curl -X POST \
  "https://api.brightdata.com/dca/trigger?collector=$BRIGHT_DATA_COLLECTOR_ID&queue_next=1" \
  -H "Authorization: Bearer $BRIGHT_DATA_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[
    {"url": "https://ecommerce-shop-brd.vercel.app/product/echo-portable-speaker"},
    {"url": "https://ecommerce-shop-brd.vercel.app/product/nimbus-cloud-storage"}
  ]'
The Bright Data Scraper Studio API responds with a single snapshot ID:
{
  "collection_id": "j_abc123def456"
}
Keep this ID. You will use it as the snapshot_id in step 3.
queue_next=1 runs your inputs immediately. Omit it (or set 0) to enqueue them behind any in-flight work for the same collector.
3

Poll until ready, then download

The same /dca/dataset endpoint serves both the in-progress and ready responses. Poll it every five seconds until the response is a JSON array.
# Poll every 5 seconds until the response is a JSON array (not an object).
while :; do
  response=$(curl -s \
    "https://api.brightdata.com/dca/dataset?id=$SNAPSHOT_ID" \
    -H "Authorization: Bearer $BRIGHT_DATA_API_TOKEN")
  if [[ "${response:0:1}" == "[" ]]; then
    echo "$response" > results.json
    break
  fi
  sleep 5
done
While the snapshot is building, the response is a status object:
{ "status": "building" }
When the snapshot is ready, the response is a JSON array. One row per successful input by default:
[
  {
    "url": "https://ecommerce-shop-brd.vercel.app/product/echo-portable-speaker",
    "title": "Echo Portable Speaker",
    "price": 49.99,
    "availability": "in stock",
    "input": { "url": "https://ecommerce-shop-brd.vercel.app/product/echo-portable-speaker" }
  },
  {
    "url": "https://ecommerce-shop-brd.vercel.app/product/nimbus-cloud-storage",
    "title": "Nimbus Cloud Storage",
    "price": 12.99,
    "availability": "in stock",
    "input": { "url": "https://ecommerce-shop-brd.vercel.app/product/nimbus-cloud-storage" }
  }
]
The exact field set depends on the output schema you defined when you built the collector.

How long does this take?

The first record usually arrives within a minute, but total time depends on the collector and the target site. Measured against a typical e-commerce product page collector:
Input countTypical wall-clock time
1 to 10 URLs30 to 90 seconds
11 to 100 URLs2 to 5 minutes
100+ URLs5+ minutes. Use push delivery instead of polling.
For long-running jobs, swap polling for a push delivery destination (webhook, S3, GCS, Azure, SFTP or email) so Bright Data calls you when the snapshot is ready.

What do the IDs mean?

Three identifiers appear in Bright Data Scraper Studio. They are easy to confuse because the trigger response uses one name for a value that another endpoint reads under a different name.
TermLooks likeWhat it identifies
Collector IDc_xxxxxxxxxxxxxxxxThe published scraper definition. Stable. You pass it as the collector query parameter on /dca/trigger.
Collection ID (returned as collection_id)j_xxxxxxxxxxxxxxxxOne run of the collector. The trigger response field is collection_id, but every other endpoint refers to the same value as snapshot_id. They are the same string.
Dataseta JSON arrayThe result rows produced by one run. The /dca/dataset endpoint returns this when the run is finished.
Treat collection_id from the trigger response as your snapshot_id everywhere else. They are the same value under two names.

What errors might I see?

StatusMeaningFix
401 UnauthorizedToken missing, malformed or revokedRe-copy from Account Settings → API Tokens
404 Not FoundCollector ID does not exist or your account does not have accessOpen the collector in Scraper Studio and re-copy the ID
422 Unprocessable EntityThe objects in your request body do not match the collector’s input schemaConfirm field names against the Inputs tab of your collector
5xxTransient Bright Data API errorRetry with exponential backoff, for example 1s, 2s, 4s
[] (empty array)Snapshot has no rows, or the snapshot expiredSnapshots are retained for 90 days by default. See Specifications

Use a production-grade starter template

These open-source repositories are exactly the calls above, hardened with environment-variable config, retry/backoff for transient failures, library helpers and a complete README. Fork either and you have a runnable client in 30 seconds.

Node.js starter

Node 18+, ES modules, dotenv, retry/backoff, ~150 LOC

Python starter

Python 3.8+, requests, python-dotenv, retry/backoff, ~150 LOC
Both repositories ship with a CodeSandbox devcontainer so you can fork and run in your browser without any local setup.

Next steps

Choose a delivery type

Skip polling. Have Bright Data push results to a webhook, S3, GCS or email when the snapshot is ready.

Trigger a batch collection

Send hundreds or thousands of inputs in a single request and receive results in batches.

Run a synchronous real-time job

For low-input, latency-sensitive workloads. Trigger and receive results in a single HTTP call.

Build a new collector

Need a collector that does not exist yet? Build one with the AI Agent or the IDE.

Frequently asked questions

The Collection API (/dca/*, this page) runs an existing collector to get data. The AI Flow API runs the AI Agent to create or self-heal a collector. Most developers integrating Bright Data Scraper Studio into a product use the Collection API.
Yes, as long as every object in the array conforms to the collector’s input schema. If your collector accepts both url and keyword as input fields, you can mix them in one request. Fields you do not include are treated as null.
Open the snapshot in My Scrapers and click Last errors to see which inputs failed and why. Re-trigger just those inputs in a new POST /dca/trigger call.
Yes. Per-account concurrency limits apply per collector. See Specifications for current limits. The starter templates linked above already implement exponential backoff for transient 5xx responses.