Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.brightdata.com/llms.txt

Use this file to discover all available pages before exploring further.

This page defines every term the Bright Data Scraper Studio API uses so you can read the rest of the reference without ambiguity. Skim it once. Come back when an endpoint mentions an ID and you want to confirm what it represents.

What are the core objects?

The Bright Data Scraper Studio data model has four objects. The relationships are strictly hierarchical: one collector produces many collections, and one collection produces one dataset.
  Collector (c_...)            -> the recipe
       |
       v
  Collection (j_...)           -> one execution of the recipe
       |
       v
  Dataset (JSON array)         -> the rows the execution produced
TermLooks likeWhat it isWhere you see it
Collectorc_xxxxxxxxxxxxxxxxThe published scraper definition you built in Scraper Studio. Stable: same ID across runs.?collector=<id> query param on every trigger call.
InputA JSON objectOne row of data you want the collector to process, typically {"url": "..."}. Multiple inputs are sent as a JSON array.Request body on POST /dca/trigger.
Collection (also called snapshot)j_xxxxxxxxxxxxxxxxOne execution of a collector against one batch of inputs. Has a lifecycle: building, then ready or failed.Returned by POST /dca/trigger. Passed to GET /dca/dataset?id=<id>.
DatasetA JSON arrayThe output rows produced by a collection. One row per successful input by default.Body of the GET /dca/dataset response when the collection is ready.
The trigger response uses one name for what every other endpoint calls something else. POST /dca/trigger returns { "collection_id": "j_..." }, but GET /dca/dataset?id=<...> expects the same value as a snapshot ID. They are the same string, just two names for the collection identifier. Treat the collection_id from the trigger response as your snapshot_id for every downstream call.

Which API surface do I use?

Bright Data Scraper Studio exposes two API surfaces. They serve different goals and are usually called by different people on a team.
SurfaceBase pathUse it toTypical caller
Collection API/dca/*Run a collector that already exists and get data back.Application developers integrating scraping into a product.
AI Flow API/api/scraper-studio/*Create or self-heal a collector programmatically using the AI Agent.Platform teams building dynamic-collector workflows.
If you have a published collector and want results, use the Collection API. The rest of this reference focuses there. The AI Flow overview covers the other surface.

How are results delivered?

Within the Collection API there are two delivery models, and each can be paired with either polling or push, for four total combinations. Pick based on input size and latency tolerance.
ModeHow results arriveBest forReference
Batch + API pollingYou call GET /dca/dataset?id=<id> until the response is a JSON array.Quickstarts, ad-hoc scripts, small input counts.Quickstart
Batch + push deliveryBright Data delivers the dataset to a destination you configured (webhook, S3, GCS, Azure, SFTP, email).Large jobs, scheduled production workloads.Choose a delivery type
Real-time, asynchronousTrigger returns immediately. You fetch results when ready. Use for 1 to 10 inputs that need to start now.Latency-sensitive workflows that can fan out polling.Async real-time job
Real-time, synchronousA single HTTP call. The response body contains the results.Single-input, request-response workloads, for example an autocomplete API.Sync real-time job
Start with Batch + API polling. Batch + API polling is the simplest model and the one the Quickstart, the Node.js starter and the Python starter all use. Move to push delivery once your input counts make polling impractical.

What worker types are available?

Every collection runs on one of two worker types, chosen when you built the collector. The choice affects speed, cost and which scraping techniques are available. The API does not let you change the worker type at trigger time. If you need a different one, edit the collector in Bright Data Scraper Studio.
WorkerBest forSpeedDetail
Code workerStatic HTML pages, JSON endpointsFastWorker types
Browser workerJavaScript-rendered pages, clicks, scrolling, captured background trafficSlowerWorker types

How do I authenticate a request?

All Bright Data Scraper Studio API calls use bearer-token authentication with a single token issued per account.
Authorization: Bearer YOUR_BRIGHT_DATA_API_TOKEN
Issue and revoke tokens from Account Settings → API Tokens. The same token works for both the Collection API and the AI Flow API.

How long is a collection retained?

ObjectLifecycleRetention
CollectorStable until you delete or unpublish it.Indefinite.
Collectionbuilding, then ready or failed. Terminal state is permanent.90 days from creation. Results are deleted after that.
DatasetMaterialized when the collection reaches ready.Tied to its collection. 90 days.
Push delivery destinations (webhook, S3, GCS, etc.) receive results once per collection. If you need to re-deliver, re-trigger the collector with the same inputs. See Specifications for the current limits on concurrency, payload size and per-account quotas.

Quickstart

Trigger a collector in about three minutes with cURL, Node.js or Python.

Choose a delivery type

Polling vs webhook vs S3 vs GCS. Per-request and per-collector options.

AI Flow API

Programmatically create or self-heal a collector with the AI Agent.

Build a collector

Need a collector that does not exist yet? Start in Bright Data Scraper Studio.