Bright Data Scraper Studio API concepts

This page defines every term the Bright Data Scraper Studio API uses so you can read the rest of the reference without ambiguity. Skim it once. Come back when an endpoint mentions an ID and you want to confirm what it represents.

What are the core objects?

The Bright Data Scraper Studio data model has four objects. The relationships are strictly hierarchical: one collector produces many collections, and one collection produces one dataset.

  Collector (c_...)            -> the recipe
       |
       v
  Collection (j_...)           -> one execution of the recipe
       |
       v
  Dataset (JSON array)         -> the rows the execution produced

Term	Looks like	What it is	Where you see it
Collector	`c_xxxxxxxxxxxxxxxx`	The published scraper definition you built in Scraper Studio. Stable: same ID across runs.	`?collector=<id>` query param on every trigger call.
Input	A JSON object	One row of data you want the collector to process, typically `{"url": "..."}`. Multiple inputs are sent as a JSON array.	Request body on `POST /dca/trigger`.
Collection (also called snapshot)	`j_xxxxxxxxxxxxxxxx`	One execution of a collector against one batch of inputs. Has a lifecycle: `building`, then `ready` or `failed`.	Returned by `POST /dca/trigger`. Passed to `GET /dca/dataset?id=<id>`.
Dataset	A JSON array	The output rows produced by a collection. One row per successful input by default.	Body of the `GET /dca/dataset` response when the collection is ready.

The trigger response uses one name for what every other endpoint calls something else. POST /dca/trigger returns { "collection_id": "j_..." }, but GET /dca/dataset?id=<...> expects the same value as a snapshot ID. They are the same string, just two names for the collection identifier. Treat the collection_id from the trigger response as your snapshot_id for every downstream call.

Which API surface do I use?

Bright Data Scraper Studio exposes two API surfaces. They serve different goals and are usually called by different people on a team.

Surface	Base path	Use it to	Typical caller
Collection API	`/dca/*`	Run a collector that already exists and get data back.	Application developers integrating scraping into a product.
AI Flow API	`/api/scraper-studio/*`	Create or self-heal a collector programmatically using the AI Agent.	Platform teams building dynamic-collector workflows.

If you have a published collector and want results, use the Collection API. The rest of this reference focuses there. The AI Flow overview covers the other surface.

How are results delivered?

Within the Collection API there are two delivery models, and each can be paired with either polling or push, for four total combinations. Pick based on input size and latency tolerance.

Mode	How results arrive	Best for	Reference
Batch + API polling	You call `GET /dca/dataset?id=<id>` until the response is a JSON array.	Quickstarts, ad-hoc scripts, small input counts.	Quickstart
Batch + push delivery	Bright Data delivers the dataset to a destination you configured (webhook, S3, GCS, Azure, SFTP, email).	Large jobs, scheduled production workloads.	Choose a delivery type
Real-time, asynchronous	Trigger returns immediately. You fetch results when ready. Use for 1 to 10 inputs that need to start now.	Latency-sensitive workflows that can fan out polling.	Async real-time job
Real-time, synchronous	A single HTTP call. The response body contains the results.	Single-input, request-response workloads, for example an autocomplete API.	Sync real-time job

Start with Batch + API polling. Batch + API polling is the simplest model and the one the Quickstart, the Node.js starter and the Python starter all use. Move to push delivery once your input counts make polling impractical.

What worker types are available?

Every collection runs on one of two worker types, chosen when you built the collector. The choice affects speed, cost and which scraping techniques are available. The API does not let you change the worker type at trigger time. If you need a different one, edit the collector in Bright Data Scraper Studio.

Worker	Best for	Speed	Detail
Code worker	Static HTML pages, JSON endpoints	Fast	Worker types
Browser worker	JavaScript-rendered pages, clicks, scrolling, captured background traffic	Slower	Worker types

How do I authenticate a request?

All Bright Data Scraper Studio API calls use bearer-token authentication with a single token issued per account.

Authorization: Bearer YOUR_BRIGHT_DATA_API_TOKEN

Issue and revoke tokens from Account Settings → API Tokens. The same token works for both the Collection API and the AI Flow API.

How long is a collection retained?

Object	Lifecycle	Retention
Collector	Stable until you delete or unpublish it.	Indefinite.
Collection	`building`, then `ready` or `failed`. Terminal state is permanent.	90 days from creation. Results are deleted after that.
Dataset	Materialized when the collection reaches `ready`.	Tied to its collection. 90 days.

Push delivery destinations (webhook, S3, GCS, etc.) receive results once per collection. If you need to re-deliver, re-trigger the collector with the same inputs. See Specifications for the current limits on concurrency, payload size and per-account quotas.

Quickstart

Trigger a collector in about three minutes with cURL, Node.js or Python.

Choose a delivery type

Polling vs webhook vs S3 vs GCS. Per-request and per-collector options.

AI Flow API

Programmatically create or self-heal a collector with the AI Agent.

Build a collector

Need a collector that does not exist yet? Start in Bright Data Scraper Studio.

Introduction

Product Guides

Bright Data Scraper Studio API concepts

What are the core objects?

Which API surface do I use?

How are results delivered?

What worker types are available?

How do I authenticate a request?

How long is a collection retained?

Quickstart

Choose a delivery type

AI Flow API

Build a collector

Introduction

Product Guides

Documentation Index

​What are the core objects?

​Which API surface do I use?

​How are results delivered?

​What worker types are available?

​How do I authenticate a request?

​How long is a collection retained?

​Related resources

Quickstart

Choose a delivery type

AI Flow API

Build a collector

What are the core objects?

Which API surface do I use?

How are results delivered?

What worker types are available?

How do I authenticate a request?

How long is a collection retained?

Related resources