Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.brightdata.com/llms.txt

Use this file to discover all available pages before exploring further.

The Bright Data CLI builds Bright Data Scraper Studio scrapers from your terminal in three commands: install it once, log in with bdata login, then bdata scraper create and bdata scraper run. This tutorial walks you through building a Hacker News top-stories scraper end to end. The CLI runs unchanged inside the embedded terminal of any coding agent like Claude Code, Cursor or Codex. Time to complete: about 10 to 25 minutes (AI generation runs in the background)

Prerequisites

  • A Bright Data account (sign up free, no card required)
  • Node.js 18 or later (node --version)
  • A terminal. Any embedded terminal works too: Claude Code, Cursor, Codex, VS Code, JetBrains, iTerm

Install the Bright Data CLI

npm install -g @brightdata/cli
bdata --version
The CLI is published as @brightdata/cli on npm. The bdata and brightdata commands are interchangeable.

Build your first scraper from the terminal

1

Log in

Run bdata login. The CLI opens a browser tab so you can authorize it against your Bright Data account, then stores your API key locally. You do not paste or copy a key.
bdata login
Expected result:
Opening browser for Bright Data authentication...
Logged in successfully. Key: 2e75****12bf
Checking for required zones...
Zone "cli_unlocker" already exists.
Zone "cli_browser" already exists.
The two zones (cli_unlocker and cli_browser) are the Web Unlocker API and Browser API endpoints the CLI uses when running scrapers. Bright Data creates them automatically on first login.
2

Create the scraper

Pass a target URL and one sentence describing the data you want. Bright Data’s AI Agent generates the output schema, writes the scraper code and returns a Collector ID.
bdata scraper create https://news.ycombinator.com \
  "Extract top stories: title, url, points, author, comment count"
The AI pipeline runs in seven stages, printed live: user_intent_analyzer, planner, collector_maintainer, output_schema_generator, code_generator, input_schema_generator, preview_runner and preview_picker. Typical wall-clock time is 5 to 15 minutes; complex targets can take up to 25 minutes.
Expected result:
Template created: c_mpohus372o5tmid1jk
Triggering AI generation...
Generating scraper...
Step: user_intent_analyzer — polling (attempt 1/600)
...
Done in 280 poll attempts.
{"status":"done","completed_steps":[...],"step":"preview_picker"}
Save the Collector ID (the c_* string). It is the stable handle for every subsequent run, schedule or API call on this scraper.
3

Run the scraper

Pass the Collector ID and a URL. Use --pretty to format the JSON output.
bdata scraper run c_mpohus372o5tmid1jk https://news.ycombinator.com --pretty
The CLI tries realtime mode first. If the scraper triggers more pages than the realtime limit allows, the CLI silently falls back to batch mode (POST /dca/trigger then poll GET /dca/dataset) and continues. No flag needed.
Expected result: a JSON array, one row per result.
[
  {
    "title": "Last.fm is now independent",
    "url": "https://support.last.fm/t/last-fm-is-now-independent/118591",
    "points": 447,
    "author": "twistslider",
    "comment_count": 131
  },
  {
    "title": "DuckDuckGo search saw 28% more visits after Google said people love AI mode",
    "url": "https://www.pcgamer.com/hardware/duckduckgos-ai-free-search-saw-nearly-28-percent-more-visits-in-the-week-following-googles-insistence-that-people-love-ai-mode/",
    "points": 418,
    "author": "HelloUsername",
    "comment_count": 212
  }
]

How do I use this from Claude Code, Cursor or Codex?

The Bright Data CLI runs inside any embedded terminal as-is. The coding agent is not building the scraper itself; the CLI calls Bright Data’s AI Agent, and the coding agent calls the CLI on your behalf. Two integrations make the CLI feel native inside a coding agent: Pin the Collector ID in the agent’s rules file so the agent re-uses your scraper across sessions instead of building a fresh one every time:
CLAUDE.md CODEX.md
SCRAPER_STUDIO_COLLECTOR_ID=c_mpohus372o5tmid1jk
HACKER_NEWS_SCRAPER_USAGE="bdata scraper run $SCRAPER_STUDIO_COLLECTOR_ID <url> --pretty"
Wire Bright Data’s MCP server into your agent with brightdata add mcp. The MCP server is separate from the Scraper Studio CLI but gives the agent additional scraping tools (scrape_as_markdown, search_engine and others) it can call directly:
brightdata add mcp                # interactive: pick Claude Code, Cursor or Codex
See the Bright Data MCP server quickstart for what the MCP exposes.

What just happened?

Three CLI commands mapped to four Bright Data Scraper Studio API endpoints. Use this table to translate the CLI flow into raw HTTP calls when you are ready to integrate without the CLI:
You ranBright Data API endpoint behind it
bdata loginLocal credential store. Stores the API key from Account Settings.
bdata scraper createPOST /dca/collector then POST /dca/collectors/{c_*}/automate_template
bdata scraper run (small input)POST /dca/trigger_immediate then GET /dca/get_result
bdata scraper run (large input)POST /dca/trigger then poll GET /dca/dataset?id=j_*
For a worked example of the underlying API in cURL, Python and Node.js, see the Bright Data Scraper Studio API quickstart. For every endpoint, see the Scraper Studio API reference.

Frequently asked questions

AI generation timing depends on target complexity. Simple single-page scrapers finish in 5 to 10 minutes. Pages with lazy-load, pagination or anti-bot challenges can take 15 to 25 minutes. The CLI polls Bright Data’s AI Flow API every five seconds and prints the current stage, so you can leave it running and check back. No action is needed while you wait.
Realtime mode caps the number of page loads per request. When a scraper triggers more pages than the realtime limit allows, the CLI prints Realtime page limit exceeded, switching to batch mode..., submits the same inputs to POST /dca/trigger, and polls GET /dca/dataset?id=j_* until the snapshot is ready. The switch is automatic and the final JSON shape is identical. See Scraper Studio specifications for the page-load limits.
The AI Agent’s generated schema is per-row best-effort, not strict. Jobs posts, “Show HN” entries and very new submissions on Hacker News do not always have a points or comment count yet, so the scraper returns the row with those fields omitted rather than inventing a value. Treat missing fields as null in your own code. To enforce a stricter schema, open the scraper in Scraper Studio or rewrite the schema with the Self-Healing tool.
Yes. The Collector ID returned by bdata scraper create (the c_* string) is the same handle the Bright Data Scraper Studio API uses. Pass it to POST /dca/trigger from any HTTP client. See the Bright Data Scraper Studio API quickstart for cURL, Python and Node.js examples.
Self-healing is not yet a CLI subcommand. You have three options:
  • Control panel: open the scraper in Scraper Studio and use the Self-Healing tool to describe the fix in plain language.
  • Direct API (three-call loop):
    1. POST /dca/collectors/{c_*}/refactor_template with the heal prompt.
    2. Poll GET /dca/collectors/{c_*}/refactor_template/progress until status is pending_answer and the response includes the proposed diff.
    3. POST /dca/collectors/{c_*}/resume_automation_job to approve or reject the diff.
    See Trigger Self-Healing and Resume Self-Healing Job.
  • Worked example: the Scraper Studio Self-Healing demo is a Node.js implementation of the full healing loop, including the pending_answer approval step.
The bdata login command requires a browser callback. For headless environments, export your API key as BRIGHTDATA_API_KEY and the CLI uses it directly without a login step:
export BRIGHTDATA_API_KEY="your_api_key_here"
bdata scraper run c_mpohus372o5tmid1jk https://news.ycombinator.com
Copy the key from Account Settings.

Build with the AI Agent

Build the same scraper from the Bright Data control panel instead of the terminal

Scraper Studio API quickstart

Trigger an existing scraper from cURL, Python or Node.js

Self-Healing tool

Fix a scraper with a plain-language prompt when a target site changes

Bright Data CLI overview

Every bdata command, with examples