The Bright Data CLI builds Bright Data Scraper Studio scrapers from your terminal in three commands: install it once, log in withDocumentation Index
Fetch the complete documentation index at: https://docs.brightdata.com/llms.txt
Use this file to discover all available pages before exploring further.
bdata login, then bdata scraper create and bdata scraper run. This tutorial walks you through building a Hacker News top-stories scraper end to end. The CLI runs unchanged inside the embedded terminal of any coding agent like Claude Code, Cursor or Codex.
Time to complete: about 10 to 25 minutes (AI generation runs in the background)
Prerequisites
- A Bright Data account (sign up free, no card required)
- Node.js 18 or later (
node --version) - A terminal. Any embedded terminal works too: Claude Code, Cursor, Codex, VS Code, JetBrains, iTerm
Install the Bright Data CLI
@brightdata/cli on npm. The bdata and brightdata commands are interchangeable.
Build your first scraper from the terminal
Log in
Run
bdata login. The CLI opens a browser tab so you can authorize it against your Bright Data account, then stores your API key locally. You do not paste or copy a key.Expected result:The two zones (
cli_unlocker and cli_browser) are the Web Unlocker API and Browser API endpoints the CLI uses when running scrapers. Bright Data creates them automatically on first login.Create the scraper
Pass a target URL and one sentence describing the data you want. Bright Data’s AI Agent generates the output schema, writes the scraper code and returns a Collector ID.The AI pipeline runs in seven stages, printed live:
user_intent_analyzer, planner, collector_maintainer, output_schema_generator, code_generator, input_schema_generator, preview_runner and preview_picker. Typical wall-clock time is 5 to 15 minutes; complex targets can take up to 25 minutes.Expected result:Save the Collector ID (the
c_* string). It is the stable handle for every subsequent run, schedule or API call on this scraper.Run the scraper
Pass the Collector ID and a URL. Use The CLI tries realtime mode first. If the scraper triggers more pages than the realtime limit allows, the CLI silently falls back to batch mode (
--pretty to format the JSON output.POST /dca/trigger then poll GET /dca/dataset) and continues. No flag needed.Expected result: a JSON array, one row per result.
How do I use this from Claude Code, Cursor or Codex?
The Bright Data CLI runs inside any embedded terminal as-is. The coding agent is not building the scraper itself; the CLI calls Bright Data’s AI Agent, and the coding agent calls the CLI on your behalf. Two integrations make the CLI feel native inside a coding agent: Pin the Collector ID in the agent’s rules file so the agent re-uses your scraper across sessions instead of building a fresh one every time:CLAUDE.md CODEX.md
brightdata add mcp. The MCP server is separate from the Scraper Studio CLI but gives the agent additional scraping tools (scrape_as_markdown, search_engine and others) it can call directly:
What just happened?
Three CLI commands mapped to four Bright Data Scraper Studio API endpoints. Use this table to translate the CLI flow into raw HTTP calls when you are ready to integrate without the CLI:| You ran | Bright Data API endpoint behind it |
|---|---|
bdata login | Local credential store. Stores the API key from Account Settings. |
bdata scraper create | POST /dca/collector then POST /dca/collectors/{c_*}/automate_template |
bdata scraper run (small input) | POST /dca/trigger_immediate then GET /dca/get_result |
bdata scraper run (large input) | POST /dca/trigger then poll GET /dca/dataset?id=j_* |
Frequently asked questions
Why did `bdata scraper create` take longer than 10 minutes?
Why did `bdata scraper create` take longer than 10 minutes?
AI generation timing depends on target complexity. Simple single-page scrapers finish in 5 to 10 minutes. Pages with lazy-load, pagination or anti-bot challenges can take 15 to 25 minutes. The CLI polls Bright Data’s AI Flow API every five seconds and prints the current stage, so you can leave it running and check back. No action is needed while you wait.
Why did the CLI switch from realtime to batch mode mid-run?
Why did the CLI switch from realtime to batch mode mid-run?
Realtime mode caps the number of page loads per request. When a scraper triggers more pages than the realtime limit allows, the CLI prints
Realtime page limit exceeded, switching to batch mode..., submits the same inputs to POST /dca/trigger, and polls GET /dca/dataset?id=j_* until the snapshot is ready. The switch is automatic and the final JSON shape is identical. See Scraper Studio specifications for the page-load limits.Why are some rows missing fields like `points` or `comment_count`?
Why are some rows missing fields like `points` or `comment_count`?
The AI Agent’s generated schema is per-row best-effort, not strict. Jobs posts, “Show HN” entries and very new submissions on Hacker News do not always have a points or comment count yet, so the scraper returns the row with those fields omitted rather than inventing a value. Treat missing fields as
null in your own code. To enforce a stricter schema, open the scraper in Scraper Studio or rewrite the schema with the Self-Healing tool.Can I trigger this scraper from my own code instead of the CLI?
Can I trigger this scraper from my own code instead of the CLI?
Yes. The Collector ID returned by
bdata scraper create (the c_* string) is the same handle the Bright Data Scraper Studio API uses. Pass it to POST /dca/trigger from any HTTP client. See the Bright Data Scraper Studio API quickstart for cURL, Python and Node.js examples.How do I fix the scraper when the target site changes?
How do I fix the scraper when the target site changes?
Self-healing is not yet a CLI subcommand. You have three options:
- Control panel: open the scraper in Scraper Studio and use the Self-Healing tool to describe the fix in plain language.
-
Direct API (three-call loop):
POST /dca/collectors/{c_*}/refactor_templatewith the heal prompt.- Poll
GET /dca/collectors/{c_*}/refactor_template/progressuntilstatusispending_answerand the response includes the proposed diff. POST /dca/collectors/{c_*}/resume_automation_jobto approve or reject the diff.
-
Worked example: the Scraper Studio Self-Healing demo is a Node.js implementation of the full healing loop, including the
pending_answerapproval step.
Does `bdata login` work without a browser, for example in CI?
Does `bdata login` work without a browser, for example in CI?
The Copy the key from Account Settings.
bdata login command requires a browser callback. For headless environments, export your API key as BRIGHTDATA_API_KEY and the CLI uses it directly without a login step:Related
Build with the AI Agent
Build the same scraper from the Bright Data control panel instead of the terminal
Scraper Studio API quickstart
Trigger an existing scraper from cURL, Python or Node.js
Self-Healing tool
Fix a scraper with a plain-language prompt when a target site changes
Bright Data CLI overview
Every
bdata command, with examples