Skip to main content

Scraping workflows

Get clean content from any website

# Clean markdown - great for reading or feeding to LLMs
brightdata scrape https://news.ycombinator.com

# Save to file
brightdata scrape https://docs.python.org/3/tutorial/ -o python-tutorial.md

# Get raw HTML for parsing
brightdata scrape https://example.com -f html -o page.html

Geo-targeted scraping

# See Amazon prices as a US customer
brightdata scrape https://amazon.com/dp/B09V3KXJPB --country us

# Scrape a German news site from Germany
brightdata scrape https://spiegel.de --country de

# Mobile user agent for mobile-optimized pages
brightdata scrape https://example.com --mobile

Take screenshots

# Full-page screenshot
brightdata scrape https://example.com -f screenshot -o homepage.png

# Screenshot from a specific country
brightdata scrape https://amazon.co.uk -f screenshot --country gb -o uk-amazon.png

Search workflows

# Google search with formatted table output
brightdata search "best web scraping tools 2025"

# Get raw JSON for processing
brightdata search "typescript best practices" --json

# Pretty-print for inspection
brightdata search "AI startups" --pretty
# Local restaurant search from Germany, in German
brightdata search "restaurants berlin" --country de --language de

# News-only results
brightdata search "AI regulation 2025" --type news

# Shopping results
brightdata search "wireless headphones" --type shopping

# Image search
brightdata search "mountain landscape wallpaper" --type images

Pagination

# First page (default)
brightdata search "web scraping tutorials"

# Second page
brightdata search "web scraping tutorials" --page 1

# Third page
brightdata search "web scraping tutorials" --page 2

Structured data extraction

E-Commerce

# Amazon product details
brightdata pipelines amazon_product "https://amazon.com/dp/B09V3KXJPB"

# Amazon product reviews
brightdata pipelines amazon_product_reviews "https://amazon.com/dp/B09V3KXJPB"

# Amazon search - requires keyword + domain
brightdata pipelines amazon_product_search "wireless headphones" "https://amazon.com"

# Walmart product
brightdata pipelines walmart_product "https://walmart.com/ip/123456"

# Export as CSV
brightdata pipelines amazon_product "https://amazon.com/dp/B09V3KXJPB" --format csv -o product.csv

Social media profiles

# LinkedIn person
brightdata pipelines linkedin_person_profile "https://linkedin.com/in/username"

# LinkedIn company
brightdata pipelines linkedin_company_profile "https://linkedin.com/company/bright-data"

# Instagram profile
brightdata pipelines instagram_profiles "https://instagram.com/username"

# TikTok profile
brightdata pipelines tiktok_profiles "https://tiktok.com/@username"

Reviews and comments

# Google Maps reviews - last 7 days
brightdata pipelines google_maps_reviews "https://maps.google.com/maps/place/..." 7

# YouTube comments - top 50
brightdata pipelines youtube_comments "https://youtube.com/watch?v=dQw4w9WgXcQ" 50

# Facebook company reviews - 25 reviews
brightdata pipelines facebook_company_reviews "https://facebook.com/company" 25

# Instagram comments
brightdata pipelines instagram_comments "https://instagram.com/p/ABC123"

Piping and automation

The CLI is designed to be pipe-friendly. When stdout is not a TTY, colors and spinners are automatically disabled.

Chain search → scrape

# Search Google, extract the first URL, then scrape it
brightdata search "best python frameworks 2025" --json \
  | jq -r '.organic[0].link' \
  | xargs brightdata scrape

Scrape and read in terminal

# Pipe markdown output to a terminal reader
brightdata scrape https://docs.github.com | glow -

# Or use less
brightdata scrape https://docs.github.com | less

Export to CSV for analysis

# Amazon product data to CSV
brightdata pipelines amazon_product "https://amazon.com/dp/B09V3KXJPB" --format csv > product.csv

# LinkedIn jobs to CSV
brightdata pipelines linkedin_job_listings "https://linkedin.com/jobs/view/123" --format csv -o jobs.csv

Extract specific fields with jq

# Get just the titles and prices from Amazon search
brightdata pipelines amazon_product_search "laptop" "https://amazon.com" \
  | jq '[.[] | {title, price: .final_price}]'

# Get just organic result URLs from search
brightdata search "web scraping" --json | jq -r '.organic[].link'

Async jobs for heavy workloads

# Submit an async scrape
JOB_ID=$(brightdata scrape https://heavy-page.com --async --json | jq -r '.snapshot_id')

# Do other work...

# Check back later
brightdata status $JOB_ID --wait --pretty

Account management

Monitor costs

# Quick balance check
brightdata budget

# Detailed balance with pending charges
brightdata budget balance

# Cost breakdown by zone
brightdata budget zones

# Specific zone in a date range
brightdata budget zone cli_unlocker --from 2024-01-01T00:00:00 --to 2024-02-01T00:00:00

Manage configuration

# View current config
brightdata config

# Set default output to JSON
brightdata config set default_format json

# Use a custom zone for scraping
brightdata config set default_zone_unlocker my_custom_zone

# Override SERP zone
brightdata config set default_zone_serp my_serp_zone

AI agent integration

Install skills into coding agents

# Interactive picker - choose skills and target agents
brightdata skill add

# Install the scraping skill into your agent
brightdata skill add scrape

# Install search capabilities
brightdata skill add search

# See all available skills
brightdata skill list
Skills are pre-packaged bundles of prompts and configuration that teach AI coding agents how to use Bright Data effectively. See Skills for more details.

Environment variables

Override any stored configuration with environment variables:
VariablePurpose
BRIGHTDATA_API_KEYAPI key (skips login entirely)
BRIGHTDATA_UNLOCKER_ZONEDefault Web Unlocker zone
BRIGHTDATA_SERP_ZONEDefault SERP zone
BRIGHTDATA_POLLING_TIMEOUTPolling timeout in seconds
# Use in CI/CD without login
BRIGHTDATA_API_KEY=your_key brightdata scrape https://example.com

# Override timeout for large pipeline jobs
BRIGHTDATA_POLLING_TIMEOUT=1200 brightdata pipelines amazon_product "https://amazon.com/dp/..."