Skip to main content
Every scraper you build in Bright Data Scraper Studio is made of two parts: interaction code that navigates the target site and parser code that extracts structured data from the resulting HTML. This page walks through the core concepts so you can read, write, and debug scrapers with confidence.

Prerequisites

  • Basic JavaScript familiarity (variables, functions, async control flow)
  • An active Bright Data account

What are the two phases of a scraper?

A Bright Data Scraper Studio scraper runs in two phases per page:
  1. Interaction moves through the site to reach the data. That means sending GET or POST requests, following links, handling pagination, submitting forms, and, on a Browser worker, clicking, typing, scrolling, and waiting for elements to appear.
  2. Parsing reads the page HTML (or a captured JSON payload) and returns a structured record.
Call parse() from interaction code once the target page has loaded. That runs the parser code and returns its result. Then call collect() to append the record to the final dataset:
let data = parse();
collect({
  url: new URL(location.href),
  title: data.title,
  links: data.links,
});
The parser code itself uses Cheerio, a jQuery-like API, to extract fields:
return {
  title: $('h1').text().trim(),
  links: $('a').toArray().map(e => new URL($(e).attr('href'))),
};

How do I structure a multi-stage scraper?

Many scrapes need more than one hop, for example “visit a search page, then follow each result URL, then extract each product”. Bright Data Scraper Studio handles this with stages. Each stage is a separate browser session, and next_stage({...}) queues a new input for the next stage. The example below scrapes an ecommerce search across all result pages, following each listing to its detail page. Stage 1, fan out search result pages:
let search_url = `https://example.com/search?q=${input.keyword}`;
navigate(search_url);
let max_page = parse().max_page;
for (let i = 1; i <= max_page; i++) {
  let search_page = new URL(search_url);
  if (i > 1)
    search_page.searchParams.set('page', i);
  next_stage({search_page});
}
Stage 2, fan out listing URLs from each result page:
navigate(input.search_page);
let listings = parse().listings;
for (let listing_url of listings)
  next_stage({listing_url});
Stage 3, collect the final product record:
navigate(input.listing_url);
collect(parse());
The flow:
  1. Stage 1 navigates to the search page and parses out the total number of pages.
  2. Stage 1 calls next_stage({search_page}) once per result page. Each call becomes a new stage-2 input.
  3. Stage 2 navigates to each result page and parses out all listing URLs.
  4. Stage 2 calls next_stage({listing_url}) once per listing. Each call becomes a new stage-3 input.
  5. Stage 3 navigates to each product page and calls collect(parse()) to add the record to the dataset.
Bright Data Scraper Studio parallelizes stages across workers automatically, so fanning out with next_stage() is much faster than walking pagination serially inside one stage.

Which worker type should I use?

Bright Data Scraper Studio offers two worker types:
  • Browser worker: a real headless browser. Needed when the page renders data with JavaScript, or when you need to click, scroll, type, or capture network traffic.
  • Code worker: raw HTTP requests. Faster and cheaper, but cannot run JavaScript or interact with the page.
Start with Code worker. Switch to Browser worker only if the data you need is not in the raw HTTP response. You can change worker type on the same scraper at any time, but browser-only functions (wait, click, scroll_*, tag_*, type, and more) will throw errors if you run them on a Code worker. See Worker types for the full list.

How does Scraper Studio handle blocking and CAPTCHAs?

Scraping at scale runs into the same defenses every time: IP blocks, rate limits, CAPTCHAs, fingerprinting, and bot detection. Bright Data Scraper Studio runs every request through Bright Data’s proxy infrastructure and Web Unlocker API, so the scraper:
  • Rotates through residential, ISP, datacenter, or mobile IPs based on scraper settings
  • Retries blocked requests with a fresh peer session automatically
  • Solves common CAPTCHAs when you call solve_captcha()
  • Mimics real browser fingerprints on Browser worker
You do not manage proxies, sessions, or retries yourself. Focus the scraper code on extracting the data you need and let the platform handle access.

Worker types

Choose between Browser worker and Code worker

Scraper Studio functions

Full reference for interaction and parser commands

Develop a scraper

Step-by-step walkthrough of building a scraper in the IDE

Best practices

Recommended patterns for fast, reliable scrapers