> ## Documentation Index
> Fetch the complete documentation index at: https://docs.brightdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Web scraping basics in Scraper Studio

> Learn the core web scraping concepts used in Bright Data Scraper Studio: navigation, parsing, worker types, stages, and handling scale-related blocking.

Every scraper you build in Bright Data Scraper Studio is made of two parts: **interaction code** that navigates the target site and **parser code** that extracts structured data from the resulting HTML. This page walks through the core concepts so you can read, write, and debug scrapers with confidence.

## Prerequisites

* Basic JavaScript familiarity (variables, functions, async control flow)
* An active [Bright Data account](https://brightdata.com/)

## What are the two phases of a scraper?

A Bright Data Scraper Studio scraper runs in two phases per page:

1. **Interaction** moves through the site to reach the data. That means sending GET or POST requests, following links, handling pagination, submitting forms, and, on a Browser worker, clicking, typing, scrolling, and waiting for elements to appear.
2. **Parsing** reads the page HTML (or a captured JSON payload) and returns a structured record.

Call `parse()` from interaction code once the target page has loaded. That runs the parser code and returns its result. Then call `collect()` to append the record to the final dataset:

```js theme={null}
let data = parse();
collect({
  url: new URL(location.href),
  title: data.title,
  links: data.links,
});
```

The parser code itself uses Cheerio, a jQuery-like API, to extract fields:

```js theme={null}
return {
  title: $('h1').text().trim(),
  links: $('a').toArray().map(e => new URL($(e).attr('href'))),
};
```

## How do I structure a multi-stage scraper?

Many scrapes need more than one hop, for example "visit a search page, then follow each result URL, then extract each product". Bright Data Scraper Studio handles this with stages. Each stage is a separate browser session, and `next_stage({...})` queues a new input for the next stage.

The example below scrapes an ecommerce search across all result pages, following each listing to its detail page.

**Stage 1, fan out search result pages:**

```js theme={null}
let search_url = `https://example.com/search?q=${input.keyword}`;
navigate(search_url);
let max_page = parse().max_page;
for (let i = 1; i <= max_page; i++) {
  let search_page = new URL(search_url);
  if (i > 1)
    search_page.searchParams.set('page', i);
  next_stage({search_page});
}
```

**Stage 2, fan out listing URLs from each result page:**

```js theme={null}
navigate(input.search_page);
let listings = parse().listings;
for (let listing_url of listings)
  next_stage({listing_url});
```

**Stage 3, collect the final product record:**

```js theme={null}
navigate(input.listing_url);
collect(parse());
```

The flow:

1. Stage 1 navigates to the search page and parses out the total number of pages.
2. Stage 1 calls `next_stage({search_page})` once per result page. Each call becomes a new stage-2 input.
3. Stage 2 navigates to each result page and parses out all listing URLs.
4. Stage 2 calls `next_stage({listing_url})` once per listing. Each call becomes a new stage-3 input.
5. Stage 3 navigates to each product page and calls `collect(parse())` to add the record to the dataset.

Bright Data Scraper Studio parallelizes stages across workers automatically, so fanning out with `next_stage()` is much faster than walking pagination serially inside one stage.

## Which worker type should I use?

Bright Data Scraper Studio offers two worker types:

* **Browser worker**: a real headless browser. Needed when the page renders data with JavaScript, or when you need to click, scroll, type, or capture network traffic.
* **Code worker**: raw HTTP requests. Faster and cheaper, but cannot run JavaScript or interact with the page.

Start with Code worker. Switch to Browser worker only if the data you need is not in the raw HTTP response. You can change worker type on the same scraper at any time, but browser-only functions (`wait`, `click`, `scroll_*`, `tag_*`, `type`, and more) will throw errors if you run them on a Code worker. See [Worker types](/datasets/scraper-studio/worker-types) for the full list.

## How does Scraper Studio handle blocking and CAPTCHAs?

Scraping at scale runs into the same defenses every time: IP blocks, rate limits, CAPTCHAs, fingerprinting, and bot detection. Bright Data Scraper Studio runs every request through Bright Data's [proxy infrastructure](/proxy-networks/introduction) and [Web Unlocker API](/scraping-automation/web-unlocker), so the scraper:

* Rotates through residential, ISP, datacenter, or mobile IPs based on scraper settings
* Retries blocked requests with a fresh peer session automatically
* Solves common CAPTCHAs when you call `solve_captcha()`
* Mimics real browser fingerprints on Browser worker

You do not manage proxies, sessions, or retries yourself. Focus the scraper code on extracting the data you need and let the platform handle access.

## Related

<CardGroup cols={2}>
  <Card title="Worker types" icon="server" href="/datasets/scraper-studio/worker-types">
    Choose between Browser worker and Code worker
  </Card>

  <Card title="Scraper Studio functions" icon="code" href="/datasets/scraper-studio/functions">
    Full reference for interaction and parser commands
  </Card>

  <Card title="Develop a scraper" icon="wrench" href="/datasets/scraper-studio/develop-a-scraper">
    Step-by-step walkthrough of building a scraper in the IDE
  </Card>

  <Card title="Best practices" icon="list-check" href="/datasets/scraper-studio/best-practices">
    Recommended patterns for fast, reliable scrapers
  </Card>
</CardGroup>
