Web scraping basics in Scraper Studio

Bright Data Scraper Studio IDE scrapers are built from a few core parts: inputs, interaction code, parser code, stages, workers, and output records. This page explains how those parts work together so you can understand, edit, and debug scraper code with confidence. For a step-by-step build walkthrough, see Develop a scraper with the IDE.

Prerequisites

Basic JavaScript familiarity (variables, functions, async control flow)
An active Bright Data account

Basic scraper flow

A Scraper Studio IDE scraper usually follows this flow:

The scraper receives an input, such as a URL, keyword, location, or custom value.
Interaction code uses that input to open a page, send a request, click, scroll, paginate, or create the next stage.
Parser code extracts structured data from the loaded page or response.
The scraper saves records to the output dataset with collect() or set_lines().
The final output is delivered according to the scraper’s delivery preferences.

Inputs

Inputs are the values passed into a scraper run. A scraper might use:

url, for page-based scraping
keyword, for search or discovery flows
location, for location-based searches
country, date, category, or any other custom field

The available input fields are defined in the scraper’s input schema. Interaction code reads these values from the input object:

navigate(input.url);

For a keyword-based scraper:

navigate(`https://example.com/search?q=${input.keyword}`);

A scraper does not always need external input. For example, a scraper can use a hardcoded URL if it always collects from the same page.

Interaction code

Interaction code controls how the scraper reaches the data. It can:

Navigate to a page
Send HTTP requests
Wait for page elements
Click buttons
Type into forms
Scroll through a page
Handle pagination
Create additional crawl stages

A simple interaction flow looks like this:

navigate(input.url);
wait('.product-title');

const data = parse();
collect(data);

In this example:

navigate(input.url) opens the page from the input.
wait('.product-title') waits until the expected page element appears.
parse() runs the parser code.
collect(data) saves the extracted record.

Interaction code is responsible for getting to the right page or response. It should not contain most of the extraction logic. That belongs in parser code.

Parser code

Parser code extracts structured fields from the page HTML or response. Parser code commonly uses Cheerio, a jQuery-like API, to read page elements:

return {
  title: $('h1').text_sane(),
  price: $('.price').text_sane(),
  availability: $('.stock-status').text_sane(),
};

The parser returns a JavaScript object. That object becomes the structured data your scraper can collect. For example, a parser can return product data, profile data, listings, article content, or links that the interaction code will use in the next stage.

`parse()` and `collect()`

parse() and collect() connect interaction code, parser code, and final output.

`parse()`

parse() runs the parser code for the current page or response and returns its result. Example:

let data = parse();

`collect()`

collect() appends one record to the output dataset. Example:

collect({
  title: data.title,
  price: data.price,
  url: location.href,
});

Multi-stage scrapers

Some scrapers need more than one step to reach the final data. For example:

Start from a search page.
Discover result pages.
Open each result page.
Extract final details.

Scraper Studio supports this pattern with stages. A stage is a separate crawl step. next_stage() sends new input values to the next stage. Example:

navigate(`https://example.com/search?q=${input.keyword}`);

const results = parse().results;

for (const result of results) {
  next_stage({ url: result.url });
}

Then the next stage can use the new url input:

navigate(input.url);

const data = parse();
collect(data);

Multi-stage scrapers are useful for:

Search results → detail pages
Category pages → product pages
Listing pages → profile pages
Pagination flows
Discovery workflows where one page creates many child pages

Scraper Studio can run stages across workers, so this pattern is usually more scalable than processing every page serially in one long script.

Parent and child crawls

When a scraper uses next_stage(), it creates a parent-child relationship between crawls. For example:

The search page is the parent crawl.
Each result page created from it is a child crawl.

This relationship is useful when debugging. In the crawl inspector, you can see which input or page created another page, inspect child pages, and trace where failures happened in a multi-stage flow.

Code workers and Browser workers

Scraper Studio supports two worker types.

Code worker

A Code worker uses HTTP requests and raw responses. Use a Code worker when:

The data is available in the raw HTML
The data is available from a public JSON endpoint
The page does not require browser interaction
You want faster and more cost-efficient scraping

Code workers cannot click, scroll, type, or run browser-only functions.

Browser worker

A Browser worker uses a real headless browser. Use a Browser worker when:

The page renders data with JavaScript
You need to click, scroll, type, or interact with the page
You need to wait for elements to appear
You need to capture browser network traffic
The site requires browser-like behavior

Start with a Code worker when possible. Switch to a Browser worker when the target data is not available in the raw response or when browser interaction is required. For the complete comparison, see Worker types.\

How does Scraper Studio handle blocking and CAPTCHAs?

Scraping at scale can trigger site defenses such as:

IP blocking
Rate limits
CAPTCHAs
Fingerprinting
Bot detection

Scraper Studio runs on Bright Data’s proxy and unblocking infrastructure, so you do not need to manage proxy rotation, sessions, or retry logic yourself. Depending on the scraper configuration and worker type, Scraper Studio can:

Route requests through Bright Data proxy infrastructure
Retry blocked requests
Use browser-like fingerprints for Browser workers
Support CAPTCHA-solving workflows with solve_captcha() when applicable

Your scraper code should focus on reaching the right pages and extracting the required data. Bright Data handles the access infrastructure behind the scenes.

How schema fits into scraper structure

The scraper code and schema work together.

Input schema

The input schema defines what values the scraper can receive. Example input fields:

{
  "url": "https://example.com/product/1",
  "country": "US"
}

Interaction code reads those values from input.

Output schema

The output schema defines what fields the scraper returns. It is usually generated from the records passed to collect(). Example collected record:

collect({
  title: data.title,
  price: data.price,
  availability: data.availability,
});

Those fields become part of the output schema. For more details, see Input and output schema.

Common scraper patterns

Single-page scraper

Use this pattern when each input maps to one final page. Example:

navigate(input.url);
collect(parse());

Best for:

Product pages
Profile pages
Article pages
Detail pages

Search or discovery scraper

Use this pattern when an input creates multiple result pages. Example:

navigate(`https://example.com/search?q=${input.keyword}`);

const results = parse().results;

for (const result of results) {
  next_stage({ url: result.url });
}

Best for:

Search pages
Category pages
Directory listings
Marketplace discovery

Multi-page detail scraper

Use this pattern when a scraper must move from a list page to many detail pages. Example flow: Input keyword → Search page → Result URLs → Detail pages → Output records Best for:

Product discovery
Lead generation
Local business listings
Review collection
Job listings

Debugging mental model

When a scraper does not return the expected data, check the flow in order:

Input, Did the scraper receive the expected input values?
Interaction code, Did it reach the correct page or response?
Parser code, Did the selectors extract the expected fields?
Stages, Did next_stage() create the expected child pages?
Output records, Did collect() emit the expected structure?
Schema, Does the output schema include the returned fields?
Worker type, Does the page require Browser worker behavior?

This flow helps isolate whether the issue is caused by input data, navigation, parsing, staging, schema, or worker configuration.

Worker types

Choose between Browser worker and Code worker

Scraper Studio functions

Full reference for interaction and parser commands

Develop a scraper

Step-by-step walkthrough of building a scraper in the IDE

Best practices

Recommended patterns for fast, reliable scrapers

Introduction

Product Guides

Web scraping basics in Scraper Studio

Prerequisites

Basic scraper flow

Inputs

Interaction code

Parser code

`parse()` and `collect()`

`parse()`

`collect()`

Multi-stage scrapers

Parent and child crawls

Code workers and Browser workers

Code worker

Browser worker

How does Scraper Studio handle blocking and CAPTCHAs?

How schema fits into scraper structure

Input schema

Output schema

Common scraper patterns

Single-page scraper

Search or discovery scraper

Multi-page detail scraper

Debugging mental model

Worker types

Scraper Studio functions

Develop a scraper

Best practices

​Prerequisites

​Basic scraper flow

​Inputs

​Interaction code

​Parser code

​parse() and collect()

​parse()

​collect()

​Multi-stage scrapers

​Parent and child crawls

​Code workers and Browser workers

​Code worker

​Browser worker

​How does Scraper Studio handle blocking and CAPTCHAs?

​How schema fits into scraper structure

​Input schema

​Output schema

​Common scraper patterns

​Single-page scraper

​Search or discovery scraper

​Multi-page detail scraper

​Debugging mental model

​Related

Worker types

Scraper Studio functions

Develop a scraper

Best practices

Prerequisites

Basic scraper flow

Inputs

Interaction code

Parser code

`parse()` and `collect()`

`parse()`

`collect()`

Multi-stage scrapers

Parent and child crawls

Code workers and Browser workers

Code worker

Browser worker

How does Scraper Studio handle blocking and CAPTCHAs?

How schema fits into scraper structure

Input schema

Output schema

Common scraper patterns

Single-page scraper

Search or discovery scraper

Multi-page detail scraper

Debugging mental model

Related