Skip to main content
This reference documents every function available in Bright Data Scraper Studio’s IDE: the interaction code that controls a browser session, and the parser code that turns HTML into structured records. Each function lists its parameters, return value, and a runnable example.
Functions marked with are proprietary to Bright Data Scraper Studio and are not part of any upstream library. Functions listed under Browser-only functions throw an error when called from a Code worker.

How is Scraper Studio code organized?

A Bright Data Scraper Studio scraper uses two code types:
Code typeRoleLanguage and libraries
Interaction codeNavigates the target site: URL requests, clicks, scrolls, waits, and background traffic captureJavaScript + Bright Data browser commands
Parser codeExtracts and structures data from the HTML returned by interaction codeJavaScript + Cheerio ($)
You move data from one to the other with parse() (which runs the parser) and collect() (which appends a record to the final dataset).

Interaction functions

Interaction functions run in the scraper’s main JavaScript context and drive the browser or HTTP client. Use them to navigate, wait for elements, interact with the page, capture network traffic, and hand off data to the parser.

Global objects

NameTypeDescription
inputobjectInput for the current stage, set by the trigger or by a previous next_stage()/run_stage()/rerun_stage() call.
jobobjectMetadata about the current job (for example job.created, the job start timestamp).
locationobjectInfo about the current browser location. Field: href.
parserobjectValues captured by tag_response, tag_script, and related tagging functions, available after wait_for_parser_value().
navigate(input.url);
let {created} = job;
console.log('current url', location.href);
Navigates the browser to a URL. A 404 status throws a dead_page error by default; override with allow_status. Parameters
ParameterTypeRequiredDefaultDescription
urlstring or URLYesTarget URL
opt.wait_untilstringNoloadload, domcontentloaded, networkidle0, or networkidle2
opt.timeoutnumberNo30000Navigation timeout in milliseconds
opt.refererstringNoReferer header to send
opt.allow_statusnumber[]No[]HTTP status codes to accept without throwing
opt.fingerprintobjectNoOverride browser fingerprint (screen.width, screen.height)
navigate(input.url);
navigate('https://example.com');
navigate('https://example.com', {wait_until: 'domcontentloaded'});
navigate('https://example.com', {referer: 'https://google.com'});
navigate('https://example.com', {timeout: 45000});
navigate('https://example.com', {allow_status: [404]});
navigate('https://example.com', {
  fingerprint: {screen: {width: 400, height: 400}},
});

request — Make a direct HTTP request

Sends an HTTP request without using a browser. Use on Code worker, or on Browser worker when you want to bypass the browser. Parameters
ParameterTypeRequiredDescription
url | optionsstring or objectYesURL string, or an object with url, method, headers, body
let res = request('https://www.example.com');
let res = request({
  url: 'https://www.example.com',
  method: 'POST',
  headers: {'Content-type': 'application/json'},
  body: {hello: 'world'},
});

next_stage — Queue input for the next stage

Runs the next stage of the scraper in a new browser session with the given input. Parameters
ParameterTypeRequiredDescription
inputobjectYesInput object passed to the next stage
next_stage({url: 'https://example.com', page: 1});

run_stage — Run a specific stage

Runs a named stage of the scraper in a new browser session. Parameters
ParameterTypeRequiredDescription
stagenumberYesStage index (starts at 1)
inputobjectYesInput object passed to that stage
run_stage(2, {url: 'https://example.com', page: 1});

rerun_stage — Re-run the current stage with new input

Runs this stage again with a new input. Use it to fan out work (for example, one re-run per page in a pagination).
rerun_stage({url: 'https://example.com/other-page'});

load_sitemap — Read URLs from an XML sitemap

Loads a sitemap XML file and returns the URL list. Supports sitemap indexes and gzip-compressed sitemaps. Parameters
ParameterTypeRequiredDescription
options.urlstringYesSitemap URL
let {pages} = load_sitemap({url: 'https://example.com/sitemap.xml.gz'});
let {children} = load_sitemap({url: 'https://example.com/sitemap-index.xml'});

resolve_url — Follow a URL through redirects

Returns the final URL that the given URL argument leads to. Parameters
ParameterTypeRequiredDescription
urlstring or URLYesURL to resolve
let {href} = parse().anchor_elem_data;
collect({final_url: resolve_url(href)});

redirect_history — Get the redirect chain

Returns the history of URL redirects since the last navigate() call.
navigate('http://google.com');
let redirects = redirect_history();
// ['http://google.com', 'http://www.google.com', 'https://www.google.com/']

response_headers — Read the last response headers

Returns the response headers from the last page load.
let headers = response_headers();
console.log('content-type', headers['content-type']);

status_code — Read the last response status

Returns the HTTP status code of the last page load.
collect({status_code: status_code()});

Waiting on the page ⭐

All wait functions are Browser worker only.

wait — Wait for an element to appear

Parameters
ParameterTypeRequiredDefaultDescription
selectorstringYesCSS selector to wait for
opt.timeoutnumberNo30000Timeout in milliseconds
opt.hiddenbooleanNofalseWait for the element to be hidden instead of visible
opt.insidestringNoSelector of an iframe to look inside
wait('#welcome-splash');
wait('.search-results .product');
wait('[href^="/product"]');
wait('#welcome-splash', {timeout: 5000});
wait('#welcome-splash', {hidden: true});
wait('#welcome-splash', {inside: '#iframe_id'});

wait_any — Wait for any of several conditions

Waits for any matching condition to succeed. Returns when the first selector resolves.
wait_any(['#title', '#notfound']);

wait_visible — Wait for an element to be visible

Parameters
ParameterTypeRequiredDefaultDescription
selectorstringYesCSS selector
opt.timeoutnumberNo30000Timeout in milliseconds
wait_visible('#welcome-splash');
wait_visible('#welcome-splash', {timeout: 5000});

wait_hidden — Wait for an element to disappear

Parameters
ParameterTypeRequiredDefaultDescription
selectorstringYesCSS selector
opt.timeoutnumberNo30000Timeout in milliseconds
wait_hidden('#welcome-splash');
wait_hidden('#welcome-splash', {timeout: 5000});

wait_for_text — Wait for text content

Waits for an element on the page to contain the given text. Parameters
ParameterTypeRequiredDescription
selectorstringYesCSS selector
textstringYesText to wait for
wait_for_text('.location', 'New York');

wait_for_parser_value — Wait for a parser field to be populated

Use after tag_response() or tag_script() to wait until the captured data is available. Parameters
ParameterTypeRequiredDescription
fieldstringYesParser field path to wait on
validate_fnfunctionNoOptional callback returning true when the value is valid
opt.timeoutnumberNoTimeout in milliseconds
wait_for_parser_value('profile');
wait_for_parser_value('listings.0.price', v => parseInt(v) > 0, {timeout: 5000});

wait_network_idle — Wait until the browser network settles

Waits until the browser network has been idle for a given period. Parameters
ParameterTypeRequiredDefaultDescription
opt.timeoutnumberNo500Milliseconds of idleness required
opt.ignorearrayNo[]Patterns (string or RegExp) for requests to exclude
wait_network_idle();
wait_network_idle({
  timeout: 1e3,
  ignore: [/long_request/, 'https://example.com'],
});

wait_page_idle — Wait until DOM mutations stop

Waits until no changes are made to the DOM tree for a given period. Parameters
ParameterTypeRequiredDescription
opt.idle_timeoutnumberNoMilliseconds of stability required
opt.ignorearrayNoSelectors to exclude from mutation monitoring
wait_page_idle();
wait_page_idle({
  ignore: ['.live-clock', '.carousel'],
  idle_timeout: 1000,
});

Element interaction ⭐

All interaction functions require Browser worker.

click — Click an element

Clicks an element, waiting for it to appear first. Parameters
ParameterTypeRequiredDescription
selectorstring or arrayYesCSS selector or Shadow DOM selector path
opt.coordinates{x, y}NoClick the closest match to given page coordinates
click('#show-more');
$('#show-more').click();

// Click the map pin closest to the center of a map
let box = bounding_box('#map');
let center = {x: (box.left + box.right) / 2, y: (box.top + box.bottom) / 2};
click('.map-pin', {coordinates: center});

right_click — Right-click an element

Same as click but uses the right mouse button.
right_click('#item');

hover — Hover over an element

Moves the cursor over an element, waiting for it to appear first.
hover('#item');

mouse_to — Move the cursor to a coordinate

Parameters
ParameterTypeRequiredDescription
xnumberYesTarget X position
ynumberYesTarget Y position
mouse_to(0, 0);

type — Enter text into an input

Waits for the input to appear, then types the given text. Parameters
ParameterTypeRequiredDescription
selectorstringYesCSS selector
textstring or arrayYesText to type, or an array of strings and special keys
opt.replacebooleanNoClear existing text before typing
type('#location', 'New York');
type('#location', 'New York', {replace: true});
type('[id$=input-box]', 'search term');
type('#search', ['Some text', 'Enter']);
type('#search', ['Backspace']);

press_key — Press a special key

Types special keys like Enter or Backspace in the currently focused input.
press_key('Enter');
press_key('Backspace');

select — Pick a value from a select element

Parameters
ParameterTypeRequiredDescription
selectorstringYesCSS selector of a <select> element
valuestringYesOption value or visible text
select('#country', 'Canada');

scroll_to — Scroll an element into view

Scrolls the page so a target element is visible. Defaults to natural scrolling; pass immediate: true to jump.
scroll_to('.author-profile');
scroll_to('top');
scroll_to('bottom');
scroll_to('top', {immediate: true});

scroll_to_all — Scroll through every matching element

scroll_to_all('.author-profiles');

load_more — Trigger lazy-loaded content

Scrolls to the bottom of a list to trigger infinite-scroll loading. Parameters
ParameterTypeRequiredDescription
selectorstringYesContainer element holding the lazy-loaded items
opt.childrenstringNoSelector for the individual items
opt.trigger_selectorstringNoSelector for an explicit “load more” button
opt.timeoutnumberNoTimeout in milliseconds
load_more('.search-results');
load_more('.search-results', {
  children: '.result-item',
  trigger_selector: '.btn-load-more',
  timeout: 10000,
});

close_popup — Auto-close popups in the background

Registers a background watcher that closes a popup whenever it appears. See Best practices for the recommended pattern. Parameters
ParameterTypeRequiredDescription
popup_selectorstringYesSelector for the popup container
close_selectorstringYesSelector for the element that closes it
opt.click_insidestringNoParent iframe selector, if the close button is inside an iframe
close_popup('.popup', '.popup_close');
close_popup('iframe.with-popup', '.popup_close', {click_inside: 'iframe.with-popup'});

solve_captcha — Solve CAPTCHAs on the page

solve_captcha();
solve_captcha({type: 'simple', selector: '#image', input: '#input'});

bounding_box — Get an element’s page coordinates

Returns the page-relative bounding box of the first matched element. Parameters
ParameterTypeRequiredDescription
selectorstringYesCSS selector
let box = bounding_box('.product-list');
// box == {top, right, bottom, left, x, y, width, height}

el_exists — Check if an element is on the page

Parameters
ParameterTypeRequiredDefaultDescription
selectorstringYesCSS selector
timeoutnumberNo0Wait up to N milliseconds for the element
el_exists('#example');            // true
el_exists('.does_not_exist');     // false
el_exists('.does_not_exist', 5e3); // false after 5 seconds

el_is_visible — Check if an element is visible

Parameters
ParameterTypeRequiredDefaultDescription
selectorstringYesCSS selector
timeoutnumberNo0Wait up to N milliseconds for visibility
el_is_visible('#example');
el_is_visible('.is_not_visible', 5e3);

track_event_listeners — Start tracking browser event listeners

Must be called before disable_event_listeners().
track_event_listeners();

disable_event_listeners — Disable event listeners

Stops all event listeners from running on the page. Parameters
ParameterTypeRequiredDescription
event_typesstring[]NoSpecific event types to disable
disable_event_listeners();
disable_event_listeners(['hover', 'click']);

freeze_page — Stop further page changes

Forces the page to stop changing, so HTML snapshots reflect exactly what the scraper saw. Experimental.
freeze_page();

Network and response tagging ⭐

Tagging captures background network traffic and exposes it to the parser. All tag_* functions are Browser worker only.

tag_response — Save one matching response

Saves the response data from a matching browser request. Parameters
ParameterTypeRequiredDescription
fieldstringYesName of the parser field to populate
patternRegExp or functionYesURL pattern or match function
opt.jsonpbooleanNoParse JSONP response bodies (auto-detected when possible)
opt.allow_errorbooleanNoCapture responses with non-2xx status codes
tag_response('resp', /url/, {jsonp: true});
tag_response('resp', /url/, {allow_error: true});

tag_response('resp', (req, res) => {
  if (req.url.includes('/api/')) {
    return {
      request_body: req.body,
      request_headers: req.headers,
      response_body: res.body,
      response_headers: res.headers,
    };
  }
});

tag_response('teams', /\/api\/teams/);
navigate('https://example.com/sports');
let teams = parse().teams;
for (let team of teams)
  collect(team);

tag_all_responses — Save every matching response

Saves the response data from every matching request as an array.
tag_all_responses('profiles', /\/api\/profile/);
navigate('https://example.com/sports');
let profiles = parse().profiles;
for (let profile of profiles)
  collect(profile);

tag_script — Extract JSON embedded in a <script> tag

Parameters
ParameterTypeRequiredDescription
fieldstringYesParser field name
selectorstringYesScript tag selector
tag_script('ssr_state', '#__SSR_DATA__');
navigate('https://example.com/');
collect(parse().ssr_state);

tag_window_field — Tag a value on the browser window

Parameters
ParameterTypeRequiredDescription
fieldstringYesParser field name
keystringYeswindow property to read
tag_window_field('initData', '__INIT_DATA__');

tag_image — Capture an image URL from a DOM element

tag_image('image', '#product-image');

tag_video — Capture a video URL from a DOM element

Parameters
ParameterTypeRequiredDescription
fieldstringYesParser field name
selectorstringYesElement selector
opt.downloadbooleanNoDownload the video file
tag_video('video', '#product-video', {download: true});

tag_screenshot — Save a screenshot of the page

Parameters
ParameterTypeRequiredDescription
fieldstringYesParser field name
opt.filenamestringNoOutput filename
opt.full_pagebooleanNoDefaults to true
tag_screenshot('html_screenshot', {filename: 'screen'});
tag_screenshot('view', {full_page: false});

tag_download — Capture files downloaded by the browser

Parameters
ParameterTypeRequiredDescription
urlstring or RegExpYesPattern to match download requests
let SEC = 1000;
let download = tag_download(/example.com\/foo\/bar/);
click('button#download');
let file1 = download.next_file({timeout: 10 * SEC});
let file2 = download.next_file({timeout: 20 * SEC});
collect({file1, file2});

tag_serp — Parse the page as a search engine result page

Parameters
ParameterTypeRequiredDescription
fieldstringYesParser field name
typestringYesParser type: google, bing, etc.
tag_serp('serp_bing_results', 'bing');
tag_serp('serp_google_results', 'google');

capture_graphql — Capture and replay GraphQL queries

Captures a GraphQL request so you can replay it with different variables. Parameters
ParameterTypeRequiredDescription
options.payloadobjectYesKey-value pairs that match the target request payload
options.urlRegExpNoURL pattern for the GraphQL endpoint (defaults to /graphql/)
let q = capture_graphql({
  payload: {id: 'ProfileQuery'},
});
navigate('https://example.com');

let [first_query, first_response] = q.wait_captured();
collect(first_response.data.profile);

let second = q.replay({
  variables: {other_id: 2},
});
collect(second.data.profile);

Data collection

parse — Run the parser code

Runs the parser code and returns the structured result.
let page_data = parse();
collect({
  title: page_data.title,
  price: page_data.price,
});

collect — Append a record to the dataset

Adds one record to the scraper’s output. Parameters
ParameterTypeRequiredDescription
data_lineobjectYesFields to collect
validate_fnfunctionNoCallback that throws on invalid data
collect({price: data.price});
collect(product, p => {
  if (!p.title)
    throw new Error('Product is missing a title');
});

set_lines — Set output lines, overriding previous calls

Each call to set_lines() overrides the previous one. Useful when the scraper collects partial data and you want the last known state delivered if a later step throws. Parameters
ParameterTypeRequiredDescription
linesobject[]YesArray of records
validate_fnfunctionNoValidation callback, run once per line
set_lines(products_so_far);
set_lines(products_so_far, p => {
  if (!p.price)
    throw new Error('Missing price');
});

load_html — Load an HTML string into Cheerio

Parameters
ParameterTypeRequiredDescription
htmlstringYesHTML to parse
let $$ = load_html('<p id="p1">p1</p><p id="p2">p2</p>');
collect({data: $$('#p2').text()});

Marking a crawl as a failure

bad_input — Mark the input as invalid

Prevents any retries and reports error_code=bad_input.
bad_input();
bad_input('Missing search term');

blocked — Mark the page as blocked

Reports that the site refused access. error_code=blocked.
blocked();
blocked('Login page was shown');
Flags the page so it can be filtered from future collections. error_code=dead_page.
dead_page();
dead_page('Product was removed');

detect_block — Detect blocking conditions on the page

Parameters
ParameterTypeRequiredDescription
resource.selectorstringYesElement to check
condition.existsbooleanNoFail if the element exists
condition.has_textstring or RegExpNoFail if the element contains matching text
detect_block({selector: '.foo'}, {exists: true});
detect_block({selector: '.bar'}, {has_text: 'text'});
detect_block({selector: '.baz'}, {has_text: /regex_pattern/});

Session and routing

country — Route through a specific country

Parameters
ParameterTypeRequiredDescription
codestringYesTwo-character ISO country code
country('us');

proxy_location — Fine-grained proxy location

Prefer country() unless you need precise geographic control. Parameters
ParameterTypeRequiredDescription
configuration.countrystringNoTwo-character ISO country code
configuration.latnumberNoLatitude, range [-85, 85]
configuration.longnumberNoLongitude, range [-180, 180]
configuration.radiusnumberNoRadius in kilometers
proxy_location({country: 'us'});
proxy_location({lat: 37.7749, long: 122.4194});
proxy_location({lat: 37.7749, long: 122.4194, country: 'US', radius: 100});

preserve_proxy_session — Reuse the proxy session across child stages

preserve_proxy_session();
Parameters
ParameterTypeRequiredDescription
domainstringYesCookie domain
namestringYesCookie name
valuestringYesCookie value
set_session_cookie('example.com', 'session_id', 'abc123');

set_session_headers — Set extra HTTP headers

Parameters
ParameterTypeRequiredDescription
headersobjectYesHeader key-value pairs
set_session_headers({'X-Custom-Header': 'value'});

Browser configuration ⭐

Browser worker only.

browser_size — Get the current browser window size

Returns {width, height} in pixels.
let size = browser_size();
console.log(size.width, size.height);

emulate_device — Emulate a mobile device

Switches the user agent, screen resolution, and device pixel ratio to match a named device. Parameters
ParameterTypeRequiredDescription
devicestringYesDevice name, e.g. iPhone X, Pixel 2
emulate_device('iPhone X');
emulate_device('Pixel 2');
  • Blackberry PlayBook / landscape
  • BlackBerry Z30 / landscape
  • Galaxy Note 3 / landscape
  • Galaxy Note II / landscape
  • Galaxy S III / S5 / S8 / S9+ (each with landscape)
  • Galaxy Tab S4 / landscape
  • iPad / iPad Mini / iPad Pro / iPad Pro 11 / iPad (gen 6) / iPad (gen 7) (each with landscape)
  • iPhone 4, 5, 6, 6 Plus, 7, 7 Plus, 8, 8 Plus, SE, X, XR, 11, 11 Pro, 11 Pro Max, 12 / 12 Mini / 12 Pro / 12 Pro Max, 13 / 13 Mini / 13 Pro / 13 Pro Max (each with landscape)
  • JioPhone 2 / landscape
  • Kindle Fire HDX / landscape
  • LG Optimus L70 / landscape
  • Microsoft Lumia 550, 950 (950 with landscape)
  • Nexus 4, 5, 5X, 6, 6P, 7, 10 (each with landscape)
  • Nokia Lumia 520 / landscape, Nokia N9 / landscape
  • Pixel 2, 2 XL, 3, 4, 4a (5G), 5 (each with landscape)
  • Moto G4 / landscape

font_exists — Check browser font support

Asserts that the browser can render the given font family.
font_exists('Liberation Mono');

html_capture_options — Configure HTML capture

Controls how the HTML snapshot is captured. Parameters
ParameterTypeRequiredDescription
options.coordinate_attributesbooleanNoEmbed element coordinates as attributes
html_capture_options({
  coordinate_attributes: true,
});

embed_html_comment — Inject a comment into the page HTML

Embeds metadata inside HTML snapshots.
embed_html_comment('trace-id: asdf123');

Debugging and observability

console — Log from interaction code

console.log(1, 'brightdata', [1, 2], {key: 'value'});
console.error('something went wrong');

verify_requests — Monitor failed browser requests

Fires a callback on every failed browser request. Parameters
ParameterTypeRequiredDescription
callbackfunctionYesCalled with {url, error, type, response} for each failed request
verify_requests(({url, error, type, response}) => {
  if (response.status != 404 && type == 'Font')
    throw new Error('Font failed to load');
});

Value constructors

Bright Data Scraper Studio provides typed constructors for structured output fields.

Image, Video, Money

ConstructorArgumentsUse
Image(src)src: image URL or data URICollect image data
Video(src)src: video URLCollect video data
Money(value, currency)value: number, currency: ISO codeCollect monetary values
let img = new Image('https://example.com/image.png');
let vid = new Video('https://example.com/video.mp4');
let price = new Money(10, 'USD');

collect({image: img, video: vid, product_price: price});

URL

Standard Node.js URL class.
let u = new URL('https://example.com');

Parser functions

Parser code runs after interaction code calls parse(). It receives the captured HTML and any tagged data, and returns a single record (or array of records) to the interaction code. Parser code uses Cheerio, a jQuery-compatible HTML parser.

Globals available in parser code

NameTypeDescription
$Cheerio instanceLoaded with the page HTML
inputobjectCurrent stage input
locationobjectCurrent browser location; field: href
parserobjectValues tagged during interaction (from tag_response, tag_script, and related)
let url = input.url;
let current_url = location.href;
$('#example').text();

Cheerio helpers

Bright Data Scraper Studio adds custom Cheerio methods on top of the standard API.

$(selector).text_sane() — Normalize whitespace

Returns text() with all whitespace runs collapsed to a single space and trimmed.
let name = $('a').text_sane();                   // "foo bar baz"
let raw  = $('a').text();                        // "foo   bar\n\n\t baz"

$(selector).filter_includes(text) — Filter elements by text content

Filters a selection to elements whose text includes the given substring. Chainable with the rest of the Cheerio API.
$('.selector').filter_includes('text').click();

Parser value constructors

Image, Video, and Money are also available in parser code and work the same way.
let img = new Image('https://example.com/image.png');
let price = new Money(10, 'USD');
collect({image: img, product_price: price});
For full Cheerio API documentation, see the Cheerio website.

Shadow DOM support

Interaction commands that accept a selector also accept an array of selectors, letting you reach into Shadow DOM trees. Use this with click, wait, type, and other interaction functions. When you pass an array:
  • One selector must target the shadow host element
  • Every selector after it is resolved inside that shadow root
click(['body', 'my-shadow-host', 'button.submit']);
In that example, my-shadow-host is the element with the shadow root attached, and button.submit is resolved inside that shadow root.

Browser-only functions

The following functions require Browser worker and throw not_supported_in_code_worker when called from a Code worker. Use this list to decide which worker your scraper needs.
CategoryFunctions
Waitswait, wait_any, wait_for_text, wait_visible, wait_hidden, wait_network_idle, wait_page_idle
Interactionclick, right_click, hover, mouse_to, type, press_key, select, scroll_to, scroll_to_all, load_more, close_popup, solve_captcha
Taggingtag_response, tag_all_responses, tag_script, tag_window_field, tag_image, tag_video, tag_screenshot, tag_download, tag_serp, capture_graphql
Browser configbrowser_size, emulate_device, font_exists, html_capture_options, freeze_page, track_event_listeners, disable_event_listeners
See Worker types to choose between Browser worker and Code worker.

Best practices

Recommended patterns for writing fast, reliable scrapers

Worker types

When to use Browser worker vs Code worker

Basics of web scraping

Core concepts: interaction, parsing, stages, and scale

Develop a scraper

Step-by-step walkthrough of building a scraper in the IDE