Functions - Bright Data Docs

Interaction Functions

This article lists and explains the available commands within the Interaction code for writing a scraper using the IDE.

Commands marked with a star ⭐ are proprietary functions developed by Bright Data.

`bad_input`

Mark the scraper input as bad. Will prevent any crawl retries error_code=bad_input

bad_input();
bad_input('Missing search term');

`blocked`

Mark the page as failed because of the website refusing access (error_code=blocked)

blocked();
blocked('Login page was shown');

⭐ `bounding_box`

The box of coordinates that describes the area of an element (relative to the page, not the browser viewport). Only the first element matched will be measured

selector: A valid CSS selector for the element

let box = bounding_box('.product-list');
// box == {
//   top: 10,
//   right: 800,
//   bottom: 210,
//   left: 200,
//   x: 200,
//   y: 10,
//   width: 600,
//   height: 200,
// }

⭐ `browser_size`

Returns current browser window size

TBD

⭐ `capture_graphql`

Capture and replay graphql requests with changed variables

options: Params to control graphql request to capture
- url
- payload

let q = capture_graphql({
    payload: {id: 'ProfileQuery'},
    // you may need to pass url opt as RegExp in case when 
    // graphql endpoint is not "*/graphql" which is default value
    // url: /\bgraphql\b/ // default
});

navigate('https://example.com');

let [first_query, first_response] = q.wait_captured();

collect(first_response.data.profile);

let second = q.replay({
    variables: {other_id: 2},
});

collect(second.data.profile);

⭐ `click`

Click on an element (will wait for the element to appear before clicking on it)

selector: Element selector

click('#show-more');
$('#show-more').click()
// Click the closest match to the passed coordinates
// (relative to the page).
// For example, clicking the center pin in a map
let box = bounding_box('#map')
let center = {x: (box.left+box.right)/2, y: (box.top+box.bottom)/2};
click('.map-pin', {coordinates: center});

Popups can appear at any time during a crawl and it’s not always clear when you should be waiting for or closing them. Add close_popup() at the top of your code to add a background watcher that will close the popup when it appears. If a popup appears multiple times, it will always be closed

popup selector: A valid CSS selector
close selector: A valid CSS selector
options: click_inside: selector of parent iframe which contains close button selector

close_popup('.popup', '.popup_close');
close_popup('iframe.with-popup', '.popup_close', {click_inside: 'iframe.with-popup'});

`collect`

Adds a line of data to the dataset created by the crawler

syntax: collect(<data_line>[, <validate_fn>]);

data_line: A object with the fields you want to collect
validate_fn: Optional function to check that the line data is valid

collect({price: data.price});
collect(product, p=>{
    if (!p.title)
        throw new Error('Product is missing a title');
})

`console`

Log messages from the interaction code

console.log(1, 'brightdata', [1, 2], {key: value});
console.error(1, 'brightdata', [1, 2], {key: value});

`country`

Configure your crawl to run from a specific country syntax: country(<code>);

code: 2-character ISO country code

country('us');

`dead_page`

Mark a page as a dead link so you can filter it from your future collections error_code=dead_page

dead_page();
dead_page('Product was removed');

⭐ `detect_block`

Detects a block on the page

resource: An object specifying the resource required for the detection
- selector
condition: An object specifying how the resource should be processed for detection
- exists
- has_text

detect_block({selector: '.foo'}, {exists: true});
detect_block({selector: '.bar'}, {has_text: 'text'});
detect_block({selector: '.baz'}, {has_text: /regex_pattern/});

⭐ `disable_event_listeners`

Stop all event listeners on the page from running. track_event_listeners() must have been called first

event_types: Specific event types that should be disabled

disable_event_listeners();
disable_event_listeners(['hover', 'click']);

`el_exists`

Check if an element exists on page, and return a boolean accordingly

selector: Valid CSS selector
timeout: Timeout duration to wait for the element to appear on the page

el_exists('#example'); // => true
el_exists('.does_not_exist'); // => false
el_exists('.does_not_exist', 5e3); // => false (after 5 seconds)

`el_is_visible`

Check if element is visible on page

selector: Valid CSS selector
timeout: Timeout duration to wait for the element to be visible on the page

el_is_visible('#example');
el_is_visible('.is_not_visible', 5e3); // false (after 5 seconds)

`embed_html_comment`

Add a comment in the page HTML. Can be used to embed metadata inside HTML snapshots.

comment: Body of the comment

embed_html_comment('trace-id: asdf123');

⭐ `font_exists`

Assert the capability of the browser to render the given font family on the page syntax: font_exists(<font-family>);

font_exists('Liberation Mono');

⭐ `freeze_page`

Force the page to stop making changes. This can be used to save the page in a particular state so page snapshots that run after crawl won’t see a different page state than you see now. This command is experimental. If you see problems, please report them to support

freeze_page();

⭐ `hover`

hover on an element (will wait for the element to appear before hovering on it) syntax: hover(<selector>);

selector: Element selector

hover('#item');

⭐ `html_capture_options`

Influence the process of the HTML capturing

options: An object which accepts options defining how HTML capturing should be processed
- coordinate_attributes

html_capture_options({
    coordinate_attributes: true,
});

`Image`

Collect image data

src: Image URL or data:image URI string

let i = new Image('https://example.com/image.png');
collect({image: i});

`input`

Global object available to the interaction code. Provided by trigger input or next_stage() calls

navigate(input.url);

`job`

Global object available to the interaction code. Provided by trigger input or next_stage() calls

let {created} = job;

`load_html`

Load html and return Cheerio instance

html: Any HTML string

let $$ = load_html('<p id="p1">p1</p><p id="p2">p2</p>');
collect({data: $$('#p2').text()});

⭐ `load_more`

Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites

selector: Selector for the element that contains the lazy-loaded items

load_more(<selector>);
load_more('.search-results');
load_more('.search-results', {children: '.result-item', trigger_selector: '.btn-load-more', timeout: 10000});

`load_sitemap`

Read a list of urls from a sitemap xml (supports sitemap indexes, and .gz compressed sitemaps. see examples.)

let {pages} = load_sitemap({url: 'https://example.com/sitemap.xml.gz'});
let {children} = load_sitemap({url: 'https://example.com/sitemap-index.xml'});

`location`

Object with info about current location. Available fields: href

navigate('https://example.com');
location.href; // "https://example.com/"

Money

Collect price/money data

value: Amount of money
currency: Currency code

let p = new Money(10, 'USD');
collect({product_price: p});

⭐ `mouse_to`

Move the mouse to the specified (x,y) position syntax: mouse_to(<x>, <y>);

x: Target x position
y: Target y position

mouse_to(0, 0);

`navigate`

Navigate the browser to a URL syntax: navigate(<url>);

A 404 status code will throw a dead_page error by default. Use opt.allow_status to override this
url: A URL to navigate to
opt: navigate options (see examples)

navigate(input.url);
navigate('https://example.com');

// waits until DOM content loaded event is fired in the browser
navigate(`url`, {wait_until: 'domcontentloaded'}); 

// adds a referer to the navigation
navigate(`url`, {referer: `url`}); 

// the number of milliseconds to wait for. Default is 30000 ms
navigate(`url`, {timeout: 45000}); 

// Don't throw an error if this URL sends a 404 status code
navigate(`url`, {allow_status: [404]});

// Specify browser width/height
navigate(`url`, {
    fingerprint: {screen: {width: 400, height: 400}},
});

`next_stage`

Run the next stage of the crawler with the specified input

input: Input object to pass to the next browser session

next_stage({url: 'http://example.com', page: 1});

`parse`

Parse the page data

let page_data = parse();
collect({
    title: page_data.title,
    price: page_data.price,
});

`preserve_proxy_session`

Preserve proxy session across children of this page

preserve_proxy_session();

⭐ `press_key`

Type special characters like Enter or Backspace in the currently focused input (usually used after typing something in a search box)

press_key('Enter');
press_key('Backspace');

⭐ `proxy_location`

Configure your crawl to run from a specific location. Unless you need high resolution control over where your crawl is running from, you probably want to use country(code) instead

configuration: Object with a desired proxy location, check examples for more info

proxy_location({country: 'us'});

// lat in range: [-85, 85], long in range: [-180, 180]
proxy_location({lat: 37.7749, long: 122.4194}); 

// radius in km
proxy_location({lat: 37.7749, long: 122.4194, country: 'US', radius: 100}); 

⭐ `redirect_history`

Returns history of URL redirects since last navigate

navigate('http://google.com');
let redirects = redirect_history();
// returns:
// [
//   'http://google.com',
//   'http://www.google.com',
//   'https://www.google.com/',
// ]

`rerun_stage`

Run this stage of the crawler again with new input

rerun_stage({url: 'http://example.com/other-page'});

`resolve_url`

Returns the final URL that the given url argument leads to

url: URL string/instance

let {href} = parse().anchor_elem_data;
collect({final_url: resolve_url(href)});

`response_headers`

Returns the response headers of the last page load

let headers = response_headers();
console.log('content-type', headers['content-type']);

`request`

Make a direct HTTP request

url | options: the url to make the request to, or request options (see examples)

let res = request('http://www.example.com');
let res = request({url: 'http://www.example.com', method: 'POST', headers: {'Content-type': 'application/json'}, body: {hello: 'world'}})

⭐ `right_click`

The same as click but use right mouse button instead (will wait for the element to appear before clicking on it) syntax: right_click(<selector>);

selector: Element selector

right_click('#item');

`run_stage`

Run a specific stage of the crawler with a new browser session

stage: Which stage to run (1 is first stage)
input: Input object to pass to the next browser session

run_stage(2, {url: 'http://example.com', page: 1});

⭐ `scroll_to`

Scroll the page so that an element is visible.If you’re doing this to trigger loading some more elements from a lazy loaded list, use load_more(). Defaults to scrolling in a natural way, which may take several seconds. If you want to jump immediatley, use {immediate: true} syntax: scroll_to(<selector>);

selector: Selector of the element you want to scroll to

scroll_to('.author-profile');
scroll_to('top'); // scroll to the top of the page
scroll_to('bottom'); // scroll to the bottom of the page
scroll_to('top', {immediate: true}); // jump to top of page immediately

⭐ `scroll_to_all`

Scroll through the page so that all the elements matching the selector will be visible on screen syntax: scroll_to_all(<selector>);

selector: Selector of the elements you want to scroll through

scroll_to_all('.author-profiles');

⭐ `select`

Pick a value from a select element syntax: select(<select>, <value>);

selector: Element selector

select('#country', 'Canada');

`set_lines`

An array of lines to add to your dataset at the end of this page crawl. Each call to set_lines() will override previous ones, and only the last set of lines will be added into the dataset (tracked per page crawl). This is a good fit when the scraper is set to collect partial on errors. You can keep calling set_lines() with the data you gathered so far, and the last call will be used if the page crawl throws an error syntax: set_lines(<data_line>[, <validate_fn>]);

lines: An array of data lines to add to your final dataset
validate_fn: Optional function to check that the line data is valid (run once per line)

set_lines(products_so_far);
set_lines(products_so_far, i=>{
    if (!i.price)
        throw new Error('Missing price');
});

Sets a cookie with the given cookie data; may overwrite equivalent cookies if they exist

set_session_cookie(`domain`, `name`, `value`);

`set_session_headers`

Set extra headers for all the HTTP requests

headers: Object with extra headers in key-value format

set_session_headers({'HEADER_NAME': 'HEADER_VALUE'});

⭐ `solve_captcha`

Solve any captchas shown on the page

solve_captcha();
solve_captcha({type: 'simple', selector: '#image', input: '#input'});

`status_code`

Returns the status code of the last page load

collect({status_code: status_code()});

⭐ `tag_all_responses`

Save the responses from all browser request that match

field: The name of the tagged field
pattern: The URL pattern to match
options: Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible

tag_all_responses(<field>, <pattern>, <options>);
tag_all_responses('resp', /url/, {jsonp: true});
tag_all_responses('resp', /url/, {allow_error: true});
tag_all_responses('profiles', /\/api\/profile/);
navigate('https://example.com/sports');
let profiles = parse().profiles;
for (let profile of profiles)
    collect(profile);

⭐ `tag_download`

Allows to get files downloaded by browser

url: A pattern or a string to match requests against

let SEC = 1000;
let download = tag_download(/example.com\/foo\/bar/);
click('button#download');
let file1 = download.next_file({timeout: 10*SEC});
let file2 = download.next_file({timeout: 20*SEC});
collect({file1, file2});

⭐ `tag_image`

Save the image url from an element

field: The name of the tagged field
selector: A valid CSS selector

tag_image(field, selector);
tag_image('image', '#product-image');

⭐ `tag_response`

Save the response data from a browser request syntax: tag_response(<field>, <pattern>, <options>);

name: The name of the tagged field
pattern: The URL pattern to match
options: Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible

tag_response('resp', /url/, {jsonp: true});
tag_response('resp', /url/, {allow_error: true});
tag_response('resp', (req, res)=>{
            if (req.url.includes('/api/'))
            {
                let request_body = req.body;
                let request_headers = req.headers;
                let response_body = res.body;
                let response_headers = res.headers;
            }
        });

tag_response('teams', /\/api\/teams/);
navigate('https://example.com/sports');
let teams = parse().teams;
for (let team of teams)
    collect(team);

⭐ `tag_screenshot`

Save a screenshot of the page HTML syntax: tag_screenshot(<field>, <options>);

field: The name of the tagged field
options: Download options (see example)

tag_screenshot('html_screenshot', {filename: 'screen'});
tag_screenshot('view', {full_page: false}); // full_page defaults to true

⭐ `tag_script`

Extract some JSON data saved in a script on the page

syntax: tag_script(<field>, <selector>);

name: The name of the tagged script
selector: The selector of the script to tag

tag_script('teams', '#preload-data');
tag_script('ssr_state', '#__SSR_DATA__');
navigate('https://example.com/');
collect(parse().ssr_state);

⭐ `tag_serp`

Parse the current page as a search engine result page

field: The name of the tagged field
type: Parser type: (e.g. bing, google)

tag_serp('serp_bing_results', 'bing')
tag_serp('serp_google_results', 'google')

⭐ `tag_video`

Save the video url from an element

field: The name of the tagged field
selector: A valid CSS selector
opt: download options (see example)

tag_video(field, selector);
tag_video('video', '#product-video', {download: true});

⭐ `tag_window_field`

Tag a js value from the browser page

field: The path to the relevant data

tag_window_field(field, key);
tag_window_field('initData', '__INIT_DATA__');

⭐ `track_event_listeners`

Start tracking the event listeners that the browser creates. It’s needed to run disable_event_listeners() later

track_event_listeners();

⭐ `type`

Enter text into an input (will wait for the input to appear before typing)

selector: Element selector
text: Text to enter

type(<selector>, <text>);
type('#location', 'New York');

// replacing text in input if it is not empty
type(<selector>, <text>, {replace: true}); 

// type text to an element with id ending "input-box" (e.g. <input id="c2E57-input-box">)
type('[id$=input-box]', <text>); 

// dispatching 'Enter' key press
type(<selector>, ['Enter']); 

// typing text and then dispatching 'Enter' key press
type(<selector>, ['Some text', 'Enter']); 

// deleting 1 char from input
type(<selector>, ['Backspace']); 

`URL`

URL class from NodeJS standard “url” module

url: URL string

let u = new URL('https://example.com');

⭐ `verify_requests`

Monitor failed requests with a callback function

callback: A function which will be called on each failed request with an object in format: {url, error, type, response}

verify_requests(({url, error, type, response})=>{
    if (response.status!=404 && type=='Font')
        throw new Error('Font failed to load');
});

`Video`

Collect video data

src: Video URL

let v = new Video('https://example.com/video.mp4');
collect({video: v});

⭐ `wait`

Wait for an element to appear on the page

selector: Element selector
opt: wait options (see examples)

wait(<selector>);
wait('#welcome-splash');
wait('.search-results .product');
wait('[href^="/product"]');

// the number of milliseconds to wait for. Default is 30000 ms
wait(<selector>, {timeout: 5000}); 

// wait for element to be hidden
wait(<selector>, {hidden: true}); 

// wait for element inside in an iframe
wait(<selector>, {inside: '#iframe_id'}); 

⭐ `wait_any`

Wait for any matching condition to succeed

wait_any(['#title', '#notfound']);

`wait_for_parser_value`

Wait for a parser field to contain a value. This can be useful after you click something to wait for some data to appear

field: The parser value path to wait on
validate_fn: An optional callback function to validate that the value is correct
opt: Extra options (e.g. timeout)

wait_for_parser_value(<field>[, <validate_fn>][, opt]);
wait_for_parser_value('profile');
wait_for_parser_value('listings.0.price', v=>{
            return parseInt(v)>0;
        }, {timeout: 5000});

⭐ `wait_for_text`

Wait for an element on the page to include some text

selector: Element selector
text: The text to wait for

wait_for_text(<selector>, <text>);
wait_for_text('.location', 'New York');

⭐ `wait_hidden`

Wait for an element to not be visible on the page (removed or hidden)

selector: Element selector

wait_hidden(<selector>);
wait_hidden('#welcome-splash');
wait_hidden(<selector>, {timeout: 5000});

⭐ `wait_network_idle`

Wait the browser network has been idle for a given time

timeout: Wait for browser network to be idle for X milliseconds
options: ignore: an array of patterns to exclude requests from monitoring timeout: how long the network needs to be idle in milliseconds (default 500)

wait_network_idle();
wait_network_idle({
    timeout: 1e3,
    ignore: [/long_request/, 'https://example.com'],
});

⭐ `wait_page_idle`

Wait until no changes are being made on the DOM tree for a given time

timeout: Milliseconds to wait for no changes
options: An object, which can accept a ignore argument to exclude some elements from monitoring

wait_page_idle();
wait_page_idle({
    ignore: [<selector1>, <selector2>],
    idle_timeout: 1000,
});

⭐ `wait_visible`

Wait for an element to be visible on the page

selector: Element selector

wait_visible(<selector>);
wait_visible('#welcome-splash');
wait_visible(<selector>, {timeout: 5000});

`$`

Helper for jQuery-like expressions

selector: Element selector

$(<selector>);
wait($('.store-card'))

`Clicking on selector based on text`

Helper for jQuery-like expressions

selector: Element selector

$('.selector').filter_includes('text').click()

⭐ `emulate_device`

View pages as a mobile device. This command will change user agent and screen parameters (resolution and device pixel ratio)

device: A string with the name of device

emulate_device('iPhone X');
emulate_device('Pixel 2');

Parser Functions

This article lists and explains the available commands within the Parser code for writing a scraper using the IDE.

`input`

Global variable available to the parser code

let url = input.url;

`$`

An instance of cheerio

$('#example').text();
$('$example').attr('href');
$('#example').text_sane(); /* This is like $().text() but also trims text and replace all space characters with single space "a b \t\n\n c" -> "a b c" */

Find more information on the cheerio website.

`location`

A global variable available to the parser code. Object with info about current location

let current_url = location.href; 

`Image`

Collect image data

let i = new Image('https://example.com/image.png');
collect({image: i});

`Video`

Collect video data

let v = new Video('https://example.com/video.mp4');
collect({video: v});

`Money`

Collect price/money data

let p = new Money(10, 'USD');
collect({product_price: p});

​Interaction Functions

​bad_input

​blocked

​⭐ bounding_box

​⭐ browser_size

​⭐ capture_graphql

​⭐ click

​⭐ close_popup

​collect

​console

​country

​dead_page

​⭐ detect_block

​⭐ disable_event_listeners

​el_exists

​el_is_visible

​embed_html_comment

​⭐ font_exists

​⭐ freeze_page

​⭐ hover

​⭐ html_capture_options

​Image

​input

​job

​load_html

​⭐ load_more

​load_sitemap

​location

​Money

​⭐ mouse_to

​navigate

​next_stage

​parse

​preserve_proxy_session

​⭐ press_key

​⭐ proxy_location

​⭐ redirect_history

​rerun_stage

​resolve_url

​response_headers

​request

​⭐ right_click

​run_stage

​⭐ scroll_to

​⭐ scroll_to_all

​⭐ select

​set_lines

​set_session_cookie

​set_session_headers

​⭐ solve_captcha

​status_code

​⭐ tag_all_responses

​⭐ tag_download

​⭐ tag_image

​⭐ tag_response

​⭐ tag_screenshot

​⭐ tag_script

​⭐ tag_serp

​⭐ tag_video

​⭐ tag_window_field

​⭐ track_event_listeners

​⭐ type

​URL

​⭐ verify_requests

​Video

​⭐ wait

​⭐ wait_any

​wait_for_parser_value

​⭐ wait_for_text

​⭐ wait_hidden

​⭐ wait_network_idle

​⭐ wait_page_idle

​⭐ wait_visible

​$

​Clicking on selector based on text

​⭐ emulate_device

​Parser Functions

​input

​$

​location

Interaction Functions

`bad_input`

`blocked`

⭐ `bounding_box`

⭐ `browser_size`

⭐ `capture_graphql`

⭐ `click`

⭐ `close_popup`

`collect`

`console`

`country`

`dead_page`

⭐ `detect_block`

⭐ `disable_event_listeners`

`el_exists`

`el_is_visible`

`embed_html_comment`

⭐ `font_exists`

⭐ `freeze_page`

⭐ `hover`

⭐ `html_capture_options`

`Image`

`input`

`job`

`load_html`

⭐ `load_more`

`load_sitemap`

`location`

Money

⭐ `mouse_to`

`navigate`

`next_stage`

`parse`

`preserve_proxy_session`

⭐ `press_key`

⭐ `proxy_location`

⭐ `redirect_history`

`rerun_stage`

`resolve_url`

`response_headers`

`request`

⭐ `right_click`

`run_stage`

⭐ `scroll_to`

⭐ `scroll_to_all`

⭐ `select`

`set_lines`

`set_session_cookie`

`set_session_headers`

⭐ `solve_captcha`

`status_code`

⭐ `tag_all_responses`

⭐ `tag_download`

⭐ `tag_image`

⭐ `tag_response`

⭐ `tag_screenshot`

⭐ `tag_script`

⭐ `tag_serp`

⭐ `tag_video`

⭐ `tag_window_field`

⭐ `track_event_listeners`

⭐ `type`

`URL`

⭐ `verify_requests`

`Video`

⭐ `wait`

⭐ `wait_any`

`wait_for_parser_value`

⭐ `wait_for_text`

⭐ `wait_hidden`

⭐ `wait_network_idle`

⭐ `wait_page_idle`

⭐ `wait_visible`

`$`

`Clicking on selector based on text`

⭐ `emulate_device`

Parser Functions

`input`

`$`

`location`