Interaction Functions

This article lists and explains the available commands within the Interaction code for writing a scraper using the IDE.

Commands marked with a star are proprietary functions developed by Bright Data.

bad_input

Mark the scraper input as bad. Will prevent any crawl retries error_code=bad_input

bad_input();
bad_input('Missing search term');

blocked

Mark the page as failed because of the website refusing access (error_code=blocked)

blocked();
blocked('Login page was shown');

bounding_box

The box of coordinates that describes the area of an element (relative to the page, not the browser viewport). Only the first element matched will be measured

  • selector: A valid CSS selector for the element
let box = bounding_box('.product-list');
// box == {
//   top: 10,
//   right: 800,
//   bottom: 210,
//   left: 200,
//   x: 200,
//   y: 10,
//   width: 600,
//   height: 200,
// }

browser_size

Returns current browser window size

TBD

capture_graphql

Capture and replay graphql requests with changed variables

  • options: Params to control graphql request to capture
    • url
    • payload
let q = capture_graphql({
    payload: {id: 'ProfileQuery'},
    // you may need to pass url opt as RegExp in case when 
    // graphql endpoint is not "*/graphql" which is default value
    // url: /\bgraphql\b/ // default
});

navigate('https://example.com');

let [first_query, first_response] = q.wait_captured();

collect(first_response.data.profile);

let second = q.replay({
    variables: {other_id: 2},
});

collect(second.data.profile);

click

Click on an element (will wait for the element to appear before clicking on it)

  • selector: Element selector
click('#show-more');
$('#show-more').click()
// Click the closest match to the passed coordinates
// (relative to the page).
// For example, clicking the center pin in a map
let box = bounding_box('#map')
let center = {x: (box.left+box.right)/2, y: (box.top+box.bottom)/2};
click('.map-pin', {coordinates: center});

close_popup

Popups can appear at any time during a crawl and it’s not always clear when you should be waiting for or closing them. Add close_popup() at the top of your code to add a background watcher that will close the popup when it appears. If a popup appears multiple times, it will always be closed

  • popup selector: A valid CSS selector
  • close selector: A valid CSS selector
  • options: click_inside: selector of parent iframe which contains close button selector
close_popup('.popup', '.popup_close');
close_popup('iframe.with-popup', '.popup_close', {click_inside: 'iframe.with-popup'});

collect

Adds a line of data to the dataset created by the crawler

syntax: collect(<data_line>[, <validate_fn>]);

  • data_line: A object with the fields you want to collect
  • validate_fn: Optional function to check that the line data is valid
collect({price: data.price});
collect(product, p=>{
    if (!p.title)
        throw new Error('Product is missing a title');
})

console

Log messages from the interaction code

console.log(1, 'brightdata', [1, 2], {key: value});
console.error(1, 'brightdata', [1, 2], {key: value});

country

Configure your crawl to run from a specific country syntax: country(<code>);

  • code: 2-character ISO country code
country('us');

dead_page

Mark a page as a dead link so you can filter it from your future collections error_code=dead_page

dead_page();
dead_page('Product was removed');

detect_block

Detects a block on the page

  • resource: An object specifying the resource required for the detection
    • selector
  • condition: An object specifying how the resource should be processed for detection
    • exists
    • has_text
detect_block({selector: '.foo'}, {exists: true});
detect_block({selector: '.bar'}, {has_text: 'text'});
detect_block({selector: '.baz'}, {has_text: /regex_pattern/});

disable_event_listeners

Stop all event listeners on the page from running. track_event_listeners() must have been called first

  • event_types: Specific event types that should be disabled
disable_event_listeners();
disable_event_listeners(['hover', 'click']);

el_exists

Check if an element exists on page, and return a boolean accordingly

  • selector: Valid CSS selector
  • timeout: Timeout duration to wait for the element to appear on the page
el_exists('#example'); // => true
el_exists('.does_not_exist'); // => false
el_exists('.does_not_exist', 5e3); // => false (after 5 seconds)

el_is_visible

Check if element is visible on page

  • selector: Valid CSS selector
  • timeout: Timeout duration to wait for the element to be visible on the page
el_is_visible('#example');
el_is_visible('.is_not_visible', 5e3); // false (after 5 seconds)

embed_html_comment

Add a comment in the page HTML. Can be used to embed metadata inside HTML snapshots.

  • comment: Body of the comment
embed_html_comment('trace-id: asdf123');

font_exists

Assert the capability of the browser to render the given font family on the page syntax: font_exists(<font-family>);

font_exists('Liberation Mono');

freeze_page

Force the page to stop making changes. This can be used to save the page in a particular state so page snapshots that run after crawl won’t see a different page state than you see now. This command is experimental. If you see problems, please report them to support

freeze_page();

hover

hover on an element (will wait for the element to appear before hovering on it) syntax: hover(<selector>);

  • selector: Element selector
hover('#item');

html_capture_options

Influence the process of the HTML capturing

  • options: An object which accepts options defining how HTML capturing should be processed
    • coordinate_attributes
html_capture_options({
    coordinate_attributes: true,
});

Image

Collect image data

  • src: Image URL or data:image URI string
let i = new Image('https://example.com/image.png');
collect({image: i});

input

Global object available to the interaction code. Provided by trigger input or next_stage() calls

navigate(input.url);

job

Global object available to the interaction code. Provided by trigger input or next_stage() calls

let {created} = job;

load_html

Load html and return Cheerio instance

  • html: Any HTML string
let $$ = load_html('<p id="p1">p1</p><p id="p2">p2</p>');
collect({data: $$('#p2').text()});

load_more

Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites

  • selector: Selector for the element that contains the lazy-loaded items
load_more(<selector>);
load_more('.search-results');
load_more('.search-results', {children: '.result-item', trigger_selector: '.btn-load-more', timeout: 10000});

load_sitemap

Read a list of urls from a sitemap xml (supports sitemap indexes, and .gz compressed sitemaps. see examples.)

let {pages} = load_sitemap({url: 'https://example.com/sitemap.xml.gz'});
let {children} = load_sitemap({url: 'https://example.com/sitemap-index.xml'});

location

Object with info about current location. Available fields: href

navigate('https://example.com');
location.href; // "https://example.com/"

Money

Collect price/money data

  • value: Amount of money
  • currency: Currency code
let p = new Money(10, 'USD');
collect({product_price: p});

mouse_to

Move the mouse to the specified (x,y) position syntax: mouse_to(<x>, <y>);

  • x: Target x position
  • y: Target y position
mouse_to(0, 0);

Navigate the browser to a URL syntax: navigate(<url>);

  • A 404 status code will throw a dead_page error by default. Use opt.allow_status to override this
  • url: A URL to navigate to
  • opt: navigate options (see examples)
navigate(input.url);
navigate('https://example.com');

// waits until DOM content loaded event is fired in the browser
navigate(`url`, {wait_until: 'domcontentloaded'}); 

// adds a referer to the navigation
navigate(`url`, {referer: `url`}); 

// the number of milliseconds to wait for. Default is 30000 ms
navigate(`url`, {timeout: 45000}); 

// Don't throw an error if this URL sends a 404 status code
navigate(`url`, {allow_status: [404]});

// Specify browser width/height
navigate(`url`, {
    fingerprint: {screen: {width: 400, height: 400}},
});

next_stage

Run the next stage of the crawler with the specified input

  • input: Input object to pass to the next browser session
next_stage({url: 'http://example.com', page: 1});

parse

Parse the page data

let page_data = parse();
collect({
    title: page_data.title,
    price: page_data.price,
});


preserve_proxy_session

Preserve proxy session across children of this page

preserve_proxy_session();

press_key

Type special characters like Enter or Backspace in the currently focused input (usually used after typing something in a search box)

press_key('Enter');
press_key('Backspace');

proxy_location

Configure your crawl to run from a specific location. Unless you need high resolution control over where your crawl is running from, you probably want to use `country(code)` instead

  • configuration: Object with a desired proxy location, check examples for more info
proxy_location({country: 'us'});

// lat in range: [-85, 85], long in range: [-180, 180]
proxy_location({lat: 37.7749, long: 122.4194}); 

// radius in km
proxy_location({lat: 37.7749, long: 122.4194, country: 'US', radius: 100}); 


redirect_history

Returns history of URL redirects since last navigate

navigate('http://google.com');
let redirects = redirect_history();
// returns:
// [
//   'http://google.com',
//   'http://www.google.com',
//   'https://www.google.com/',
// ]

rerun_stage

Run this stage of the crawler again with new input

rerun_stage({url: 'http://example.com/other-page'});

resolve_url

Returns the final URL that the given url argument leads to

  • url: URL string/instance
let {href} = parse().anchor_elem_data;
collect({final_url: resolve_url(href)});

response_headers

Returns the response headers of the last page load

let headers = response_headers();
console.log('content-type', headers['content-type']);

request

Make a direct HTTP request

  • url | options: the url to make the request to, or request options (see examples)
let res = request('http://www.example.com');
let res = request({url: 'http://www.example.com', method: 'POST', headers: {'Content-type': 'application/json'}, body: {hello: 'world'}})

right_click

The same as click but use right mouse button instead (will wait for the element to appear before clicking on it) syntax: right_click(<selector>);

  • selector: Element selector
right_click('#item');

run_stage

Run a specific stage of the crawler with a new browser session

  • stage: Which stage to run (1 is first stage)
  • input: Input object to pass to the next browser session
run_stage(2, {url: 'http://example.com', page: 1});

scroll_to

Scroll the page so that an element is visible.If you’re doing this to trigger loading some more elements from a lazy loaded list, use load_more(). Defaults to scrolling in a natural way, which may take several seconds. If you want to jump immediatley, use {immediate: true} syntax: scroll_to(<selector>);

  • selector: Selector of the element you want to scroll to
scroll_to('.author-profile');
scroll_to('top'); // scroll to the top of the page
scroll_to('bottom'); // scroll to the bottom of the page
scroll_to('top', {immediate: true}); // jump to top of page immediately

scroll_to_all

Scroll through the page so that all the elements matching the selector will be visible on screen syntax: scroll_to_all(<selector>);

  • selector: Selector of the elements you want to scroll through
scroll_to_all('.author-profiles');

select

Pick a value from a select element syntax: select(<select>, <value>);

  • selector: Element selector
select('#country', 'Canada');

set_lines

An array of lines to add to your dataset at the end of this page crawl. Each call to set_lines() will override previous ones, and only the last set of lines will be added into the dataset (tracked per page crawl). This is a good fit when the scraper is set to collect partial on errors. You can keep calling set_lines() with the data you gathered so far, and the last call will be used if the page crawl throws an error syntax: set_lines(<data_line>[, <validate_fn>]);

  • lines: An array of data lines to add to your final dataset
  • validate_fn: Optional function to check that the line data is valid (run once per line)
set_lines(products_so_far);
set_lines(products_so_far, i=>{
    if (!i.price)
        throw new Error('Missing price');
});

Sets a cookie with the given cookie data; may overwrite equivalent cookies if they exist

set_session_cookie(`domain`, `name`, `value`);

set_session_headers

Set extra headers for all the HTTP requests

  • headers: Object with extra headers in key-value format
set_session_headers({'HEADER_NAME': 'HEADER_VALUE'});

solve_captcha

Solve any captchas shown on the page

solve_captcha();
solve_captcha({type: 'simple', selector: '#image', input: '#input'});

status_code

Returns the status code of the last page load

collect({status_code: status_code()});

tag_all_responses

Save the responses from all browser request that match

  • field: The name of the tagged field
  • pattern: The URL pattern to match
  • options: Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible
tag_all_responses(<field>, <pattern>, <options>);
tag_all_responses('resp', /url/, {jsonp: true});
tag_all_responses('resp', /url/, {allow_error: true});
tag_all_responses('profiles', /\/api\/profile/);
navigate('https://example.com/sports');
let profiles = parse().profiles;
for (let profile of profiles)
    collect(profile);

tag_download

Allows to get files downloaded by browser

  • url: A pattern or a string to match requests against
let SEC = 1000;
let download = tag_download(/example.com\/foo\/bar/);
click('button#download');
let file1 = download.next_file({timeout: 10*SEC});
let file2 = download.next_file({timeout: 20*SEC});
collect({file1, file2});

tag_image

Save the image url from an element

  • field: The name of the tagged field
  • selector: A valid CSS selector
tag_image(field, selector);
tag_image('image', '#product-image');

tag_response

Save the response data from a browser request syntax: tag_response(<field>, <pattern>, <options>);

  • name: The name of the tagged field
  • pattern: The URL pattern to match
  • options: Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible
tag_response('resp', /url/, {jsonp: true});
tag_response('resp', /url/, {allow_error: true});
tag_response('resp', (req, res)=>{
            if (req.url.includes('/api/'))
            {
                let request_body = req.body;
                let request_headers = req.headers;
                let response_body = res.body;
                let response_headers = res.headers;
            }
        });

tag_response('teams', /\/api\/teams/);
navigate('https://example.com/sports');
let teams = parse().teams;
for (let team of teams)
    collect(team);

tag_screenshot

Save a screenshot of the page HTML syntax: tag_screenshot(<field>, <options>);

  • field: The name of the tagged field
  • options: Download options (see example)
tag_screenshot('html_screenshot', {filename: 'screen'});
tag_screenshot('view', {full_page: false}); // full_page defaults to true

tag_script

Extract some JSON data saved in a script on the page

syntax: tag_script(<field>, <selector>);

  • name: The name of the tagged script
  • selector: The selector of the script to tag
tag_script('teams', '#preload-data');
tag_script('ssr_state', '#__SSR_DATA__');
navigate('https://example.com/');
collect(parse().ssr_state);

tag_serp

Parse the current page as a search engine result page

  • field: The name of the tagged field
  • type: Parser type: (e.g. bing, google)
tag_serp('serp_bing_results', 'bing')
tag_serp('serp_google_results', 'google')

tag_video

Save the video url from an element

  • field: The name of the tagged field
  • selector: A valid CSS selector
  • opt: download options (see example)
tag_video(field, selector);
tag_video('video', '#product-video', {download: true});

tag_window_field

Tag a js value from the browser page

  • field: The path to the relevant data
tag_window_field(field, key);
tag_window_field('initData', '__INIT_DATA__');

track_event_listeners

Start tracking the event listeners that the browser creates. It’s needed to run disable_event_listeners() later

track_event_listeners();

type

Enter text into an input (will wait for the input to appear before typing)

  • selector: Element selector
  • text: Text to enter
type(<selector>, <text>);
type('#location', 'New York');

// replacing text in input if it is not empty
type(<selector>, <text>, {replace: true}); 

// type text to an element with id ending "input-box" (e.g. <input id="c2E57-input-box">)
type('[id$=input-box]', <text>); 

// dispatching 'Enter' key press
type(<selector>, ['Enter']); 

// typing text and then dispatching 'Enter' key press
type(<selector>, ['Some text', 'Enter']); 

// deleting 1 char from input
type(<selector>, ['Backspace']); 

URL

URL class from NodeJS standard “url” module

  • url: URL string
let u = new URL('https://example.com');

verify_requests

Monitor failed requests with a callback function

  • callback: A function which will be called on each failed request with an object in format: {url, error, type, response}
verify_requests(({url, error, type, response})=>{
    if (response.status!=404 && type=='Font')
        throw new Error('Font failed to load');
});

Video

Collect video data

  • src: Video URL
let v = new Video('https://example.com/video.mp4');
collect({video: v});

wait

Wait for an element to appear on the page

  • selector: Element selector
  • opt: wait options (see examples)
wait(<selector>);
wait('#welcome-splash');
wait('.search-results .product');
wait('[href^="/product"]');

// the number of milliseconds to wait for. Default is 30000 ms
wait(<selector>, {timeout: 5000}); 

// wait for element to be hidden
wait(<selector>, {hidden: true}); 

// wait for element inside in an iframe
wait(<selector>, {inside: '#iframe_id'}); 

wait_any

Wait for any matching condition to succeed

wait_any(['#title', '#notfound']);

wait_for_parser_value

Wait for a parser field to contain a value. This can be useful after you click something to wait for some data to appear

  • field: The parser value path to wait on
  • validate_fn: An optional callback function to validate that the value is correct
  • opt: Extra options (e.g. timeout)
wait_for_parser_value(<field>[, <validate_fn>][, opt]);
wait_for_parser_value('profile');
wait_for_parser_value('listings.0.price', v=>{
            return parseInt(v)>0;
        }, {timeout: 5000});

wait_for_text

Wait for an element on the page to include some text

  • selector: Element selector
  • text: The text to wait for
wait_for_text(<selector>, <text>);
wait_for_text('.location', 'New York');

wait_hidden

Wait for an element to not be visible on the page (removed or hidden)

  • selector: Element selector
wait_hidden(<selector>);
wait_hidden('#welcome-splash');
wait_hidden(<selector>, {timeout: 5000});

wait_network_idle

Wait the browser network has been idle for a given time

  • timeout: Wait for browser network to be idle for X milliseconds
  • options: ignore: an array of patterns to exclude requests from monitoring timeout: how long the network needs to be idle in milliseconds (default 500)
wait_network_idle();
wait_network_idle({
    timeout: 1e3,
    ignore: [/long_request/, 'https://example.com'],
});

wait_page_idle

Wait until no changes are being made on the DOM tree for a given time

  • timeout: Milliseconds to wait for no changes
  • options: An object, which can accept a ignore argument to exclude some elements from monitoring
wait_page_idle();
wait_page_idle({
    ignore: [<selector1>, <selector2>],
    idle_timeout: 1000,
});

wait_visible

Wait for an element to be visible on the page

  • selector: Element selector
wait_visible(<selector>);
wait_visible('#welcome-splash');
wait_visible(<selector>, {timeout: 5000});

$

Helper for jQuery-like expressions

  • selector: Element selector
$(<selector>);
wait($('.store-card'))

emulate_device

View pages as a mobile device. This command will change user agent and screen parameters (resolution and device pixel ratio)

  • device: A string with the name of device
emulate_device('iPhone X');
emulate_device('Pixel 2');

Parser Functions

This article lists and explains the available commands within the Parser code for writing a scraper using the IDE.

input

Global variable available to the parser code

let url = input.url;

$

An instance of cheerio

$('#example').text();
$('$example').attr('href');
$('#example').text_sane(); /* This is like $().text() but also trims text and replace all space characters with single space "a b \t\n\n c" -> "a b c" */
Find more information on the cheerio website.

location

A global variable available to the parser code. Object with info about current location

let current_url = location.href; 

Image

Collect image data

let i = new Image('https://example.com/image.png');
collect({image: i});

Video

Collect video data

let v = new Video('https://example.com/video.mp4');
collect({video: v});

Money

Collect price/money data

let p = new Money(10, 'USD');
collect({product_price: p});