Functions
Interaction Functions
This article lists and explains the available commands within the Interaction code for writing a scraper using the IDE.
Commands marked with a star ⭐ are proprietary functions developed by Bright Data.
bad_input
Mark the scraper input as bad. Will prevent any crawl retries error_code=bad_input
blocked
Mark the page as failed because of the website refusing access (error_code=blocked)
⭐ bounding_box
The box of coordinates that describes the area of an element (relative to the page, not the browser viewport). Only the first element matched will be measured
selector
: A valid CSS selector for the element
⭐ browser_size
Returns current browser window size
⭐ capture_graphql
Capture and replay graphql requests with changed variables
- options: Params to control graphql request to capture
- url
- payload
⭐ click
Click on an element (will wait for the element to appear before clicking on it)
selector
: Element selector
⭐ close_popup
Popups can appear at any time during a crawl and it’s not always clear when you should be waiting for or closing them. Add close_popup()
at the top of your code to add a background watcher that will close the popup when it appears. If a popup appears multiple times, it will always be closed
- popup selector: A valid CSS selector
- close selector: A valid CSS selector
- options:
click_inside
: selector of parent iframe which contains close button selector
collect
Adds a line of data to the dataset created by the crawler
syntax: collect(<data_line>[, <validate_fn>]);
data_line
: A object with the fields you want to collectvalidate_fn
: Optional function to check that the line data is valid
console
Log messages from the interaction code
country
Configure your crawl to run from a specific country
syntax: country(<code>);
code
: 2-character ISO country code
dead_page
Mark a page as a dead link so you can filter it from your future collections error_code=dead_page
⭐ detect_block
Detects a block on the page
resource
: An object specifying the resource required for the detectionselector
condition
: An object specifying how the resource should be processed for detectionexists
has_text
⭐ disable_event_listeners
Stop all event listeners on the page from running. track_event_listeners()
must have been called first
event_types
: Specific event types that should be disabled
el_exists
Check if an element exists on page, and return a boolean accordingly
selector
: Valid CSS selectortimeout
: Timeout duration to wait for the element to appear on the page
el_is_visible
Check if element is visible on page
- selector: Valid CSS selector
- timeout: Timeout duration to wait for the element to be visible on the page
embed_html_comment
Add a comment in the page HTML. Can be used to embed metadata inside HTML snapshots.
comment
: Body of the comment
⭐ font_exists
Assert the capability of the browser to render the given font family on the page
syntax: font_exists(<font-family>);
⭐ freeze_page
Force the page to stop making changes. This can be used to save the page in a particular state so page snapshots that run after crawl won’t see a different page state than you see now. This command is experimental. If you see problems, please report them to support
⭐ hover
hover on an element (will wait for the element to appear before hovering on it)
syntax: hover(<selector>);
selector
: Element selector
⭐ html_capture_options
Influence the process of the HTML capturing
options
: An object which accepts options defining how HTML capturing should be processedcoordinate_attributes
Image
Collect image data
src
: Image URL or data:image URI string
input
Global object available to the interaction code. Provided by trigger input or next_stage()
calls
job
Global object available to the interaction code. Provided by trigger input or next_stage()
calls
load_html
Load html and return Cheerio instance
html
: Any HTML string
⭐ load_more
Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites
selector
: Selector for the element that contains the lazy-loaded items
load_sitemap
Read a list of urls from a sitemap xml (supports sitemap indexes, and .gz compressed sitemaps. see examples.)
location
Object with info about current location. Available fields: href
Money
Collect price/money data
value
: Amount of moneycurrency
: Currency code
⭐ mouse_to
Move the mouse to the specified (x,y) position
syntax: mouse_to(<x>, <y>);
x
: Target x positiony
: Target y position
navigate
Navigate the browser to a URL
syntax: navigate(<url>);
- A 404 status code will throw a
dead_page
error by default. Use opt.allow_status
to override this url
: A URL to navigate toopt
: navigate options (see examples)
next_stage
Run the next stage of the crawler with the specified input
input
: Input object to pass to the next browser session
parse
Parse the page data
preserve_proxy_session
Preserve proxy session across children of this page
⭐ press_key
Type special characters like Enter or Backspace in the currently focused input (usually used after typing something in a search box)
⭐ proxy_location
Configure your crawl to run from a specific location. Unless you need high resolution control over where your crawl is running from, you probably want to use country(code)
instead
configuration
: Object with a desired proxy location, check examples for more info
⭐ redirect_history
Returns history of URL redirects since last navigate
rerun_stage
Run this stage of the crawler again with new input
resolve_url
Returns the final URL that the given url argument leads to
url
: URL string/instance
response_headers
Returns the response headers of the last page load
request
Make a direct HTTP request
url
|options
: the url to make the request to, or request options (see examples)
⭐ right_click
The same as click but use right mouse button instead (will wait for the element to appear before clicking on it)
syntax: right_click(<selector>);
selector
: Element selector
run_stage
Run a specific stage of the crawler with a new browser session
- stage: Which stage to run (1 is first stage)
- input: Input object to pass to the next browser session
⭐ scroll_to
Scroll the page so that an element is visible.If you’re doing this to trigger loading some more elements from a lazy loaded list, use load_more()
. Defaults to scrolling in a natural way, which may take several seconds. If you want to jump immediatley, use {immediate: true}
syntax: scroll_to(<selector>);
selector
: Selector of the element you want to scroll to
⭐ scroll_to_all
Scroll through the page so that all the elements matching the selector will be visible on screen
syntax: scroll_to_all(<selector>);
selector
: Selector of the elements you want to scroll through
⭐ select
Pick a value from a select element
syntax: select(<select>, <value>);
selector
: Element selector
set_lines
An array of lines to add to your dataset at the end of this page crawl. Each call to set_lines()
will override previous ones, and only the last set of lines will be added into the dataset (tracked per page crawl). This is a good fit when the scraper is set to collect partial on errors. You can keep calling set_lines()
with the data you gathered so far, and the last call will be used if the page crawl throws an error
syntax: set_lines(<data_line>[, <validate_fn>]);
lines
: An array of data lines to add to your final datasetvalidate_fn
: Optional function to check that the line data is valid (run once per line)
set_session_cookie
Sets a cookie with the given cookie data; may overwrite equivalent cookies if they exist
set_session_headers
Set extra headers for all the HTTP requests
headers
: Object with extra headers in key-value format
⭐ solve_captcha
Solve any captchas shown on the page
status_code
Returns the status code of the last page load
⭐ tag_all_responses
Save the responses from all browser request that match
field
: The name of the tagged fieldpattern
: The URL pattern to matchoptions
: Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible
⭐ tag_download
Allows to get files downloaded by browser
url
: A pattern or a string to match requests against
⭐ tag_image
Save the image url from an element
- field: The name of the tagged field
- selector: A valid CSS selector
⭐ tag_response
Save the response data from a browser request
syntax: tag_response(<field>, <pattern>, <options>);
name
: The name of the tagged fieldpattern
: The URL pattern to matchoptions
: Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible
⭐ tag_screenshot
Save a screenshot of the page HTML
syntax: tag_screenshot(<field>, <options>);
- field: The name of the tagged field
- options: Download options (see example)
⭐ tag_script
Extract some JSON data saved in a script on the page
syntax: tag_script(<field>, <selector>);
- name: The name of the tagged script
- selector: The selector of the script to tag
⭐ tag_serp
Parse the current page as a search engine result page
field
: The name of the tagged fieldtype
: Parser type: (e.g. bing, google)
⭐ tag_video
Save the video url from an element
field
: The name of the tagged fieldselector
: A valid CSS selectoropt
: download options (see example)
⭐ tag_window_field
Tag a js value from the browser page
field
: The path to the relevant data
⭐ track_event_listeners
Start tracking the event listeners that the browser creates. It’s needed to run disable_event_listeners()
later
⭐ type
Enter text into an input (will wait for the input to appear before typing)
selector
: Element selectortext
: Text to enter
URL
URL class from NodeJS standard “url” module
url
: URL string
⭐ verify_requests
Monitor failed requests with a callback function
callback
: A function which will be called on each failed request with an object in format:{url, error, type, response}
Video
Collect video data
src
: Video URL
⭐ wait
Wait for an element to appear on the page
selector
: Element selectoropt
: wait options (see examples)
⭐ wait_any
Wait for any matching condition to succeed
wait_for_parser_value
Wait for a parser field to contain a value. This can be useful after you click something to wait for some data to appear
field
: The parser value path to wait onvalidate_fn
: An optional callback function to validate that the value is correctopt
: Extra options (e.g. timeout)
⭐ wait_for_text
Wait for an element on the page to include some text
selector
: Element selectortext
: The text to wait for
⭐ wait_hidden
Wait for an element to not be visible on the page (removed or hidden)
selector
: Element selector
⭐ wait_network_idle
Wait the browser network has been idle for a given time
timeout
: Wait for browser network to be idle for X millisecondsoptions
: ignore: an array of patterns to exclude requests from monitoring timeout: how long the network needs to be idle in milliseconds (default 500)
⭐ wait_page_idle
Wait until no changes are being made on the DOM tree for a given time
timeout
: Milliseconds to wait for no changesoptions
: An object, which can accept a ignore argument to exclude some elements from monitoring
⭐ wait_visible
Wait for an element to be visible on the page
selector
: Element selector
$
Helper for jQuery-like expressions
selector
: Element selector
⭐ emulate_device
View pages as a mobile device. This command will change user agent and screen parameters (resolution and device pixel ratio)
device
: A string with the name of device
Parser Functions
This article lists and explains the available commands within the Parser code for writing a scraper using the IDE.
input
Global variable available to the parser code
$
An instance of cheerio
location
A global variable available to the parser code. Object with info about current location
Image
Collect image data
Video
Collect video data
Money
Collect price/money data
Was this page helpful?