Overview
Scraper Studio IDE uses two types of code to build web scrapers:- Interaction Code - Controls browser automation and navigation
- Parser Code - Extracts and structures data from HTML
IDE Interaction code
What It Does
Interaction code controls a real browser session to:- Navigate to URLs
- Wait for elements to load
- Click buttons and links
- Fill out forms
- Scroll pages
- Handle dynamic content (JavaScript-heavy sites)
input - Global object available to the interaction code. Provided by trigger input or next_stage() calls
navigate - Navigate the browser session to a URL
url: A URL to navigate to
navigate options
parse - Parse the page data
collect - Adds a line of data to the dataset created by the crawler
data_line: A object with the fields you want to collectvalidate_fn: Optional function to validate that the line data is valid
next_stage - Run the next stage of the crawler with the specified input
input: Input object to pass to the next browser session

rerun_stage - Run this stage of the crawler again with new input
input: Input object to pass to the next browser session
run_stage - Run a specific stage of the crawler with a new browser session
input: Input object to pass to the next browser sessionstage: Which stage to run (1 is first stage)
country - Configure your crawl to run from a specific country
code: 2-character ISO country code
wait - Wait for an element to appear on the page
selector: Element selectoropt: wait options (see examples)
wait_for_text - Wait for an element on the page to include some text
selector: Element selectortext: The text to wait for
click - Click on an element (will wait for the element to appear before clicking on it)
selector: Element selector
type - Enter text into an input (will wait for the input to appear before typing)
selector: Element selectortext: The text to wait for
select - Pick a value from a select element
selector: Element selector
URL - URL class from NodeJS standard “url” module
url: URL string
location - Object with info about current location. Available fields: href
url: URL string
tag_response - Save the response data from a browser request
name: The name of the tagged fieldpattern: The URL pattern to match
response_header - Returns the response headers of the last page load
console - Log messages from the interaction code
load_more - Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites
selector: Element selector
scroll_to - Scroll the page so that an element is visible
$ - Helper for jQuery-like expressions
selector: Element selector
IDE Parser code
Overview
Parser code is responsible for extracting and structuring data from HTML content. Scraper Studio Parser code uses the pre-installed Cheerio library—a library that provides jQuery-like syntax for parsing HTML documents.$ - An instance of cheerio
location - A global variable available to the parser code. An object containing information about the current location.