Coding Environment & Tutorials
Explore essential coding commands and best practices for using the Web Scraper IDE. Learn how to navigate, parse data, interact with elements, and optimize your scraping tasks efficiently.
IDE Interaction code
These are all of the codes that you can do with the IDE
input
- Global object available to the interaction code. Provided by trigger input
or next_stage()
calls
navigate
- Navigate the browser session to a URL
url
: A URL to navigate to
navigate
options
parse
- Parse the page data
collect
- Adds a line of data to the dataset created by the crawler
data_line
: A object with the fields you want to collectvalidate_fn
: Optional function to validate that the line data is valid
next_stage
- Run the next stage of the crawler with the specified input
input
: Input object to pass to the next browser session
rerun_stage
- Run this stage of the crawler again with new input
input
: Input object to pass to the next browser session
run_stage
- Run a specific stage of the crawler with a new browser session
input
: Input object to pass to the next browser sessionstage
: Which stage to run (1 is first stage)
country
- Configure your crawl to run from a specific country
code
: 2-character ISO country code
wait
- Wait for an element to appear on the page
selector
: Element selectoropt
: wait options (see examples)
wait_for_text
- Wait for an element on the page to include some text
selector
: Element selectortext
: The text to wait for
click
- Click on an element (will wait for the element to appear before clicking on it)
selector
: Element selector
type
- Enter text into an input (will wait for the input to appear before typing)
selector
: Element selectortext
: The text to wait for
select
- Pick a value from a select element
selector
: Element selector
URL
- URL class from NodeJS
standard “url” module
url
: URL string
location
- Object with info about current location. Available fields: href
url
: URL string
tag_response
- Save the response data from a browser request
name
: The name of the tagged fieldpattern
: The URL pattern to match
response_header
- Returns the response headers of the last page load
console
- Log messages from the interaction code
load_more
- Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites
selector
: Element selector
scroll_to
- Scroll the page so that an element is visible
$
- Helper for jQuery-like expressions
selector
: Element selector
IDE Parser code
These are all of the codes that you can do with the IDE:
input
- Global variable available to the parser code
$
- An instance of cheerio
location
- A global variable available to the parser code. Object with info about current location
Was this page helpful?