Coding Environment & Tutorials
IDE Interaction code
These are all of the codes that you can do with the IDE
input
- Global object available to the interaction code. Provided by trigger input
or next_stage()
calls
navigate(input.url);
navigate
- Navigate the browser session to a URL
url
: A URL to navigate to
navigate([url]);
navigate(input.url);
navigate('https://example.com');
navigate
options
// waits until DOM content loaded event is fired in the browser
navigate([url], {wait_until: 'domcontentloaded'});
// adds a referer to the navigation
navigate([url], {referer: [url]});
// the number of milliseconds to wait for. Default is 30000 ms
navigate([url], {timeout: 45000});
// add headers to the navigation
navigate([url], {header : 'accept: text/html'});
// specify browser width/height
navigate([url], {fingerprint: {screen: {width: 400, height: 400}}});
parse
- Parse the page data
let page_data = parse();
collect
- Adds a line of data to the dataset created by the crawler
data_line
: A object with the fields you want to collectvalidate_fn
: Optional function to validate that the line data is valid
collect([data_line] [validate_fn]);
collect({ title: page_data.title price: page_data.price });
collect({ price: data.price });
collect(line, l=>!l && throw new Error('Empty line'));
next_stage
- Run the next stage of the crawler with the specified input
input
: Input object to pass to the next browser session
next_stage({url: 'http://example.com', page: 1});
rerun_stage
- Run this stage of the crawler again with new input
input
: Input object to pass to the next browser session
rerun_stage({url: 'http://example.com/other-page'});
run_stage
- Run a specific stage of the crawler with a new browser session
input
: Input object to pass to the next browser sessionstage
: Which stage to run (1 is first stage)
run_stage(2, {url: 'http://example.com', page: 1});
country
- Configure your crawl to run from a specific country
code
: 2-character ISO country code
country(<code>);
wait
- Wait for an element to appear on the page
selector
: Element selectoropt
: wait options (see examples)
wait (<selector>);
wait('#welcome-splash');
wait('.search-results .product');
wait('[href^='/product']');
wait(<selector>, {timeout: 5000});
wait(<selector>, {hidden: true});
wait_for_text
- Wait for an element on the page to include some text
selector
: Element selectortext
: The text to wait for
wait_for_text(<selector>, <text>);
wait_for_text('.location', 'New York');
click
- Click on an element (will wait for the element to appear before clicking on it)
selector
: Element selector
click(<selector>);
click('#show-more');
type
- Enter text into an input (will wait for the input to appear before typing)
selector
: Element selectortext
: The text to wait for
type(<selector>, <text>);
type('#location', 'New York');
type(<selector>, ['Enter']);
type(<selector>, ['Backspace']);
select
- Pick a value from a select element
selector
: Element selector
select(<select>, <value>);
select('#country', 'Canada');
URL
- URL class from NodeJS
standard “url” module
url
: URL string
let u = new URL('https://example.com');
location
- Object with info about current location. Available fields: href
url
: URL string
navigate('https://example.com');
location.href;
tag_response
- Save the response data from a browser request
name
: The name of the tagged fieldpattern
: The URL pattern to match
tag_response(<field>, <pattern>);
tag_response('teams', /\/api\/teams/);
navigate('https://example.com/sports');
let teams = parse().teams;
for (let team of teams) collect(team);
response_header
- Returns the response headers of the last page load
let headers = response_headers();
console.log('content-type', headers['content-type']);
console
- Log messages from the interaction code
console.log(1, 'luminati', [1, 2], {key: value});
load_more
- Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites
selector
: Element selector
load_more(<selector>);
load_more('.search-results');
scroll_to
- Scroll the page so that an element is visible
scroll_to(<selector>);
scroll_to('.author-profile');
$
- Helper for jQuery-like expressions
selector
: Element selector
$(<selector>);
wait($('.store-card'))
IDE Parser code
These are all of the codes that you can do with the IDE:
input
- Global variable available to the parser code
let url = input.url;
$
- An instance of cheerio
$('#example').text()
$('$example').attr('href')
location
- A global variable available to the parser code. Object with info about current location
let current_url = location.href;