Explore comprehensive examples of web scraping using the Web Scraper IDE, including code for interaction, parsing, handling multiple results, and advanced techniques.
The collect and parse commands have been removed. The data will be returned from parser code as an object or array, and it will be automatically saved to the output:
// Old codenavigate("https://example.com");collect(parse());// New codenavigate("https://example.com");// New code alternativenavigate("https://example.com");tag_html("html_key");
New commands have been added to provide access to the data from the interaction code: tag_html, tag_request, tag_graphql
In addition, some existing commands have been updated: tag_response, tag_sitemap, tag_all_responses. See IDE documentation for more details.
When using any tag commands, you can provide a custom name. You can then access the data using this name in the parser code under parser.YOUR_KEY.
For tag_html, current browser location URL will be saved under parser.YOUR_KEY_url.
For simple cases when only a single tag_html is needed, it can be skipped, and it will be automatically saved under parser.page.
Reparse is a new feature that allows to reparse the data that was already collected. It can be useful when you want to change the parser code without rerunning the entire interaction code:
When scraper has more than one step, the parser code is only available in the last step. All other steps can only have next_stage calls. To parse something from the page, load_html should be used:
let url =newURL(input.url.replace('https://www.slintel.com','https://6sense.com'));url =newURL(url.pathname,'https://6sense.com');navigate(url);if(location.href==='https://6sense.com/company')dead_page(`Page not found`);tag_html('html');