The collect and parse commands have been removed. The data will be returned from parser code as an object or array, and it will be automatically saved to the output:
// Old codenavigate("https://example.com");collect(parse());// New codenavigate("https://example.com");// New code alternativenavigate("https://example.com");tag_html("html_key");
New commands have been added to provide access to the data from the interaction code: tag_html, tag_request, tag_graphql
In addition, some existing commands have been updated: tag_response, tag_sitemap, tag_all_responses. See IDE documentation for more details.
When using any tag commands, you can provide a custom name. You can then access the data using this name in the parser code under parser.YOUR_KEY.
For tag_html, current browser location URL will be saved under parser.YOUR_KEY_url.
For simple cases when only a single tag_html is needed, it can be skipped, and it will be automatically saved under parser.page.
Reparse is a new feature that allows to reparse the data that was already collected. It can be useful when you want to change the parser code without rerunning the entire interaction code:
When scraper has more than one step, the parser code is only available in the last step. All other steps can only have next_stage calls. To parse something from the page, load_html should be used:
let url =newURL(input.url.replace('https://www.slintel.com','https://6sense.com'));
url =newURL(url.pathname,'https://6sense.com');navigate(url);if(location.href==='https://6sense.com/company')dead_page(`Page not found`);tag_html('html');