Introduction

The collect and parse commands have been removed. The data will be returned from parser code as an object or array, and it will be automatically saved to the output:

New commands have been added to provide access to the data from the interaction code: tag_html, tag_request, tag_graphql

In addition, some existing commands have been updated: tag_response, tag_sitemap, tag_all_responses. See IDE documentation for more details.

When using any tag commands, you can provide a custom name. You can then access the data using this name in the parser code under parser.YOUR_KEY.

For tag_html, current browser location URL will be saved under parser.YOUR_KEY_url.

For simple cases when only a single tag_html is needed, it can be skipped, and it will be automatically saved under parser.page.

Sometimes, it’s necessary to get parsed data within the interaction code and use it to make request. See examples how to do it:

Multiple results

To collect multiple results, array can be returned from the parser code.

Reparse

Reparse is a new feature that allows to reparse the data that was already collected. It can be useful when you want to change the parser code without rerunning the entire interaction code:

next_stage and rerun_stage

When scraper has more than one step, the parser code is only available in the last step. All other steps can only have next_stage calls. To parse something from the page, load_html should be used:

Basic PDP scraper

Multiple navigates example

Multiple tag_response