Complete Web Scraper IDE Examples
Explore comprehensive examples of web scraping using the Web Scraper IDE, including code for interaction, parsing, handling multiple results, and advanced techniques.
Introduction
The collect and parse commands have been removed. The data will be returned from parser code as an object or array, and it will be automatically saved to the output:
New commands have been added to provide access to the data from the interaction code: tag_html
, tag_request
, tag_graphql
In addition, some existing commands have been updated: tag_response
, tag_sitemap
, tag_all_responses
. See IDE documentation for more details.
When using any tag commands, you can provide a custom name. You can then access the data using this name in the parser code under parser.YOUR_KEY
.
For tag_html
, current browser location URL will be saved under parser.YOUR_KEY_url
.
For simple cases when only a single tag_html
is needed, it can be skipped, and it will be automatically saved under parser.page.
Sometimes, it’s necessary to get parsed data within the interaction code and use it to make request. See examples how to do it:
Multiple results
To collect multiple results, array can be returned from the parser code.
Reparse
Reparse is a new feature that allows to reparse the data that was already collected. It can be useful when you want to change the parser code without rerunning the entire interaction code:
next_stage
and rerun_stage
When scraper has more than one step, the parser code is only available in the last step. All other steps can only have next_stage
calls. To parse something from the page, load_html
should be used:
Basic PDP scraper
Multiple navigates example
Multiple tag_response
Was this page helpful?