Trigger data collection API
Learn how to trigger data collection using the Web Scraper API with options for discovery and PDP scrapers. Customize requests, set delivery options, and retrieve data efficiently.
Endpoint: POST /datasets/v3/trigger
Creates a request for data collection.
Request
Dataset ID for which data collection is triggered. You can see our available datasets here.
Example:
dataset_id=gd_l1vikfnt1wgvvqz95w
If you want to trigger a collection that includes a Discovery phase, you should pass discover_new as the type. type=discover_new
should always be sent when discover_by
is provided.
Example:
type=discover_new
type=discover_new
Specifies which discovery method to use, discovery types can be
Example:
discover_by=keyword
Available options:
keyword
,best_sellers_url
,category_url
,location
and more (according to the specific API)
Limit results per input when a collection includes a discovery phase
Example: discover by keywords - limit to 10 results per keyword
Ensure the output includes errors report for easier troubleshooting.
Example:
include_errors=true
URL where the notification will be sent once the collection is finished.
Notification will contain snapshot_id
and status
.
Example:
notify=https://notify-me.com/
Authorization header to be used when sending notification to notify
URL or delivering data via webhook endpoint
Example:
auth_header=QWxhZGRpbjpPcGVuU2VzYW1l
Webhook URL where data will be delivered.
Example:
endpoint=https://webhook-url.com
Specifies the format of the data to be delivered to the webhook endpoint.
Supported formats:
JSON
,NDJSON
,JSONL
,CSV
Example:format=json
By default, the data will be sent to the webhook compressed. pass true to send it uncompressed.
Example:
uncompressed_webhook=true
Additional delivery methods: You can use the snapshot_id returned from this API call to trigger a delivery to a specific storage (Amazon S3, Microsoft Azure, etc.) via the delivery API, or use the download API to download it directly.
Body
The inputs to be used by the scraper. Can be provided either as JSON or as a CSV file:
Content-Type: application/json
A JSON array of inputs
Example:
[{"url":"https://www.airbnb.com/rooms/50122531"}]
Content-Type: multipart/form-data
A CSV file, in a field called data
Example (curl):
data=@path/to/your/file.csv
To learn more about scraper inputs, click here
Web Scraper Types
Each scraper can require different inputs. There are 2 main types of scrapers:
1. PDP
These scrapers require URLs as inputs. A PDP scraper extracts detailed product information like specifications, pricing, and features from web pages
2. Discovery
Discovery scrapers allow you to explore and find new entities/products through search, categories, Keywords and more.
Request examples
PDP
with URL input
Input format for PDP
is always a URL, pointing to the page to be scraped.
Discovery input based on the discovery
method
Input format for discovery
can vary according to the specific scraper. Inputs can be:
And more. Find out what inputs each scraper requires here.
Returns
Object containing snapshot_id
, which represents the ID of your request and can be used in the next APIs.
Was this page helpful?