Overview
Discover the Web Scraper API, use our ready-made scrapers designed to streamline data collection and enhance dataset generation. Learn about initiating scrapes, managing data, and system limitations.
We are thrilled to introduce the Web scraper API, designed to simplify and enrich your data acquisition process. This new service allows for a more robust and streamlined way to collect data according to your specific needs. We offer real-time support for up to 20 URL inputs, and batch support for more than 20 inputs, regardless of the scraper type.
How To Collect? (High level)
Trigger a Collection (Demo)
-
Choose the target website from our variety of API offerings
-
Pick the specific scraper you need
-
Update the desired list of Inputs via JSON or CSV
-
Enable the “Include errors report with the results” toggle button
-
Enable the “Deliver results to external storage” toggle
OR
the “Send to webhook” toggle button according to your preference
-
Select whether to deliver the data to an external storage or to deliver it to a webhook using the desired option toggle button
Via Webhook:
-
Set your webhook URL and Authorization header if needed
-
Select your preferred file format (JSON, NDJSON, JSON lines, CSV)
-
Choose whether to send it compressed or not
-
Test webhook to validate that the operation runs successfully (using sample data)
-
Copy the code and run it.
Via Deliver to external storage:
-
Select your preferred delivery location (S3, Google Cloud, Snowflake, or any other available option)
-
Fill out the needed credentials according to your pick
-
Select your preferred file format (JSON, NDJSON, JSON lines, CSV)
-
Copy the code and run it.
Limit records
While running a discovery API, you can set a limit of the number of results per input provided
In the example below, we’ve set a limitation of 10 results per input
Management APIs
Additional actions you can do using our different API endpoints
Get snapshot list
Check your snapshot history with this API. It returns a list of all available snapshots, including the snapshot ID, creation date, and status. (link to endpoint playground)
Monitor Progress
Check your data collection status with this API. It should return “collecting” while gathering data, “digesting” when processing, and “ready” when available. (link to endpoint playground)
Cancel snapshot
Cancel a running collection, stop your data collection before finishing with this API. It should return “ok” while managing to stop the collection. (link to endpoint playground)
Monitor Delivery
Check your delivery status with this API. It should return “done” while the delivery was completed, “canceled” when the delivery was canceled, and “Failed” when the delivery was not completed. (link to endpoint playground)
System limitations
File size
Input | up to 1GB |
Webhook delivery | up to 1GB |
API Download | up to 5GB (for bigger files use API delivery) |
Delivery API | unlimited |
Was this page helpful?