Discover the Web Scraper API, use our ready-made scrapers designed to streamline data collection and enhance dataset generation. Learn about initiating scrapes, managing data, and system limitations.
The Web Scraper API lets you extract data from websites programmatically. It offers both synchronous and asynchronous scraping methods for different use cases, from quick data retrieval to complex, large-scale extraction jobs.
The API handles real-time processing for up to 20 URL inputs, and batch processing for larger collections, accommodating various scraping requirements.
/scrape
)Synchronous scraping allows you to initiate a scrape and receive results in a single request, ideal for quick data retrieval.
/trigger
)Asynchronous scraping initiates a job that runs in the background, allowing you to handle larger and more complex scraping tasks in batch mode. Batch mode allows up to 100 concurrent requests, and each batch can process up to 1GB of inputs file size, making it ideal for high-volume data collection projects.
Discovery tasks (finding related products, scraping multiple pages) require the asynchronous scraping (/trigger
) due to their need to navigate and extract data across multiple web pages.
Use case | Recommended method |
---|---|
Quick data checks | Synchronous (/scrape ) |
Single page extraction | Synchronous (/scrape ) |
Multiple pages or URLs (For batch processing up to 5K URLs on average) | Asynchronous (/trigger ) |
Complex scraping patterns | Asynchronous (/trigger ) |
Large datasets | Asynchronous (/trigger ) |
/scrape
) for immediate results and simple extractions/trigger
) for complex scraping, multiple URLs, or large datasetsWhile running a discovery API, you can set a limit of the number of results per input provided
In the example below, we’ve set a limitation of 10 results per input
Additional actions you can do using our different API endpoints
Check your snapshot history with this API. It returns a list of all available snapshots, including the snapshot ID, creation date, and status. (link to endpoint playground)
Check your data collection status with this API. It should return “collecting” while gathering data, “digesting” when processing, and “ready” when available. (link to endpoint playground)
Cancel a running collection, stop your data collection before finishing with this API. It should return “ok” while managing to stop the collection. (link to endpoint playground)
Check your delivery status with this API. It should return “done” while the delivery was completed, “canceled” when the delivery was canceled, and “Failed” when the delivery was not completed. (link to endpoint playground)
Input | up to 1GB |
Webhook delivery | up to 1GB |
API Download | up to 5GB (for bigger files use API delivery) |
Delivery API | unlimited |
To ensure stable performance and fair usage, the Web Scraper API enforces rate limits based on the type of request: single input or batch input. Exceeding these limits will result in a 429 error response.
The Web Scraper API supports the following maximum number of concurrent requests:
Method | Rate-limit |
---|---|
Single inputs | up to 500 concurrent requests |
Batch inputs | up to 100 concurrent requests |
If your application exceeds these limits, the API will return the following error:
429 Client Error: Too Many Requests for URL
This error indicates that your request rate has surpassed the allowed threshold.
To reduce the number of concurrent requests and stay within the rate limits:
Discover the Web Scraper API, use our ready-made scrapers designed to streamline data collection and enhance dataset generation. Learn about initiating scrapes, managing data, and system limitations.
The Web Scraper API lets you extract data from websites programmatically. It offers both synchronous and asynchronous scraping methods for different use cases, from quick data retrieval to complex, large-scale extraction jobs.
The API handles real-time processing for up to 20 URL inputs, and batch processing for larger collections, accommodating various scraping requirements.
/scrape
)Synchronous scraping allows you to initiate a scrape and receive results in a single request, ideal for quick data retrieval.
/trigger
)Asynchronous scraping initiates a job that runs in the background, allowing you to handle larger and more complex scraping tasks in batch mode. Batch mode allows up to 100 concurrent requests, and each batch can process up to 1GB of inputs file size, making it ideal for high-volume data collection projects.
Discovery tasks (finding related products, scraping multiple pages) require the asynchronous scraping (/trigger
) due to their need to navigate and extract data across multiple web pages.
Use case | Recommended method |
---|---|
Quick data checks | Synchronous (/scrape ) |
Single page extraction | Synchronous (/scrape ) |
Multiple pages or URLs (For batch processing up to 5K URLs on average) | Asynchronous (/trigger ) |
Complex scraping patterns | Asynchronous (/trigger ) |
Large datasets | Asynchronous (/trigger ) |
/scrape
) for immediate results and simple extractions/trigger
) for complex scraping, multiple URLs, or large datasetsWhile running a discovery API, you can set a limit of the number of results per input provided
In the example below, we’ve set a limitation of 10 results per input
Additional actions you can do using our different API endpoints
Check your snapshot history with this API. It returns a list of all available snapshots, including the snapshot ID, creation date, and status. (link to endpoint playground)
Check your data collection status with this API. It should return “collecting” while gathering data, “digesting” when processing, and “ready” when available. (link to endpoint playground)
Cancel a running collection, stop your data collection before finishing with this API. It should return “ok” while managing to stop the collection. (link to endpoint playground)
Check your delivery status with this API. It should return “done” while the delivery was completed, “canceled” when the delivery was canceled, and “Failed” when the delivery was not completed. (link to endpoint playground)
Input | up to 1GB |
Webhook delivery | up to 1GB |
API Download | up to 5GB (for bigger files use API delivery) |
Delivery API | unlimited |
To ensure stable performance and fair usage, the Web Scraper API enforces rate limits based on the type of request: single input or batch input. Exceeding these limits will result in a 429 error response.
The Web Scraper API supports the following maximum number of concurrent requests:
Method | Rate-limit |
---|---|
Single inputs | up to 500 concurrent requests |
Batch inputs | up to 100 concurrent requests |
If your application exceeds these limits, the API will return the following error:
429 Client Error: Too Many Requests for URL
This error indicates that your request rate has surpassed the allowed threshold.
To reduce the number of concurrent requests and stay within the rate limits: