We are thrilled to introduce a new product, Web scraper API, which is designed to simplify and enrich your data acquisition process. This new service allows for a more robust and streamlined way to collect data and facilitating more effective dataset generation according to your specific needs.

overview

Data Collection APIs

Initiate a scrape

  1. Choose the target website from our variety of API offerings
  2. Update the desired list of Inputs via JSON or CSV
  3. Select whether to deliver the data by webhook or by API

Via Webhook:

  1. Select your preferred file format (JSON, NDJSON, JSON lines, CSV)
  2. Set your webhook URL and Authorization header if needed
  3. Choose whether to send it compressed or not
  4. Test webhook to validate that the operation runs successfully (using sample data)
  5. Copy the code and run it.

delivery-options.png

Via API:

  1. Select your preferred delivery location (S3, Google cloud, Snowflake or any other available option)
  2. Fill out the needed credentials according to your pick
  3. Set your webhook URL and Authorization header if needed
  4. Select your preferred file format (JSON, NDJSON, JSON lines, CSV)
  5. Copy the code and run it.

deliver-snapshot.png

Limit records

While running a discovery API, you can set a limit of the number of results per input

limit-per-input-disabled.png

In the example below, we’ve set a limitation of 10 results per input

limit-per-input-10.png

Management APIs

management-apis.png

Get snapshot list

Check your snapshot history with this API. It returns a list of all available snapshots, including the snapshot ID, creation date, and status.

get-snapshot-list.png

Monitor Progress

Check your data collection status with this API. It should return “collecting” while gathering data, “digesting” when processing, and “ready” when available.

monitor-progress.png

System limitations

File size

Inputup to 1GB
Webhook deliveryup to 1GB
API Downloadup to 5GB (for bigger files use API delivery)
Delivery APIunlimited