Custom Dataset API
Custom Dataset API is a feature designed to enrich your data acquisition process.
This update allows for a more granular and streamlined way to request and manage your data collections, facilitating more effective dataset generation according to your specific needs.
Understanding When to Use Each API:
Initial Collection Without Customer-Defined View:
The 3 primary API endpoints serve distinct purposes in the data collection workflow, facilitating a structured and efficient process in obtaining tailored datasets.
Requesting a Collection:
Endpoint: POST
https://api.brightdata.com/datasets/request_collection
Parameters:
Dataset ID
discover_new
OR url_collection
Array - json
multipart - csv
Processing may take several minutes, based on the number of inputs. When you request to discover (‘discover_new’), finding all links (PDPs) may take time.
Checking Status of the Collection Above:
Endpoint: GET
https://api.brightdata.com/datasets/request_collection
Parameters:
Obtain from the previous API.
Sets data freshness.
If data is within this period (e.g., req ested 1 wee , collected 5 days ago), 0 new scrape occurs. If data is not fresh, we scrape it now.
- 1 week: 604,800,000 ms
- 1 month: 2,592,000,000 ms
Response Indicating Nmber of Records and Freshness Found:
The request is still running:
Issue with one (or more) inputs: in this case the url was sent as URL
Initiating a Collection:
Endpoint: POST
https://api.brightdata.com/datasets/initiate_collection
Parameters:
The unique identifier for the collection request you are inquiring about.
The time in milliseconds indicating the desired data freshness.
The time in milliseconds indicating the desired data freshness.
Collection After Defining a View:
Initiating a Collection:
Endpoint: POST
https://api.brightdata.com/datasets/initiate
Parameters:
discover_new
OR url_collection
Array - json
multipart - csv
Dataset will be delivered to the setting configured for this view.
By leveraging these enhanced capabilities, users can now tailor their data collection processes more efficiently, ensuring that the datasets generated are aligned with their project requirements.
How to retrieve results of snapshot that was already collected
Was this page helpful?