Overview

Our Archive API allows you to access and retrieve Data Snapshots from Bright Data’s cached data collections in a seamless and efficient method.

To access this API, you will need a Bright Data API key

Run a Search

To initiate a search of our Archive, use the following /search endpoint. Endpoint: POST api.brightdata.com/webarchive/search

Request
Response
Code Example
Dictionary

Request

POST api.brightdata.com/webarchive/search
{
    filters: {
        max_age?: Duration,
        min_date?: yyyy-mm-dd,
        max_date?: yyyy-mm-dd,
        domain_whitelist?: ['example.com'],
        domain_blacklist?: ['example.com'],
        domain_regex_whitelist?: ['.*example..*'],
        domain_regex_blacklist?: ['.*example..*'],
        category_whitelist?: ['Automotive'],
        category_blacklist?: ['Automotive'],
        path_regex_whitelist?: ['.*/products/.*'],
        path_regex_blacklist?: ['.*/products/.*'],
        language_whitelist?: ['eng'], // ISO 639-3 letter language codes
        language_blacklist?: ['eng'],
        ip_country_whitelist?: ['us', 'ie', 'in'],
        ip_country_blacklist?: ['mx', 'ae', 'ca'],
        captcha?: true,
        robots_block?: true,
    }
}

You can run up to 100 searches per day without triggering a dump. Once you trigger a dump, that search no longer count against your limit.

Get Search Status

To check the status of a specific query that was made. Endpoint: GET api.brightdata.com/webarchive/search/<search_id> When successful it will retrieve:

The number of entries for your query
The estimated size and cost of the full Data Snapshot

Request
Response
Code Example

GET api.brightdata.com/webarchive/search/<search_id>

Get All Search Statuses

Check the status of all current searches. Endpoint: GET api.brightdata.com/webarchive/searches

Request
Response
Code Example

GET api.brightdat.com/webarchive/searches

How data range affects delivery time

If your query is matching data within last 72h - your snapshot will start processing/delivering immediately. If some of your matched data is older than 72h - it needs to be retrieved from a colder archive before delivery and it may take up to 72h.

We recommend using max_age = 1d for initial testing.

Deliver Snapshot to Amazon S3 Storage

To use S3 storage delivery, you will first need to do the following:

Create an AWS role which gives Bright Data access to your system.
- During this setup, you will be asked by Amazon for an “external ID” that is used with the role.
- Your external ID for S3 is your Bright Data Account ID that can be found within Account Settings
Once a role is created, you will need to allow our system delivery role to AssumeRole that role.
- Our system delivery role is: arn:aws:iam::422310177405:role/brd.ec2.zs-dca-delivery

To deliver a specific Snapshot from a specific search_id to S3 storage, use the following /dump endpoint. Endpoint: POST api.brightdata.com/webarchive/dump

Request
Response
Code Example

POST api.brightdata.com/webarchive/dump
{
    search_id: <search_id>,
    max_entries?: 1000000, // (optional) limit how many files you purchase
    delivery: {
        strategy: 's3', // also supports 'azure' and 'webhook'
	    settings: {
            bucket: <your_bucket_name>,
            prefix: <your_custom_prefix>, // (optional) Customize top-level export folder
            assume_role: {
                role_arn: <role_you_created_above>,
            },
        },
    },
}

Deliver Snapshot to Azure Blob Storage

Deliver a specific Snapshot from a specific search_id directly into an Azure Blob Storage container using the same /dump endpoint. Endpoint: POST api.brightdata.com/webarchive/dump

Request
Response
Code Example

POST api.brightdata.com/webarchive/dump
{
  search_id: <search_id>,
  max_entries?: 1000000, // (optional) limit how many files you purchase
  delivery: {
    strategy: 'azure',
    settings: {
      container: <your_container>,
      prefix: <your_custom_prefix>, // (optional) customize top-level export folder
      credentials: {
        account: <your_account_name>,
        key: <your_account_key>, // use a key with write permission to the container
      },
    },
  },
}

Collect Snapshot via Webhook

Collect a Data Snapshot via webhook from a specific search_id Endpoint: POST api.brightdata.com/webarchive/dump

Request
Response
Code Example

{
    search_id: <search_id>,
    max_entries?: 1000000,
    delivery: {
		strategy: 'webhook',
		settings: {
             url: string(),
             auth?: string(), // will be added as an Authorization header
        },
    }
}

If you’re running a linux/macos machine, you can simulate one of our delivery webhooks with the code on this page.

Get Status of Data Snapshot

Check the status of a specific Data Snapshot (dump) using the dump_id. Endpoint: GET api.brightdata.com/webarchive/dump/<dump_id>

Request
Response
Code Example

GET api.brightdata.com/webarchive/dump/<dump_id>

Get the Status of all Data Snapshots

Endpoint: GET api.brightdata.com/webarchive/dumps

Response
Code Example

200 OK

[
    {
        dump_id: 'ID',
        status: 'in_progress',
        batches_total: 130,
        batches_uploaded: 29,
        files_total: 1241241251,
        estimate_finish: Date
    },
    {
        dump_id: 'ID',
        status: 'done',
        batches_total: 130,
        files_total: 1241241251,
        files_uploaded: 2412515,
        completed_at: Date
    }
    // ... rest of the dumps
]

Introduction

Product Guides

Run a Search

Get Search Status

Get All Search Statuses

How data range affects delivery time

Deliver Snapshot to Amazon S3 Storage

Deliver Snapshot to Azure Blob Storage

Collect Snapshot via Webhook

Get Status of Data Snapshot

Get the Status of all Data Snapshots

High-level process flow diagram

Introduction

Product Guides

​Run a Search

​Get Search Status

​Get All Search Statuses

​How data range affects delivery time

​Deliver Snapshot to Amazon S3 Storage

​Deliver Snapshot to Azure Blob Storage

​Collect Snapshot via Webhook

​Get Status of Data Snapshot

​Get the Status of all Data Snapshots

​High-level process flow diagram

Run a Search

Get Search Status

Get All Search Statuses

How data range affects delivery time

Deliver Snapshot to Amazon S3 Storage

Deliver Snapshot to Azure Blob Storage

Collect Snapshot via Webhook

Get Status of Data Snapshot

Get the Status of all Data Snapshots

High-level process flow diagram