Skip to main content
Our Archive API allows you to access and retrieve Data Snapshots from Bright Data’s cached data collections in a seamless and efficient method.
To access this API, you will need a Bright Data API key
To initiate a search of our Archive, use the following /search endpoint. Endpoint: POST api.brightdata.com/webarchive/search
Request
POST api.brightdata.com/webarchive/search
{
    filters: {
        max_age?: Duration,
        min_date?: yyyy-mm-dd,
        max_date?: yyyy-mm-dd,
        domain_whitelist?: ['example.com'],
        domain_blacklist?: ['example.com'],
        domain_regex_whitelist?: ['.*example..*'],
        domain_regex_blacklist?: ['.*example..*'],
        domain_like_whitelist?: ['%.example.%', 'example%'],
        domain_like_blacklist?: ['%.example.ca'],
        category_whitelist?: ['Automotive'],
        category_blacklist?: ['Automotive'],
        url_regex_whitelist?: ['.*/products/.*'],
        url_regex_blacklist?: ['.*/products/.*'],
        url_like_whitelist?: ['%/products/%', '%/search%'],
        url_like_blacklist?: ['%/review/%'],
        language_whitelist?: ['eng'], // ISO 639-3 letter language codes
        language_blacklist?: ['eng'],
        ip_country_whitelist?: ['us', 'ie', 'in'],
        ip_country_blacklist?: ['mx', 'ae', 'ca'],
        captcha?: true,
        robots_block?: true,
    }
}
You can run up to 100 searches per day without triggering a dump. Once you trigger a dump, that search no longer count against your limit.
LIKE vs Regex Filters: Use LIKE filters (domain_like_*, url_like_*) for simple pattern matching with % (any sequence) and _ (single character). LIKE patterns are case-insensitive and often faster than regex for simple prefix/suffix matching like %.com or amazon%. Use regex filters (domain_regex_*, url_regex_*) for complex patterns requiring full regex syntax. LIKE patterns use backslash escaping: \% for literal %, \_ for literal _.

Get Search Status

To check the status of a specific query that was made. Endpoint: GET api.brightdata.com/webarchive/search/<search_id> When successful it will retrieve:
  • The number of entries for your query
  • The estimated size and cost of the full Data Snapshot
GET api.brightdata.com/webarchive/search/<search_id>

Get All Search Statuses

Check the status of all current searches. Endpoint: GET api.brightdata.com/webarchive/searches
GET api.brightdata.com/webarchive/searches

How data range affects delivery time

If your query is matching data within last 72h - your snapshot will start processing/delivering immediately. If some of your matched data is older than 72h - it needs to be retrieved from a colder archive before delivery and it may take up to 72h.
We recommend using max_age = 1d for initial testing.
Warning: Avoid queries that span the retention boundary (approximately 72 hours from now). Requests with max_age or time ranges that fall within ~72h ± 6h of the current time may include files that have already been migrated to cold archive. Attempting a dump for such queries can cause the dump to stall or remain incomplete because of files storage class transition. We are working on fix.Recommendations:
  • For real-time data needs: use max_age: "48h" or a narrower window to avoid the retention edge.
  • For historical data (older than 72h): use explicit min_date/max_date filters rather than max_age.
  • If a dump appears stalled: we usually retry automatically, please open a ticket if it didn’t happened.

Deliver Snapshot to Amazon S3 Storage

To use S3 storage delivery, you will first need to do the following:
  • Create an AWS role which gives Bright Data access to your system.
    • During this setup, you will be asked by Amazon for an “external ID” that is used with the role.
    • Your external ID for S3 is your Bright Data Account ID that can be found within Account Settings
  • Once a role is created, you will need to allow our system delivery role to AssumeRole that role.
    • Our system delivery role is: arn:aws:iam::422310177405:role/brd.ec2.zs-dca-delivery
To deliver a specific Snapshot from a specific search_id to S3 storage, use the following /dump endpoint. Endpoint: POST api.brightdata.com/webarchive/dump
POST api.brightdata.com/webarchive/dump
{
    search_id: <search_id>,
    max_entries?: 1000000, // (optional) limit how many files you purchase
    delivery: {
        strategy: 's3', // also supports 'azure' and 'webhook'
	    settings: {
            bucket: <your_bucket_name>,
            prefix: <your_custom_prefix>, // (optional) Customize top-level export folder
            assume_role: {
                role_arn: <role_you_created_above>,
            },
        },
    },
}

Deliver Snapshot to Azure Blob Storage

Deliver a specific Snapshot from a specific search_id directly into an Azure Blob Storage container using the same /dump endpoint. Endpoint: POST api.brightdata.com/webarchive/dump
POST api.brightdata.com/webarchive/dump
{
  search_id: <search_id>,
  max_entries?: 1000000, // (optional) limit how many files you purchase
  delivery: {
    strategy: 'azure',
    settings: {
      container: <your_container>,
      prefix: <your_custom_prefix>, // (optional) customize top-level export folder
      credentials: {
        account: <your_account_name>,
        key: <your_account_key>, // use a key with write permission to the container
      },
    },
  },
}

Collect Snapshot via Webhook

Collect a Data Snapshot via webhook from a specific search_id Endpoint: POST api.brightdata.com/webarchive/dump
{
    search_id: <search_id>,
    max_entries?: 1000000,
    delivery: {
		strategy: 'webhook',
		settings: {
             url: string(),
             auth?: string(), // will be added as an Authorization header
        },
    }
}
If you’re running a linux/macos machine, you can simulate one of our delivery webhooks with the code on this page.

Get Status of Data Snapshot

Check the status of a specific Data Snapshot (dump) using the dump_id. Endpoint: GET api.brightdata.com/webarchive/dump/<dump_id>
GET api.brightdata.com/webarchive/dump/<dump_id>

Get the Status of all Data Snapshots

Endpoint: GET api.brightdata.com/webarchive/dumps
200 OK
[
    {
        dump_id: 'ID',
        status: 'in_progress',
        batches_total: 130,
        batches_uploaded: 29,
        files_total: 1241241251,
        estimate_finish: Date
    },
    {
        dump_id: 'ID',
        status: 'done',
        batches_total: 130,
        files_total: 1241241251,
        files_uploaded: 2412515,
        completed_at: Date
    }
    // ... rest of the dumps
]

High-level process flow diagram

flow diagram