Skip to main content
Our Archive API allows you to access and retrieve Data Snapshots from Bright Data’s cached data collections in a seamless and efficient method.
To access this API, you will need a Bright Data API key
To initiate a search of our Archive, use the following /search endpoint. Endpoint: POST api.brightdata.com/webarchive/search
If the search takes longer than 30 seconds, the response returns only a search_id and you should poll the status asynchronously. If the search completes within 30 seconds, the response returns the full search result object (same as GET /webarchive/search/<search_id>).
POST api.brightdata.com/webarchive/search
{
  "filters": {
    "max_age": "Duration", // mandatory: use max_age OR (min_date + max_date)
    "min_date": "YYYY-MM-DD", // mandatory if max_age is not set
    "max_date": "YYYY-MM-DD", // mandatory if max_age is not set
    "domain_whitelist": ["example.com"],
    "domain_blacklist": ["example.com"],
    "domain_regex_whitelist": [".*example..*"],
    "domain_regex_blacklist": [".*example..*"],
    "domain_like_whitelist": ["%.example.%", "example%"],
    "domain_like_blacklist": ["%.example.ca"],
    "category_whitelist": ["Automotive"],
    "category_blacklist": ["Automotive"],
    "url_regex_whitelist": [".*/products/.*"],
    "url_regex_blacklist": [".*/products/.*"],
    "url_like_whitelist": ["%/products/%", "%/search%"],
    "url_like_blacklist": ["%/review/%"],
    "language_whitelist": ["eng"], // ISO 639-3 letter language codes
    "language_blacklist": ["eng"],
    "ip_country_whitelist": ["us", "ie", "in"],
    "ip_country_blacklist": ["mx", "ae", "ca"],
    "captcha": true,
    "robots_block": true
  }
}
You can run up to 100 searches per day without triggering a dump. Once you trigger a dump, that search no longer count against your limit.
LIKE vs Regex Filters: Use LIKE filters (domain_like_*, url_like_*) for simple pattern matching with % (any sequence) and _ (single character). LIKE patterns are case-insensitive and often faster than regex for simple prefix/suffix matching like %.com or amazon%. Use regex filters (domain_regex_*, url_regex_*) for complex patterns requiring full regex syntax. LIKE patterns use backslash escaping: \% for literal %, \_ for literal _.

Get Search Status

To check the status of a specific query that was made. Endpoint: GET api.brightdata.com/webarchive/search/<search_id> When successful it will retrieve:
  • The number of entries for your query
  • The estimated size and cost of the full Data Snapshot
Pricing & size: estimate_batch_size is measured in bytes. dump_cost_usd is an estimated total cost based on files_count and your current cache/archive pricing tier. The cost_breakdown object shows separate costs for cache vs archive pages.
GET api.brightdata.com/webarchive/search/<search_id>

Get All Search Statuses

Check the status of all current searches. Endpoint: GET api.brightdata.com/webarchive/searches
GET api.brightdata.com/webarchive/searches

How data range affects delivery time

If your query is matching data within last 24 hours - your snapshot will start processing/delivering immediately. If some of your matched data is older than 24 hours - it needs to be retrieved from S3 Glacier Deep Archive storage tier before delivery, which may take up to 72 hours.
Starting February 9, 2026, the hot storage retention period changed from 72 hours to 24 hours.
We recommend using max_age = 12h for initial testing to ensure fast delivery.
Warning: Avoid queries that span the retention boundary (approximately 24 hours from now). Requests with max_age or time ranges that fall within ~24h ± 2h of the current time may include files that have already been migrated to archive storage tier. Attempting a dump for such queries can cause the dump to stall or remain incomplete because of files storage class transition.Recommendations:
  • For real-time data needs: use max_age: "12h" or a narrower window to avoid the retention edge.
  • For historical data (older than 24h): use explicit min_date/max_date filters rather than max_age.
  • If a dump appears stalled: we usually retry automatically, please open a ticket if it didn’t happen.

Deliver Snapshot to Amazon S3 Storage

To use S3 storage delivery, you will first need to do the following:
  • Create an AWS role which gives Bright Data access to your system.
    • During this setup, you will be asked by Amazon for an “external ID” that is used with the role.
    • Your external ID for S3 is your Bright Data Account ID that can be found within Account Settings
  • Once a role is created, you will need to allow our system delivery role to AssumeRole that role.
    • Our system delivery role is: arn:aws:iam::422310177405:role/brd.ec2.zs-dca-delivery
To deliver a specific Snapshot from a specific search_id to S3 storage, use the following /dump endpoint. Endpoint: POST api.brightdata.com/webarchive/dump
Common dump parameters:
  • search_id (required): The search ID from a completed search
  • max_entries (optional): Limit the number of files to include in the dump
  • delivery (required): Delivery configuration (S3, Azure, or webhook)
{
  "search_id": "ucd_abc123xyz",
  "max_entries": 1000000,
  "delivery": {
    "strategy": "s3",
    "settings": {
      "bucket": "your-bucket-name",
      "prefix": "optional/custom/prefix",
      "assume_role": {
        "role_arn": "arn:aws:iam::YOUR_ACCOUNT:role/your-role"
      }
    }
  }
}

Deliver Snapshot to Azure Blob Storage

Deliver a specific Snapshot from a specific search_id directly into an Azure Blob Storage container using the same /dump endpoint. Endpoint: POST api.brightdata.com/webarchive/dump
{
  "search_id": "ucd_abc123xyz",
  "max_entries": 1000000,
  "delivery": {
    "strategy": "azure",
    "settings": {
      "container": "your-container-name",
      "prefix": "optional/custom/prefix",
      "credentials": {
        "account": "your-storage-account",
        "key": "your-account-key"
      }
    }
  }
}

Collect Snapshot via Webhook

Collect a Data Snapshot via webhook from a specific search_id Endpoint: POST api.brightdata.com/webarchive/dump
{
  "search_id": "ucd_abc123xyz",
  "max_entries": 1000000,
  "delivery": {
    "strategy": "webhook",
    "settings": {
      "url": "https://your-domain.com/webhook",
      "auth": "Bearer your-optional-auth-token"
    }
  }
}
If you’re running a linux/macos machine, you can simulate one of our delivery webhooks with the code on this page.

Get Status of Data Snapshot

Check the status of a specific Data Snapshot (dump) using the dump_id. Endpoint: GET api.brightdata.com/webarchive/dump/<dump_id>
GET api.brightdata.com/webarchive/dump/<dump_id>

Get the Status of all Data Snapshots

Endpoint: GET api.brightdata.com/webarchive/dumps
GET api.brightdata.com/webarchive/dumps

High-level process flow diagram

flow diagram