> ## Documentation Index
> Fetch the complete documentation index at: https://docs.brightdata.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Filter Dataset (BETA)

> Use the Bright Data Marketplace Dataset API to filter Dataset (BETA). Spans 250+ domains in the Bright Data marketplace.

<Tip>
  Paste your API key to the authorization field. To get an API key, [Create an account](https://brightdata.com/?hs_signup=1\&utm_source=docs\&utm_campaign=playground) and learn [how to generate an API key](/api-reference/authentication#how-do-i-generate-a-new-api-key%3F)
</Tip>

## General Description

* A call to this endpoint starts the async job of filtering the dataset and creating a snapshot with filtered data in your account.
* The maximum amount of time for the job to finish is 5 minutes. If the job doesn't finish in this timeframe it will be cancelled.
* Creating the dataset snapshot is subject to charges based on the snapshot size and record price.
* The maximum depth of nesting the filter groups is 3.

## Limits

| Limit                     | Value             | Description                                                                                                                                       |
| ------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Max rows per file**     | 10,000            | Each uploaded CSV/JSON file can contain up to 10,000 data rows. The header row is not counted.                                                    |
| **Max files per request** | No limit          | You can attach as many files as needed in a single multipart/form-data request, as long as the total request size stays within the 200 MiB limit. |
| **Max request size**      | 200 MiB           | Total size of all uploaded files and form data combined. Requests exceeding 200 MiB will be rejected.                                             |
| **Job timeout**           | 5 minutes         | If filtering doesn't complete within 5 minutes the job is cancelled.                                                                              |
| **Filter nesting depth**  | 3 levels          | Maximum depth for nested filter groups using `and`/`or` operators.                                                                                |
| **Rate limit**            | 120 requests/hour | Maximum number of Filter API calls per hour.                                                                                                      |

## Modes of Use

### 1. JSON Mode (No File Uploads)

Use this when you are not uploading any files.

* All parameters (`dataset_id`, `records_limit`, and `filter`) are sent in the JSON request body.
* `Content-Type` must be `application/json`.
* No query parameters are used.

```bash Example theme={null}
curl --request POST \
  --url https://api.brightdata.com/datasets/filter \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
    "dataset_id": "gd_l1viktl72bvl7bjuj0",
    "records_limit": 100,
    "filter": {
      "name": "name",
      "operator": "=",
      "value": "John"
    }
  }'
```

***

### 2. Multipart/Form-Data Mode (File Uploads)

Use this when uploading CSV or JSON files containing filter values.

* `dataset_id` and `records_limit` must be sent as **query parameters** in the URL.
* The `filter` and any uploaded files are included in the **form-data body**.
* `Content-Type` must be `multipart/form-data`.

<Note>
  Each uploaded file can contain up to 10,000 data rows (header row excluded). There is no limit on the number of files per request. The total request size must not exceed 200 MiB.
</Note>

```bash Example theme={null}
curl --request POST \
  --url "https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vijqt9jfj7olije&records_limit=100" \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'filter={"operator":"and","filters":[{"name":"industries:value","operator":"includes","value":"industries.csv"}]}' \
  --form 'files[]=@/path/to/industries.csv'
```

**Example: Excluding 100k+ values using multiple files**

Split your values into files of up to 10,000 rows each, then attach them all in a single request:

```bash Example theme={null}
curl --request POST \
  --url "https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vijqt9jfj7olije&records_limit=5000" \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form 'filter={"operator":"and","filters":[{"name":"company_id","operator":"not_in","value":"exclude1.csv"},{"name":"company_id","operator":"not_in","value":"exclude2.csv"},{"name":"company_id","operator":"not_in","value":"exclude3.csv"}]}' \
  --form 'files[]=@exclude1.csv' \
  --form 'files[]=@exclude2.csv' \
  --form 'files[]=@exclude3.csv'
```

Each CSV file should have a header row matching the field name, followed by one value per line:

```csv exclude1.csv theme={null}
company_id
12345
67890
...
```

***

## Filter Syntax

### Operators

The following table shows operators that can be used in the field filter.

| Operator             | Field Types  | Description                                                                                                                                                                                                                                                                                                         |
| -------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| =                    | Any          | Equal to                                                                                                                                                                                                                                                                                                            |
| !=                   | Any          | Not equal to                                                                                                                                                                                                                                                                                                        |
| \<                   | Number, Date | Lower than                                                                                                                                                                                                                                                                                                          |
| \<=                  | Number, Date | Lower than or equal                                                                                                                                                                                                                                                                                                 |
| >                    | Number, Date | Greater than                                                                                                                                                                                                                                                                                                        |
| >=                   | Number, Date | Greater than or equal                                                                                                                                                                                                                                                                                               |
| in                   | Any          | Tests if field value is equal to any of the values provided in filter's value                                                                                                                                                                                                                                       |
| not\_in              | Any          | Tests if field value is not equal to all of the values provided in filter's value                                                                                                                                                                                                                                   |
| includes             | Array, Text  | Tests if the field value contains the filter value. If the filter value is a single string, it matches records where the field value contains that string. If the filter value is an array of strings, it matches records where the field value contains a least one string from the array.                         |
| not\_includes        | Array, Text  | Tests if the field value does not contain the filter value. If the filter value is a single string, it matches records where the field value does not contain that string. If the filter value is an array of strings, it matches records where the field value does not contain any of the strings from the array. |
| array\_includes      | Array        | Tests if filter value is in field value (exact match)                                                                                                                                                                                                                                                               |
| not\_array\_includes | Array        | Tests if filter value is not in field value (exact match)                                                                                                                                                                                                                                                           |
| is\_null             | Any          | Tests if the field value is equal to NULL. Operator does not accept any value.                                                                                                                                                                                                                                      |
| is\_not\_null        | Any          | Tests if the field value is not equal to NULL. Operator does not accept any value.                                                                                                                                                                                                                                  |

### Combining Multiple Filters

Multiple field filters can be combined into the filter group using 2 logical operators: 'and', 'or'.
API supports filters with a maximum nesting depth of 3.
Example of filter group:

```json theme={null}
{
    // operator can be one of ["and", "or"]
    "operator": "and",
    // an array of field filters
    "filters": [
        {
            "name": "reviews_count",
            "operator": ">",
            "value": "200"
        },
        {
            "name": "rating",
            "operator": ">",
            "value": "4.5"
        }
    ]
}
```


## OpenAPI

````yaml api-reference/dca-api POST /datasets/filter
openapi: 3.1.0
info:
  title: Brightdata API
  description: API for interaction with datasets marketplace
  version: 1.0.0
servers:
  - url: https://api.brightdata.com
security:
  - bearerAuth: []
paths:
  /datasets/filter:
    post:
      description: Create a dataset snapshot based on a provided filter
      parameters:
        - name: dataset_id
          in: query
          description: ID of the dataset to filter (required in multipart/form-data mode)
          required: false
          schema:
            type: string
            example: gd_l1viktl72bvl7bjuj0
        - name: records_limit
          description: Limit the number of records to be included in the snapshot
          in: query
          required: false
          schema:
            type: integer
            example: 1000
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - dataset_id
                - filter
              properties:
                dataset_id:
                  type: string
                  description: ID of the dataset to filter
                  example: gd_l1viktl72bvl7bjuj0
                records_limit:
                  type: integer
                  description: Limit the number of records to be included in the snapshot
                  example: 1000
                filter:
                  $ref: '#/components/schemas/DatasetFilter'
          multipart/form-data:
            schema:
              $ref: '#/components/schemas/FilterDatasetBody'
      responses:
        '200':
          description: Job of creating the snapshot successfully started
          content:
            application/json:
              schema:
                type: object
                properties:
                  snapshot_id:
                    type: string
                    description: ID of the snapshot
        '400':
          description: Bad request
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ValidationErrorBody'
              example:
                validation_errors:
                  - '"filter.filters[0].invalid_prop" is not allowed'
                  - '"records_limit" must be a positive number'
        '402':
          description: Not enough funds to create the snapshot
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorBody'
              example:
                error: >-
                  Your current balance is insufficient to process this data
                  collection request. Please add funds to your account or adjust
                  your request to continue. ($1 is missing)
        '422':
          description: Provided filter did not match any records
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorBody'
              example:
                error: Provided filter did not match any records
        '429':
          description: Too many parallel jobs
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorBody'
              example:
                error: Maximum limit of 100 jobs per dataset has been exceeded
      x-codeSamples:
        - lang: shell
          label: cURL
          source: |-
            curl --request POST \
              --url 'https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vikfnt1wgvvqz95w' \
              --header "Authorization: Bearer YOUR_API_KEY" \
              --header "Content-Type: application/json" \
              --data '{"filter": {"name": "url", "operator": "=", "value": "https://www.instagram.com/natgeo/"}, "records_limit": 10}'
        - lang: python
          label: Python
          source: >-
            import requests


            url =
            "https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vikfnt1wgvvqz95w"

            headers = {
                "Authorization": "Bearer YOUR_API_KEY",
                "Content-Type": "application/json",
            }

            payload = {
                "filter": {
                    "name": "url",
                    "operator": "=",
                    "value": "https://www.instagram.com/natgeo/"
                },
                "records_limit": 10
            }


            response = requests.post(url, headers=headers, json=payload)

            print(response.text)
        - lang: py
          label: Python SDK
          source: |-
            # Install: pip install brightdata-sdk
            from brightdata import BrightDataClient

            async with BrightDataClient(api_key="YOUR_API_KEY") as client:
                # Quick sample — no filter needed
                snapshot_id = await client.datasets.imdb_movies.sample(records_limit=5)

                # Or filter with criteria
                snapshot_id = await client.datasets.instagram_profiles.query(
                    url="https://www.instagram.com/natgeo/",
                    records_limit=10,
                )

                # Same pattern works on all 126+ datasets
                await client.datasets.amazon_products.sample(records_limit=10)
                await client.datasets.linkedin_profiles.sample(records_limit=10)
        - lang: javascript
          label: JavaScript
          source: >-
            const response = await
            fetch("https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vikfnt1wgvvqz95w",
            {
              method: "POST",
              headers: {
                "Authorization": "Bearer YOUR_API_KEY",
                "Content-Type": "application/json",
              },
              body: JSON.stringify({
                "filter": {
                    "name": "url",
                    "operator": "=",
                    "value": "https://www.instagram.com/natgeo/"
                },
                "records_limit": 10
            }),

            });


            const data = await response.text();

            console.log(data);
        - lang: js
          label: JavaScript SDK
          source: >-
            // Install: npm install @brightdata/sdk

            import { bdclient } from '@brightdata/sdk';


            const client = new bdclient({ apiKey: 'YOUR_API_KEY' });


            const ds = client.datasets;


            // Query a dataset and return a snapshot_id you can download

            const snapshotId = await ds.instagramProfiles.query(
              { url: 'https://www.instagram.com/natgeo/' },
              { records_limit: 10 },
            );


            // Same pattern works on all 126+ datasets

            await ds.amazonProducts.query({ url: 'https://amazon.com/dp/B123'
            });

            await ds.imdbMovies.query({}, { records_limit: 50 });


            await client.close();
        - lang: php
          label: PHP
          source: >-
            <?php

            $ch =
            curl_init("https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vikfnt1wgvvqz95w");

            curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");

            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

            curl_setopt($ch, CURLOPT_HTTPHEADER, [
                "Authorization: Bearer YOUR_API_KEY",
                "Content-Type: application/json",
            ]);

            curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode({
                "filter": {
                    "name": "url",
                    "operator": "=",
                    "value": "https://www.instagram.com/natgeo/"
                },
                "records_limit": 10
            }));


            $response = curl_exec($ch);

            curl_close($ch);

            echo $response;
        - lang: go
          label: Go
          source: "package main\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n\t\"io\"\n\t\"net/http\"\n)\n\nfunc main() {\n\tpayload := []byte(\"{\\\"filter\\\": {\\\"name\\\": \\\"url\\\", \\\"operator\\\": \\\"=\\\", \\\"value\\\": \\\"https://www.instagram.com/natgeo/\\\"}, \\\"records_limit\\\": 10}\")\n\treq, _ := http.NewRequest(\"POST\", \"https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vikfnt1wgvvqz95w\", bytes.NewBuffer(payload))\n\treq.Header.Set(\"Authorization\", \"Bearer YOUR_API_KEY\")\n\treq.Header.Set(\"Content-Type\", \"application/json\")\n\n\tres, err := http.DefaultClient.Do(req)\n\tif err != nil { panic(err) }\n\tdefer res.Body.Close()\n\n\tbody, _ := io.ReadAll(res.Body)\n\tfmt.Println(string(body))\n}"
        - lang: java
          label: Java
          source: |-
            import java.net.URI;
            import java.net.http.HttpClient;
            import java.net.http.HttpRequest;
            import java.net.http.HttpResponse;

            public class Main {
                public static void main(String[] args) throws Exception {
                    String body = "{\"filter\": {\"name\": \"url\", \"operator\": \"=\", \"value\": \"https://www.instagram.com/natgeo/\"}, \"records_limit\": 10}";
                    HttpRequest request = HttpRequest.newBuilder()
                        .uri(URI.create("https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vikfnt1wgvvqz95w"))
                        .header("Authorization", "Bearer YOUR_API_KEY")
                        .header("Content-Type", "application/json")
                        .method("POST", HttpRequest.BodyPublishers.ofString(body))
                        .build();

                    HttpResponse<String> response = HttpClient.newHttpClient()
                        .send(request, HttpResponse.BodyHandlers.ofString());
                    System.out.println(response.body());
                }
            }
        - lang: ruby
          label: Ruby
          source: >-
            require 'net/http'

            require 'json'

            require 'uri'


            uri =
            URI.parse("https://api.brightdata.com/datasets/filter?dataset_id=gd_l1vikfnt1wgvvqz95w")

            request = Net::HTTP::Post.new(uri)

            request["Authorization"] = "Bearer YOUR_API_KEY"

            request["Content-Type"] = "application/json"

            request.body = {"filter": {"name": "url", "operator": "=", "value":
            "https://www.instagram.com/natgeo/"}, "records_limit": 10}.to_json


            response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) {
            |http| http.request(request) }

            puts response.body
components:
  schemas:
    DatasetFilter:
      anyOf:
        - $ref: '#/components/schemas/DatasetFilterItem'
          title: Single field filter
        - $ref: '#/components/schemas/DatasetFilterGroup'
          title: Filters group
        - $ref: '#/components/schemas/DatasetFilterItemNoVal'
          title: Single field filter w/out value
    FilterDatasetBody:
      type: object
      required:
        - filter
      properties:
        filter:
          $ref: '#/components/schemas/DatasetFilter'
    ValidationErrorBody:
      type: object
      properties:
        validation_errors:
          type: array
          items:
            type: string
    ErrorBody:
      type: object
      properties:
        error:
          type: string
    DatasetFilterItem:
      type: object
      required:
        - name
        - operator
        - value
      additionalProperties: false
      properties:
        name:
          type: string
          description: Field name to filter by
        operator:
          type: string
          enum:
            - '='
            - '!='
            - '>'
            - <
            - '>='
            - <=
            - in
            - not_in
            - includes
            - not_includes
            - array_includes
            - not_array_includes
        value:
          description: Value to filter by
          oneOf:
            - type: string
            - type: number
            - type: boolean
            - type: object
            - type: array
              items:
                oneOf:
                  - type: string
                  - type: number
                  - type: boolean
      example:
        name: name
        operator: '='
        value: John
    DatasetFilterGroup:
      type: object
      required:
        - operator
        - filters
      additionalProperties: false
      properties:
        operator:
          type: string
          enum:
            - and
            - or
        combine_nested_fields:
          type: boolean
          description: >-
            For arrays of objects: if true, all filters must match within a
            single object
        filters:
          type: array
          items:
            $ref: '#/components/schemas/DatasetFilter'
      example:
        operator: and
        filters:
          - name: name
            operator: '='
            value: John
          - name: age
            operator: '>'
            value: '30'
    DatasetFilterItemNoVal:
      type: object
      required:
        - name
        - operator
      additionalProperties: false
      properties:
        name:
          type: string
          description: Field name to filter by
        operator:
          type: string
          enum:
            - is_null
            - is_not_null
      example:
        name: reviews_count
        operator: is_not_null
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: >-
        Use your Bright Data API Key as a Bearer token in the Authorization
        header.


        **How to authenticate:**

        1. Obtain your API Key from the Bright Data account settings at
        https://brightdata.com/cp/setting/users

        2. Include the API Key in the Authorization header of your requests

        3. Format: `Authorization: Bearer YOUR_API_KEY`


        **Example:**

        ```

        Authorization: Bearer
        b5648e1096c6442f60a6c4bbbe73f8d2234d3d8324554bd6a7ec8f3f251f07df

        ```


        Learn how to get your Bright Data API key:
        https://docs.brightdata.com/api-reference/authentication
      bearerFormat: API Key

````