Skip to main content
The input and output schema defines the data contract for a Bright Data Scraper Studio IDE collector: which fields a collection run accepts and which structured fields the collector returns.
  • Input schema defines the fields a collection run accepts, such as url, keyword, country, date, or any custom field your interaction code reads from input.
  • Output schema defines the structured fields the scraper returns, based on the data emitted by collect().
Both schemas are configured in the Bright Data Scraper Studio IDE. Schema changes are applied to the production collector when you click Save to Production.

What is the input schema?

The input schema defines the values your scraper can receive at runtime. A scraper often uses a url input, but inputs are not limited to URLs. Depending on the collector logic, inputs can be keywords, locations, dates, IDs, countries, or any custom parameter. Your interaction code reads input values through the input object:
navigate(input.url);
wait('.product-title');

const data = parse();
collect(data);
For a keyword-based scraper:
navigate(`https://example.com/search?q=${input.keyword}`);
wait('.search-results');

collect(parse());
A scraper can also run without user-provided input if the target URL or collection logic is hardcoded in the scraper code.

Define input parameters

To define the input schema in the Bright Data Scraper Studio IDE:
  1. Open your collector in the Scraper Studio IDE.
  2. Go to the Code tab.
  3. Click Add input parameter.
  4. Enter a field name, for example url, keyword, country, or date.
  5. Add an optional description.
  6. Select the field type.
  7. Mark the field as Required if the collector cannot run without it.
  8. Click Save.
  9. Click Save to Production when the collector is ready.
After a collector has already been saved, click Edit schema in the IDE to update its input schema.

What are the input parameter settings?

SettingDescription
Field nameThe key used in code as input.<field_name>.
DescriptionOptional explanation of what value the user should provide.
TypeThe expected value type, such as text/string, boolean, date, or country.
RequiredIf enabled, each collection input must include this field.
Predefined valuesOptional fixed choices, when supported by the selected type. Example: country type.
Case-insensitiveTreats matching values as case-insensitive, when supported by the field configuration.

What do collection inputs look like?

A URL-based collector can accept one or more URLs:
[
  { "url": "https://example.com/product/1" },
  { "url": "https://example.com/product/2" }
]
A collector can also accept multiple input fields:
[
  {
    "url": "https://example.com/search",
    "keyword": "standing desk",
    "country": "US"
  },
  {
    "url": "https://example.com/search",
    "keyword": "monitor arm",
    "country": "GB"
  }
]
Only fields marked as Required must be provided for every input object. Optional fields can be omitted.

What is the output schema?

The output schema defines the data point structure and how the data is organized. In the Bright Data Scraper Studio IDE, the output schema is usually generated from the object passed to collect().
collect({
  title: $('.product-title').text_sane(),
  price: new Money(+$('.price').text().replace(/\D+/g, ''), 'USD'),
  availability: $('.stock-status').text_sane(),
});
This produces output fields such as:
{
  "title": "ErgoDesk Pro",
  "price": {
    "value": 349.99,
    "currency": "USD"
  },
  "availability": "In stock"
}
When the scraper is saved, Scraper Studio detects the collected data structure and creates or updates the output schema.

Update the output schema

There are two ways to update the output schema: automatically from parser code or manually in the schema editor.

Update the schema automatically

  1. Add or change fields in your parser code.
  2. Run a preview to confirm that required fields return as expected.
  3. Click Save to Production.
  4. If Scraper Studio detects schema changes, click Update schema.
  5. Click Save to Production again.

Update the schema manually

  1. Click Edit schema in the IDE.
  2. Add or edit fields by name and type.
  3. Configure required flags, default values, formatting, validation, or PII settings.
  4. Save the schema.
  5. Click Save to Production.

What is the Output Schema Editor?

The Output Schema Editor defines exactly which fields your collector returns and how each field is validated, formatted and delivered. The editor has two views:
ViewDescription
Table viewVisual list of fields with toggles and field configuration.
JSON viewDirect JSON editing for the schema object.
Clicking a field row opens the configuration side panel for that field.

How is an output schema structured?

An output schema is a JSON object with a top-level type and a fields object:
{
  "type": "object",
  "fields": {
    "title": {
      "type": "text",
      "active": true
    },
    "price": {
      "type": "price",
      "active": true
    }
  }
}

What properties can output fields have?

These properties apply to user-defined output fields.
PropertyTypeDescription
typestringField type, such as text, number, price, image, or object.
activebooleanWhether the field is included in the output. Default: true.
requiredbooleanIf true, rows with no valid value for this field are marked as errors.
default_valuestringValue used when the field cannot be populated.
descriptionstringHuman-readable explanation of the field.
piibooleanMarks the field as containing personally identifiable information.
custom_formattingobjectCustom JavaScript formatter for advanced output shaping.
custom_validationobjectDefine validation rules that run on every collected record.

Configure a field in the side panel

The side panel contains field-specific settings.
SettingDescription
Field nameThe key used in the output JSON. Available for user-defined fields.
Display nameOptional UI label, separate from the output key.
Data typeThe field type. Changing the type resets type-specific settings.
ActiveIncludes or excludes the field from output.
RequiredMarks rows as errors when this field is missing or invalid.
Default valueFallback value when the field cannot be populated.
DescriptionOptional human-readable description of the field.
Contains PIIMarks the field as containing personally identifiable information.
FormatType-specific output formatting. Example: price/money type.
DownloadFor media/file fields, downloads the file to configured storage.
Array valuesDefines the item type for array fields.
SubfieldsDefines nested fields for object fields.
NormalizeControls empty array behavior.
Set as quick filterExposes the field as a filter in the dataset viewer.
Quick filter operatorDefines the comparison operator used by the quick filter.

What default values are available?

Available default values depend on the field type.
OptionOutput behaviorAvailable for
undefinedField is omitted from the output.All types
nullField is returned as null.All types
""Empty string.text
falseBoolean false.boolean
0Numeric zero.number, price
[]Empty array.array

What output field types are available?

Scraper Studio supports the following user-defined output field types.

text

Free-form text.
{
  "type": "text",
  "active": true,
  "required": false,
  "default_value": "null"
}
Example value:
"Laptop 15-inch Pro"

number

Integer or decimal number. Numeric strings can be converted to numbers.
{
  "type": "number",
  "active": true,
  "format": {
    "decimal_places": 2
  },
  "default_value": "zero"
}
Example value:
11.23

url

A URL string. Only http:// and https:// URLs are accepted.
{
  "type": "url",
  "active": true,
  "required": true
}
Example value:
"https://example.com/product/123"

price

A monetary value represented as a numeric value and currency code.
{
  "type": "price",
  "active": true,
  "format": {
    "preset": "us_style"
  }
}
Example value:
{
  "value": 99.99,
  "currency": "USD"
}
Price format presets:
PresetDescriptionExample
us_styleUS-style formatting.$1,234.56
localeLocale-aware formatting. Requires locale.1.234,56 €
numberNumeric value only.1234.56
rawRaw object.{ "value": 1234.56, "currency": "USD" }
customTemplate using {[symbol]}, {[value]}, {[currency]}.USD 1234.56

boolean

A true/false value.
{
  "type": "boolean",
  "active": true,
  "default_value": "false"
}
Example value:
true

date

Date or timestamp value.
{
  "type": "date",
  "active": true,
  "format": {
    "preset": "iso"
  }
}
Date format presets:
PresetDescriptionExample
isoISO 8601 string.2024-03-15T10:30:00.000Z
timestampUnix timestamp in milliseconds.1710494400000
localeLocale-aware readable date.March 15, 2024 at 10:30:00 AM UTC
Locale formatting can include:
  • Locale, for example en-US, fr-FR, ru-RU
  • Date style: long, medium, short
  • Time style: long, medium, short

country

A two-letter ISO 3166-1 alpha-2 country code.
{
  "type": "country",
  "active": true
}
Example value:
"US"

phone

A phone number parsed into structured components.
{
  "type": "phone",
  "active": true
}
Example value:
{
  "area_code": 1,
  "number": 5555555555,
  "extension": "1234"
}

image

A downloaded or referenced image.
{
  "type": "image",
  "active": true,
  "download": true,
  "format": {
    "behavior": "object",
    "content_type": true
  }
}
Image behavior options:
BehaviorDescription
simpleReturns the filename if downloaded, or the source URL if not downloaded.
objectReturns an object with file path, remote URL, and optionally content type.
Example object output:
{
  "file_path": "file_xxxxxxxxxxx.img",
  "remote_url": "https://example.com/image.png",
  "content_type": "image/png"
}
When Download is enabled, the file is stored in the configured delivery destination. File downloads are billed separately from page loads where applicable.

video, pdf and doc

These file types use the same download and behavior settings as image.
{
  "type": "video",
  "active": true,
  "download": true
}
Supported file field types:
TypeDescription
videoDownloaded or referenced video file.
pdfDownloaded or referenced PDF file.
docDownloaded or referenced document file.

array

An ordered list of values. The element type is defined with items.
{
  "type": "array",
  "active": true,
  "normalize": {
    "empty": "keep"
  },
  "default_value": "empty_array",
  "items": {
    "type": "text"
  }
}
Empty array behavior:
OptionDescription
keepKeep empty arrays as [].
dropReplace empty arrays with the configured default value.
Array of nested objects:
{
  "type": "array",
  "active": true,
  "items": {
    "type": "object",
    "fields": {
      "name": {
        "type": "text",
        "active": true
      },
      "price": {
        "type": "price",
        "active": true
      }
    }
  }
}

object

A nested object with its own subfields.
{
  "type": "object",
  "active": true,
  "fields": {
    "title": {
      "type": "text",
      "active": true
    },
    "price": {
      "type": "price",
      "active": true
    },
    "in_stock": {
      "type": "boolean",
      "active": true
    }
  }
}

What HTML conversion field types are available?

TypeDescriptionExample value
html2textHTML converted to readable text.Product title\nDescription text
html2markdownHTML converted to Markdown.## Product title
html2htmlRaw HTML content.<div class="product"><h1>Title</h1></div>
html2ldjsonStructured data from application/ld+json scripts.{"@type":"Product","name":"Widget"}
Example:
{
  "type": "html2markdown",
  "active": true
}

How do I validate field values?

Custom validation lets you define JavaScript rules that run on every collected value for a field. Throw an error to mark the value as invalid:
function validate(v) {
  if (!v)
    throw new Error('Value is required');

  return true;
}
Rows that fail validation are treated as error rows when validation is configured for required output quality.

How do I format field values?

Custom formatting lets you transform a field value before output delivery.
function process(value) {
  return value;
}
Use custom formatting when built-in formatting options do not match the required output shape.

When do I use collect() vs set_lines()?

The way records are emitted affects the output dataset.
FunctionBehaviorUse when
collect(data)Appends one record to the dataset.Most scrapers.
set_lines(data)Replaces previously emitted records with the latest set.Progressive collection where the latest snapshot should be preserved.
Example with collect():
collect({
  title,
  price,
  availability,
});
Example with set_lines():
set_lines(products);

What system fields can I add?

System fields are generated by Scraper Studio. Their names and types are fixed. You can toggle them on or off in the output schema configuration under Additional data.
FieldTypeDefaultDescription
inputstring / objectOnThe input value or object that triggered the crawl.
prime_inputstring / objectOffThe original root input when discovery or pagination is used.
errorstringOnExplanation of why collection failed for the row.
error_codestringOnStructured error code, such as validation or timeout.
warningstringOnSystem-level warning for the row.
warning_codestringOnStructured warning code.
status_codenumberOffHTTP-like crawl result code, such as 200 or 404.
timestampdateOffDate and time the page was collected.
requested_timestampdateOffDate and time the job was triggered.
page_idstringOffUnique identifier for the page crawl.
job_idstringOffID of the job that produced the row.
collector_idstringOffID of the collector.
collector_queuestringOffQueue the job was submitted to.
crawl_typestringOffCrawl or parser type used for the row.
screenshotfileOffScreenshot of the browser page at collection time.
htmlfileOffFull HTML snapshot of the page.
warcfileOffWARC archive of the page.
When screenshot, html, or warc are active, the files are downloaded to the configured storage destination.

How do I add a screenshot watermark?

When the screenshot system field is enabled, a watermark can be added to the screenshot. Each watermark item has a label and a data source.
SourceDescription
Browser URLURL the browser was on when the screenshot was taken.
TimestampTimestamp of the screenshot capture.
Input valueValue from the collector input, such as url or config.country.

What does a full output schema look like?

{
  "type": "object",
  "fields": {
    "url": {
      "type": "url",
      "active": true,
      "required": true
    },
    "title": {
      "type": "text",
      "active": true
    },
    "price": {
      "type": "price",
      "active": true,
      "format": {
        "preset": "us_style"
      },
      "default_value": "zero"
    },
    "rating": {
      "type": "number",
      "active": true,
      "format": {
        "decimal_places": 1
      }
    },
    "in_stock": {
      "type": "boolean",
      "active": true,
      "default_value": "false"
    },
    "listed_date": {
      "type": "date",
      "active": true,
      "format": {
        "preset": "iso"
      }
    },
    "country": {
      "type": "country",
      "active": true
    },
    "images": {
      "type": "array",
      "active": true,
      "normalize": {
        "empty": "keep"
      },
      "default_value": "empty_array",
      "items": {
        "type": "image",
        "download": true
      }
    },
    "seller": {
      "type": "object",
      "active": true,
      "fields": {
        "name": {
          "type": "text",
          "active": true
        },
        "phone": {
          "type": "phone",
          "active": true
        }
      }
    }
  }
}

Develop a scraper

Step-by-step walkthrough of building a scraper in the IDE

Functions reference

Interaction and parser functions with parameters and examples

Initiate collection and delivery

Trigger collection and deliver output to your destination

Trigger a scraper (API)

Run a published collector for batch collection by API