- Input schema defines the fields a collection run accepts, such as
url,keyword,country,date, or any custom field your interaction code reads frominput. - Output schema defines the structured fields the scraper returns, based on the data emitted by
collect().
What is the input schema?
The input schema defines the values your scraper can receive at runtime. A scraper often uses aurl input, but inputs are not limited to URLs. Depending on the collector logic, inputs can be keywords, locations, dates, IDs, countries, or any custom parameter.
Your interaction code reads input values through the input object:
Define input parameters
To define the input schema in the Bright Data Scraper Studio IDE:- Open your collector in the Scraper Studio IDE.
- Go to the Code tab.
- Click Add input parameter.
- Enter a field name, for example
url,keyword,country, ordate. - Add an optional description.
- Select the field type.
- Mark the field as Required if the collector cannot run without it.
- Click Save.
- Click Save to Production when the collector is ready.
What are the input parameter settings?
| Setting | Description |
|---|---|
| Field name | The key used in code as input.<field_name>. |
| Description | Optional explanation of what value the user should provide. |
| Type | The expected value type, such as text/string, boolean, date, or country. |
| Required | If enabled, each collection input must include this field. |
| Predefined values | Optional fixed choices, when supported by the selected type. Example: country type. |
| Case-insensitive | Treats matching values as case-insensitive, when supported by the field configuration. |
What do collection inputs look like?
A URL-based collector can accept one or more URLs:What is the output schema?
The output schema defines the data point structure and how the data is organized. In the Bright Data Scraper Studio IDE, the output schema is usually generated from the object passed tocollect().
Update the output schema
There are two ways to update the output schema: automatically from parser code or manually in the schema editor.Update the schema automatically
- Add or change fields in your parser code.
- Run a preview to confirm that required fields return as expected.
- Click Save to Production.
- If Scraper Studio detects schema changes, click Update schema.
- Click Save to Production again.
Update the schema manually
- Click Edit schema in the IDE.
- Add or edit fields by name and type.
- Configure required flags, default values, formatting, validation, or PII settings.
- Save the schema.
- Click Save to Production.
What is the Output Schema Editor?
The Output Schema Editor defines exactly which fields your collector returns and how each field is validated, formatted and delivered. The editor has two views:| View | Description |
|---|---|
| Table view | Visual list of fields with toggles and field configuration. |
| JSON view | Direct JSON editing for the schema object. |
How is an output schema structured?
An output schema is a JSON object with a top-leveltype and a fields object:
What properties can output fields have?
These properties apply to user-defined output fields.| Property | Type | Description |
|---|---|---|
type | string | Field type, such as text, number, price, image, or object. |
active | boolean | Whether the field is included in the output. Default: true. |
required | boolean | If true, rows with no valid value for this field are marked as errors. |
default_value | string | Value used when the field cannot be populated. |
description | string | Human-readable explanation of the field. |
pii | boolean | Marks the field as containing personally identifiable information. |
custom_formatting | object | Custom JavaScript formatter for advanced output shaping. |
custom_validation | object | Define validation rules that run on every collected record. |
Configure a field in the side panel
The side panel contains field-specific settings.| Setting | Description |
|---|---|
| Field name | The key used in the output JSON. Available for user-defined fields. |
| Display name | Optional UI label, separate from the output key. |
| Data type | The field type. Changing the type resets type-specific settings. |
| Active | Includes or excludes the field from output. |
| Required | Marks rows as errors when this field is missing or invalid. |
| Default value | Fallback value when the field cannot be populated. |
| Description | Optional human-readable description of the field. |
| Contains PII | Marks the field as containing personally identifiable information. |
| Format | Type-specific output formatting. Example: price/money type. |
| Download | For media/file fields, downloads the file to configured storage. |
| Array values | Defines the item type for array fields. |
| Subfields | Defines nested fields for object fields. |
| Normalize | Controls empty array behavior. |
| Set as quick filter | Exposes the field as a filter in the dataset viewer. |
| Quick filter operator | Defines the comparison operator used by the quick filter. |
What default values are available?
Available default values depend on the field type.| Option | Output behavior | Available for |
|---|---|---|
undefined | Field is omitted from the output. | All types |
null | Field is returned as null. | All types |
"" | Empty string. | text |
false | Boolean false. | boolean |
0 | Numeric zero. | number, price |
[] | Empty array. | array |
What output field types are available?
Scraper Studio supports the following user-defined output field types.text
Free-form text.
number
Integer or decimal number. Numeric strings can be converted to numbers.
url
A URL string. Only http:// and https:// URLs are accepted.
price
A monetary value represented as a numeric value and currency code.
| Preset | Description | Example |
|---|---|---|
us_style | US-style formatting. | $1,234.56 |
locale | Locale-aware formatting. Requires locale. | 1.234,56 € |
number | Numeric value only. | 1234.56 |
raw | Raw object. | { "value": 1234.56, "currency": "USD" } |
custom | Template using {[symbol]}, {[value]}, {[currency]}. | USD 1234.56 |
boolean
A true/false value.
date
Date or timestamp value.
| Preset | Description | Example |
|---|---|---|
iso | ISO 8601 string. | 2024-03-15T10:30:00.000Z |
timestamp | Unix timestamp in milliseconds. | 1710494400000 |
locale | Locale-aware readable date. | March 15, 2024 at 10:30:00 AM UTC |
- Locale, for example
en-US,fr-FR,ru-RU - Date style:
long,medium,short - Time style:
long,medium,short
country
A two-letter ISO 3166-1 alpha-2 country code.
phone
A phone number parsed into structured components.
image
A downloaded or referenced image.
| Behavior | Description |
|---|---|
simple | Returns the filename if downloaded, or the source URL if not downloaded. |
object | Returns an object with file path, remote URL, and optionally content type. |
video, pdf and doc
These file types use the same download and behavior settings as image.
| Type | Description |
|---|---|
video | Downloaded or referenced video file. |
pdf | Downloaded or referenced PDF file. |
doc | Downloaded or referenced document file. |
array
An ordered list of values. The element type is defined with items.
| Option | Description |
|---|---|
keep | Keep empty arrays as []. |
drop | Replace empty arrays with the configured default value. |
object
A nested object with its own subfields.
What HTML conversion field types are available?
| Type | Description | Example value |
|---|---|---|
html2text | HTML converted to readable text. | Product title\nDescription text |
html2markdown | HTML converted to Markdown. | ## Product title |
html2html | Raw HTML content. | <div class="product"><h1>Title</h1></div> |
html2ldjson | Structured data from application/ld+json scripts. | {"@type":"Product","name":"Widget"} |
How do I validate field values?
Custom validation lets you define JavaScript rules that run on every collected value for a field. Throw an error to mark the value as invalid:How do I format field values?
Custom formatting lets you transform a field value before output delivery.When do I use collect() vs set_lines()?
The way records are emitted affects the output dataset.| Function | Behavior | Use when |
|---|---|---|
collect(data) | Appends one record to the dataset. | Most scrapers. |
set_lines(data) | Replaces previously emitted records with the latest set. | Progressive collection where the latest snapshot should be preserved. |
collect():
set_lines():
What system fields can I add?
System fields are generated by Scraper Studio. Their names and types are fixed. You can toggle them on or off in the output schema configuration under Additional data.| Field | Type | Default | Description |
|---|---|---|---|
input | string / object | On | The input value or object that triggered the crawl. |
prime_input | string / object | Off | The original root input when discovery or pagination is used. |
error | string | On | Explanation of why collection failed for the row. |
error_code | string | On | Structured error code, such as validation or timeout. |
warning | string | On | System-level warning for the row. |
warning_code | string | On | Structured warning code. |
status_code | number | Off | HTTP-like crawl result code, such as 200 or 404. |
timestamp | date | Off | Date and time the page was collected. |
requested_timestamp | date | Off | Date and time the job was triggered. |
page_id | string | Off | Unique identifier for the page crawl. |
job_id | string | Off | ID of the job that produced the row. |
collector_id | string | Off | ID of the collector. |
collector_queue | string | Off | Queue the job was submitted to. |
crawl_type | string | Off | Crawl or parser type used for the row. |
screenshot | file | Off | Screenshot of the browser page at collection time. |
html | file | Off | Full HTML snapshot of the page. |
warc | file | Off | WARC archive of the page. |
screenshot, html, or warc are active, the files are downloaded to the configured storage destination.
How do I add a screenshot watermark?
When thescreenshot system field is enabled, a watermark can be added to the screenshot. Each watermark item has a label and a data source.
| Source | Description |
|---|---|
| Browser URL | URL the browser was on when the screenshot was taken. |
| Timestamp | Timestamp of the screenshot capture. |
| Input value | Value from the collector input, such as url or config.country. |
What does a full output schema look like?
Related
Develop a scraper
Step-by-step walkthrough of building a scraper in the IDE
Functions reference
Interaction and parser functions with parameters and examples
Initiate collection and delivery
Trigger collection and deliver output to your destination
Trigger a scraper (API)
Run a published collector for batch collection by API