What is Parsing?

Bright Data’s SERP API is a comprehensive solution that not only provides powerful scraping capabilities for various search engines but also includes advanced parsing functionality specifically for Google & Bing.

Parsing for SERP API is the process of transforming raw HTML into structured fields and values of data.

When parsing is activated, data from SERP HTMLs are further structured into usable fields and values (such as, “rank”, “link”, “title”, “description”, “rating”, and dozens more fields) enabling you to monitor competitor SERP rankings, analyze keyword trends, and gather valuable market insights.

How to Send a Parsed Request

The following is the simplest GET parsed request with SERP API:

curl "https://www.google.com/search?q=pizza&lum_json=1" \
   --proxy brd.superproxy.io:22225 \
   --proxy-user brd-customer-<CUSTOMER_ID>-zone-<ZONE_NAME>:<ZONE_PASSWORD>

The above request is a synchronous request (the response is received in real-time). If you are looking to send an asynchronous parsed request see here.  

Breakdown of a Basic Request

brd.superproxy.io Address of our load balancer that will find the fastest Super Proxy for your request
22225 Infrastructure port of our Super Proxies that is used to receive your requests
-user brd-customer- < CUSTOMER_ID> -zone-< ZONE_NAME > Username authentication. In its most basic form, it defines your username and what zone you will use for your request. 
< ZONE_PASSWORD > Zone password. All zones have passwords that are used for authentication
brd_json=1Returns parsed JSON instead of raw HTML

By default, a SERP API response without the “brd_json=1” parameter, returns an unparsed structured HTML of the targeted SERP. If you would like to receive a parsed JSON response, add one of the following parameters at the end of your search query: 

# Returns a single parsed JSON (instead of a raw HTML)

curl --proxy brd.superproxy.io:22225 \
  --proxy-user brd-customer-<CUSTOMER\_ID>-zone-<ZONE\_NAME>:<ZONE\_PASSWORD> \
  -k "https://www.google.com/search?q=pizza&brd\_json=1"
Parsing is supported for both Google and Bing search engines

Expected Parsed Output when using brd_json=1

The following is the exact JSON response received when sending the request above:

Next, we will examine a number of the important fields within the parsed JSON data to understand the type of structured data we have to offer. 

A Comprehensive Guide to SERP API’s Parsed JSON

At the top of the JSON response, you can find the “general” field which contains details about the search you ran and also includes the “results count” from the response.

The following fields can be found in the “general” field

  • general.search_engine: the search engine used for the search.
  • general.query: the keywords used for the search.
  • general.results_cnt: the results count.
    Google doesn’t display results count for Mobile, so this field is supported only with desktop search results.
  • general.search_time: the response time to get the results page.
  • general.language: the language that was set for the search, (Default: hl=en).
  • general.location: the location that was targeted with the search,(based on the  “localization” and “geo-location” parameters).
  • general.mobile: the device the search was performed with (desktop\mobile)
  • general.basic_view: deprecated
  • general.search _type: the type of search that was set to the request.
  • general.Page_title: results page title
  • general.Code_version: Bright data parser version
  • general.Timestamp: the time when the search executed
  • Input.original_url: the url used for the search, this url includes all parameters applied for the search.applied for the search.

Starter fields to know

JSON fieldDescription
typeThe field type (site_link, text, rating, etc.)
titleThe text header, mostly the link text.
descriptionThe description under the link
referral _linkRedirection link 
imageThis field can contain the image base64 string or the Image url
image_altImage alternative name

rank

  • “rank” - indicates the position of the element in accordance with the other elements within that component.
  • “global _rank” - indicates the position of the element in accordance with all the elements in the SERP.

spelling

When your search terms are inaccurate, Google suggests other search terms, which will show under the “spelling” field.

Subfields:

  • Original_text: the text that was searched
  • Original_empty: true, means that no results were found.
  • Auto_corrected_link: link to suggested result
  • auto_corrected _text: the suggested link text

| HTML | JSON|

reviews, rating

Some of the components within a SERPs can include the “reviews” and “rating” fields

| HTML JSON |

extensions

Some SERP results could include site sub-links (AKA site links) that are displayed as a vertical or horizontal list 

Vertical - marked with “extended”:true

| HTML JSON |

Horizontal - not marked with “extended”:true

| HTML JSON |

These are breadcrumbs from the URL in the result:

| HTML JSON |

organic

Main search results are called organic results and are located in the organic json node.

| HTML JSON |

Ads

There are four different locations for ads within a SERP and each are parsed separately:

  • top_ads: Ads that are located at the top of the SERP
  • top_pla: Ads that are located within a special carousel at the top of a SERP
  • jackpot_pla: Ads that are located within the right side panel in shopping ads. It usually appears when a particular product matches your search perfectly.
  • bottom_ads: Ads that are located at the bottom of the SERP

Please note: By default, SERP API displays “organic” adds (based on IP location and cookies, etc). If you wish SERP API to display ads in different ways  (incognito, adtest) you can change this in your SERP API zone on the control panel.

top_ads

| HTML JSON |

top_pla

jackpot_pla

bottom_ads

people_also_ask

The “people also ask” section includes questions Google automatically generates based on queries it believes are related to your question.

answers

In a SERP,  the PAA box questions are connected to answers that users can click to read. This can help people better understand their initial question without clicking on other results. Each question here has its answer under the “answers” element. 

| HTML JSON |

 videos

| HTML JSON |

twitter

| HTML JSON |

Top stories

| HTML JSON |

knowledge

Provides a brief overview of the searched topic in a knowledge panel (desktop- on the right side, mobile - at the top)

| HTML JSON |

Widgets

Knowledge.widgets.social media presence  - includes profiles

Knowledge.widgets.sideways refinements - people also ask

recipes

When searching for food items the SERP might contain also recipes 

snack_pack_map & snack_pack

Relates to the Google maps displayed in a SERP.

snack_pack_map

This map part is displayed in the JSON and includes the coordinates of the map location.

| HTML JSON |

snack_pack

If the map includes pins of specific locations, the JSON will include a snack_pack field for each location with additional details like open hours, contact details, tags etc.

| HTML JSON |

At the bottom of SERPs, Google also provides users with a “related searches” portion, prompting other search queries related to the initial search.

| HTML JSON |

Please note: 

  • List_group:true, when the elements are grouped as at the top of the following screenshot
  • List_group:false , when the elements aren’t grouped as at the bottom of the following screenshot

Flights

| HTML JSON |

Hotels

Bright Data’s SERP API makes it easy to collect hotel data, like prices, availability, reviews, and more. 

Here’s how to collect the data from the hotel knowledge graph using Google Search and how to get even more details from the hotel page on Google Travel. 

When you search for a hotel using Google Search, its details and reviews appear in the resulting knowledge graph on the right.

Setting arrival and departure dates along with the number of guests lets you see and compare some of the hotel’s prices. 

With SERP API, you can set these fields to collect different price combinations using dedicated parameters. Go to the SERP API playground to learn more.

The SERP API also lets you target the hotel page in Google Travel, where you can find more prices and search by more parameters (including arrival and departure dates, the number of adults and children, the children’s ages, and whether or not it has free cancellation) to collect more price combinations. Go to the API Guide to learn more.

 

Pagination

| HTML JSON |

Pagination indexes can be found in the bottom JSON section:

  • current_page: the requested page location within the search

  • first_page_link

  • prev_page_link and next_page_link - referring to requested page

  • prev_page_start and next_page_start - first searched results number in previous and next pages

  • prev_page and next_page- number of pages for previous and next pages

  • page- page number

  • link- link to page

  • start- first result in the page