What is BeautifulSoup?

BeautifulSoup is a Python library that simplifies the process of extracting and organizing data from HTML and XML documents. Combined with Bright Data proxies, it enables you to scrape data securely and anonymously while reducing the risk of detection and blocking.

How to Integrate Bright Data with BeautifulSoup

Step 0. Prerequisites

Before you start:

  • Download the latest Python version from python.org.

  • Install BeautifulSoup and the requests library:

     pip install beautifulsoup4 requests

Step 1. Set Up the Proxy

Login to bright data account, and select the proxy zone you with to use. In the Overview, under Access details, you can find the required information to get your access information. ****

  1. Log in to your Bright Data account and retrieve your proxy credentials:

    • Host: http://brd.superproxy.io/

    • Port: 33335

    • Username: Your Bright Data username. Modify it for geo-specific proxies if needed (e.g., your-username-country-US).

    • Password: Your Bright Data proxy zone password.

  2. Define your proxy details in your script:

proxy = {
  "http": "http://[USERNAME]:[PASSWORD]@[HOST]:[PORT]"
}

Step 2. Implement Proxy Settings with requests and Parse Data Using BeautifulSoup

Here’s a comprehensive script that demonstrates how to integrate Bright Data with BeautifulSoup for secure data retrieval and parsing:

import requests
from bs4 import BeautifulSoup

# Bright Data Proxy Configuration
proxy = {
    "http": "http://[USERNAME]:[PASSWORD]@[HOST]:[PORT]",
    "https": "http://[USERNAME]:[PASSWORD]@[HOST]:[PORT]"
}

# Target URL to verify the proxy
url = "https://httpbin.org/ip" 

try:
    # Send the request using the proxy
    response = requests.get(url, proxies=proxy, timeout=10)
    response.raise_for_status()  # Handle HTTP errors

    # Parse the HTML content
    soup = BeautifulSoup(response.text, "html.parser")

    # Print the formatted page content
    print("Response Content (IP Address):")
    print(soup.prettify())

except requests.exceptions.RequestException as e:
    print("Error occurred while using the proxy:", e)

Step 3. Verify the Output

If the Bright Data proxy is configured correctly, you should see the IP address of the proxy displayed in the output:

{
  "origin": "123.45.67.89"
}

Integrating Bright Data proxies with BeautifulSoup allows you to scrape data securely, anonymously, and efficiently. Whether you’re extracting structured data, accessing geo-restricted content, or managing large-scale scraping tasks, Bright Data ensures reliability and privacy for all your scraping needs. Start scraping smarter with Bright Data and BeautifulSoup today!