This tool connects to Bright Data to enable your agent to crawl websites, search the web, and access structured data from platforms like LinkedIn, Amazon, and social media. Bright Data’s tools provide robust web scraping capabilities with built-in CAPTCHA solving and bot detection avoidance, allowing you to reliably extract data from the web.

Why Use Bright Data With LlamaIndex?

The Bright Data tool provides the following capabilities:
  • scrape_as_markdown
    Scrape a webpage and convert the content to Markdown format. This tool can bypass CAPTCHA and bot detection.
result = brightdata_tool.scrape_as_markdown("https://example.com")
print(result.text)    
  • get_screenshot
    Take a screenshot of a webpage and save it to a file.
screenshot_path = brightdata_tool.get_screenshot(
    "https://example.com", output_path="example_screenshot.png"
)
  • search_engine
    Search Google, Bing, or Yandex and get structured search results as JSON or Markdown. Supports advanced parameters for more specific searches.
search_results = brightdata_tool.search_engine(
    query="climate change solutions",
    engine="google",
    language="en",
    country_code="us",
    num_results=20,
)
print(search_results.text)
  • web_data_feed
    Retrieve structured data from various platforms including LinkedIn, Amazon, Instagram, Facebook, X (Twitter), Zillow, and more.
linkedin_profile = brightdata_tool.web_data_feed(
    source_type="linkedin_person_profile",
    url="https://www.linkedin.com/in/username/",
)
print(linkedin_profile)

amazon_product = brightdata_tool.web_data_feed(
    source_type="amazon_product", url="https://www.amazon.com/dp/B08N5KWB9H"
)
print(amazon_product)
The Bright Data tool offers various configuration options for specialized use cases:

Search Engine Parameters

The search_engine function supports advanced parameters like:
  • Language targeting (language parameter)
  • Country-specific search (country_code parameter)
  • Different search types (images, shopping, news, etc.)
  • Pagination controls
  • Mobile device emulation
  • Geolocation targeting
  • Hotel search parameters
results = brightdata_tool.search_engine(
    query="best hotels in paris",
    engine="google",
    language="fr",
    country_code="fr",
    search_type="shopping",
    device="mobile",
    hotel_dates="2025-06-01,2025-06-05",
    hotel_occupancy=2,
)

Supported Web Data Sources

The web_data_feed function supports retrieving structured data from:
  • LinkedIn (profiles and companies)
  • Amazon (products and reviews)
  • Instagram (profiles, posts, reels, comments)
  • Facebook (posts, marketplace listings, company reviews)
  • X/Twitter (posts)
  • Zillow (property listings)
  • Booking.com (hotel listings)
  • YouTube (videos)
  • ZoomInfo (company profiles)
For more information, visit the Bright Data documentation.

How to Integrate Bright Data With LlamaIndex?

1

Obtain Your Bright Data API Key

2

Installation

Install the required packages:
pip install llama-index llama-index-core llama-index-tools-brightdata
3

Usage

Here’s an example of how to use the BrightDataToolSpec with LlamaIndex:
llm = OpenAI(model="gpt-4o", api_key="your-api-key")

brightdata_tool = BrightDataToolSpec(api_key="your-api-key", zone="unlocker")

tool_list = brightdata_tool.to_tool_list()

for tool in tool_list:
    tool.original_description = tool.metadata.description
    tool.metadata.description = "Bright Data web scraping tool"

agent = OpenAIAgent.from_tools(tools=tool_list, llm=llm)

query = (
    "Find and summarize the latest news about AI from major tech news sites"
)
tool_descriptions = "\n\n".join(
    [
        f"Tool Name: {tool.metadata.name}\nTool Description: {tool.original_description}"
        for tool in tool_list
    ]
)

query_with_descriptions = f"{tool_descriptions}\n\nQuery: {query}"

response = agent.chat(query_with_descriptions)
print(response)