Building an AI startup?
You might be eligible for our Startup Program. Get fully funded access to the infrastructure you’re reading about right now (up to $50K value).
haystack-brightdata Python package is the official Haystack integration for Bright Data, including support for:
- Bright Data Web Scraper - Extract structured data from 45+ supported websites including Amazon, LinkedIn, Instagram, Facebook, TikTok, YouTube, and more using Bright Data’s Dataset API.
- Bright Data SERP - Query search engines (Google, Bing, Yahoo) with geo-targeting and language customization for real-time search results.
- Bright Data Unlocker - Access geo-restricted and bot-protected websites, bypass CAPTCHAs and anti-bot measures to extract content in multiple formats.
How to Integrate Bright Data With Haystack
1
Obtain Your Bright Data API Key
- Log in to your Bright Data dashboard.
- Go to Account Settings.
- Generate an API key if you haven’t already done so.
2
Install the Bright Data Integration
Install the Bright Data integration package for Haystack by running the following command:
3
Set the environment variable
Set your Bright Data API key as an environment variable:
4
Select your preferred Bright Data component
The Bright Data + Haystack integration currently supports:
- BrightDataWebScraper
- BrightDataSERP
- BrightDataUnlocker
Extract structured data from 45+ supported websites, including e-commerce, social media, and business intelligence platforms.
RAG Pipeline Examples
Product Data RAG Pipeline
Build a Retrieval-Augmented Generation (RAG) pipeline using Bright Data to extract product data from Amazon and answer questions about products:SERP + Web Content RAG Pipeline
Use SERP API to find relevant web pages, then use Web Unlocker to extract content for a RAG pipeline:Supported Datasets
TheBrightDataWebScraper component supports 45+ datasets across multiple categories:
| Category | Datasets |
|---|---|
| E-commerce | amazon_product, amazon_product_reviews, amazon_product_search, walmart_product, walmart_seller, ebay_product, homedepot_products, zara_products, etsy_products, bestbuy_products |
| linkedin_person_profile, linkedin_company_profile, linkedin_job_listings, linkedin_posts, linkedin_people_search | |
| instagram_profiles, instagram_posts, instagram_reels, instagram_comments | |
| facebook_posts, facebook_marketplace_listings, facebook_company_reviews, facebook_events | |
| TikTok | tiktok_profiles, tiktok_posts, tiktok_shop, tiktok_comments |
| YouTube | youtube_profiles, youtube_videos, youtube_comments |
| Search & Commerce | google_maps_reviews, google_shopping, google_play_store, apple_app_store, zillow_properties_listing, booking_hotel_listings |
| Business Intelligence | crunchbase_company, zoominfo_company_profile |
| Other | reuter_news, github_repository_file, yahoo_finance_business, x_posts, reddit_posts |
Use Cases
Bright Data’s Haystack integration enables powerful use cases:- E-commerce Intelligence: Price monitoring, product data extraction, and competitive analysis
- Social Media Analytics: Content monitoring and engagement analysis across platforms
- Business Intelligence: Company research and competitive landscape analysis
- Search Analysis: SEO/SEM research with geo-targeted search results
- Content Aggregation: Building RAG pipelines with real-time web data
- Market Research: Accessing geo-restricted content for global research