安装软件包
打开终端并运行:
pip install brightdata - sdk
在你的代码文件中 ,导入该软件包并发起你的第一个请求:
from brightdata import BrightDataClient
# Initialize client (auto-loads from BRIGHTDATA_API_TOKEN env var)
client = BrightDataClient()
# Search Google
results = client.search.google( query = "best selling shoes" )
if results.success:
print ( f "Found { len (results.data) } results" )
for item in results.data[: 5 ]:
print ( f " { item[ 'position' ] } . { item[ 'title' ] } " )
SDK 会自动从 BRIGHTDATA_API_TOKEN 环境变量或 .env 文件加载你的 API token。你也可以直接传入:BrightDataClient(token="your_token")
启动抓取和网络搜索
尝试以下示例,在你的 IDE 中使用 Bright Data 的 SDK 功能
Search Engines
Web Scraping
from brightdata import BrightDataClient
client = BrightDataClient()
# Google search
results = client.search.google(
query = "best shoes of 2025" ,
location = "United States" ,
language = "en" ,
num_results = 20
)
# Bing search
results = client.search.bing(
query = "python tutorial" ,
location = "United States"
)
# Yandex search
results = client.search.yandex(
query = "latest news" ,
location = "Germany"
)
if results.success:
print ( f "Cost: \$ { results.cost :.4f} " )
print ( f "Time: { results.elapsed_ms() :.2f} ms" )
当处理多个查询或 URL 时,请求会并发处理以获得最佳性能。
使用平台特定抓取器获取结构化数据
从流行平台提取结构化数据,如 Amazon、LinkedIn、ChatGPT、Facebook 和 Instagram
Amazon Products
LinkedIn - Search & Scrape
Facebook & Instagram
ChatGPT Prompts
from brightdata import BrightDataClient
from brightdata.payloads import AmazonProductPayload
client = BrightDataClient()
# Scrape Amazon product with type-safe payload
payload = AmazonProductPayload(
url = "https://amazon.com/dp/B0CRMZHDG8" ,
reviews_count = 50
)
result = client.scrape.amazon.products( ** payload.to_dict())
if result.success:
product = result.data[ 0 ]
print ( f "Title: { product[ 'title' ] } " )
print ( f "Price: \$ { product[ 'final_price' ] } " )
print ( f "Rating: { product[ 'rating' ] } " )
# Scrape reviews with filters
result = client.scrape.amazon.reviews(
url = "https://amazon.com/dp/B0CRMZHDG8" ,
pastDays = 30 ,
keyWord = "quality" ,
numOfReviews = 100
)
在你的 IDE 中,将鼠标悬停在 BrightDataClient 类或其任何方法 上,即可查看可用参数、类型提示和使用示例。SDK 提供完整的 IntelliSense 支持!
使用 dataclass payloads 以保证类型安全
SDK 包含带运行时验证和辅助属性的 dataclass payloads
from brightdata import BrightDataClient
from brightdata.payloads import (
AmazonProductPayload,
LinkedInJobSearchPayload,
ChatGPTPromptPayload
)
client = BrightDataClient()
# Amazon product with validation
amazon_payload = AmazonProductPayload(
url = "https://amazon.com/dp/B123456789" ,
reviews_count = 50 # 运行时验证!
)
print ( f "ASIN: { amazon_payload.asin } " ) # 辅助属性
print ( f "Domain: { amazon_payload.domain } " )
# LinkedIn job search
linkedin_payload = LinkedInJobSearchPayload(
keyword = "python developer" ,
location = "San Francisco" ,
remote = True
)
print ( f "远程搜索: { linkedin_payload.is_remote_search } " )
# Use with client
result = client.scrape.amazon.products( ** amazon_payload.to_dict())
连接抓取浏览器
使用 SDK 轻松连接到 Bright Data 的抓取浏览器
from brightdata import BrightDataClient
from playwright.sync_api import Playwright, sync_playwright
client = BrightDataClient(
token = "your_api_token" ,
browser_username = "username-zone-browser_zone1" ,
browser_password = "your_password"
)
def scrape ( playwright : Playwright, url = 'https://example.com' ):
browser = playwright.chromium.connect_over_cdp(client.connect_browser())
try :
print ( f '已连接! 正在导航到 { url } ...' )
page = browser.new_page()
page.goto(url, timeout = 2 * 60_000 )
print ( '已导航! 正在抓取页面内容...' )
data = page.content()
print ( f '抓取完成! 数据长度: { len (data) } ' )
finally :
browser.close()
def main ():
with sync_playwright() as playwright:
scrape(playwright)
if __name__ == '__main__' :
main()
使用 CLI 工具
SDK 包含一个强大的命令行界面,可在终端使用
# 搜索操作
brightdata search google "python tutorial" --location "United States"
brightdata search linkedin jobs --keyword "python developer" --remote
# 抓取操作
brightdata scrape amazon products "https://amazon.com/dp/B123"
brightdata scrape linkedin profiles "https://linkedin.com/in/johndoe"
# 通用网页抓取
brightdata scrape generic "https://example.com" --output-format pretty
# 保存结果到文件
brightdata search google "AI news" --output-file results.json
异步使用以提高性能
对于并发操作,请使用异步 API:
import asyncio
from brightdata import BrightDataClient
async def scrape_multiple ():
# 使用异步上下文管理器
async with BrightDataClient() as client:
# 并发抓取多个 URL
results = await client.scrape.generic.url_async([
"https://example1.com" ,
"https://example2.com" ,
"https://example3.com"
])
for result in results:
if result.success:
print ( f "成功: { result.elapsed_ms() :.2f} ms" )
asyncio.run(scrape_multiple())
使用 *_async 方法时,请始终使用异步上下文管理器 (async with BrightDataClient() as client)。同步封装会自动处理此问题。
客户端初始化
配置 BrightDataClient
client = BrightDataClient(
token = "your_token" , # 自动从 BRIGHTDATA_API_TOKEN 加载
customer_id = "your_customer_id" , # 可选 - 自动从 BRIGHTDATA_CUSTOMER_ID 加载
timeout = 30 , # 默认超时时间(秒)
web_unlocker_zone = "sdk_unlocker" , # Web Unlocker 区域名称
serp_zone = "sdk_serp" , # SERP API 区域名称
browser_zone = "sdk_browser" , # Browser API 区域名称
auto_create_zones = False , # 自动创建缺失的区域
validate_token = False # 初始化时验证 token
)
使用结构化结果的搜索引擎(Google、Bing、Yandex) client.search.google(
query = "搜索词" , # 搜索查询(必填)
location = "United States" , # 地理位置
language = "en" , # 语言代码
num_results = 20 , # 返回结果数量
timeout = 30 # 请求超时
)
client.search.bing( query = "..." , location = "..." )
client.search.yandex( query = "..." , location = "..." )
平台抓取器
Amazon, LinkedIn, ChatGPT, Facebook, Instagram
错误处理与测试
from brightdata import BrightDataClient
client = BrightDataClient()
# 测试连接
is_valid = client.test_connection_sync()
print ( f "连接有效: { is_valid } " )
# 获取账户信息
info = client.get_account_info_sync()
print ( f "区域数量: { info[ 'zone_count' ] } " )
print ( f "活跃区域: { [z[ 'name' ] for z in info[ 'zones' ]] } " )
SDK 可以自动创建所需区域,也可以手动管理 # 启用自动创建
client = BrightDataClient( auto_create_zones = True )
# 手动列出区域
zones = client.list_zones_sync()
for zone in zones:
print ( f "区域: { zone[ 'name' ] } (类型: { zone.get( 'type' , 'unknown' ) } )" )
from brightdata import BrightDataClient
from brightdata.exceptions import SSLError
try :
client = BrightDataClient()
result = client.scrape.generic.url( "https://example.com" )
if result.success:
print ( "成功!" )
else :
print ( f "错误: { result.error } " )
except SSLError as e:
# 提供平台相关的帮助信息
print ( f "SSL 错误: { e } " )
except Exception as e:
print ( f "意外错误: { e } " )
创建 Bright Data 账户并获取 API token 选项 1:环境变量 export BRIGHTDATA_API_TOKEN = "your_token"
选项 2:.env 文件 # .env
BRIGHTDATA_API_TOKEN = your_token
BRIGHTDATA_CUSTOMER_ID = your_customer_id # 可选
选项 3:直接传入 client = BrightDataClient( token = "your_token" )
前往 账户设置 并确保你的 API key 拥有 管理员权限 。
v2.0.0 新功能
🎨 Dataclass Payloads
类型安全的请求 payloads
运行时验证并提供有用的错误信息
IDE 自动补全支持
辅助属性 (.asin, .is_remote_search, .domain)
与结果模型一致
brightdata 命令可在终端使用
抓取和搜索操作
多种输出格式(JSON、pretty、minimal)
支持文件输出
5 个全面的笔记本
Pandas 集成示例
数据分析工作流程
批处理指南
🆕 新平台
Facebook & Instagram
Facebook 抓取器(帖子、评论、Reels)
Instagram 抓取器(个人资料、帖子、评论、Reels)
Instagram 搜索(帖子和 Reels 发现)
单一共享 AsyncEngine(效率提升 8 倍)
减少内存占用
更好的资源管理
502+ 个全面测试
企业级 SDK ,提供 100% 类型安全、异步优先架构和全面测试。专为数据科学家和开发者打造。