安装软件包
打开终端并运行:
pip install brightdata - sdk
在你的代码文件中 ,导入该软件包并发起你的第一个请求:
from brightdata import BrightDataClient
# Initialize client (auto-loads from BRIGHTDATA_API_TOKEN env var)
client = BrightDataClient()
# Search Google
results = client.search.google( query = "best selling shoes" )
if results.success:
print ( f "Found { len (results.data) } results" )
for item in results.data[: 5 ]:
print ( f " { item[ 'position' ] } . { item[ 'title' ] } " )
SDK 会自动从 BRIGHTDATA_API_TOKEN 环境变量或 .env 文件加载你的 API token。你也可以直接传入:BrightDataClient(token="your_token")
启动抓取和网络搜索
尝试以下示例,在你的 IDE 中使用 Bright Data 的 SDK 功能
Search Engines
Web Scraping
from brightdata import BrightDataClient
client = BrightDataClient()
# Google search
results = client.search.google(
query = "best shoes of 2025" ,
location = "United States" ,
language = "en" ,
num_results = 20
)
# Bing search
results = client.search.bing(
query = "python tutorial" ,
location = "United States"
)
# Yandex search
results = client.search.yandex(
query = "latest news" ,
location = "Russia"
)
if results.success:
print ( f "Cost: \$ { results.cost :.4f} " )
print ( f "Time: { results.elapsed_ms() :.2f} ms" )
当处理多个查询或 URL 时,请求会并发处理以获得最佳性能。
使用平台特定抓取器获取结构化数据
从流行平台提取结构化数据,如 Amazon、LinkedIn、ChatGPT、Facebook 和 Instagram
Amazon Products
LinkedIn - Search & Scrape
Facebook & Instagram
ChatGPT Prompts
from brightdata import BrightDataClient
from brightdata.payloads import AmazonProductPayload
client = BrightDataClient()
# Scrape Amazon product with type-safe payload
payload = AmazonProductPayload(
url = "https://amazon.com/dp/B0CRMZHDG8" ,
reviews_count = 50
)
result = client.scrape.amazon.products( ** payload.to_dict())
if result.success:
product = result.data[ 0 ]
print ( f "Title: { product[ 'title' ] } " )
print ( f "Price: \$ { product[ 'final_price' ] } " )
print ( f "Rating: { product[ 'rating' ] } " )
# Scrape reviews with filters
result = client.scrape.amazon.reviews(
url = "https://amazon.com/dp/B0CRMZHDG8" ,
pastDays = 30 ,
keyWord = "quality" ,
numOfReviews = 100
)
在你的 IDE 中,将鼠标悬停在 BrightDataClient 类或其任何方法 上,即可查看可用参数、类型提示和使用示例。SDK 提供完整的 IntelliSense 支持!
使用 dataclass payloads 以保证类型安全
SDK 包含带运行时验证和辅助属性的 dataclass payloads
from brightdata import BrightDataClient
from brightdata.payloads import (
AmazonProductPayload,
LinkedInJobSearchPayload,
ChatGPTPromptPayload
)
client = BrightDataClient()
# Amazon product with validation
amazon_payload = AmazonProductPayload(
url = "https://amazon.com/dp/B123456789" ,
reviews_count = 50 # 运行时验证!
)
print ( f "ASIN: { amazon_payload.asin } " ) # 辅助属性
print ( f "Domain: { amazon_payload.domain } " )
# LinkedIn job search
linkedin_payload = LinkedInJobSearchPayload(
keyword = "python developer" ,
location = "San Francisco" ,
remote = True
)
print ( f "远程搜索: { linkedin_payload.is_remote_search } " )
# Use with client
result = client.scrape.amazon.products( ** amazon_payload.to_dict())
连接抓取浏览器
使用 SDK 轻松连接到 Bright Data 的抓取浏览器
from brightdata import BrightDataClient
from playwright.sync_api import Playwright, sync_playwright
client = BrightDataClient(
token = "your_api_token" ,
browser_username = "username-zone-browser_zone1" ,
browser_password = "your_password"
)
def scrape ( playwright : Playwright, url = 'https://example.com' ):
browser = playwright.chromium.connect_over_cdp(client.connect_browser())
try :
print ( f '已连接! 正在导航到 { url } ...' )
page = browser.new_page()
page.goto(url, timeout = 2 * 60_000 )
print ( '已导航! 正在抓取页面内容...' )
data = page.content()
print ( f '抓取完成! 数据长度: { len (data) } ' )
finally :
browser.close()
def main ():
with sync_playwright() as playwright:
scrape(playwright)
if __name__ == '__main__' :
main()
使用 CLI 工具
SDK 包含一个强大的命令行界面,可在终端使用
# 搜索操作
brightdata search google "python tutorial" --location "United States"
brightdata search linkedin jobs --keyword "python developer" --remote
# 抓取操作
brightdata scrape amazon products "https://amazon.com/dp/B123"
brightdata scrape linkedin profiles "https://linkedin.com/in/johndoe"
# 通用网页抓取
brightdata scrape generic "https://example.com" --output-format pretty
# 保存结果到文件
brightdata search google "AI news" --output-file results.json
异步使用以提高性能
对于并发操作,请使用异步 API:
import asyncio
from brightdata import BrightDataClient
async def scrape_multiple ():
# 使用异步上下文管理器
async with BrightDataClient() as client:
# 并发抓取多个 URL
results = await client.scrape.generic.url_async([
"https://example1.com" ,
"https://example2.com" ,
"https://example3.com"
])
for result in results:
if result.success:
print ( f "成功: { result.elapsed_ms() :.2f} ms" )
asyncio.run(scrape_multiple())
使用 *_async 方法时,请始终使用异步上下文管理器 (async with BrightDataClient() as client)。同步封装会自动处理此问题。
客户端初始化
配置 BrightDataClient
client = BrightDataClient(
token = "your_token" , # 自动从 BRIGHTDATA_API_TOKEN 加载
customer_id = "your_customer_id" , # 可选 - 自动从 BRIGHTDATA_CUSTOMER_ID 加载
timeout = 30 , # 默认超时时间(秒)
web_unlocker_zone = "sdk_unlocker" , # Web Unlocker 区域名称
serp_zone = "sdk_serp" , # SERP API 区域名称
browser_zone = "sdk_browser" , # Browser API 区域名称
auto_create_zones = False , # 自动创建缺失的区域
validate_token = False # 初始化时验证 token
)
使用结构化结果的搜索引擎(Google、Bing、Yandex) client.search.google(
query = "搜索词" , # 搜索查询(必填)
location = "United States" , # 地理位置
language = "en" , # 语言代码
num_results = 20 , # 返回结果数量
timeout = 30 # 请求超时
)
client.search.bing( query = "..." , location = "..." )
client.search.yandex( query = "..." , location = "..." )
从 URL 提取数据,支持多种响应格式 client.scrape.generic.url(
url = "https://example.com" , # 单个 URL 或 URL 列表(必填)
response_format = "raw" , # "raw" 或 "json"
timeout = 30 , # 请求超时
country = "US" # 两位国家代码
)
平台抓取器
Amazon, LinkedIn, ChatGPT, Facebook, Instagram
平台特定抓取器,输出结构化数据 Amazon: client.scrape.amazon.products( url = "..." , reviews_count = 50 )
client.scrape.amazon.reviews( url = "..." , pastDays = 30 , numOfReviews = 100 )
client.scrape.amazon.sellers( url = "..." )
LinkedIn: # URL 抓取 (scrape namespace)
client.scrape.linkedin.profiles( url = "..." )
client.scrape.linkedin.companies( url = "..." )
client.scrape.linkedin.jobs( url = "..." )
# 搜索 (search namespace)
client.search.linkedin.jobs( keyword = "..." , location = "..." , remote = True )
client.search.linkedin.profiles( firstName = "..." , lastName = "..." )
client.search.linkedin.posts( profile_url = "..." , start_date = "..." )
Facebook & Instagram: client.scrape.facebook.posts_by_profile( url = "..." , num_of_posts = 10 )
client.scrape.facebook.comments( url = "..." , num_of_comments = 100 )
client.scrape.instagram.profiles( url = "..." )
client.search.instagram.posts( url = "..." , post_type = "reel" )
ChatGPT: client.scrape.chatgpt.prompt( prompt = "..." , web_search = True )
client.scrape.chatgpt.prompts( prompts = [ "..." , "..." ])
所有操作返回包含时间和费用信息的结果对象 result = client.scrape.amazon.products( url = "..." )
# 访问数据
result.success # bool - 操作是否成功
result.data # Any - 抓取的数据
result.error # str | None - 错误信息
result.cost # float | None - 美元花费
result.platform # str | None - 平台名称
result.method # str | None - 使用的方法
# 时间信息
result.elapsed_ms() # 总耗时(毫秒)
result.get_timing_breakdown() # 详细时间字典
# 序列化
result.to_dict() # 转为字典
result.to_json( indent = 2 ) # JSON 字符串
result.save_to_file( "result.json" ) # 保存到文件
错误处理与测试
from brightdata import BrightDataClient
client = BrightDataClient()
# 测试连接
is_valid = client.test_connection_sync()
print ( f "连接有效: { is_valid } " )
# 获取账户信息
info = client.get_account_info_sync()
print ( f "区域数量: { info[ 'zone_count' ] } " )
print ( f "活跃区域: { [z[ 'name' ] for z in info[ 'zones' ]] } " )
SDK 可以自动创建所需区域,也可以手动管理 # 启用自动创建
client = BrightDataClient( auto_create_zones = True )
# 手动列出区域
zones = client.list_zones_sync()
for zone in zones:
print ( f "区域: { zone[ 'name' ] } (类型: { zone.get( 'type' , 'unknown' ) } )" )
from brightdata import BrightDataClient
from brightdata.exceptions import SSLError
try :
client = BrightDataClient()
result = client.scrape.generic.url( "https://example.com" )
if result.success:
print ( "成功!" )
else :
print ( f "错误: { result.error } " )
except SSLError as e:
# 提供平台相关的帮助信息
print ( f "SSL 错误: { e } " )
except Exception as e:
print ( f "意外错误: { e } " )
创建 Bright Data 账户并获取 API token 选项 1:环境变量 export BRIGHTDATA_API_TOKEN = "your_token"
选项 2:.env 文件 # .env
BRIGHTDATA_API_TOKEN = your_token
BRIGHTDATA_CUSTOMER_ID = your_customer_id # 可选
选项 3:直接传入 client = BrightDataClient( token = "your_token" )
前往 账户设置 并确保你的 API key 拥有 管理员权限 。
v2.0.0 新功能
🎨 Dataclass Payloads
类型安全的请求 payloads
运行时验证并提供有用的错误信息
IDE 自动补全支持
辅助属性 (.asin, .is_remote_search, .domain)
与结果模型一致
brightdata 命令可在终端使用
抓取和搜索操作
多种输出格式(JSON、pretty、minimal)
支持文件输出
5 个全面的笔记本
Pandas 集成示例
数据分析工作流程
批处理指南
🆕 新平台
Facebook & Instagram
Facebook 抓取器(帖子、评论、Reels)
Instagram 抓取器(个人资料、帖子、评论、Reels)
Instagram 搜索(帖子和 Reels 发现)
单一共享 AsyncEngine(效率提升 8 倍)
减少内存占用
更好的资源管理
502+ 个全面测试
企业级 SDK ,提供 100% 类型安全、异步优先架构和全面测试。专为数据科学家和开发者打造。