如何将 Bright Data 集成到 Vercel AI SDK 中

Vercel AI SDK 是一个用于构建 AI 应用的 TypeScript 工具包，可与 React、Next.js、Vue、Svelte、Node.js 等框架搭配使用。它提供统一的 API，用于与不同的 AI 服务商交互，并包含用于流式输出、函数调用和构建对话式界面的实用工具。

入门步骤

先决条件

Bright Data API Key
Node.js 20.18.1+
TypeScript（推荐）

安装

安装所需依赖：

npm install @brightdata/sdk ai zod

创建 Bright Data 工具

创建文件 brightdata-tools.ts 并写入以下内容：

brightdata-tools.ts

import { tool, type Tool } from 'ai'
import { z } from 'zod'
import { bdclient } from '@brightdata/sdk'

type BrightDataTools = 'scrape' | 'search' | 'amazonProduct'|'linkedinCollectProfiles'

interface BrightDataToolsConfig {
apiKey: string
excludeTools?: BrightDataTools[]
}

export const brightDataTools = (
config: BrightDataToolsConfig
): Partial<Record<BrightDataTools, Tool>> => {
const client = new bdclient({ 
apiKey: config.apiKey,
autoCreateZones: true
})

const tools: Partial<Record<BrightDataTools, Tool>> = {
scrape: tool({
  description:
    'Scrape website content and return it in clean markdown format. Bypasses anti-bot protection and CAPTCHAs.',
  inputSchema: z.object({
    url: z
      .string()
      .url()
      .describe('The URL of the website to scrape'),
    country: z
      .string()
      .length(2)
      .optional()
      .describe('Two-letter country code for proxy location (e.g., "us", "gb", "de")'),
  }),
  execute: async ({ url, country }) => {
    try {
      const result = await client.scrape(url, {
        dataFormat: 'markdown',
        format: 'raw',
        country: country?.toLowerCase(),
      })
      return result
    } catch (error) {
      return `Error scraping ${url}: ${String(error)}`
    }
  },
}),

search: tool({
  description:
    'Search the web using Google, Bing, or Yandex. Returns search results with anti-bot protection bypass.',
  inputSchema: z.object({
    query: z
      .string()
      .describe('The search query'),
    searchEngine: z
      .enum(['google', 'bing', 'yandex'])
      .optional()
      .default('google')
      .describe('Search engine to use'),
    country: z
      .string()
      .length(2)
      .optional()
      .describe('Two-letter country code for localized results'),
    dataFormat: z
      .enum(['html', 'markdown'])
      .optional()
      .default('markdown')
      .describe('Format of returned search results'),
  }),
  execute: async ({ query, searchEngine, country, dataFormat }) => {
    try {
      const result = await client.search(query, {
        searchEngine,
        dataFormat,
        format: 'raw',
        country: country?.toLowerCase(),
      })
      return result
    } catch (error) {
      return `Error searching for "${query}": ${String(error)}`
    }
  },
}),

amazonProduct: tool({
  description:
    'Get detailed Amazon product information including price, ratings, reviews, and specifications. Requires a valid Amazon product URL.',
  inputSchema: z.object({
    url: z
      .string()
      .url()
      .describe('Amazon product URL (must contain /dp/ or /gp/product/)'),
    zipcode: z
      .string()
      .optional()
      .describe('ZIP code for location-specific pricing and availability'),
  }),
  execute: async ({ url, zipcode }) => {
    try {
      const result = await client.datasets.amazon.collectProducts(
        [{ url, zipcode }],
        { 
          format: 'json',
          async: false 
        }
      )
      return JSON.stringify(result, null, 2)
    } catch (error) {
      return `Error fetching Amazon product data: ${String(error)}`
    }
  },
}),

linkedinCollectProfiles: tool({
    description:
      'Fetch LinkedIn profile data for one or more LinkedIn profile URLs. Returns detailed information including work experience, education, skills, and contact information.',
    inputSchema: z.object({
      urls: z
        .array(z.string().url())
        .min(1)
        .describe('Array of LinkedIn profile URLs to collect data from (e.g., ["https://www.linkedin.com/in/example"])'),
      format: z
        .enum(['json', 'jsonl'])
        .optional()
        .default('json')
        .describe('Output format for the results'),
    }),
    execute: async ({ urls, format }) => {
      try {
        const result = await client.datasets.linkedin.collectProfiles(
          urls,
          {
            format: format || 'json',
            async: false
          }
        )
        return JSON.stringify(result, null, 2)
      } catch (error) {
        return `Error fetching LinkedIn profiles: ${String(error)}`
      }
    },
  }),
}

// Remove excluded tools
for (const toolName in tools) {
if (config.excludeTools?.includes(toolName as BrightDataTools)) {
  delete tools[toolName as BrightDataTools]
}
}

return tools
}

使用示例

Next.js App Router
Node.js Script
排除特定工具
高级：多数据集

创建一个可用于任意 AI 提供商的 API 路由：

app/api/chat/route.ts

import { openai } from '@ai-sdk/openai'
import { streamText, stepCountIs } from 'ai'
import { brightDataTools } from '@/lib/brightdata-tools'

export const maxDuration = 60

export async function POST(req: Request) {
  const { messages } = await req.json()

  const tools = brightDataTools({
    apiKey: process.env.BRIGHTDATA_API_KEY!,
  })

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    tools,
    stopWhen: stepCountIs(10),
  })

  return result.toDataStreamResponse()
}

然后在你的组件中使用它：

app/page.tsx

'use client'

import { useChat } from 'ai/react'

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat()

  return (
    <div className="flex flex-col h-screen">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((m) => (
          <div key={m.id} className="mb-4">
            <strong>{m.role === 'user' ? 'You: ' : 'AI: '}</strong>
            {m.content}
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit} className="p-4 border-t">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Try: 'Scrape https://example.com' or 'Search for best laptops 2024'"
          className="w-full p-2 border rounded"
        />
      </form>
    </div>
  )
}

示例输出

抓取示例

用户输入： “Scrape https://example.com and tell me what it’s about” AI 回复：

I've scraped the website. Here's what I found:

# Example Domain

This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.

The website appears to be a placeholder domain used for documentation and
examples. It's maintained by IANA (Internet Assigned Numbers Authority) and
serves as a standard example domain that can be referenced in documentation
without needing permission.

搜索示例

用户输入： “Search for best mechanical keyboards 2024” AI 回复：

I found several highly-rated mechanical keyboards for 2024:

1. **Keychron Q1 Pro** - A premium 75% layout keyboard with hot-swappable
   switches, wireless connectivity, and excellent build quality. Price around $200.

2. **Wooting 60HE** - Features analog switches that allow for adjustable
   actuation points and rapid trigger technology. Popular among gamers.

3. **GMMK Pro** - A versatile 75% gasket-mounted keyboard with extensive
   customization options and strong community support.

All three offer different features depending on whether you prioritize gaming,
typing, or customization.

亚马逊产品示例

用户输入： “Get information about Amazon product B0D2Q9397Y” AI 回复：

I've retrieved the product information:

**Product:** Logitech MX Master 3S Wireless Mouse
**Price:** $99.99
**Rating:** 4.6/5 stars (8,234 reviews)
**Key Features:**

* 8K DPI sensor for precise tracking
* Quiet clicks technology
* USB-C charging
* Multi-device connectivity (up to 3 devices)
* Customizable buttons

The product has excellent reviews, with users praising its ergonomics and
productivity features. However, some note it's on the pricier side. It's
particularly popular among creative professionals and developers.

可用工具

工具	描述	使用场景
`scrape`	抓取任意网站并返回 markdown 内容	内容提取、监控、数据采集
`search`	搜索 Google、Bing 或 Yandex	研究、竞品分析、趋势监控
`amazonProduct`	获取亚马逊产品详情	价格监控、产品研究、产品对比
`linkedinCollectProfiles`	获取 LinkedIn 个人资料	数据增强、人员研究

最佳实践

错误处理：始终使用 try-catch 包裹工具调用
速率限制：多次请求时注意 API 限速
数据格式：抓取时使用 markdown 格式以获得更干净的内容
异步操作：处理大型数据集时使用 async: true 避免超时
地域定向：需要本地化结果时可指定国家代码

环境变量

.env.local

BRIGHTDATA_API_KEY=your_api_key_here

从 Bright Data Dashboard 获取你的 API Key。

​入门步骤

​示例输出

​抓取示例

​搜索示例

​亚马逊产品示例

​可用工具

​更多数据集工具

​最佳实践

​环境变量