验证码解算器
使用抓取浏览器浏览页面时,默认情况下,我们的集成验证码解算器会自动解算所有验证码 。您可以使用以下自定义 CDP 函数在代码中监控此自动解算过程。
验证码解算后,如有表单需要提交,则默认情况下会提交。
验证码解算器 - 自动解算
使用此指令返回验证码已解算、失败或未检测到之后的状态。
使用此指令返回验证码已解算、失败或未检测到之后的状态。 Captcha.Solve
SolveResult
SolveStatus
Captcha . Solve ({
detectTimeout? : number // Detect timeout in millisecond for solver to detect captcha
options ?: CaptchaOptions [] // Configuration options for captcha solving
}) : SolveResult
示例 NodeJS - Puppeteer
Python - Playwright
const page = await browser . newPage ();
const client = await page . target (). createCDPSession ();
await page . goto ( '[https://site-with-captcha.com](https://site-with-captcha.com/)' );
// Note 1: If no captcha was found it will return not_detected status after detectTimeout
// Note 2: Once a CAPTCHA is solved, if there is a form to submit, it will be submitted by default
const client = await page . target (). createCDPSession ();
const { status } = await client . send ( 'Captcha.Solve' , { detectTimeout: 30 * 1000 });
console . log ( `Captcha solve status: ${ status } ` )
如果验证码解算失败,请重新尝试。 如果问题仍然存在,请提交支持请求 ,详细说明您遇到的具体问题。
使用以下指令查明验证码解算流程中更具体的阶段: Captcha.detected抓取浏览器遇到验证码并开始解算 Captcha.SolveFinished抓取浏览器成功解算验证码 Captcha.SolveFailed抓取浏览器未能解算验证码 Captcha.waitForSolve抓取浏览器等待验证码解算器完成
示例 以下代码设置 CDP 会话、监听 CAPTCHA 事件并处理超时: NodeJS - Puppeteer
Python - Playwright
// Node.js - Puppeteer - waiting for CAPTCHA solving events
const client = await page . target (). createCDPSession ();
await new Promise (( resolve , reject ) => {
client . on ( 'Captcha.SolveFinished' , resolve );
client . on ( 'Captcha.SolveFailed' , () => reject ( new Error ( 'Captcha failed' )));
setTimeout ( reject , 5 * 60 * 1000 , new Error ( 'Captcha solve timeout' ));
});
Selenium doesn’t support asynchronous server-driven events like Puppeteer and Playwright.
The Captcha.waitForSolve command waits for 抓取浏览器’s CAPTCHA solver to finish. # Python Selenium - Waiting for Captcha to auto-solve after navigate
driver.execute( 'executeCdpCommand' , {
'cmd' : 'Captcha.waitForSolve' ,
'params' : {},
})
验证码解算器 - 手动控制
如果您想手动配置或完全禁用我们的默认验证码解算器,请改为手动调用解算器或自己解算,请参阅以下 CDP 指令和功能。
此指令用于控制验证码的自动解算。 您可以禁用自动解算或为不同的验证码类型配置算法,然后手动触发此操作: Captcha.setAutoSolve
CaptchaOptions
Captcha . setAutoSolve ({
autoSolve: boolean // Whether to automatically solve captcha after navigate
options ?: CaptchaOptions [] // Configuration options for captcha auto-solving
}) : void
在会话中完全禁用自动解算器的 CDP 指令示例: NodeJS - Puppeteer
Python - Playwright
- Selenium
// Node.js Puppeteer - Disable Captcha auto-solver completely
const page = await browser . newPage ();
const client = await page . target (). createCDPSession ();
await client . send ( 'Captcha.setAutoSolve' , { autoSolve: false })
NodeJS - Puppeteer
Python - Playwright
// Node.js Puppeteer - Disable Captcha auto-solver for ReCaptcha only
const page = await browser . newPage ();
const client = await page . target (). createCDPSession ();
await client . send ( 'Captcha.setAutoSolve' , {
autoSolve: true ,
options: [{
type: 'usercaptcha' ,
disabled: true ,
}],
});
NodeJS - Puppeteer
Python - Playwright
Python - Selenium
// Node.js Puppeteer - manually solving CAPTCHA after navigation
const page = await browser . newPage ();
const client = await page . target (). createCDPSession ();
await client . send ( 'Captcha.setAutoSolve' , { autoSolve: false });
await page . goto ( 'https://site-with-captcha.com' , { timeout: 2 * 60 * 1000 });
const { status } = await client . send ( 'Captcha.Solve' , { detectTimeout: 30 * 1000 });
console . log ( 'Captcha solve status:' , status );
对于以下三种验证码类型,我们支持以下附加选项,以控制和配置我们的自动解算算法。 CF Challenge
HCaptcha
usercaptcha (reCAPTCHA)
timeout : 40000
selector : '#challenge-body-text, .challenge-form'
check_timeout : 300
error_selector : '#challenge-error-title'
success_selector : '#challenge-success[style*=inline]'
check_success_timeout : 300
btn_selector : '#challenge-stage input[type=button]'
cloudflare_checkbox_frame_selector : '#turnstile-wrapper iframe'
checkbox_area_selector : '.ctp-checkbox-label .mark'
wait_timeout_after_solve : 500
wait_networkidle : { timeout : 500 }
detect_selector :
'#cf-hcaptcha-container, #challenge-hcaptcha-wrapper .hcaptcha-box, .h-captcha'
pass_proxy : true
submit_form : true
submit_selector : '#challenge-form body > form[action*="internalcaptcha/captchasubmit"]
value_selector : '.h-captcha textarea[id^="h-captcha-response"]'
{ // configuration keys and default values for reCAPTCHA (type=usercaptcha)
type : 'usercaptcha' ,
// selector to retrieve sitekey and/or action
selector : '.g-recaptcha, .recaptcha' ,
// attributes to search for sitekey
sitekey_attributes : [ 'data-sitekey' , 'data-key' ],
// attributes to search for action
action_attributes : [ 'data-action' ],
// detect selectors
detect_selector : `
.g-recaptcha[data-sitekey] > *,
.recaptcha > *,
iframe[src*="[www.google.com/recaptcha/api2](http://www.google.com/recaptcha/api2)"],
iframe[src*="[www.recaptcha.net/recaptcha/api2](http://www.recaptcha.net/recaptcha/api2)"],
iframe[src*="[www.google.com/recaptcha/enterprise](http://www.google.com/recaptcha/enterprise)"]` ,
// element to type response code into
reponse_selector : '#g-recaptcha-response, .g-recaptcha-response' ,
// should solver submit form automatically after captcha solved
submit_form : true ,
// selector for submit button
submit_selector : '[type=submit]' ,
}
仿真函数
Emulation.getSupportedDevices
使用此指令获取所有可仿真的设备列表。 此方法返回可与 setDevice 指令一起使用的设备选项阵列。 Emulation . getSupportedDevices (). then ( devices => { console . log ( devices );});
收到上面的支持设备列表后,您可以使用 Emulation.setDevice 指令模拟特定设备。此指令更改屏幕宽度、高度、userAgent 和 devicePixelRatio 以匹配指定的设备。 Emulation . setDevice ({ device: '[device_name]' });
横向模式 如果您想将方向更改为横向(适用于支持横向的设备),可在device_name 之后添加字符串landscape。 Emulation . setDevice ({ device: 'iPhone X landscape' });
广告拦截器
启用我们的 AdBlock 功能可以帮助 减少带宽 使用并 提升广告密集型网站的性能 。
CDP 命令
Unblocker.enableAdBlock – 启用广告拦截器(默认:关闭)
Unblocker.disableAdBlock – 禁用广告拦截器
// 在导航之前启用广告拦截
const client = await page . createCDPSession ();
try {
await client . send ( 'Unblocker.enableAdBlock' , {
list: [ null ],
});
} catch ( e ) {
console . error ( e . stack || e );
}
await page . goto ( 'https://www.w3schools.com/html/html_forms.asp' );
查看完整的 广告拦截示例脚本 。
会话持久化
使用此命令可以在多个浏览会话中重用同一个代理节点。这在需要保持会话一致性(如保留浏览器状态或基于 IP 的连续性)的场景中非常有用。
CDP 命令
Proxy.useSession – 将会话与特定的 session ID 关联。
sessionId – 唯一标识你的会话的字符串。
const client = await page . createCDPSession ();
await client . send ( 'Proxy.useSession' , { sessionId });
await page . goto ( 'https://geo.brdtest.com/mygeo.json' );
文件下载
你可以在 Browser API 流程中使用自定义 Download CDP 域自动化文件下载。这对于需要在浏览器自动化过程中直接下载文件(例如 CSV、PDF)的工作流非常有用。
CDP 命令
Download.enable – 启用指定内容类型的文件下载。
Download.downloadRequest – 当请求产生下载时触发。
Download.getLastCompleted – 获取上一次完成的下载信息。
Download.getDownloadedBody – 获取实际下载的文件内容。
const client = await page . createCDPSession ();
// 启用二进制文件(如 CSV)的下载
await client . send ( 'Download.enable' , { allowedContentTypes: [ 'application/octet-stream' ] });
// 发起文件下载
await Promise . all ([
new Promise ( resolve => client . once ( 'Download.downloadRequest' , resolve )),
page . click ( selector ),
]);
// 下载完成后:
const { id } = await client . send ( 'Download.getLastCompleted' );
const { body , base64Encoded } = await client . send ( 'Download.getDownloadedBody' , { id });
const fs = require ( 'fs' );
fs . writeFileSync ( './downloaded_file.csv' , base64Encoded ? Buffer . from ( body , 'base64' ) : body );
更快的文本输入
对于需要快速或批量文本输入的场景,使用自定义 Input.type CDP 命令。这种方法比标准 CDP 文本输入方法更快,非常适合需要高速输入或处理大量文本的自动化任务。
CDP 命令
Input.type - 向当前聚焦的元素发送按键或模拟输入指定文本。
const client = await page . createCDPSession ();
// 聚焦输入元素
await page . focus ( 'input' );
// 输入消息
await client . send ( 'Input.type' , {
text: 'what is the best place to try pizza and pasta?'
});
自定义客户端 SSL/TLS 证书
使用此命令可在特定域认证所需时安装自定义客户端 SSL/TLS 证书。这些证书在 单次 Browser API 会话期间应用,并在会话结束后自动移除。
Browser . addCertificate ( params : {
cert: string // base64 编码的证书文件
pass : string // 证书密码
}) : void
用有效的 Browser API 凭据替换占位符 SBR_ZONE_FULL_USERNAME:SBR_ZONE_PASSWORD。
将 client.pfx 替换为实际证书文件路径,该文件应为有效的 .pfx 格式 SSL/TLS 客户端证书。
将 secret 替换为证书的实际密码。
NodeJS - Puppeteer
Python - Selenium
const puppeteer = require ( 'puppeteer-core' );
const fs = require ( 'fs/promises' );
const {
AUTH = 'SBR_ZONE_FULL_USERNAME:SBR_ZONE_PASSWORD' ,
TARGET_URL = 'https://example.com' ,
CERT_FILE = 'client.pfx' ,
CERT_PASS = 'secret' ,
} = process . env ;
async function scrape ( url = TARGET_URL , file = CERT_FILE , pass = CERT_PASS ) {
if ( AUTH == 'SBR_ZONE_FULL_USERNAME:SBR_ZONE_PASSWORD' ) {
throw new Error ( `Provide Browser API credentials in AUTH`
+ ` environment variable or update the script.` );
}
console . log ( `Connecting to Browser...` );
const browserWSEndpoint = `wss:// ${ AUTH } @brd.superproxy.io:9222` ;
const browser = await puppeteer . connect ({ browserWSEndpoint });
try {
console . log ( `Connected! Installing ${ file } certificate...` );
const page = await browser . newPage ();
const client = await page . createCDPSession ();
const cert = ( await fs . readFile ( CERT_FILE )). toString ( 'base64' );
await client . send ( 'Browser.addCertificate' , { cert , pass });
console . log ( `Installed! Navigating to ${ url } ...` );
await page . goto ( url , { timeout: 2 * 60 * 1000 });
console . log ( `Navigated! Scraping page content...` );
const data = await page . content ();
console . log ( `Scraped! Data: ${ data } ` );
} finally {
await browser . close ();
}
}
scrape ();