To get started, you need your API credentials, your Username and Password. You can find these credentials in the Access Parameters tab of newly created Scraping Browser.

Installation

npm i puppeteer-core

Sample Code

Try running the example script below (swap in your credentials, zone, and target URL):

const puppeteer = require('puppeteer-core');  
const AUTH = 'USER:PASS';  
const SBR_WS_ENDPOINT = `wss://${AUTH}@brd.superproxy.io:9222`;  
  
async function main() {  
    console.log('Connecting to Scraping Browser...');  
    const browser = await puppeteer.connect({  
        browserWSEndpoint: SBR_WS_ENDPOINT,  
   });  
    try {  
        console.log('Connected! Navigating...');  
        const page = await browser.newPage();  
        await page.goto('https://example.com', { timeout: 2 * 60 * 1000 });  
        console.log('Taking screenshot to page.png');  
        await page.screenshot({ path: './page.png', fullPage: true });  
   console.log('Navigated! Scraping page content...');  
 const html = await page.content();  
 console.log(html)  
 // CAPTCHA solving: If you know you are likely to encounter a CAPTCHA on your target page, add the following few lines of code to get the status of Scraping Browser's automatic CAPTCHA solver   
 // Note 1: If no captcha was found it will return not_detected status after detectTimeout   
 // Note 2: Once a CAPTCHA is solved, if there is a form to submit, it will be submitted by default   
 // const client = await page.target().createCDPSession();  
 // const {status} = await client.send('Captcha.solve', {detectTimeout: 30*1000});   
 // console.log(`Captcha solve status: ${status}`)   
    } finally {  
        await browser.close();  
   }  
}  
  
if (require.main === module) {  
    main().catch(err => {  
        console.error(err.stack || err);  
        process.exit(1);  
   });  
}

Run the Script

Save the above code as script.js and run it using this command:

node script.js

View live browser session

The Scraping Browser Debugger enables developers to inspect, analyze, and fine-tune their code alongside Chrome Dev Tools, resulting in better control, visibility, and efficiency. You can integrate the following code snippet to launch devtools automatically for every session:

// Node.js Puppeteer - launch devtools locally  
  
const { exec } = require('child_process');  
const chromeExecutable = 'google-chrome';  
  
const delay = ms => new Promise(resolve => setTimeout(resolve, ms));  
const openDevtools = async (page, client) => {  
    // get current frameId  
    const frameId = page.mainFrame()._id;  
    // get URL for devtools from scraping browser  
    const { url: inspectUrl } = await client.send('Page.inspect', { frameId });  
    // open devtools URL in local chrome  
    exec(`"${chromeExecutable}" "${inspectUrl}"`, error => {  
        if (error)  
            throw new Error('Unable to open devtools: ' + error);  
    });  
    // wait for devtools ui to load  
    await delay(5000);  
};  
  
const page = await browser.newPage();  
const client = await page.target().createCDPSession();  
await openDevtools(page, client);  
await page.goto('http://example.com');

Single Navigation Per Session

Scraping Browser sessions are structured to allow one initial navigation per session. This initial navigation refers to the first instance of loading the target site from which data is to be extracted. Following this, users are free to navigate the site using clicks, scrolls, and other interactive actions within the same session. However, to start a new scraping job, either on the same site or a different one, from the initial navigation stage, it is necessary to begin a new session.

Session Time Limits

Scraping Browser has 2 kinds of timeouts aimed to safeguard our customers from uncontrolled usage.

  1. Idle Session Timeout: in case a browser session is kept open for 5 minutes and above in an idle mode, meaning no usage going through it, Scraping Browser will automatically timeout the session.
  2. Maximum Session Length Timeout: Scraping Browser session can last up to 30 minutes. Once the maximum session time is reached the session will automatically timeout.