Web Unlocker
SERP API
Scraping Browser
Web Scraper IDE
Browser Extension
Bright Shield
FAQs
Web scraping projects often require intricate interactions with target websites and debugging is vital for identifying and resolving issues found during the development process.
The Scraping Browser Debugger serves as a valuable resource, enabling you to inspect, analyze, and fine-tune your code alongside Chrome Dev Tools, resulting in better control, visibility, and efficiency.
Our Scraping Browser Debugger can be launched via two methods: Manually via Control Panel OR Remotely via your script.
The Scraping Browser Debugger can be easily accessed within your Bright Data Control Panel. Follow these steps:
- Within the control panel, go to My Proxies view
- Click on your Scraping Browser proxy
- Click on the Access parameters tab
- On the right side, Click on the “Chrome Dev Tools Debugger” button
Getting Started with the Debugger & Chrome Dev Tools
Open a Scraping Browser Session
- Ensure you have an active Scraping Browser session
- If you don’t yet know how to launch a scraping browser session, see our Quick Start guide.
Launch the Debugger
- Once your session is up and running you can now launch the Debugger.
- Click on the Debugger button within your Access parameters tab to launch the Scraping Browser Debugger interface (see the screenshot above )
Connect with your live browser sessions
- Within the Debugger interface, you will find a list of your live Scraping Browser sessions.
- Select the preferred session that you wish to debug
- Click on the session link or copy/paste it into your browser of choice, and this will establish a connection between the Debugger and your selected session.
Leveraging Chrome Dev Tools
- With the Scraping Browser Debugger now connected to your live session, you gain access to the powerful features of Chrome Dev Tools.
- Utilize the Dev Tools interface to inspect HTML elements, analyze network requests, debug JavaScript code, and monitor performance. Leverage breakpoints, console logging, and other debugging techniques to identify and resolve issues within your code.
If you would like to automatically launch devtools on every session to view your live browser session, you can integrate the following code snippet:
// Node.js Puppeteer - launch devtools locally
const {
exec
} = require('child_process');
const chromeExecutable = 'google-chrome';
const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
const openDevtools = async (page, client) => {
// get current frameId
const frameId = page.mainFrame()._id;
// get URL for devtools from scraping browser
const {
url: inspectUrl
} = await client.send('Page.inspect', {
frameId
});
// open devtools URL in local chrome
exec(`"${chromeExecutable}" "${inspectUrl}"`, error => {
if (error)
throw new Error('Unable to open devtools: ' + error);
});
// wait for devtools ui to load
await delay(5000);
};
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await openDevtools(page, client);
await page.goto('http://example.com');
Debugger Walkthrough
Check out the Scraping Browser Debugger in action below
<inser-video-here>
You can easily trigger a screenshot of the browser at any time by adding the following to your code:
// node.js puppeteer - Taking screenshot to file screenshot.png
await page.screenshot({ path: 'screenshot.png', fullPage: true });
To take screenshots on Python and C# see here.
There is a lot of “behind the scenes” work that goes into unlocking your targeted site.
Some sites will take just a few seconds for navigation, while others might take even up to a minute or two as they require more complex unlocking procedures. As such, we recommend setting your navigation timeout to “2 minutes” to give the navigation enough time to succeed if needed.
You can set your navigation timeout to 2 minutes by adding the following line in your script before your “page.goto” call.
// node.js puppeteer - Navigate to site with 2 min timeout
page.goto('<https://example.com>', { timeout: 2*60*1000 });
Error Code | Meaning | What can you do about it? |
Unexpected server response: 407 | An issue with the remote browser’s port | Please check your remote browser’s port. The correct port for Scraping Browser is port:9222 |
Unexpected server response: 403 | Authentication Error | Check authentication credentials (username, password) and check that you are using the correct “Browser API” zone from Bright Data control panel |
Unexpected server response: 503 | Service Unavailable | We are likely scaling browsers right now to meet demand. Try to reconnect in 1 minute. |
You can check your connection with the following curl:
curl -v -u USER:PASS https://brd.superproxy.io:9222/json/protocol
For any other issues please see our Troubleshooting guide or contact support.
When optimizing your web scraping projects, conserving bandwidth is the key.
Explore our tips and guidelines below on effective bandwidth-saving techniques that you can utilize within your script to ensure efficient and resource-friendly scraping.
A typical inefficiency when scraping browsers is the unnecessary downloading of media content, such as images and videos, from your targeted domains. Learn below how to easily avoid this by excluding them right from within your script.
Given that anti-bot systems expect specific resources to load for particular domains, approach resource-blocking cautiously, as it can directly impact Scraping Browser’s ability to successfully load your target domains. If you encounter any issues after applying resource blocks, please ensure that they persist even when your blocking logic is reverted, before contacting our support team.
const page = await browser.newPage();
// Enable request interception
await page.setRequestInterception(true);
// Listen for requests
page.on('request', (request) => {
if (request.resourceType() === 'image') {
// If the request is for an image, block it
request.abort();
} else {
// If it's not an image request, allow it to continue
request.continue();
}
});
One common inefficiency in scraping jobs is the repeated downloading of the same page during a single session.
Leveraging cached pages - a version of a previously scraped page - can significantly increase your scraping efficiency, as it can be used to avoid repeated network requests to the same domain. Not only does it save on bandwidth by avoiding redundant fetches, but it also ensures faster and more responsive interactions with the preloaded content.
A single Scraping Browser session can persist for up to 20 minutes. This duration allows you ample opportunity to revisit and re-navigate the page as needed within the same session, eliminating the need for redundant sessions on identical pages during your scraping job.
Example In a multi-step web scraping workflow, you often gather links from a page and then dive into each link for more detailed data extraction.
You’ll often need to revisit the initial page for cross-referencing or validation. By leveraging caching, these revisits don’t trigger new network requests as the data is simply loaded from the cache.
const puppeteer = require('puppeteer-core');
const AUTH = 'USER:PASS';
const SBR_WS_ENDPOINT = `wss://${AUTH}@brd.superproxy.io:9222`;
async function main() {
console.log('Connecting to Scraping Browser...');
const browser = await puppeteer.connect({
browserWSEndpoint: SBR_WS_ENDPOINT,
});
try {
console.log('Connected! Navigating...');
const page = await browser.newPage();
await page.goto('https://example.com', {
timeout: 2 * 60 * 1000
});
// Extract product links from the listing page
const productLinks = await page.$$eval('.product-link', links => links.map(link => link.href));
const productDetails = [];
// Navigate to an individual product page
for (let link of productLinks) {
await page.goto(link);
// Extract the product's name
const productName = await page.$eval('.product-name', el => el.textContent);
// Apply a coupon (assuming it doesn't navigate away)
await page.click('.apply-coupon-button');
// Extract the discounted product's price from the cached product detail page
const productPrice = await page.$eval('.product-price', el => el.textContent);
// Store product details
productDetails.push({
productName,
productPrice
});
}
} finally {
await browser.close();
}
}
- Limit Your Requests: Only scrape what you need, rather than downloading entire webpages or sites.
- Concurrency Control: Limit the number of concurrent pages or browsers you open. Too many parallel processes can exhaust resources.
- Session Management: Ensure you properly manage and close sessions after scraping. This prevents resource and memory leaks.
- Opt for APIs: If the target website offers an API, use it instead of direct scraping. APIs are typically more efficient and less bandwidth-intensive than scraping full web pages.
- Fetch Incremental Data: If scraping periodically, try to fetch only new or updated data rather than re-fetching everything.