Start with the cheapest IP type that works
The single biggest cost lever in web scraping is IP type. Datacenter IPs are significantly cheaper per gigabyte than residential or mobile IPs, yet many teams default to residential when datacenter would work fine. Think of it like shipping: you don’t overnight everything just because it’s easier. You start with ground shipping and only upgrade when the delivery window demands it.| IP type | Relative cost | Best for |
|---|---|---|
| Datacenter | Lowest | Sites without advanced anti-bot systems |
| ISP | Medium | Sites that need residential-grade trust at datacenter speed |
| Residential | Higher | Sites with Cloudflare, Akamai, or PerimeterX protection |
| Mobile | Highest | Heavily protected targets that block all other IP types |
How to test if a site works with datacenter IPs
Before writing any code, test manually in a browser:- Configure your browser (Firefox works well) to route traffic through a datacenter proxy with a single sticky session
- Navigate to the target URL
- If the page loads normally, datacenter IPs can work for this site
Making datacenter IPs work with headers and cookies
A bare HTTP request with no headers or cookies won’t work on most sites, even with a clean datacenter IP. But adding the right headers and cookies often gets you to near-perfect success rates. Here’s a real example scraping an Amazon product page with datacenter IPs:Node.js
- Open the target site in a browser routed through your datacenter proxy
- Open DevTools (F12) and go to the Network tab
- Reload the page and click the main document request
- Copy the Request Headers section
- Use those exact headers in your scraping code
Use a browser only when an HTTP client isn’t enough
There are two ways to fetch a web page through a proxy:- HTTP client (axios, Python
requests, cURL): sends a single request and downloads just the HTML. Fast, lightweight, and cheap. Everything in the previous section uses this approach. - Browser automation (Puppeteer, Playwright, Selenium): launches a full browser that renders JavaScript, loads images, stylesheets, and every other resource on the page. Much heavier on bandwidth.
Node.js (Puppeteer)
If you don’t want to manage browsers at all, Bright Data Browser API runs real GUI browsers in the cloud with all anti-blocking built in. You connect via Puppeteer or Playwright and we handle the infrastructure.
Reduce bandwidth when using browsers
A single Amazon product page loads over 20 MB of resources. If you only need text data like titles, prices, and descriptions, most of that bandwidth is wasted money.Block unnecessary resource types
Intercept network requests and abort anything you don’t need:Node.js (Puppeteer)
Block requests by domain
Many pages load third-party scripts (analytics, social widgets, ad networks) that contribute nothing to the data you need:Node.js (Puppeteer)
Stop loading once you have the data
You don’t need to wait for the full page to finish loading. Wait for the specific elements you need, then stop:Node.js (Puppeteer)
Navigate directly to the data
Every extra page navigation costs bandwidth and time. Build the shortest path to the data you need. Longer path (3 navigations):- Search for “Xbox” on Amazon
- Click on the first product result
- Click “See all reviews”
https://www.amazon.com/product-reviews/B0EXAMPLE by substituting the product ID into the reviews URL.
Most sites have predictable URL patterns for product pages, review pages, and search results. Reverse-engineer the URL structure and skip the clicks entirely.
Mix HTTP client and browser methods
Some sites require a browser session to generate authentication tokens, but the actual data pages work fine with an HTTP client. You can combine both approaches to get the best of each:- Load one page in a headless browser to collect session cookies and authentication tokens
- Extract the cookies from the browser session
- Use those cookies with your HTTP client for all subsequent pages
Node.js
Choose the right data collection approach
Before optimizing individual requests, consider whether you should be building scrapers at all.| Approach | What you manage | Best for |
|---|---|---|
| In-house | Everything: proxies, unlocking, parsing, storage, servers, engineering team | Companies where data collection IS the core business |
| Hybrid | Parsing and storage; a service handles unlocking and IP rotation | Teams that want control over data processing but not infrastructure |
| Data as a service | Analysis only; you buy structured data from a provider | Teams whose core business is analyzing data, not collecting it |
Optimize your service plan
Technical optimizations reduce per-request costs. Service plan optimization reduces the price you pay for each gigabyte or request. Choose the right pricing model. Some providers charge per gigabyte of bandwidth, others per request. A single browser-loaded product page can weigh 20+ MB of bandwidth but counts as one request. Compare both models against your actual usage pattern. Commit to a monthly plan. Pay-as-you-go rates are the highest tier. Even a small monthly commitment can cut per-unit costs by 50% or more. If a large commitment feels risky, start smaller. The savings still outweigh pay-as-you-go. Consolidate with one provider. Splitting volume across multiple providers means you get worse pricing from both. Bringing all your volume to one provider unlocks higher-tier discounts.Real-world example: consolidating providers
Real-world example: consolidating providers
A company split traffic 50/50 between two proxy providers, paying a combined 24,000 per month. That’s $84,000 in annual savings from consolidation alone, with no code changes.
Cost optimization checklist
Use this checklist to audit your current scraping setup:- Are you using datacenter IPs where possible, or defaulting to residential?
- Have you tested your target sites with datacenter IPs in a real browser?
- Are you sending proper headers and cookies with your HTTP client?
- If using a browser, are you blocking images and unnecessary resource types?
- Are you blocking third-party domains that don’t contribute to your data?
- Are you stopping page loads once target elements are present?
- Are you navigating directly to data URLs instead of clicking through multiple pages?
- Can you use a browser for authentication and an HTTP client for data pages?
- Is your pricing model (bandwidth vs. per-request) optimal for your usage pattern?
- Are you on a monthly plan instead of pay-as-you-go?
- Have you consolidated proxy volume with a single provider for better tier pricing?
Verify your optimizations
After applying these techniques, measure the impact:- Compare bandwidth per request. Log the response size before and after blocking resources. If you went from 20 MB to 5 MB per page, you’ve cut costs by 75%.
- Track success rates. Send 10-20 test requests with your optimized headers and cookies. If success rate is below 90%, you’re likely missing a required header or cookie.
- Monitor cost per record. Divide your monthly proxy bill by the number of successful records collected. This is the metric that matters. Repeat after each optimization to confirm savings.
Troubleshooting
Datacenter IPs return CAPTCHAs or 403 errors even with headers and cookies. The site likely uses Cloudflare, Akamai, or a similar system that blocks all datacenter IP ranges. Switch to ISP or residential IPs, or use Bright Data Web Unlocker. We handle detection automatically so you don’t have to debug it. Blocking resources breaks the page and target data is missing. You blocked a JavaScript file that the page needs to render the data. Roll back to the last working block list and re-add resources one type at a time. Always block images first (safest), then expand incrementally. Cookies expire and success rate drops after a few hours. Set up a cron job or scheduled task that loads the target domain in a headless browser, extracts fresh cookies, and stores them for your HTTP requests.FAQs
Which sites work with datacenter IPs?
Which sites work with datacenter IPs?
Many sites work with datacenter IPs when you send proper headers and cookies. To test, route your browser through a datacenter proxy and try loading the target page. If it loads, you can make it work in code. Sites with Cloudflare, Akamai, or PerimeterX set to block all datacenter IPs will require residential or ISP proxies.
How much bandwidth does blocking images save?
How much bandwidth does blocking images save?
Blocking images typically reduces page load bandwidth by 50% or more. A product page that transfers 20 MB with all resources may drop to 7-8 MB with images blocked. Adding stylesheet and font blocking reduces it further.
How often do I need to refresh headers and cookies?
How often do I need to refresh headers and cookies?
Should I use Puppeteer, Playwright, or Selenium?
Should I use Puppeteer, Playwright, or Selenium?
All three have equivalent data collection capabilities. Choose based on your team’s language preference: Puppeteer for Node.js, Playwright for Node.js/Python/Java/.NET, Selenium for Python/Java. The cost optimization techniques in this guide apply to all of them.
What if I don't want to manage any of this?
What if I don't want to manage any of this?
If maintaining scrapers isn’t your core business, consider Bright Data Web Scraper API. We maintain 650+ pre-built scrapers that return structured JSON. You get data without managing proxies, browsers, or anti-blocking. Pricing starts at $1 per 1,000 records.