Browser worker and code worker are two technical approaches for scraping, and you should choose between them based on your needs and budget, and based on technical challenges you’re facing with the website you’re scraping.

Browser worker vs Code worker

Browser Workers:

  • Can simulate a user’s interaction with the website via a headless browser
  • Browser worker is more expensive to use, in terms of CPM (Cost Per Thousand page loads)
  • Handles complex scraping tasks like filling forms, and dynamic content loading.

Code workers:

  • Roughly equivalent to doing a curl or python requests.get(url)
  • Work by sending HTTP requests to the target website
  • Much cheaper
  • Can only work in situations that don’t require interacting with the website UI

Choose the optimal worker type

You should choose the right worker type based on the technology used by the website you want to scrape, and the navigation needed for scraping the data you need.

It’s good to start with the cheaper code workers and only change if you find that you cant’ get the data you want.

For example:

  • If you need to click on element to load some more data
  • If you need to use scroll for load more elements
  • If you need to use tag_script, tag_response (capture network traffic from inside the browser)
  • if you need to type some text to get data on the website to do a search

Align your code with worker type

Some functions in our library are only available when using browsers and will throw an error if you try to use them from code workers.

Below is a list of function that you can only use from browser workers:

  • wait_* (any wait function)
  • scroll_* (any scroll function)
  • tag_* (any tag function)
  • type
  • browser_size
  • emulate_device
  • freeze_page
  • click
  • hover
  • right_click
  • mouse_to
  • press_key
  • solve_captcha
  • capture_graphql
  • close_popup