Pica is a platform designed to enhance AI agent workflows by providing reliable, anonymous, and scalable web access for real-world data automation. Integrating Pica with Bright Data allows AI agents to leverage Bright Data’s advanced web scraping and proxy network capabilities, enabling them to collect and process data from the web efficiently and effectively.

Available Bright Data Tools

Bright Data offers the following tools for integration with Pica:

  • Web Scraper API: Automate web data extraction with Bright Data’s powerful Web Scraper API.
  • Web Unlocker: Access and retrieve data from websites that employ advanced anti-bot measures.

How to Integrate Bright Data With Pica

1

Obtain Your Bright Data API Key

2

Install the Bright Data Integration

Install the Bright Data integration package for Pica by running the following command:

pip install pica-langchain

Other available integrations include:

3

Select your preferred Bright Data tool

Pica connectors for Bright Data include two main tools for integration:

import os
from langchain_openai import ChatOpenAI
from langchain.agents import AgentType
from pica_langchain import PicaClient, create_pica_agent
from pica_langchain.models import PicaClientOptions

def main():
  try:
      pica_client = PicaClient(
          secret=os.environ["PICA_SECRET"],
          options=PicaClientOptions(
              connectors=["test::bright-data::default::fd583f2344fa414293bdda4f240258c1"] # Initialize all available connections or pass specific connector keys
          )
      )

      pica_client.initialize()
      
      llm = ChatOpenAI(
          temperature=0,
          model="gpt-4o",
      )

      # Create an agent with Pica tools
      agent = create_pica_agent(
          client=pica_client,
          llm=llm,
          agent_type=AgentType.OPENAI_FUNCTIONS,
      )

      # Execute a multi-step workflow using the GitHub Connector
      result = agent.invoke({
          "input": (
              "Trigger Synchronous Web Scraping and Retrieve Results, use this dataset ID : gd_l7q7dkf244hwjntr0 and search for this URL : https://www.amazon.com/dp/B0D2Q9397Y?th=1&psc=1"
          )
      })
      
      print(f"\nWorkflow Result:\n {result}")
  
  except Exception as e:
      print(f"ERROR: An unexpected error occurred: {e}")


if __name__ == "__main__":
  main()