You track prices and stock on a list of Amazon SKUs and you need fresh numbers every morning before your team starts work. You don’t want to run a server just for this. You don’t want to babysit a cron job on a laptop. In this tutorial we’ll build exactly that pipeline. You’ll commit a SKU list to a GitHub repo, write a small Python script that triggers the Bright Data Amazon Scraper API against it, wrap the script in a GitHub Actions workflow that runs on a daily cron, and configure Bright Data to deliver the results directly to an S3 bucket. Each morning a fresh JSON file lands in S3, keyed by snapshot ID, ready for your BI pipeline to pick up. No servers, no webhook handlers, no glue code. Just a workflow file, a script and a delivery config.Documentation Index
Fetch the complete documentation index at: https://docs.brightdata.com/llms.txt
Use this file to discover all available pages before exploring further.
What you’ll build
A GitHub repository containing:- A
skus.jsonfile listing the Amazon product URLs to monitor - A Python script that POSTs the SKU list to the Bright Data Amazon Scraper API
- A GitHub Actions workflow that runs the script on a daily schedule
- Bright Data configured to deliver each snapshot to your S3 bucket
Prerequisites
- A Bright Data account with an API key (get your key)
- An S3 bucket with Bright Data delivery already configured. Follow Amazon to S3 delivery once, then come back. This tutorial assumes the delivery destination is already saved in your Amazon scraper’s settings.
- A GitHub account and a new (empty) repository
- Python 3.9+ installed locally
- Git installed locally
Part 1: Create the SKU list
Clone your empty GitHub repo locally and create askus.json file at the repo root:
skus.json
Part 2: Write the trigger script
Createtrigger_scrape.py at the repo root:
trigger_scrape.py
requirements.txt:
- The script does not wait for results. It fires the trigger and exits. Bright Data runs the scrape asynchronously and delivers the results directly to S3 via the delivery config you saved in your scraper settings. That’s the whole point: the script is a cheap, stateless trigger.
- The API key comes from an environment variable. Never commit keys to a repo. We’ll wire this to GitHub Actions Secrets in Part 4.
Part 3: Run it locally
Install the dependency and run the script with your key:The price field is
final_price, and it can be null for products that are out of stock or currency-ambiguous listings. Your BI pipeline should handle that case explicitly rather than crashing on a missing key.snapshot_id, not by date. That’s deliberate: each snapshot is immutable, and you can walk the bucket chronologically by listing creation timestamps or by enabling versioning. We’ll discuss naming conventions in “Next steps.”
Part 4: Schedule the workflow on GitHub Actions
Now let’s move the trigger off your laptop and onto a daily schedule. Create.github/workflows/daily-scrape.yml:
.github/workflows/daily-scrape.yml
schedule: cron: "0 6 * * *"runs the job at 06:00 UTC every day. Adjust the cron expression for your timezone. GitHub’s scheduled workflows have no guaranteed precision, but daily runs typically fire within a few minutes of the scheduled time.workflow_dispatchadds a Run workflow button in the Actions tab so you can kick off the job on demand without waiting for the schedule.
- In your GitHub repo, go to Settings > Secrets and variables > Actions
- Click New repository secret
- Name it
BRIGHT_DATA_API_KEYand paste your key - Click Add secret
Part 5: Push and verify
Commit everything and push:Congratulations
You’ve built a fully automated daily price monitor:- A SKU list version-controlled in GitHub
- A Python trigger script that fires the Bright Data scrape and exits
- A GitHub Actions workflow that runs on a daily cron and authenticates via a repo secret
- Bright Data S3 delivery that drops each snapshot into your bucket asynchronously
Next steps
Stream large snapshots
Use
stream_max_lines to start receiving batches as soon as the first records are ready.Amazon async reference
Full parameter list for the async trigger endpoint, including
include_errors and limit_per_input.Monitor delivery status
Programmatically check snapshot status and delivery result from inside the workflow.
All delivery options
Swap S3 for GCS, Azure, Snowflake or SFTP with the same trigger call.