Skip to main content
This guide shows you how to scrape Reddit data at scale using the asynchronous /trigger endpoint. Use this when you have more than 20 URLs, need discovery by keyword or subreddit, or want delivery to a webhook or S3.
Not sure whether to use sync or async? Read Understanding sync vs. async requests.

Prerequisites

Step 1: Trigger the collection

Send a POST request to the /trigger endpoint with your input URLs. This example collects five Reddit posts in a single batch:
curl -X POST \
  "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_lvz8ah06191smkebj4&format=json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[
    {"url": "https://www.reddit.com/r/learnpython/comments/1asdf12/"},
    {"url": "https://www.reddit.com/r/python/comments/1bsdf34/"},
    {"url": "https://www.reddit.com/r/programming/comments/1csdf56/"},
    {"url": "https://www.reddit.com/r/datascience/comments/1dsdf78/"},
    {"url": "https://www.reddit.com/r/machinelearning/comments/1esdf90/"}
  ]'
You should see a 200 response with a snapshot_id:
{
  "snapshot_id": "s_m1a2b3c4d5e6f7g8h"
}
Save this ID. You need it to check progress and download results.

Discovery with async

The async endpoint is the best fit for discovery jobs, because Reddit discovery can return many results. Trigger a subreddit or keyword discovery by adding the relevant query parameters: Discover by subreddit URL:
curl -X POST \
  "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_lvz8ah06191smkebj4&format=json&type=discover_new&discover_by=subreddit_url" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[{"url": "https://www.reddit.com/r/learnpython/", "sort_by": "hot"}]'
Discover by keyword:
curl -X POST \
  "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_lvz8ah06191smkebj4&format=json&type=discover_new&discover_by=keyword" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[{"keyword": "machine learning", "date": "Past week", "num_of_posts": 100}]'

Step 2: Monitor progress

Poll the snapshot status until it shows ready. This takes 30 seconds to several minutes depending on the number of URLs and whether discovery is involved.
curl "https://api.brightdata.com/datasets/v3/progress/s_m1a2b3c4d5e6f7g8h" \
  -H "Authorization: Bearer YOUR_API_TOKEN"
Status values:
StatusMeaning
collectingScraping is in progress
digestingData is being processed
readyResults are available for download
failedThe collection encountered an error

Step 3: Download results

Once the status is ready, download the scraped data:
curl "https://api.brightdata.com/datasets/v3/snapshot/s_m1a2b3c4d5e6f7g8h?format=json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -o results.json
You’ve successfully triggered, monitored and downloaded a batch Reddit scraping job.

Skip polling with webhooks

If you don’t want to poll for status, add a webhook parameter to receive results automatically:
curl -X POST \
  "https://api.brightdata.com/datasets/v3/trigger?dataset_id=gd_lvz8ah06191smkebj4&format=json&webhook=https://your-server.com/webhook" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '[{"url": "https://www.reddit.com/r/learnpython/comments/1asdf12/"}]'
See Webhook delivery options for the full setup.

Limits and constraints

ConstraintValue
Max input file size1 GB
Max concurrent batch requests100
Max concurrent single-input requests1,500
Webhook delivery sizeUp to 1 GB
API download sizeUp to 5 GB

Troubleshooting

You’ve exceeded the concurrent request limit. Reduce the number of parallel requests or combine inputs into fewer, larger batches. Each batch can include up to 1 GB of input data.
Check that all input URLs are valid, publicly accessible Reddit URLs. Review the error details in the snapshot response or in the Logs tab of your Bright Data dashboard.
Some URLs may fail individually while the overall job succeeds. Private subreddits, deleted posts and removed comments cannot be scraped. Check the snapshot response for any errors field and retry failed URLs in a separate request.
For keyword discovery, make sure the date value matches one of Reddit’s accepted ranges (e.g. Past hour, Past day, Past week, Past month, Past year, All time). For subreddit discovery, confirm the subreddit URL is valid and not private.

Next steps

Delivery options

Webhooks, S3, Snowflake, Azure and GCS delivery.

API reference

Full endpoint specs, parameters and response schemas.