Overview
The Agent Traffic API lets you send web traffic logs to Scrunch from any platform or hosting environment — including CDNs and setups that don’t have a native Scrunch integration. Once data is flowing, Scrunch automatically classifies each request by bot type (retrieval, training, indexer) and agent source (GPTBot, ClaudeBot, and others).
Once connected, the Agent Traffic dashboard will show:
- Total bot traffic for the selected period and a comparison to the prior period
- Bot traffic over time and distribution across Retrieval, Indexer, and Training types
- Top bot agents and when they were last seen
- Top content pages accessed by LLM bots
- Recent bot requests
- A date filter for the last 24 hours, 7 days, or 30 days
This guide covers:
- Setting up a site with the API platform in the dashboard
- Sending single events and batches
- Backfilling historical log data
- Managing multiple sites (for agencies and multi-brand setups)
- Retry logic and error handling
Prerequisites
- A Scrunch account with Agent Traffic access
- Access to your web server or CDN access logs
- Your site’s domain (e.g.,
example.com)
- In the Scrunch dashboard, open the Agent Traffic page.
- Click + Connect Site.
- Enter your domain and select API as the platform.
- A dedicated instructions page will appear showing your Site ID, Webhook URL, and API Key. Copy all three — you will need them for every request.
Each site has its own endpoint and key. Don’t reuse them across different sites or integrations.
Your site will show a pending status until the first valid request is received. It transitions to active automatically within 5–10 minutes.
Step 2: Send your first event
Endpoint
POST https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic
Authentication
Include the API key in the X-Api-Key header:
X-Api-Key: <your-jwt-token>
Required fields
| Field | Type | Description |
|---|
domain | string | The domain of the site (e.g. example.com) |
user_agent | string | The full, original User-Agent string from the request |
url | string | Full URL (e.g. https://example.com/blog/post) |
path | string | URL path only (e.g. /blog/post) |
method | string | HTTP method (e.g. GET) |
status_code | integer | HTTP response status code (e.g. 200) |
timestamp | integer | float | Unix epoch in seconds (e.g. 1700000000) |
Optional fields
| Field | Type | Description |
|---|
response_time | integer | Response time in milliseconds |
ip | string | IP address of the requesting client |
Always pass the original, unmodified user_agent string from the incoming request. Scrunch’s bot classification runs entirely off this field. Truncating or transforming it will result in incorrect or missing bot detection.
Single event (cURL)
Use Content-Type: application/json and send one JSON object per request:
curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
-H "Content-Type: application/json" \
-H "X-Api-Key: YOUR_API_KEY" \
-d '{
"domain": "example.com",
"user_agent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)",
"url": "https://example.com/blog/post",
"path": "/blog/post",
"method": "GET",
"status_code": 200,
"timestamp": 1700000000,
"response_time": 120,
"ip": "203.0.113.1"
}'
A successful response returns:
Step 3: Send batches with NDJSON
For production use, send multiple events per request using newline-delimited JSON (NDJSON). Each line is a complete JSON object. This reduces request overhead and is the recommended approach for any significant traffic volume.
Use Content-Type: application/x-ndjson:
curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
-H "Content-Type: application/x-ndjson" \
-H "X-Api-Key: YOUR_API_KEY" \
-d '{"domain":"example.com","user_agent":"Mozilla/5.0 (compatible; GPTBot/1.0)","url":"https://example.com/page-1","path":"/page-1","method":"GET","status_code":200,"timestamp":1700000000}
{"domain":"example.com","user_agent":"Mozilla/5.0 (compatible; ClaudeBot/1.0)","url":"https://example.com/page-2","path":"/page-2","method":"GET","status_code":200,"timestamp":1700000060,"response_time":95}'
Keep each batch under 1 MB uncompressed. Split larger payloads into multiple requests.
Step 4: Verify your integration
After sending your first request, wait up to 5–10 minutes for your site to show as Active in Scrunch. If you don’t see traffic appearing, send a test event using a known bot User-Agent to confirm your credentials and pipeline are working:
curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
-H "Content-Type: application/json" \
-H "X-Api-Key: YOUR_API_KEY" \
-d '{
"domain": "yourdomain.com",
"user_agent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)",
"url": "https://yourdomain.com/test-page",
"path": "/test-page",
"method": "GET",
"status_code": 200,
"timestamp": 1700000000
}'
If this returns { "status": "ok" } but traffic still doesn’t appear after 10 minutes, check the troubleshooting section below.
Step 5: Backfill historical data with Python
If you have existing access logs, use this script to send them in batches. It reads a CSV of log entries, maps fields to the API schema, and sends NDJSON batches with retry handling for rate limits.
Your CSV should have columns matching the required and optional fields. At minimum:
timestamp,domain,user_agent,url,path,method,status_code,response_time_ms,ip_address
1700000000,example.com,"Mozilla/5.0 (compatible; GPTBot/1.0)",https://example.com/page,/page,GET,200,120,203.0.113.1
Backfill script
import csv
import json
import time
import requests
API_KEY = "your-jwt-token"
SITE_ID = "your-site-id"
ENDPOINT = f"https://webhooks.scrunchai.com/v1/sites/{SITE_ID}/platforms/custom/web-traffic"
BATCH_SIZE_BYTES = 1_000_000 # 1 MB per batch
def load_payloads(csv_path: str) -> list[dict]:
"""Read a CSV of access log rows and map to API payload format."""
payloads = []
with open(csv_path, encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
domain = row.get("domain", "")
path = row.get("path", "/") or "/"
payload = {
"domain": domain,
"user_agent": row.get("user_agent", ""),
"url": row.get("url", "") or f"https://{domain}{path}",
"path": path,
"method": row.get("method", "GET") or "GET",
"status_code": int(row.get("status_code", "200") or "200"),
"timestamp": int(row.get("timestamp", "0") or "0"),
"response_time": int(row.get("response_time_ms", "0") or "0"),
"ip": row.get("ip_address") or None,
}
payloads.append(payload)
return payloads
def build_batches(payloads: list[dict], max_bytes: int = BATCH_SIZE_BYTES) -> list[list[dict]]:
"""Split payloads into batches that fit within max_bytes uncompressed."""
batches, current, current_size = [], [], 0
for p in payloads:
size = len(json.dumps(p).encode()) + 1 # +1 for newline
if current and current_size + size > max_bytes:
batches.append(current)
current, current_size = [], 0
current.append(p)
current_size += size
if current:
batches.append(current)
return batches
def send_batch(batch: list[dict], retries: int = 3) -> None:
"""Send a single NDJSON batch with retry logic for rate limits."""
ndjson = "\n".join(json.dumps(p) for p in batch) + "\n"
for attempt in range(retries):
response = requests.post(
ENDPOINT,
content=ndjson.encode("utf-8"),
headers={
"Content-Type": "application/x-ndjson",
"X-Api-Key": API_KEY,
},
timeout=60,
)
if response.status_code == 200:
return
if response.status_code == 429:
wait = int(response.headers.get("Retry-After", 5))
print(f"Rate limited. Retrying in {wait}s...")
time.sleep(wait)
else:
response.raise_for_status()
raise RuntimeError(f"Failed to send batch after {retries} attempts")
def main(csv_path: str) -> None:
payloads = load_payloads(csv_path)
batches = build_batches(payloads)
print(f"Loaded {len(payloads)} events across {len(batches)} batch(es)")
for i, batch in enumerate(batches, 1):
print(f"Sending batch {i}/{len(batches)} ({len(batch)} events)...")
send_batch(batch)
print(f" Batch {i} sent successfully")
print("Done.")
if __name__ == "__main__":
import sys
main(sys.argv[1])
Run it:
python backfill.py your_logs.csv
Managing multiple sites
If you are an agency or managing multiple brands, each domain requires its own site entry in the dashboard with its own Site ID and API key. Never reuse credentials across sites — each site’s key is scoped to that domain only.
The sending logic is identical across all sites — only the site_id in the URL and the X-Api-Key header change per site.
A common pattern for multi-site setups:
SITES = [
{"site_id": "01ABC...", "api_key": "token-for-site-a", "domain": "brand-a.com"},
{"site_id": "01DEF...", "api_key": "token-for-site-b", "domain": "brand-b.com"},
]
for site in SITES:
# filter payloads for this domain, then send
site_payloads = [p for p in all_payloads if p["domain"] == site["domain"]]
# ... send using site["site_id"] and site["api_key"]
This approach scales well when onboarding many brands: provision each site in the dashboard, collect credentials, and run the same pipeline with different configuration per site.
Tips for better results
- Use NDJSON batching to reduce request overhead for high-traffic sites.
- Keep batch sizes under 1 MB uncompressed for optimal performance.
- Always pass the original, unmodified User-Agent string — Scrunch uses it to classify the bot. Never transform or truncate it.
- Exclude static asset paths (CSS, JS, images) if you want cleaner data focused on content pages.
- Include paths that serve PDFs — AI bots frequently request them.
- Never reuse credentials across sites — provision a separate Site ID and API key for each domain.
Error handling
| Status | Meaning | Action |
|---|
200 | Accepted and queued | No action needed |
401 | Invalid or missing API key | Verify the X-Api-Key value and header name |
422 | Validation error | Check all required fields are present and correctly typed |
429 | Rate limited | Wait and retry; respect the Retry-After response header |
500 | Server error | Retry with exponential backoff; contact support if persistent |
Troubleshooting
Site is stuck in pending status
The site activates within 5–10 minutes of the first valid request. If it remains pending, confirm a request was actually sent (not a dry run), check that the Site ID in the URL matches the one in the dashboard, and verify the API key is correct. Use the verification cURL in Step 4 to test with a known bot user-agent.
Bot traffic is not being classified
Bot classification is derived entirely from the user_agent field. Confirm you are passing the raw, original user-agent string from the incoming request without modification. Check your log format — some CDNs normalize or truncate user-agent strings before writing them to logs. If so, use a logging integration that captures the original header.
Getting 422 errors
The most common cause is a missing required field or an incorrect type. Check that timestamp is a Unix epoch number (not ISO 8601), status_code is an integer (not a string), and path starts with a /.
NDJSON batches are being rejected
Each line must be a complete, valid JSON object with no embedded newlines. The Content-Type header must be exactly application/x-ndjson. Keep batch size under 1 MB uncompressed.
Don’t see traffic after 10 minutes
Confirm your Webhook URL and API Key match exactly what’s shown in your Scrunch app. Check that your Content-Type header matches the body format (application/json for single events, application/x-ndjson for batches). Confirm your timestamp is a Unix epoch in seconds, not milliseconds.