Skip to main content

Overview

The Agent Traffic API lets you send web traffic logs to Scrunch from any platform or hosting environment — including CDNs and setups that don’t have a native Scrunch integration. Once data is flowing, Scrunch automatically classifies each request by bot type (retrieval, training, indexer) and agent source (GPTBot, ClaudeBot, and others). This guide covers:
  • Setting up a site with the API platform in the dashboard
  • Sending single events and batches
  • Backfilling historical log data
  • Managing multiple sites (for agencies and multi-brand setups)
  • Retry logic and error handling

Prerequisites

  • A Scrunch account with Agent Traffic access
  • Access to your web server or CDN access logs
  • Your site’s domain (e.g., example.com)

Step 1: Create a site with the API platform

  1. In the Scrunch dashboard, open the Agent Traffic page.
  2. Click Add website.
  3. Enter your domain and select API as the platform.
  4. Copy the Site ID (ULID format) and API key (JWT token) that appear after saving.
You will need both values for every request. Each site has its own Site ID and API key — if you are managing multiple domains, repeat this step for each one.
Your site will show a pending status until the first valid request is received. It transitions to active automatically within 5 minutes.
Screenshot 2026 04 27 At 1 25 17 PM

Step 2: Send your first event

Endpoint

POST https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic

Authentication

Include the API key in the X-Api-Key header:
X-Api-Key: <your-jwt-token>
Screenshot 2026 04 27 At 1 26 12 PM

Required fields

FieldTypeDescription
domainstringThe domain of the site (e.g. example.com)
user_agentstringThe full, original User-Agent string from the request
urlstringFull URL (e.g. https://example.com/blog/post)
pathstringURL path only (e.g. /blog/post)
methodstringHTTP method (e.g. GET)
status_codeintegerHTTP response status code (e.g. 200)
timestampintegerUnix epoch in seconds (e.g. 1700000000)

Optional fields

FieldTypeDescription
response_timeintegerResponse time in milliseconds
ipstringIP address of the requesting client
Always pass the original, unmodified user_agent string from the incoming request. Scrunch’s bot classification runs entirely off this field. Truncating or transforming it will result in incorrect or missing bot detection.

Single event (cURL)

curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -d '{
    "domain": "example.com",
    "user_agent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)",
    "url": "https://example.com/blog/post",
    "path": "/blog/post",
    "method": "GET",
    "status_code": 200,
    "timestamp": 1700000000,
    "response_time": 120,
    "ip": "203.0.113.1"
  }'
A successful response returns:
{ "status": "ok" }

Step 3: Send batches with NDJSON

For production use, send multiple events per request using newline-delimited JSON (NDJSON). Each line is a complete JSON object. This reduces request overhead and is the recommended approach for any significant traffic volume. Set Content-Type: application/x-ndjson:
curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
  -H "Content-Type: application/x-ndjson" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -d '{"domain":"example.com","user_agent":"Mozilla/5.0 (compatible; GPTBot/1.0)","url":"https://example.com/page-1","path":"/page-1","method":"GET","status_code":200,"timestamp":1700000000}
{"domain":"example.com","user_agent":"Mozilla/5.0 (compatible; ClaudeBot/1.0)","url":"https://example.com/page-2","path":"/page-2","method":"GET","status_code":200,"timestamp":1700000060,"response_time":95}'
Keep each batch under 1 MB uncompressed. Split larger payloads into multiple requests.

Step 4: Backfill historical data with Python

If you have existing access logs, use this script to send them in batches. It reads a CSV of log entries, maps fields to the API schema, and sends NDJSON batches with retry handling for rate limits.

Expected CSV format

Your CSV should have columns matching the required and optional fields. At minimum:
timestamp,domain,user_agent,url,path,method,status_code,response_time_ms,ip_address
1700000000,example.com,"Mozilla/5.0 (compatible; GPTBot/1.0)",https://example.com/page,/page,GET,200,120,203.0.113.1

Backfill script

import csv
import json
import time
import requests

API_KEY = "your-jwt-token"
SITE_ID = "your-site-id"
ENDPOINT = f"https://webhooks.scrunchai.com/v1/sites/{SITE_ID}/platforms/custom/web-traffic"
BATCH_SIZE_BYTES = 1_000_000  # 1 MB per batch


def load_payloads(csv_path: str) -> list[dict]:
    """Read a CSV of access log rows and map to API payload format."""
    payloads = []
    with open(csv_path, encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            domain = row.get("domain", "")
            path = row.get("path", "/") or "/"
            payload = {
                "domain": domain,
                "user_agent": row.get("user_agent", ""),
                "url": row.get("url", "") or f"https://{domain}{path}",
                "path": path,
                "method": row.get("method", "GET") or "GET",
                "status_code": int(row.get("status_code", "200") or "200"),
                "timestamp": int(row.get("timestamp", "0") or "0"),
                "response_time": int(row.get("response_time_ms", "0") or "0"),
                "ip": row.get("ip_address") or None,
            }
            payloads.append(payload)
    return payloads


def build_batches(payloads: list[dict], max_bytes: int = BATCH_SIZE_BYTES) -> list[list[dict]]:
    """Split payloads into batches that fit within max_bytes uncompressed."""
    batches, current, current_size = [], [], 0
    for p in payloads:
        size = len(json.dumps(p).encode()) + 1  # +1 for newline
        if current and current_size + size > max_bytes:
            batches.append(current)
            current, current_size = [], 0
        current.append(p)
        current_size += size
    if current:
        batches.append(current)
    return batches


def send_batch(batch: list[dict], retries: int = 3) -> None:
    """Send a single NDJSON batch with retry logic for rate limits."""
    ndjson = "\n".join(json.dumps(p) for p in batch) + "\n"
    for attempt in range(retries):
        response = requests.post(
            ENDPOINT,
            content=ndjson.encode("utf-8"),
            headers={
                "Content-Type": "application/x-ndjson",
                "X-Api-Key": API_KEY,
            },
            timeout=60,
        )
        if response.status_code == 200:
            return
        if response.status_code == 429:
            wait = int(response.headers.get("Retry-After", 5))
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
        else:
            response.raise_for_status()
    raise RuntimeError(f"Failed to send batch after {retries} attempts")


def main(csv_path: str) -> None:
    payloads = load_payloads(csv_path)
    batches = build_batches(payloads)
    print(f"Loaded {len(payloads)} events across {len(batches)} batch(es)")

    for i, batch in enumerate(batches, 1):
        print(f"Sending batch {i}/{len(batches)} ({len(batch)} events)...")
        send_batch(batch)
        print(f"  Batch {i} sent successfully")

    print("Done.")


if __name__ == "__main__":
    import sys
    main(sys.argv[1])
Run it:
python backfill.py your_logs.csv

Managing multiple sites

If you are an agency or managing multiple brands, each domain requires its own site entry in the dashboard with its own Site ID and API key. The sending logic is identical across all sites — only the site_id in the URL and the X-Api-Key header change per site. A common pattern for multi-site setups:
SITES = [
    {"site_id": "01ABC...", "api_key": "token-for-site-a", "domain": "brand-a.com"},
    {"site_id": "01DEF...", "api_key": "token-for-site-b", "domain": "brand-b.com"},
]

for site in SITES:
    # filter payloads for this domain, then send
    site_payloads = [p for p in all_payloads if p["domain"] == site["domain"]]
    # ... send using site["site_id"] and site["api_key"]
This approach scales well when onboarding many brands: provision each site in the dashboard, collect credentials, and run the same pipeline with different configuration per site.

Error handling

StatusMeaningAction
200Accepted and queuedNo action needed
401Invalid or missing API keyVerify the X-Api-Key value and header name
422Validation errorCheck all required fields are present and correctly typed
429Rate limitedWait and retry; respect the Retry-After response header
500Server errorRetry with exponential backoff; contact support if persistent

Troubleshooting

Site is stuck in pending status The site activates within 5 minutes of the first valid request. If it remains pending, confirm a request was actually sent (not a dry run), check that the Site ID in the URL matches the one in the dashboard, and verify the API key is correct. Bot traffic is not being classified Bot classification is derived entirely from the user_agent field. Confirm you are passing the raw, original user-agent string from the incoming request without modification. Check your log format — some CDNs normalize or truncate user-agent strings before writing them to logs. If so, use a logging integration that captures the original header. Getting 422 errors The most common cause is a missing required field or an incorrect type. Check that timestamp is a Unix epoch integer (not ISO 8601), status_code is an integer (not a string), and path starts with a /. NDJSON batches are being rejected Each line must be a complete, valid JSON object with no embedded newlines. The Content-Type header must be exactly application/x-ndjson. Keep batch size under 1 MB uncompressed.