Skip to main content

Overview

The Agent Traffic API lets you send web traffic logs to Scrunch from any platform or hosting environment — including CDNs and setups that don’t have a native Scrunch integration. Once data is flowing, Scrunch automatically classifies each request by bot type (retrieval, training, indexer) and agent source (GPTBot, ClaudeBot, and others). Once connected, the Agent Traffic dashboard will show:
  • Total bot traffic for the selected period and a comparison to the prior period
  • Bot traffic over time and distribution across Retrieval, Indexer, and Training types
  • Top bot agents and when they were last seen
  • Top content pages accessed by LLM bots
  • Recent bot requests
  • A date filter for the last 24 hours, 7 days, or 30 days
This guide covers:
  • Setting up a site with the API platform in the dashboard
  • Sending single events and batches
  • Backfilling historical log data
  • Managing multiple sites (for agencies and multi-brand setups)
  • Retry logic and error handling

Prerequisites

  • A Scrunch account with Agent Traffic access
  • Access to your web server or CDN access logs
  • Your site’s domain (e.g., example.com)

Step 1: Create a site with the API platform

  1. In the Scrunch dashboard, open the Agent Traffic page.
  2. Click + Connect Site.
  3. Enter your domain and select API as the platform.
  4. A dedicated instructions page will appear showing your Site ID, Webhook URL, and API Key. Copy all three — you will need them for every request.
Each site has its own endpoint and key. Don’t reuse them across different sites or integrations.
Your site will show a pending status until the first valid request is received. It transitions to active automatically within 5–10 minutes.
Screenshot 2026 04 27 At 1 25 17 PM

Step 2: Send your first event

Endpoint

POST https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic

Authentication

Include the API key in the X-Api-Key header:
X-Api-Key: <your-jwt-token>
Screenshot 2026 04 27 At 1 26 12 PM

Required fields

FieldTypeDescription
domainstringThe domain of the site (e.g. example.com)
user_agentstringThe full, original User-Agent string from the request
urlstringFull URL (e.g. https://example.com/blog/post)
pathstringURL path only (e.g. /blog/post)
methodstringHTTP method (e.g. GET)
status_codeintegerHTTP response status code (e.g. 200)
timestampinteger | floatUnix epoch in seconds (e.g. 1700000000)

Optional fields

FieldTypeDescription
response_timeintegerResponse time in milliseconds
ipstringIP address of the requesting client
Always pass the original, unmodified user_agent string from the incoming request. Scrunch’s bot classification runs entirely off this field. Truncating or transforming it will result in incorrect or missing bot detection.

Single event (cURL)

Use Content-Type: application/json and send one JSON object per request:
curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -d '{
    "domain": "example.com",
    "user_agent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)",
    "url": "https://example.com/blog/post",
    "path": "/blog/post",
    "method": "GET",
    "status_code": 200,
    "timestamp": 1700000000,
    "response_time": 120,
    "ip": "203.0.113.1"
  }'
A successful response returns:
{ "status": "ok" }

Step 3: Send batches with NDJSON

For production use, send multiple events per request using newline-delimited JSON (NDJSON). Each line is a complete JSON object. This reduces request overhead and is the recommended approach for any significant traffic volume. Use Content-Type: application/x-ndjson:
curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
  -H "Content-Type: application/x-ndjson" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -d '{"domain":"example.com","user_agent":"Mozilla/5.0 (compatible; GPTBot/1.0)","url":"https://example.com/page-1","path":"/page-1","method":"GET","status_code":200,"timestamp":1700000000}
{"domain":"example.com","user_agent":"Mozilla/5.0 (compatible; ClaudeBot/1.0)","url":"https://example.com/page-2","path":"/page-2","method":"GET","status_code":200,"timestamp":1700000060,"response_time":95}'
Keep each batch under 1 MB uncompressed. Split larger payloads into multiple requests.

Step 4: Verify your integration

After sending your first request, wait up to 5–10 minutes for your site to show as Active in Scrunch. If you don’t see traffic appearing, send a test event using a known bot User-Agent to confirm your credentials and pipeline are working:
curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -d '{
    "domain": "yourdomain.com",
    "user_agent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)",
    "url": "https://yourdomain.com/test-page",
    "path": "/test-page",
    "method": "GET",
    "status_code": 200,
    "timestamp": 1700000000
  }'
If this returns { "status": "ok" } but traffic still doesn’t appear after 10 minutes, check the troubleshooting section below.

Step 5: Backfill historical data with Python

If you have existing access logs, use this script to send them in batches. It reads a CSV of log entries, maps fields to the API schema, and sends NDJSON batches with retry handling for rate limits.

Expected CSV format

Your CSV should have columns matching the required and optional fields. At minimum:
timestamp,domain,user_agent,url,path,method,status_code,response_time_ms,ip_address
1700000000,example.com,"Mozilla/5.0 (compatible; GPTBot/1.0)",https://example.com/page,/page,GET,200,120,203.0.113.1

Backfill script

import csv
import json
import time
import requests

API_KEY = "your-jwt-token"
SITE_ID = "your-site-id"
ENDPOINT = f"https://webhooks.scrunchai.com/v1/sites/{SITE_ID}/platforms/custom/web-traffic"
BATCH_SIZE_BYTES = 1_000_000  # 1 MB per batch


def load_payloads(csv_path: str) -> list[dict]:
    """Read a CSV of access log rows and map to API payload format."""
    payloads = []
    with open(csv_path, encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            domain = row.get("domain", "")
            path = row.get("path", "/") or "/"
            payload = {
                "domain": domain,
                "user_agent": row.get("user_agent", ""),
                "url": row.get("url", "") or f"https://{domain}{path}",
                "path": path,
                "method": row.get("method", "GET") or "GET",
                "status_code": int(row.get("status_code", "200") or "200"),
                "timestamp": int(row.get("timestamp", "0") or "0"),
                "response_time": int(row.get("response_time_ms", "0") or "0"),
                "ip": row.get("ip_address") or None,
            }
            payloads.append(payload)
    return payloads


def build_batches(payloads: list[dict], max_bytes: int = BATCH_SIZE_BYTES) -> list[list[dict]]:
    """Split payloads into batches that fit within max_bytes uncompressed."""
    batches, current, current_size = [], [], 0
    for p in payloads:
        size = len(json.dumps(p).encode()) + 1  # +1 for newline
        if current and current_size + size > max_bytes:
            batches.append(current)
            current, current_size = [], 0
        current.append(p)
        current_size += size
    if current:
        batches.append(current)
    return batches


def send_batch(batch: list[dict], retries: int = 3) -> None:
    """Send a single NDJSON batch with retry logic for rate limits."""
    ndjson = "\n".join(json.dumps(p) for p in batch) + "\n"
    for attempt in range(retries):
        response = requests.post(
            ENDPOINT,
            content=ndjson.encode("utf-8"),
            headers={
                "Content-Type": "application/x-ndjson",
                "X-Api-Key": API_KEY,
            },
            timeout=60,
        )
        if response.status_code == 200:
            return
        if response.status_code == 429:
            wait = int(response.headers.get("Retry-After", 5))
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
        else:
            response.raise_for_status()
    raise RuntimeError(f"Failed to send batch after {retries} attempts")


def main(csv_path: str) -> None:
    payloads = load_payloads(csv_path)
    batches = build_batches(payloads)
    print(f"Loaded {len(payloads)} events across {len(batches)} batch(es)")
    for i, batch in enumerate(batches, 1):
        print(f"Sending batch {i}/{len(batches)} ({len(batch)} events)...")
        send_batch(batch)
        print(f"  Batch {i} sent successfully")
    print("Done.")


if __name__ == "__main__":
    import sys
    main(sys.argv[1])
Run it:
python backfill.py your_logs.csv

Managing multiple sites

If you are an agency or managing multiple brands, each domain requires its own site entry in the dashboard with its own Site ID and API key. Never reuse credentials across sites — each site’s key is scoped to that domain only. The sending logic is identical across all sites — only the site_id in the URL and the X-Api-Key header change per site. A common pattern for multi-site setups:
SITES = [
    {"site_id": "01ABC...", "api_key": "token-for-site-a", "domain": "brand-a.com"},
    {"site_id": "01DEF...", "api_key": "token-for-site-b", "domain": "brand-b.com"},
]

for site in SITES:
    # filter payloads for this domain, then send
    site_payloads = [p for p in all_payloads if p["domain"] == site["domain"]]
    # ... send using site["site_id"] and site["api_key"]
This approach scales well when onboarding many brands: provision each site in the dashboard, collect credentials, and run the same pipeline with different configuration per site.

Tips for better results

  • Use NDJSON batching to reduce request overhead for high-traffic sites.
  • Keep batch sizes under 1 MB uncompressed for optimal performance.
  • Always pass the original, unmodified User-Agent string — Scrunch uses it to classify the bot. Never transform or truncate it.
  • Exclude static asset paths (CSS, JS, images) if you want cleaner data focused on content pages.
  • Include paths that serve PDFs — AI bots frequently request them.
  • Never reuse credentials across sites — provision a separate Site ID and API key for each domain.

Error handling

StatusMeaningAction
200Accepted and queuedNo action needed
401Invalid or missing API keyVerify the X-Api-Key value and header name
422Validation errorCheck all required fields are present and correctly typed
429Rate limitedWait and retry; respect the Retry-After response header
500Server errorRetry with exponential backoff; contact support if persistent

Troubleshooting

Site is stuck in pending status The site activates within 5–10 minutes of the first valid request. If it remains pending, confirm a request was actually sent (not a dry run), check that the Site ID in the URL matches the one in the dashboard, and verify the API key is correct. Use the verification cURL in Step 4 to test with a known bot user-agent. Bot traffic is not being classified Bot classification is derived entirely from the user_agent field. Confirm you are passing the raw, original user-agent string from the incoming request without modification. Check your log format — some CDNs normalize or truncate user-agent strings before writing them to logs. If so, use a logging integration that captures the original header. Getting 422 errors The most common cause is a missing required field or an incorrect type. Check that timestamp is a Unix epoch number (not ISO 8601), status_code is an integer (not a string), and path starts with a /. NDJSON batches are being rejected Each line must be a complete, valid JSON object with no embedded newlines. The Content-Type header must be exactly application/x-ndjson. Keep batch size under 1 MB uncompressed. Don’t see traffic after 10 minutes Confirm your Webhook URL and API Key match exactly what’s shown in your Scrunch app. Check that your Content-Type header matches the body format (application/json for single events, application/x-ndjson for batches). Confirm your timestamp is a Unix epoch in seconds, not milliseconds.