Send Agent Traffic via API

Overview

The Agent Traffic API lets you send web traffic logs to Scrunch from any platform or hosting environment — including CDNs and setups that don’t have a native Scrunch integration. Once data is flowing, Scrunch automatically classifies each request by bot type (retrieval, training, indexer) and agent source (GPTBot, ClaudeBot, and others). This guide covers:

Setting up a site with the API platform in the dashboard
Sending single events and batches
Backfilling historical log data
Managing multiple sites (for agencies and multi-brand setups)
Retry logic and error handling

Prerequisites

A Scrunch account with Agent Traffic access
Access to your web server or CDN access logs
Your site’s domain (e.g., example.com)

Step 1: Create a site with the API platform

In the Scrunch dashboard, open the Agent Traffic page.
Click Add website.
Enter your domain and select API as the platform.
Copy the Site ID (ULID format) and API key (JWT token) that appear after saving.

You will need both values for every request. Each site has its own Site ID and API key — if you are managing multiple domains, repeat this step for each one.

Your site will show a pending status until the first valid request is received. It transitions to active automatically within 5 minutes.

Step 2: Send your first event

Endpoint

POST https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic

Authentication

Include the API key in the X-Api-Key header:

X-Api-Key: <your-jwt-token>

Required fields

Field	Type	Description
`domain`	string	The domain of the site (e.g. `example.com`)
`user_agent`	string	The full, original User-Agent string from the request
`url`	string	Full URL (e.g. `https://example.com/blog/post`)
`path`	string	URL path only (e.g. `/blog/post`)
`method`	string	HTTP method (e.g. `GET`)
`status_code`	integer	HTTP response status code (e.g. `200`)
`timestamp`	integer	Unix epoch in seconds (e.g. `1700000000`)

Optional fields

Field	Type	Description
`response_time`	integer	Response time in milliseconds
`ip`	string	IP address of the requesting client

Always pass the original, unmodified user_agent string from the incoming request. Scrunch’s bot classification runs entirely off this field. Truncating or transforming it will result in incorrect or missing bot detection.

Single event (cURL)

curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -d '{
    "domain": "example.com",
    "user_agent": "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)",
    "url": "https://example.com/blog/post",
    "path": "/blog/post",
    "method": "GET",
    "status_code": 200,
    "timestamp": 1700000000,
    "response_time": 120,
    "ip": "203.0.113.1"
  }'

A successful response returns:

{ "status": "ok" }

Step 3: Send batches with NDJSON

For production use, send multiple events per request using newline-delimited JSON (NDJSON). Each line is a complete JSON object. This reduces request overhead and is the recommended approach for any significant traffic volume. Set Content-Type: application/x-ndjson:

curl -X POST "https://webhooks.scrunchai.com/v1/sites/{site_id}/platforms/custom/web-traffic" \
  -H "Content-Type: application/x-ndjson" \
  -H "X-Api-Key: YOUR_API_KEY" \
  -d '{"domain":"example.com","user_agent":"Mozilla/5.0 (compatible; GPTBot/1.0)","url":"https://example.com/page-1","path":"/page-1","method":"GET","status_code":200,"timestamp":1700000000}
{"domain":"example.com","user_agent":"Mozilla/5.0 (compatible; ClaudeBot/1.0)","url":"https://example.com/page-2","path":"/page-2","method":"GET","status_code":200,"timestamp":1700000060,"response_time":95}'

Keep each batch under 1 MB uncompressed. Split larger payloads into multiple requests.

Step 4: Backfill historical data with Python

If you have existing access logs, use this script to send them in batches. It reads a CSV of log entries, maps fields to the API schema, and sends NDJSON batches with retry handling for rate limits.

Expected CSV format

Your CSV should have columns matching the required and optional fields. At minimum:

timestamp,domain,user_agent,url,path,method,status_code,response_time_ms,ip_address
1700000000,example.com,"Mozilla/5.0 (compatible; GPTBot/1.0)",https://example.com/page,/page,GET,200,120,203.0.113.1

Backfill script

import csv
import json
import time
import requests

API_KEY = "your-jwt-token"
SITE_ID = "your-site-id"
ENDPOINT = f"https://webhooks.scrunchai.com/v1/sites/{SITE_ID}/platforms/custom/web-traffic"
BATCH_SIZE_BYTES = 1_000_000  # 1 MB per batch


def load_payloads(csv_path: str) -> list[dict]:
    """Read a CSV of access log rows and map to API payload format."""
    payloads = []
    with open(csv_path, encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            domain = row.get("domain", "")
            path = row.get("path", "/") or "/"
            payload = {
                "domain": domain,
                "user_agent": row.get("user_agent", ""),
                "url": row.get("url", "") or f"https://{domain}{path}",
                "path": path,
                "method": row.get("method", "GET") or "GET",
                "status_code": int(row.get("status_code", "200") or "200"),
                "timestamp": int(row.get("timestamp", "0") or "0"),
                "response_time": int(row.get("response_time_ms", "0") or "0"),
                "ip": row.get("ip_address") or None,
            }
            payloads.append(payload)
    return payloads


def build_batches(payloads: list[dict], max_bytes: int = BATCH_SIZE_BYTES) -> list[list[dict]]:
    """Split payloads into batches that fit within max_bytes uncompressed."""
    batches, current, current_size = [], [], 0
    for p in payloads:
        size = len(json.dumps(p).encode()) + 1  # +1 for newline
        if current and current_size + size > max_bytes:
            batches.append(current)
            current, current_size = [], 0
        current.append(p)
        current_size += size
    if current:
        batches.append(current)
    return batches


def send_batch(batch: list[dict], retries: int = 3) -> None:
    """Send a single NDJSON batch with retry logic for rate limits."""
    ndjson = "\n".join(json.dumps(p) for p in batch) + "\n"
    for attempt in range(retries):
        response = requests.post(
            ENDPOINT,
            content=ndjson.encode("utf-8"),
            headers={
                "Content-Type": "application/x-ndjson",
                "X-Api-Key": API_KEY,
            },
            timeout=60,
        )
        if response.status_code == 200:
            return
        if response.status_code == 429:
            wait = int(response.headers.get("Retry-After", 5))
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)
        else:
            response.raise_for_status()
    raise RuntimeError(f"Failed to send batch after {retries} attempts")


def main(csv_path: str) -> None:
    payloads = load_payloads(csv_path)
    batches = build_batches(payloads)
    print(f"Loaded {len(payloads)} events across {len(batches)} batch(es)")

    for i, batch in enumerate(batches, 1):
        print(f"Sending batch {i}/{len(batches)} ({len(batch)} events)...")
        send_batch(batch)
        print(f"  Batch {i} sent successfully")

    print("Done.")


if __name__ == "__main__":
    import sys
    main(sys.argv[1])

Run it:

python backfill.py your_logs.csv

Managing multiple sites

If you are an agency or managing multiple brands, each domain requires its own site entry in the dashboard with its own Site ID and API key. The sending logic is identical across all sites — only the site_id in the URL and the X-Api-Key header change per site. A common pattern for multi-site setups:

SITES = [
    {"site_id": "01ABC...", "api_key": "token-for-site-a", "domain": "brand-a.com"},
    {"site_id": "01DEF...", "api_key": "token-for-site-b", "domain": "brand-b.com"},
]

for site in SITES:
    # filter payloads for this domain, then send
    site_payloads = [p for p in all_payloads if p["domain"] == site["domain"]]
    # ... send using site["site_id"] and site["api_key"]

This approach scales well when onboarding many brands: provision each site in the dashboard, collect credentials, and run the same pipeline with different configuration per site.

Error handling

Status	Meaning	Action
`200`	Accepted and queued	No action needed
`401`	Invalid or missing API key	Verify the `X-Api-Key` value and header name
`422`	Validation error	Check all required fields are present and correctly typed
`429`	Rate limited	Wait and retry; respect the `Retry-After` response header
`500`	Server error	Retry with exponential backoff; contact support if persistent

Troubleshooting

Site is stuck in pending status The site activates within 5 minutes of the first valid request. If it remains pending, confirm a request was actually sent (not a dry run), check that the Site ID in the URL matches the one in the dashboard, and verify the API key is correct. Bot traffic is not being classified Bot classification is derived entirely from the user_agent field. Confirm you are passing the raw, original user-agent string from the incoming request without modification. Check your log format — some CDNs normalize or truncate user-agent strings before writing them to logs. If so, use a logging integration that captures the original header. Getting 422 errors The most common cause is a missing required field or an incorrect type. Check that timestamp is a Unix epoch integer (not ISO 8601), status_code is an integer (not a string), and path starts with a /. NDJSON batches are being rejected Each line must be a complete, valid JSON object with no embedded newlines. The Content-Type header must be exactly application/x-ndjson. Keep batch size under 1 MB uncompressed.

Agent Traffic API Overview — Full schema and response format for querying traffic data
Get Agent Traffic — Query aggregated bot traffic by date, path, and agent type
CDN Integration Guides — Native integrations for Cloudflare, Fastly, and other CDN platforms

Get Started

Quickstarts

Guides

Integrations

Overview

Prerequisites

Step 1: Create a site with the API platform

Step 2: Send your first event

Endpoint

Authentication

Required fields

Optional fields

Single event (cURL)

Step 3: Send batches with NDJSON

Step 4: Backfill historical data with Python

Expected CSV format

Backfill script

Managing multiple sites

Error handling

Troubleshooting

Get Started

Quickstarts

Guides

Integrations

​Overview

​Prerequisites

​Step 1: Create a site with the API platform

​Step 2: Send your first event

​Endpoint

​Authentication

​Required fields

​Optional fields

​Single event (cURL)

​Step 3: Send batches with NDJSON

​Step 4: Backfill historical data with Python

​Expected CSV format

​Backfill script

​Managing multiple sites

​Error handling

​Troubleshooting

​Related

Overview

Prerequisites

Step 1: Create a site with the API platform

Step 2: Send your first event

Endpoint

Authentication

Required fields

Optional fields

Single event (cURL)

Step 3: Send batches with NDJSON

Step 4: Backfill historical data with Python

Expected CSV format

Backfill script

Managing multiple sites

Error handling

Troubleshooting

Related