Skip to main content

Overview

The Agent Traffic API provides aggregated analytics on AI bot activity across your websites. Track which AI platforms are crawling your content, what pages they access, and understand patterns in retrieval, training, and indexing behavior. This API is optimized for time-series analysis and bot classification, making it ideal for SEO teams, content strategists, and developers building AI visibility monitoring into their workflows. Each request returns aggregated bot traffic metrics grouped by the dimensions you select.

What the Agent Traffic API includes

The Agent Traffic API returns aggregated metrics including:
  • Request counts by bot source
  • Traffic patterns by date or week
  • Bot activity by page path
  • Classification by bot type (retrieval, training, indexer)
These metrics can be grouped by dimensions including:
  • Date (day or week buckets)
  • Site domain
  • URL path
  • Agent source (e.g., chatgpt-user, claudebot, gptbot)
  • Agent type (retrieval, training, indexer)
All results are aggregated summaries optimized for trend analysis and monitoring.

When to use the Agent Traffic API

Use the Agent Traffic API when you need:
  • Weekly or daily bot traffic reporting
  • Trend analysis of AI crawler behavior over time
  • Path-level breakdowns of bot activity
  • Bot classification by retrieval vs training purposes
  • Data for SEO dashboards monitoring AI visibility
  • Automated alerts based on traffic patterns
The Agent Traffic API is designed to answer questions like: “Which AI bots are crawling my content?” and “How is bot traffic trending across different sections of my site?”

When not to use the Agent Traffic API

The Agent Traffic API is not appropriate if you need:
  • Raw access log entries
  • Individual request details (user agents, IP addresses, timestamps)
  • Real-time streaming data
  • Non-bot traffic analytics
For CDN setup and log configuration, see the Agent Traffic Integration Guide.

Example query

curl -X GET \
  "https://api.scrunchai.com/v1/1234/sites/01JW849S5DJZ3CCE4DA6TFMYEY/agent-traffic?start_date=2025-01-01&end_date=2025-01-31&fields=date,agent_source,agent_type&time_bucket=week" \
  -H "Authorization: Bearer $SCRUNCH_API_TOKEN"
Response:
{
  "meta": {
    "start_date": "2025-01-01",
    "end_date": "2025-01-31",
    "time_bucket": "week"
  },
  "data": [
    {
      "date": "2025W01",
      "agent_source": "chatgpt-user",
      "agent_type": "retrieval",
      "requests": 1247
    },
    {
      "date": "2025W01",
      "agent_source": "claudebot",
      "agent_type": "training",
      "requests": 892
    }
  ]
}

Available dimensions

FieldDescriptionExample Values
dateTimestamp bucket20250115 (day) or 2025W03 (week)
siteDomainexample.com
pathURL path/blog/article
agent_sourceBot identifierchatgpt-user, claudebot, gptbot
agent_typeBot categoryretrieval, training, indexer

Time bucketing

Control the granularity of date aggregation using the time_bucket parameter:
  • day (default): Daily aggregation with dates formatted as YYYYMMDD
  • week: Weekly aggregation with dates formatted as YYYYWW (ISO week number)
Weekly buckets reduce result size and are recommended for long-range trend analysis.

Path filtering

Use the path parameter to filter results by URL path prefix:
# Only show bot traffic to blog articles
?path=/blog/

# Only show traffic to a specific section
?path=/products/widgets
Path matching uses prefix-based filtering with SQL LIKE patterns (path LIKE '/blog/%'). All user input is properly escaped to prevent SQL injection.

Limits and performance considerations

  • Maximum rows per request: 100,000
  • Default limit: 10,000 rows
  • Results are pre-aggregated for fast retrieval
  • Use pagination (limit and offset) for large result sets
For best performance:
  • Use weekly bucketing when possible to reduce cardinality
  • Keep path filters specific to reduce result size
  • Request only the dimensions you need

Security and validation

The Agent Traffic API implements strict security measures:
  • Site ID validation: All site IDs are validated against ULID format using regex
  • Parameter validation: All query parameters are validated before SQL generation
  • SQL injection prevention: Path filters use escaped LIKE patterns with no direct string concatenation
  • Authentication: All requests require valid bearer token authentication

Best practices

  • Use time_bucket=week for trend analysis spanning more than 30 days
  • Filter by path when analyzing specific site sections
  • Group by agent_type to distinguish retrieval bots from training crawlers
  • Run separate queries for different reporting needs rather than over-selecting dimensions
  • Monitor agent_source trends to identify new AI platforms crawling your content

Typical use cases

Teams commonly use the Agent Traffic API to:
  • Monitor which AI platforms are indexing their content
  • Identify pages with high bot traffic for SEO optimization
  • Track changes in crawler behavior after content updates
  • Build dashboards showing AI visibility by site section
  • Alert on unusual bot traffic patterns
  • Analyze the impact of robots.txt changes on AI crawler access

Prerequisites

Before using the Agent Traffic API, you must:
  1. Configure your CDN or hosting provider to send access logs to Scrunch
  2. Verify your site is properly configured in your Scrunch account
  3. Obtain your site ID from the Scrunch dashboard

Set up Agent Traffic logging

Configure your CDN integration →