Skip to main content
Build a data mart from the Scrunch Responses API by incrementally loading response-level data, normalizing it into fact and dimension tables, and computing metrics in your warehouse.

Prerequisites

  • A Scrunch API key with Responses access (provision one here)
  • A brand ID for the brand you want to load
  • A cloud data warehouse (BigQuery, Snowflake, Redshift, Databricks, etc.)

How the Responses API works

The Responses API returns one record per AI response. Each record includes the full response text, citations, brand and competitor evaluations, and prompt metadata. Key characteristics:
  • 1:1 response model — each item represents a single, distinct AI response
  • Denormalized — arrays like tags, competitors, and citations are embedded in each record
  • Reverse chronological — newest responses are returned first
  • Supports date filtering and pagination — ideal for incremental ETL
For the full response schema, see the Responses API overview.

Incremental loading strategy

Use date windows and pagination to pull new responses on a recurring schedule.
1

Choose your sync window

Use start_date and end_date (both YYYY-MM-DD, UTC) to define a date range. For daily loads, pull the previous UTC day after midnight to ensure completeness.
2

Paginate through all results

The API returns paginated results. Increment offset by limit until you have all records for the window.
curl "https://api.scrunchai.com/v1/$BRAND_ID/responses?start_date=2025-06-01&end_date=2025-06-02&limit=1000&offset=0" \
  -H "Authorization: Bearer $SCRUNCH_API_KEY"
The response includes a total count so you know when you’re done:
{
  "total": 2430,
  "offset": 0,
  "limit": 1000,
  "items": [ ... ]
}
3

Deduplicate on id

Each response has a globally unique id. Use it as a natural key and upsert into your warehouse to handle re-runs or overlapping date windows.
Some fields like brand_present, brand_sentiment, tags, and stage may be re-evaluated if prompt metadata or brand configuration changes in Scrunch. Always upsert rather than append-only.

Schema design

Each API record includes arrays (tags, competitors, citations) that should be normalized into separate tables to avoid row explosion.
responses (1 row per response)
   |
   |—— response_tags (1 row per response x tag)
   |
   |—— response_competitors (1 row per response x competitor)
   |
   |—— response_citations (1 row per response x citation)
If your warehouse has strong support for JSON or ARRAY columns (e.g., BigQuery, Snowflake, Databricks), you can skip the bridge tables and store tags, competitors, and citations as native array or JSON fields directly on the responses table. Use UNNEST or LATERAL FLATTEN at query time instead of pre-normalizing. This simplifies your ETL at the cost of slightly more complex analytics queries.

Responses fact table

Store one row per response with scalar fields:
ColumnTypeNotes
idSTRINGPrimary key (globally unique)
created_atTIMESTAMPWhen the response was captured
prompt_idSTRINGForeign key to the prompt
promptSTRINGText of the prompt
platformSTRINGChatGPT, Perplexity, Claude, etc.
persona_idSTRING
persona_nameSTRING
stageSTRINGFunnel stage enum
brandedBOOLEANWhether the prompt includes brand names
brand_presentBOOLEAN
brand_sentimentSTRINGpositive, mixed, negative, or null
brand_positionSTRINGtop, middle, bottom, or null
countrySTRING
response_textSTRINGFull AI-generated answer (markdown)

Bridge tables

Unnest each array into its own table, keyed by response_id. response_tags
ColumnType
response_idSTRING
tagSTRING
response_competitors
ColumnType
response_idSTRING
competitor_nameSTRING
presentBOOLEAN
sentimentSTRING
positionSTRING
response_citations
ColumnType
response_idSTRING
urlSTRING
titleSTRING
snippetSTRING
source_typeSTRING
Never unnest multiple arrays in the same query. Crossing two arrays (e.g., tags x competitors) causes row explosion and inflates every metric.

Computing metrics

Once your data mart is loaded, you can aggregate responses into daily performance summaries. Group on DATE(created_at), prompt_id, prompt, platform to match the way Scrunch reports prompt-level performance by default. The following examples use PostgreSQL syntax. Translate as needed for your warehouse.

Response count

COUNT(*) AS responses

Brand presence percentage

COUNT(*) FILTER (WHERE brand_present = true)::float
  / NULLIF(COUNT(*)::float, 0.0) AS brand_presence_pct

Brand sentiment score

AVG(
  CASE
    WHEN brand_sentiment = 'positive' THEN 100
    WHEN brand_sentiment = 'mixed' THEN 50
    WHEN brand_sentiment = 'negative' THEN 0
    ELSE NULL
  END
) AS brand_sentiment_score

Brand position score

AVG(
  CASE
    WHEN brand_position = 'top' THEN 100
    WHEN brand_position = 'middle' THEN 50
    WHEN brand_position = 'bottom' THEN 0
    ELSE NULL
  END
) AS brand_position_score

Putting it together

SELECT
  DATE(created_at)  AS response_date,
  prompt_id,
  prompt,
  platform,
  COUNT(*)          AS responses,
  COUNT(*) FILTER (WHERE brand_present)::float
    / NULLIF(COUNT(*)::float, 0.0)
                    AS brand_presence_pct,
  AVG(CASE
    WHEN brand_sentiment = 'positive' THEN 100
    WHEN brand_sentiment = 'mixed'    THEN 50
    WHEN brand_sentiment = 'negative' THEN 0
  END)              AS brand_sentiment_score,
  AVG(CASE
    WHEN brand_position = 'top'    THEN 100
    WHEN brand_position = 'middle' THEN 50
    WHEN brand_position = 'bottom' THEN 0
  END)              AS brand_position_score
FROM responses
GROUP BY 1, 2, 3, 4;
These metric definitions match the ones used in the Scrunch dashboard and Query API. You can also define your own calculations if your use case requires it.