GSC (Google Search Console)¶
Code: kvprocessor.cpp — PullFreshGSCData() / ProcessGSCData(). Per-keyword validity gate: processing.py:KB-ANCHOR:is-legit-impressions-3mo.
Last validated: 2026-05-21
What it carries¶
Per-keyword × domain × device × search-type × date — impressions, clicks, CTR, position. The richest and freshest of the four raw sources; the only one with intra-day granularity and per-domain/device/search-type breakdowns. Everything customer-facing that says "monthly volume" or "trend" leans on GSC as the primary signal once a keyword has enough history.
Where it lives and how it gets here¶
The GSC daemon in backend/megaindex/gsc/ (see memory: GSC data comes from direct GSC API calls via that OCaml daemon, not from S3) writes into gsc.gsc_main3_local and gsc.gsc_bykeyword_local on every laksa shard. Materialised views aggregate those into gsc.aggregated_keywords_local (per-shard) and then re-shard by keyword hash into gsc.aggregated_keywords_global (the lookup table kvprocessor reads). kvprocessor.cpp pulls from aggregated_keywords_global on each peer shard, dedups, rolls up to monthly, and writes the 16 columns above into keywords.keywords_data_local.
The freshness filter on aggregated_keywords_global is metrics_imported_at_max > DATE_SUB(curdate(), 45) — a 45-day window.
Why the 45-day window¶
The 45-day window is an operational performance knob, not a quality threshold. Older rows are filtered out at source-table read time to keep the cross-shard pull cheap enough that kvprocessor can complete in the hourly loop without blowing out the laksa server workers. It is not expressing "data older than 45 days is wrong". Keep this distinction in mind when debugging "where did my old data go?" — older GSC history is still in the upstream gsc_main3_local raw table, just not in the kvprocessor read path.
Validity gates in processing.py¶
- Single-month staleness — a keyword whose GSC data is single-month and >3 months old is treated as not-present (
processing.py:KB-ANCHOR:is-legit-impressions-3mo). - Spike corroboration — uncorroborated GSC impression spikes (e.g. bot-driven surges) are repaired in-place using GT/GKP as the truth signal. See the
Decisionssection for the spike-detection rules. - Outer gate downstream — the customer-facing trend display gate
processed_keyword_volume > 200(memory:project_prod_trend_display_gate) is computed after the GSC blend, so unhealthy GSC doesn't sneak through.
See also¶
Reference docs (external):
- Google Search Console — Performance report metric definitions — canonical definitions of impressions (how often a link to the site appeared on Google), clicks (how often it was clicked), CTR (clicks ÷ impressions), and position (relative ranking; 1 = topmost).
Internal:
- Central hub table — where the columns above land (
keywords.keywords_data_local) - kvprocessor — the program that pulls GSC
- Decisions — bot/spike detection, freshness gates
gsc-debuggingClaude skill — how to drill into GSC issues for a specific keyword- Column-level reference — TBD: a future per-column page lives under
../fields/