Skip to content

Keyword-Volume Knowledge Base

Overview

Sources¶

The four external data feeds the pipeline blends together. Each lands a set of raw arrays / scalars into the central hub table keywords.keywords_data_local; processing.py then derives every customer-facing metric from those raw inputs.

Source	Cadence	Scale	What makes it distinctive
GSC	hourly, 45-day source window	impressions	Per-domain / device / search-type breakdown. Richest and freshest.
GT	continuous, signal-driven	0–100 relative	The only signal showing trend shape back to ~2014.
GKP	daily-ish, signal-driven	absolute volume + CPC	Anchors absolute volume + serves as a 1.2× soft cap.
Jumpshot	NEVER (one-time historical)	absolute volume + organic % + click-stream	Frozen but irreplaceable clickstream signal.

How they interact¶

The blend logic is in processing.py (Phase 2 / 3 nodes coming). At a high level:

GSC is the primary trend / impression signal for keywords with enough history.
GT provides shape (seasonality, secular trend) and disambiguates spike vs. real change.
GKP anchors absolute volume and caps the blend per-month.
Jumpshot fills the long tail of older keywords where GSC alone is thin; also feeds processed_organic_p directly.

The contradictions between sources are exactly where most of the Decisions live — that's the highest-value Phase 3 work.

See also¶

Architecture — full pipeline diagram
Central hub table — where all source columns land
kvprocessor — the program that pulls each source