Skip to content

Sources

The four external data feeds the pipeline blends together. Each lands a set of raw arrays / scalars into the central hub table keywords.keywords_data_local; processing.py then derives every customer-facing metric from those raw inputs.

Source Cadence Scale What makes it distinctive
GSC hourly, 45-day source window impressions Per-domain / device / search-type breakdown. Richest and freshest.
GT continuous, signal-driven 0–100 relative The only signal showing trend shape back to ~2014.
GKP daily-ish, signal-driven absolute volume + CPC Anchors absolute volume + serves as a 1.2× soft cap.
Jumpshot NEVER (one-time historical) absolute volume + organic % + click-stream Frozen but irreplaceable clickstream signal.

How they interact

The blend logic is in processing.py (Phase 2 / 3 nodes coming). At a high level:

  1. GSC is the primary trend / impression signal for keywords with enough history.
  2. GT provides shape (seasonality, secular trend) and disambiguates spike vs. real change.
  3. GKP anchors absolute volume and caps the blend per-month.
  4. Jumpshot fills the long tail of older keywords where GSC alone is thin; also feeds processed_organic_p directly.

The contradictions between sources are exactly where most of the Decisions live — that's the highest-value Phase 3 work.

See also