Sources¶
The four external data feeds the pipeline blends together. Each lands a set of raw arrays / scalars into the central hub table keywords.keywords_data_local; processing.py then derives every customer-facing metric from those raw inputs.
| Source | Cadence | Scale | What makes it distinctive |
|---|---|---|---|
| GSC | hourly, 45-day source window | impressions | Per-domain / device / search-type breakdown. Richest and freshest. |
| GT | continuous, signal-driven | 0–100 relative | The only signal showing trend shape back to ~2014. |
| GKP | daily-ish, signal-driven | absolute volume + CPC | Anchors absolute volume + serves as a 1.2× soft cap. |
| Jumpshot | NEVER (one-time historical) | absolute volume + organic % + click-stream | Frozen but irreplaceable clickstream signal. |
How they interact¶
The blend logic is in processing.py (Phase 2 / 3 nodes coming). At a high level:
- GSC is the primary trend / impression signal for keywords with enough history.
- GT provides shape (seasonality, secular trend) and disambiguates spike vs. real change.
- GKP anchors absolute volume and caps the blend per-month.
- Jumpshot fills the long tail of older keywords where GSC alone is thin; also feeds
processed_organic_pdirectly.
The contradictions between sources are exactly where most of the Decisions live — that's the highest-value Phase 3 work.
See also¶
- Architecture — full pipeline diagram
- Central hub table — where all source columns land
- kvprocessor — the program that pulls each source