processed_keyword_volume¶
SQL type: SimpleAggregateFunction(anyLast, Nullable(UInt64))
Table: keywords.keywords_data_local
Last validated: 2026-05-21
What it is¶
The customer-facing headline monthly search volume for a (keyword, country) pair. A 12-month rolling average computed over the blended volume series (combined_volume_data_sorted) after spike repair and GT/GKP blending. This is the number that appears as "Volume" in the product UI.
How it's computed¶
In processing.py, computed in two phases:
- Pre-gate estimate (around the outer gate at
processing.py:KB-ANCHOR:volume-200-display-gate) — initial 12-month average from raw GSCimpressions_array, used to evaluate the display gate. - Post-blend final value — after GT and GKP blending (
processing.py:KB-ANCHOR:gt-blend-weights,gkp-blend-weights) and bot-spike repair, the 12-month average is re-computed from the finalcombined_volume_data_sortedand written back.
Forecasted months (if added in the same iteration) are excluded from the average — only observed months that have a published value contribute.
Gates¶
- Approval gate — only computed when
processed_ke_approved == 1(theprocessed_ke_approvedfield gets its own page in Phase 4 · sub-batch 4c). - Display gate —
processing.py:KB-ANCHOR:volume-200-display-gate: customer-facing volume only shown whenprocessed_keyword_volume > 200ORlast_12months_js_avg_sv > 100. Below the gate, the row is still written but downstream consumers (PE update, growth metrics) treat it as "not displayed". Memoryproject_prod_trend_display_gate. - Single-month staleness —
processing.py:KB-ANCHOR:is-legit-impressions-3mo: keywords with single-month GSC older than 3 months are flagged not-legit upstream of the volume computation.
Edge cases¶
- New keywords with < 12 months of GSC — average is computed over whatever observed months exist; can produce surprisingly high numbers for keywords with a single recent spike.
- All-zero GSC + JS present — JS baseline feeds in via
last_12months_js_avg_sv. The display gate's OR branch lets such keywords surface if JS ≥ 100. - Bot-spike repair — spike-inflated GSC months are replaced with GT-implied values before the volume average (
processing.py:KB-ANCHOR:gsc-spike-factor,spike-corroboration-thresholds). Keywords with corrected spikes will have lowerprocessed_keyword_volumethan raw GSC would suggest; the original raw values are preserved inprocessed_volume_trend_meta.bot_spike_months.
Downstream¶
- PE update (
pe_update.py) reads this directly for top/bottom 400K ranking (processing.py:KB-ANCHOR:pe-cutoff-count,pe-growth-weighted-score). - Growth metrics (
processed_growth) are computed only when this > 200 (processing.py:KB-ANCHOR:growth-metrics-display-gate). - Global aggregation — summed across all countries to produce
processed_global_keyword_volume. - Stage 4 upload ships to ES (
-all-20220613,-lite-20241025) and CHkeywords_metrics_localfor customer display.
See also¶
- Central hub table — column lives here
- processing.py — producer
- GSC source · JS source · GKP source · GT source
- Decisions (Phase 3) —
volume-200-display-gate,gt-blend-weights,gkp-blend-weights,gsc-spike-factor