Skip to content

processed_keyword_volume

SQL type: SimpleAggregateFunction(anyLast, Nullable(UInt64)) Table: keywords.keywords_data_local Last validated: 2026-05-21

What it is

The customer-facing headline monthly search volume for a (keyword, country) pair. A 12-month rolling average computed over the blended volume series (combined_volume_data_sorted) after spike repair and GT/GKP blending. This is the number that appears as "Volume" in the product UI.

How it's computed

In processing.py, computed in two phases:

  1. Pre-gate estimate (around the outer gate at processing.py:KB-ANCHOR:volume-200-display-gate) — initial 12-month average from raw GSC impressions_array, used to evaluate the display gate.
  2. Post-blend final value — after GT and GKP blending (processing.py:KB-ANCHOR:gt-blend-weights, gkp-blend-weights) and bot-spike repair, the 12-month average is re-computed from the final combined_volume_data_sorted and written back.

Forecasted months (if added in the same iteration) are excluded from the average — only observed months that have a published value contribute.

Gates

  • Approval gate — only computed when processed_ke_approved == 1 (the processed_ke_approved field gets its own page in Phase 4 · sub-batch 4c).
  • Display gateprocessing.py:KB-ANCHOR:volume-200-display-gate: customer-facing volume only shown when processed_keyword_volume > 200 OR last_12months_js_avg_sv > 100. Below the gate, the row is still written but downstream consumers (PE update, growth metrics) treat it as "not displayed". Memory project_prod_trend_display_gate.
  • Single-month stalenessprocessing.py:KB-ANCHOR:is-legit-impressions-3mo: keywords with single-month GSC older than 3 months are flagged not-legit upstream of the volume computation.

Edge cases

  • New keywords with < 12 months of GSC — average is computed over whatever observed months exist; can produce surprisingly high numbers for keywords with a single recent spike.
  • All-zero GSC + JS present — JS baseline feeds in via last_12months_js_avg_sv. The display gate's OR branch lets such keywords surface if JS ≥ 100.
  • Bot-spike repair — spike-inflated GSC months are replaced with GT-implied values before the volume average (processing.py:KB-ANCHOR:gsc-spike-factor, spike-corroboration-thresholds). Keywords with corrected spikes will have lower processed_keyword_volume than raw GSC would suggest; the original raw values are preserved in processed_volume_trend_meta.bot_spike_months.

Downstream

  • PE update (pe_update.py) reads this directly for top/bottom 400K ranking (processing.py:KB-ANCHOR:pe-cutoff-count, pe-growth-weighted-score).
  • Growth metrics (processed_growth) are computed only when this > 200 (processing.py:KB-ANCHOR:growth-metrics-display-gate).
  • Global aggregation — summed across all countries to produce processed_global_keyword_volume.
  • Stage 4 upload ships to ES (-all-20220613, -lite-20241025) and CH keywords_metrics_local for customer display.

See also