Spike corroboration thresholds (GT / GKP confirmation)¶
The second half of the in-blend bot-spike detector. Once GSC spike factor has identified a candidate spike, this rule asks the other sources: "do you also see this month as elevated?" If yes → real spike, leave it alone. If no → bot, substitute with GT-implied volume.
What it is¶
CORROBORATION_FACTOR = 3 # GT/GKP must be >=3x their median to corroborate
GT_HIGH_SIGNAL = 80 # GT value always considered high (GT capped at 100)
GT corroborates when its month value is at least min(3 × gt_median, 80). The min() exists because GT is normalized 0–100 by construction — without the cap, a keyword with median 40 would need GT = 120 (impossible) to corroborate.
GKP corroborates when its month value is at least 3 × gkp_median (no cap — GKP is absolute volume, not normalized).
The candidate spike is judged uncorroborated (and thus bot-driven) only when both sources fail to corroborate:
if gt_val >= gt_thresh: return False # GT confirms — real spike
if gkp_val >= 3 * gkp_median: return False # GKP confirms — real spike
return True # bot
How it's computed¶
At processing.py:KB-ANCHOR:spike-corroboration-thresholds (module-level constants alongside GSC_SPIKE_FACTOR). Used inside is_uncorroborated_gsc_spike().
Why this choice¶
Both 3× and 80 are empirical, calibrated on past bot incidents. When real bot traffic hits a keyword:
- GT (which measures relative search interest, not impressions) almost never moves — bots don't search for the keyword in numbers that move the GT signal, they just slam its result page.
- GKP (which estimates monthly searches from auction data) lags 1–2 months and was historically the more-reliable disagreer with GSC for in-blend single-month bot detection.
A 3× corroboration threshold on either source is high enough that random noise rarely fires the corroborator on a non-spike month, but low enough that a real surge (which moves all sources together) reliably triggers it.
The 80 GT high-signal floor exists because GT being capped at 100 means a literal 3 × median test gets useless when median is already high. Going to 80+ on GT is unambiguous — it means GT itself thinks this month is among the keyword's peak interest months, regardless of its own median. So we use min(3 × median, 80) to handle both peak-shaped GT histories (low median) and steady-elevated ones (high median).
Edge cases¶
gt_median == 0— fall back to the 80 floor directly. A keyword with no GT history at all gets the strictest threshold.gkp_median == 0— the GKP corroboration check is skipped entirely (if gkp_median > 0 and ...). GT alone has to corroborate. This matches reality: a keyword with no GKP can't be cross-validated by GKP.- Both sources at zero on the spike month — neither corroborates; bot flag fires. This is the canonical "GSC says X, nobody else says X" pattern.
See also¶
- GSC spike factor — the candidate-detection step that feeds this
- Bot-like threshold — the parallel cross-source path with different semantics (masks, doesn't substitute)
bot_spike_months— meta record of substitutions