Skip to content

Spike corroboration thresholds (GT / GKP confirmation)

The second half of the in-blend bot-spike detector. Once GSC spike factor has identified a candidate spike, this rule asks the other sources: "do you also see this month as elevated?" If yes → real spike, leave it alone. If no → bot, substitute with GT-implied volume.

What it is

CORROBORATION_FACTOR = 3       # GT/GKP must be >=3x their median to corroborate
GT_HIGH_SIGNAL = 80            # GT value always considered high (GT capped at 100)

GT corroborates when its month value is at least min(3 × gt_median, 80). The min() exists because GT is normalized 0–100 by construction — without the cap, a keyword with median 40 would need GT = 120 (impossible) to corroborate.

GKP corroborates when its month value is at least 3 × gkp_median (no cap — GKP is absolute volume, not normalized).

The candidate spike is judged uncorroborated (and thus bot-driven) only when both sources fail to corroborate:

if gt_val >= gt_thresh:       return False   # GT confirms — real spike
if gkp_val >= 3 * gkp_median: return False   # GKP confirms — real spike
return True                                   # bot

How it's computed

At processing.py:KB-ANCHOR:spike-corroboration-thresholds (module-level constants alongside GSC_SPIKE_FACTOR). Used inside is_uncorroborated_gsc_spike().

Why this choice

Both 3× and 80 are empirical, calibrated on past bot incidents. When real bot traffic hits a keyword:

  • GT (which measures relative search interest, not impressions) almost never moves — bots don't search for the keyword in numbers that move the GT signal, they just slam its result page.
  • GKP (which estimates monthly searches from auction data) lags 1–2 months and was historically the more-reliable disagreer with GSC for in-blend single-month bot detection.

A 3× corroboration threshold on either source is high enough that random noise rarely fires the corroborator on a non-spike month, but low enough that a real surge (which moves all sources together) reliably triggers it.

The 80 GT high-signal floor exists because GT being capped at 100 means a literal 3 × median test gets useless when median is already high. Going to 80+ on GT is unambiguous — it means GT itself thinks this month is among the keyword's peak interest months, regardless of its own median. So we use min(3 × median, 80) to handle both peak-shaped GT histories (low median) and steady-elevated ones (high median).

Edge cases

  • gt_median == 0 — fall back to the 80 floor directly. A keyword with no GT history at all gets the strictest threshold.
  • gkp_median == 0 — the GKP corroboration check is skipped entirely (if gkp_median > 0 and ...). GT alone has to corroborate. This matches reality: a keyword with no GKP can't be cross-validated by GKP.
  • Both sources at zero on the spike month — neither corroborates; bot flag fires. This is the canonical "GSC says X, nobody else says X" pattern.

See also