Skip to content

GSC spike factor (in-blend single-month detection)

The first half of the in-blend bot-spike detector (path 1 of 2 in the bot-repair system). Decides whether a single GSC month is anomalous enough to consider it bot-driven. Pairs with spike corroboration thresholds which then decides whether other sources confirm or deny the suspicion.

This path is distinct from the upstream cross-source masking — it runs during the GT-blend loop, replaces individual months with GT-implied values, and records the substitutions to bot_spike_months meta for inspection.

What it is

GSC_SPIKE_FACTOR = 10          # GSC must be >10x its median to be a spike
GSC_SPIKE_MIN_ABS = 50         # ignore spikes below 50 impressions

A GSC month qualifies as a candidate spike when both of:

  1. raw_gsc > 10 × gsc_median — relatively, 10× the keyword's typical GSC baseline
  2. raw_gsc > 50 — absolutely, at least 50 impressions

Candidate spikes then go through corroboration — if GT or GKP agrees this month is high, the spike is real and stays. If neither corroborates, it's flagged as bot-driven and substituted with GT-implied volume.

How it's computed

At processing.py:KB-ANCHOR:gsc-spike-factor (module-level constants, line 878–881). The early-exit guard inside is_uncorroborated_gsc_spike() short-circuits on the relative AND absolute tests:

if gsc_median <= 0 or raw_gsc <= GSC_SPIKE_FACTOR * gsc_median or raw_gsc <= GSC_SPIKE_MIN_ABS:
    return False

Why this choice

Empirical — the noise-vs-signal cutoff. 10× the median is the inflection where real bot incursions clearly dominate normal month-over-month variability. Below 10×, false-positives swamp true positives — a brand keyword's organic growth or a legitimate news cycle can easily produce 3–5× spikes.

The 50-impression absolute floor exists because a relative ratio is meaningless on low-volume keywords: a keyword going from 2 to 25 impressions is a 12× "spike" by ratio, but it's just noise on a tiny baseline. The floor ensures the spike has to be both proportionally and absolutely significant.

Edge cases

  • gsc_median == 0 — the early-exit returns False immediately. Keywords with no GSC history cannot trigger this path.
  • Sustained high traffic without prior baseline — a newly-popular keyword with consistently high recent values may have a high median already, masking ongoing bot traffic from this single-month detector. The cross-source masking catches the multi-month case.
  • Coincides with a real news event — corroboration (next page) is what disambiguates this. The spike-factor alone is intentionally permissive.

See also

  • Spike corroboration thresholds — the GT/GKP check that confirms or denies the suspicion
  • Bot-like threshold — the other bot-detection path (cross-source, upstream of blending)
  • bot_spike_months — the meta JSON field where matches are recorded
  • Memory: project_bot_detection_daily_impressions — explains why daily-granularity bot detection didn't add value