GSC spike factor (in-blend single-month detection)¶
The first half of the in-blend bot-spike detector (path 1 of 2 in the bot-repair system). Decides whether a single GSC month is anomalous enough to consider it bot-driven. Pairs with spike corroboration thresholds which then decides whether other sources confirm or deny the suspicion.
This path is distinct from the upstream cross-source masking — it runs during the GT-blend loop, replaces individual months with GT-implied values, and records the substitutions to bot_spike_months meta for inspection.
What it is¶
GSC_SPIKE_FACTOR = 10 # GSC must be >10x its median to be a spike
GSC_SPIKE_MIN_ABS = 50 # ignore spikes below 50 impressions
A GSC month qualifies as a candidate spike when both of:
raw_gsc > 10 × gsc_median— relatively, 10× the keyword's typical GSC baselineraw_gsc > 50— absolutely, at least 50 impressions
Candidate spikes then go through corroboration — if GT or GKP agrees this month is high, the spike is real and stays. If neither corroborates, it's flagged as bot-driven and substituted with GT-implied volume.
How it's computed¶
At processing.py:KB-ANCHOR:gsc-spike-factor (module-level constants, line 878–881). The early-exit guard inside is_uncorroborated_gsc_spike() short-circuits on the relative AND absolute tests:
if gsc_median <= 0 or raw_gsc <= GSC_SPIKE_FACTOR * gsc_median or raw_gsc <= GSC_SPIKE_MIN_ABS:
return False
Why this choice¶
Empirical — the noise-vs-signal cutoff. 10× the median is the inflection where real bot incursions clearly dominate normal month-over-month variability. Below 10×, false-positives swamp true positives — a brand keyword's organic growth or a legitimate news cycle can easily produce 3–5× spikes.
The 50-impression absolute floor exists because a relative ratio is meaningless on low-volume keywords: a keyword going from 2 to 25 impressions is a 12× "spike" by ratio, but it's just noise on a tiny baseline. The floor ensures the spike has to be both proportionally and absolutely significant.
Edge cases¶
gsc_median == 0— the early-exit returnsFalseimmediately. Keywords with no GSC history cannot trigger this path.- Sustained high traffic without prior baseline — a newly-popular keyword with consistently high recent values may have a high median already, masking ongoing bot traffic from this single-month detector. The cross-source masking catches the multi-month case.
- Coincides with a real news event — corroboration (next page) is what disambiguates this. The spike-factor alone is intentionally permissive.
See also¶
- Spike corroboration thresholds — the GT/GKP check that confirms or denies the suspicion
- Bot-like threshold — the other bot-detection path (cross-source, upstream of blending)
bot_spike_months— the meta JSON field where matches are recorded- Memory:
project_bot_detection_daily_impressions— explains why daily-granularity bot detection didn't add value