GT/GKP staleness refresh signal¶
Two of the five non-exclusive refresh signals: one for GT, one for GKP. Either fires when the corresponding source's latest data point is more than 3 months behind the pipeline's reference date.
What it is¶
Signal names in signals[]: "stale_gt" and "stale_gkp".
Each fires independently. A keyword can carry both, just stale_gt, just stale_gkp, or neither. They contribute to the separate per-source priority lists (gt_components and gkp_components) — a stale GT doesn't push a GKP refresh and vice versa.
How it's computed¶
At processing.py:KB-ANCHOR:refresh-signal-staleness:
months_stale = _months_between(one_mo_ago, max(source_month_array))
if months_stale >= 3:
component = min(3.0, 1.0 + (months_stale - 3) / 3.0)
| Months stale | Component value |
|---|---|
| < 3 | does not fire |
| 3 | 1.0 |
| 6 | 2.0 |
| 9 | 3.0 (cap) |
| 12+ | 3.0 (cap) |
Linear ramp from 1.0 at 3 months to 3.0 at 9 months, then saturates. Cap matches the standard "max single-signal contribution" of 3.0 used across signals.
Why this choice¶
3 months is the practical refresh cadence. GT data via SearchAPI updates on a monthly tick, but capacity constraints + extraction latency mean a keyword pulled in month N typically has its most recent point at month N-1 or N-2. Anything > 3 months stale means the keyword hasn't been touched in at least one full extraction cycle, which is the right "needs attention" threshold.
The continuous ramp (rather than a step function) means a barely stale keyword (3 months) contributes less than a very stale one (9+ months), so the priority formula naturally prioritizes the most-overdue keywords inside the refresh pool.
The 9-month saturation point reflects that beyond 9 months, the keyword is going to be refreshed in the next eligible cycle anyway — adding more weight wouldn't change the outcome.
Edge cases¶
- GT/GKP source genuinely empty — handled by missing-source signal, not this one. Staleness only fires when the source has some data.
- GSC has no staleness signal — GSC is treated as always-fresh; if it's stale, that's a pipeline incident, not a refresh-detection condition. Same reasoning as
_CW_FRESH_GSC_PRESENTin the confidence freshness ladder. - Newly-extracted keyword — first pull may show only 1–2 months of source data due to API truncation; the staleness signal correctly does not fire because
max(month_array)is current.
See also¶
- Missing-source refresh signal — fires when there's no data at all
- Confidence freshness subscore — the customer-facing freshness penalty (separate ladder)
- Archive:
_archive/refresh_detection.md