Skip to content

GT/GKP staleness refresh signal

Two of the five non-exclusive refresh signals: one for GT, one for GKP. Either fires when the corresponding source's latest data point is more than 3 months behind the pipeline's reference date.

What it is

Signal names in signals[]: "stale_gt" and "stale_gkp".

Each fires independently. A keyword can carry both, just stale_gt, just stale_gkp, or neither. They contribute to the separate per-source priority lists (gt_components and gkp_components) — a stale GT doesn't push a GKP refresh and vice versa.

How it's computed

At processing.py:KB-ANCHOR:refresh-signal-staleness:

months_stale = _months_between(one_mo_ago, max(source_month_array))
if months_stale >= 3:
    component = min(3.0, 1.0 + (months_stale - 3) / 3.0)
Months stale Component value
< 3 does not fire
3 1.0
6 2.0
9 3.0 (cap)
12+ 3.0 (cap)

Linear ramp from 1.0 at 3 months to 3.0 at 9 months, then saturates. Cap matches the standard "max single-signal contribution" of 3.0 used across signals.

Why this choice

3 months is the practical refresh cadence. GT data via SearchAPI updates on a monthly tick, but capacity constraints + extraction latency mean a keyword pulled in month N typically has its most recent point at month N-1 or N-2. Anything > 3 months stale means the keyword hasn't been touched in at least one full extraction cycle, which is the right "needs attention" threshold.

The continuous ramp (rather than a step function) means a barely stale keyword (3 months) contributes less than a very stale one (9+ months), so the priority formula naturally prioritizes the most-overdue keywords inside the refresh pool.

The 9-month saturation point reflects that beyond 9 months, the keyword is going to be refreshed in the next eligible cycle anyway — adding more weight wouldn't change the outcome.

Edge cases

  • GT/GKP source genuinely empty — handled by missing-source signal, not this one. Staleness only fires when the source has some data.
  • GSC has no staleness signal — GSC is treated as always-fresh; if it's stale, that's a pipeline incident, not a refresh-detection condition. Same reasoning as _CW_FRESH_GSC_PRESENT in the confidence freshness ladder.
  • Newly-extracted keyword — first pull may show only 1–2 months of source data due to API truncation; the staleness signal correctly does not fire because max(month_array) is current.

See also