Skip to content

Refresh priority formula

How the five refresh signals' contributions get combined into the single per-keyword priority score that drives which keywords get GT/GKP refreshed first. Operates on the two separate component lists (gt_components, gkp_components) and produces three numeric outputs: gt_priority, gkp_priority, and the headline priority = max(both).

What it is

volume_factor = min(1.0, log2(volume + 1) / 17.0)
scale         = 0.3 + 0.7 * volume_factor              # range [0.3, 1.0]

gt_priority   = min(10.0, sum(gt_components)  * scale)
gkp_priority  = min(10.0, sum(gkp_components) * scale)
priority      = max(gt_priority, gkp_priority)

Two-step shape: sum the contributions, then scale by a volume-derived factor, then cap at 10.0.

How it's computed

At processing.py:KB-ANCHOR:refresh-priority-formula. The component lists are populated by the five signal blocks above; this block runs once after all five have had a chance to fire.

processed_keyword_volume volume_factor scale
0 0.0 0.30
100 ~0.39 ~0.57
1,000 ~0.59 ~0.71
10,000 ~0.78 ~0.85
131,072 (2^17) 1.00 1.00
≥ 131,072 1.00 1.00

A keyword at volume 0 retains 30% of its raw component sum; a keyword at ~131K volume retains 100%.

Why this choice

Additive (sum) over the components, not max or geometric mean. This is deliberate: each signal captures a different reason to refresh, and they're meant to stack. A keyword that's stale AND surging AND has missing GKP is more urgent than any single signal alone — additive composition reflects that.

The per-signal min(3.0, ...) cap on each contribution prevents any one signal from dominating; the overall min(10.0, ...) cap keeps the score bounded for tier assignment and downstream ordering.

Volume scaling exists because GT capacity is the binding constraint. The function log2(v+1) / 17 maps the volume range [0, ~131K+] smoothly onto [0, 1] — log because keyword volumes are heavy-tailed, and 17 because 2^17 ≈ 131K happens to be where the curve plateaus naturally (above that, more volume doesn't make a refresh more urgent).

The 0.3 + 0.7 * volume_factor mapping leaves a 30% floor for low-volume keywords: they still get some priority. A pure multiplication by volume_factor would have driven low-volume keywords to ~0 priority and they'd never refresh. The floor reflects that low-volume tail keywords still need to be touched eventually, just not at the front of the queue.

Edge cases

  • No signals fire at all — the function returns None before reaching the priority block (lines 800–814). Keyword carries no refresh metadata; nothing scheduled.
  • All contributions to one side, none to the other — both gt_priority and gkp_priority are computed independently, but only the side with components > 0 sets needs_gt / needs_gkp. The other side gets 0 priority and is ignored downstream.
  • Score hits the 10.0 cap — happens for keywords with several strong signals + high volume. Multiple keywords can share priority 10.0; ordering among them inside tier 1 falls back to insertion order in the extraction queue.

See also