Refresh priority formula¶
How the five refresh signals' contributions get combined into the single per-keyword priority score that drives which keywords get GT/GKP refreshed first. Operates on the two separate component lists (gt_components, gkp_components) and produces three numeric outputs: gt_priority, gkp_priority, and the headline priority = max(both).
What it is¶
volume_factor = min(1.0, log2(volume + 1) / 17.0)
scale = 0.3 + 0.7 * volume_factor # range [0.3, 1.0]
gt_priority = min(10.0, sum(gt_components) * scale)
gkp_priority = min(10.0, sum(gkp_components) * scale)
priority = max(gt_priority, gkp_priority)
Two-step shape: sum the contributions, then scale by a volume-derived factor, then cap at 10.0.
How it's computed¶
At processing.py:KB-ANCHOR:refresh-priority-formula. The component lists are populated by the five signal blocks above; this block runs once after all five have had a chance to fire.
processed_keyword_volume |
volume_factor |
scale |
|---|---|---|
| 0 | 0.0 | 0.30 |
| 100 | ~0.39 | ~0.57 |
| 1,000 | ~0.59 | ~0.71 |
| 10,000 | ~0.78 | ~0.85 |
| 131,072 (2^17) | 1.00 | 1.00 |
| ≥ 131,072 | 1.00 | 1.00 |
A keyword at volume 0 retains 30% of its raw component sum; a keyword at ~131K volume retains 100%.
Why this choice¶
Additive (sum) over the components, not max or geometric mean. This is deliberate: each signal captures a different reason to refresh, and they're meant to stack. A keyword that's stale AND surging AND has missing GKP is more urgent than any single signal alone — additive composition reflects that.
The per-signal min(3.0, ...) cap on each contribution prevents any one signal from dominating; the overall min(10.0, ...) cap keeps the score bounded for tier assignment and downstream ordering.
Volume scaling exists because GT capacity is the binding constraint. The function log2(v+1) / 17 maps the volume range [0, ~131K+] smoothly onto [0, 1] — log because keyword volumes are heavy-tailed, and 17 because 2^17 ≈ 131K happens to be where the curve plateaus naturally (above that, more volume doesn't make a refresh more urgent).
The 0.3 + 0.7 * volume_factor mapping leaves a 30% floor for low-volume keywords: they still get some priority. A pure multiplication by volume_factor would have driven low-volume keywords to ~0 priority and they'd never refresh. The floor reflects that low-volume tail keywords still need to be touched eventually, just not at the front of the queue.
Edge cases¶
- No signals fire at all — the function returns
Nonebefore reaching the priority block (lines 800–814). Keyword carries no refresh metadata; nothing scheduled. - All contributions to one side, none to the other — both
gt_priorityandgkp_priorityare computed independently, but only the side with components > 0 setsneeds_gt/needs_gkp. The other side gets 0 priority and is ignored downstream. - Score hits the 10.0 cap — happens for keywords with several strong signals + high volume. Multiple keywords can share priority 10.0; ordering among them inside tier 1 falls back to insertion order in the extraction queue.
See also¶
- GT tier assignment — applies the priority and single-spike override
- The five contributing signals: surge · staleness · divergence · missing · shape-mismatch
- Archive:
_archive/refresh_detection.md