Monitoring snapshots — record_*_snapshot.py¶
Code: record_confidence_snapshot.py, record_prediction_coverage_snapshot.py, record_bot_spike_counts_snapshot.py (all in project root).
Last validated: 2026-05-21
What they do¶
Three small Python scripts that run once per hourly cycle (after pe_update.py, before upload_backend.sh) and write one row-set per iteration to history tables on isog. They never modify the hub — they're pure observability emitters.
All three share the same patterns:
- Scatter-gather across all 200 shards × 256 ranges of
keywords.keywords_data_local - Idempotent DDL via
CREATE TABLE IF NOT EXISTSon first run - 3-retry insert with exponential back-off; on full failure the snapshot is skipped (history just misses one row for that cycle)
- Dated logs at
/home/data/{confidence_snapshot_logs,prediction_coverage_snapshot_logs,bot_spike_counts_logs}/
record_confidence_snapshot.py¶
Bins approved keywords by confidence score (0–100) into 10 buckets of width 10, computes counts and sum-of-confidence per (scope × bucket), inserts 110 rows per run (1 timestamp × 11 scopes [global + top-10 countries] × 10 buckets) into keywords_volume.confidence_distribution_history.
| Anchor | Constant | Note |
|---|---|---|
confidence-snapshot-top-countries |
TOP_N_COUNTRIES = 10 |
Different from bot-spike snapshot's 25 — pick deliberate |
Schema:
CREATE TABLE keywords_volume.confidence_distribution_history (
snapshot_at DateTime,
scope LowCardinality(String), -- 'global' or country code
bucket UInt8, -- 0–9 (× 10 = lower edge of the score bucket)
cnt UInt64,
sum_conf Float64
) ENGINE = MergeTree
ORDER BY (snapshot_at, scope, bucket)
Consumed by the demo app's api_confidence.py.
record_prediction_coverage_snapshot.py¶
Counts approved keywords per country whose processed_volume_trend JSON contains any month key greater than the current YYYYMM — i.e. keywords with a usable 12-month forecast attached. One row per country per run.
No load-bearing constants — uses today.strftime('%Y%m') dynamically. Nothing to anchor here.
Schema:
CREATE TABLE keywords_volume.prediction_coverage_history (
snapshot_at DateTime,
country LowCardinality(String),
cnt UInt64
) ENGINE = MergeTree
ORDER BY (snapshot_at, country)
Consumed by the demo app's api_prediction_coverage.py.
record_bot_spike_counts_snapshot.py¶
Two parallel scatter-gather scans:
spiked_kw_count— keywords matching the spike predicate (GSC >SPIKE_MIN_RATIO× median AND GSC >SPIKE_MIN_ABSAND GT/GKP < p75). Same predicate as the/spikesdemo page so the snapshot and the page stay in sync.bot_corrected_kw_count— keywords with non-emptybot_spike_monthsinprocessed_volume_trend_meta(i.e.processing.pyrepaired a bot-driven spike for them).
Aggregated globally + per top-25 country. ~26 rows per run (1 global + 25 countries).
| Anchor | Constants | Note |
|---|---|---|
bot-spike-snapshot-predicate |
SPIKE_MIN_RATIO = 10, SPIKE_MIN_ABS = 50 |
Same defaults as /spikes page; tight coupling |
bot-spike-snapshot-top-countries |
TOP_N_COUNTRIES = 25 |
Wider than confidence snapshot's 10 — bot patterns are country-localised |
Schema:
CREATE TABLE keywords_volume.bot_spike_counts_history (
snapshot_at DateTime,
scope LowCardinality(String),
spiked_kw_count UInt64,
bot_corrected_kw_count UInt64
) ENGINE = MergeTree
ORDER BY (snapshot_at, scope)
Failure modes (all 3)¶
| Symptom | Cause | Behaviour |
|---|---|---|
| Scatter-gather shard timeout | CH replica slow | Logged, that shard's contribution missing from the row |
Insert into keywords_volume.*_history fails |
isog write outage | 3 retries with 5 s back-off, then skip this cycle's snapshot |
| Snapshot script raises uncaught exception | Bug, OOM, etc. | keep_running.sh has no early-exit — next stages still run; one row missing from history |
A few missed snapshots are harmless; the time-series history just has a gap.
See also¶
- keep_running.sh — orchestrator that invokes all three
- Central hub table — read surface
- demo-app — consumer of these history tables (Phase 5)
record_bot_spike_counts_snapshot.py:KB-ANCHOR:bot-spike-snapshot-predicateis the same spike-predicate used by/spikesin the demo app — changes need to be applied in both places.