Methodology

Exactly how the regime label for a given symbol is computed. If anything on this page becomes wrong, it's a bug — tell us.

The pipeline

For each (symbol, timeframe):

Fetch N=320 most-recent OHLC bars from Binance (fallback Coinbase/Bybit for cross-exchange consensus).
Compute log returns: r_t = ln(close_t / close_t-1).
Rolling realized volatility on the last 30 bars: RV_t = stdev(r_t-29..t) × √(bars_per_year). Bars-per-year at the bar interval — annualized for comparability across timeframes.
Rolling baseline: μ_t and σ_t over the last 240 values of RV.
z-score: z_t = (RV_t − μ_t) / σ_t.
FSM state transitions gated by thresholds + hysteresis (see below).

The FSM

Three states — low, normal, high. Starts in normal. Transitions require enter_k consecutive observations past the threshold; exits require exit_k observations back inside. Hysteresis prevents single-bar wicks from flipping the state.

normal → high   when z ≥ high_enter for enter_k bars
normal → low    when z ≤ low_enter  for enter_k bars
high   → normal when z ≤ high_exit  for exit_k  bars
low    → normal when z ≥ low_exit   for exit_k  bars

Calibration

Per symbol, per timeframe, we grid-sweep 2,304 (high_enter, high_exit, low_enter, low_exit, enter_k, exit_k) combinations against 180 days of 1m klines, using a smoothed-percentile ground truth (top/bottom 10% of forward-window RV). For each combination we compute macro-F1 across the three classes and the mean alerts-per-day. The three named presets (conservative, balanced, aggressive) are picks on the Pareto front of (F1, alerts/day).

Per-symbol presets are stored at data/calibration/{symbol}/presets.json and loaded by the service at boot. Nightly CI reruns the sweep and opens a PR if F1 moved more than 0.001 on any preset — see changelog.

Coverage disclosure. The initial 180-day sweep produced per-symbol presets for 10 symbols (BTC, ETH, BNB, XRP, POL, SOL, DOGE, ADA, AVAX, LINK) — the F1 table in the blog post is from that run. The other 10 tracked symbols (SUI, TON, DOT, APT, NEAR, ATOM, OP, ARB, HBAR, LTC) run live on the BTC/ETH-derived defaults until their own sweep ships. Scale-free z-scoring keeps the shared thresholds reasonable, and the next calibration pass will publish their F1 alongside the existing ten.

Alternative vol estimators

Alongside the close-to-close RV that drives the FSM, each /v1/regime/{symbol} response includes two intraday-efficient estimators computed over the same 30-bar window:

parkinson_vol — √( Σ ln(H/L)² / (4 N ln 2) ), annualized. Uses the full bar range; 5× more statistically efficient than close-to-close but assumes no drift inside the bar.
yang_zhang_vol — Yang & Zhang (2000): combines overnight variance, open-to-close variance, and Rogers-Satchell. Robust to both drift and open/close gaps.

These are informational — useful for anyone who wants to sanity-check the RV signal or build their own thresholds off a more efficient estimator. The FSM deliberately stays on close-to-close because it's the broadly understood benchmark and every venue we read reports it cleanly. Values come through as null when the bar source (e.g. Coinbase fallback) didn't provide OHLC.

Cross-exchange consensus

A second engine polls Binance, Coinbase and Bybit independently, computes a stateless (no FSM hysteresis) regime label per venue, and reports the majority. When the three agree you're seeing structural vol; when they diverge, one venue has a book anomaly.

Honest limits

Ground truth is a smoothed percentile, not a physical truth. F1 has a ceiling around 0.5 on this framing. Published numbers (0.41–0.45 on balanced) are per-symbol from the 180-day sweep.
The FSM's output distribution intentionally differs from ground truth. Over the last 90 days on BTC 15m we observe roughly low 1% / normal 92% / high 7%, vs the 10/80/10 ground-truth labelling. Hysteresis (the enter_k/exit_k consecutive-bar requirement) suppresses short spikes on purpose, so the FSM spends less time in the extreme classes than a classifier trained to recover 10/10 exactly would. That's a design trade-off: we accept lower recall on the extremes in exchange for fewer false-positive alerts. The F1 target is macro over three classes, so a calibrator that optimised for matching the 10/10 split would raise class-level recall but drop precision and overall F1.
Gap handling in the backfill. Binance's 1m klines occasionally have gaps — maintenance windows, listing-day churn, rare API hiccups. Over 90 days of BTC 15m the archive shows ~8,581 bars against an expected 8,640 (≈0.7% missing). Our policy is skip: a missing bar is dropped from the rolling window entirely, so the 30-bar realized-vol rolling window may briefly see 29 bars instead of 30 around a gap. We do not forward-fill or interpolate because either would inject fake returns into the RV calculation, which is exactly the kind of synthetic bar we refuse to generate elsewhere. The FSM state at the gap boundary is whatever the last observed bar produced; if a gap is long enough to span an enter_k/exit_k streak the hysteresis counters stall but do not reset. Every event in the archive therefore has a real exchange bar behind it.
Detection lag median is around 20 minutes — the hysteresis cost. You can trade lag for noise by using aggressive (or your own /regime/{sym}/custom call).
We do not forecast the direction of price. The regime label is a description of what the vol is doing now.
Per-symbol calibration is fit on 180d of 1m. It works for all six timeframes because annualization makes the z-score roughly scale-free, but it's not re-optimized for 4h/1d yet.
Ground truth is arbitrary. The 10/90 percentile cutoff was chosen because it matches common vol-regime conventions; it's not derived from data. Tighter cutoffs (5/95) produce higher F1 on rare transitions but fewer labels. data/calibration/{symbol}/presets.json in the repo records the exact thresholds in use, and every calibration rerun updates them.
No walk-forward validation yet. Presets are fit on the full 180-day window, not on a rolling train/test split. This is an acknowledged overfit risk — the next calibration pass (already tracked in the changelog) will switch to an expanding-window scheme.
F1 0.45 as ceiling is a claim, not a proof. We haven't yet run the grid against HMM / MS-GARCH baselines to confirm 0.45 isn't just the FSM class's ceiling. That benchmark is on the roadmap; we'll publish the comparison once it's done.

Change log

2026-04-17

Per-symbol calibration shipped

Until this date every symbol ran on BTC/ETH-fit thresholds. After the sweep each of the remaining eighteen gets its own thresholds, lifting balanced F1 by 2–6 points for alts (LINK highest at 0.45).

2026-04-17

4h / 1d timeframes added

Shares the same thresholds as the shorter timeframes for now. Per-tf calibration is on the list; the scale-free z framing makes the shared values reasonable in the meantime.

2026-04-16

Cross-exchange consensus

Added Coinbase and Bybit to the poller. Stateless classification only — hysteresis would require tracking FSM state per venue, which is on the list.

2026-04-16

Initial calibration

BTC/ETH swept over 180 days, three Pareto-front presets published. F1 0.37 / 0.39 / 0.41 at conservative / balanced / aggressive.