2026-04-17 · calibration

Calibrating regime thresholds for 10 crypto symbols

Short version: we stopped sharing BTC/ETH thresholds with the other eight, ran the grid sweep per symbol, and recovered 2–6 points of macro-F1 on most of them. LINK gets to 0.45 at the balanced preset, which is the ceiling we'd expect on this framing.

The setup

For each symbol we pulled 180 days of 1-minute klines from Binance (so 259,200 bars), computed 30-bar realized vol, z-scored against a 240-bar rolling baseline, and ran the FSM across 2,304 threshold combinations. Ground truth comes from a smoothed-percentile label over forward-window realized vol — the top and bottom deciles become "high" and "low", everything else "normal".

Macro-F1 (average F1 across the three classes) was the primary metric. Alerts-per-day came along for the ride so we could pick points on the Pareto front and give them human names.

What we used to do

Until this run every symbol used the BTC/ETH-derived preset:

balanced: high_enter=3.0, high_exit=0.5,
          low_enter=-2.5, low_exit=-0.5,
          enter_k=2, exit_k=8

This worked well enough to ship — macro-F1 landed at 0.39 on BTC and ETH — but it bothered us. An eight-symbol shortcut is exactly the kind of compromise a paying quant would notice.

What we found

Symbol	conservative	balanced	aggressive	alerts/day (bal)
BTC	0.37	0.39	0.41	5.6
ETH	0.37	0.39	0.41	5.6
BNB	0.38	0.41	0.42	5.0
XRP	0.40	0.42	0.43	5.2
POL	0.39	0.42	0.44	5.9
SOL	0.39	0.43	0.44	5.0
DOGE	0.40	0.43	0.43	5.1
ADA	0.41	0.44	0.44	5.2
AVAX	0.41	0.44	0.45	5.1
LINK	0.41	0.45	0.45	5.0

A few things worth noting:

BTC and ETH scored the worst. They're the most liquid and the most mean-reverting at the hourly scale, and mean reversion is exactly what makes the regime signal noisier — the vol keeps reverting through the band. The alts don't revert as hard, so their regimes are more sustained, which is easier to detect.
BNB is the quietest. 5.0 alerts/day at balanced, consistent with its lower historical vol. Its preset ended up with enter_k=8 — takes more consecutive observations to trigger — because short-lived spikes dominated the bucket and hurt precision.
LINK tops out at 0.45. Past that, the ground truth starts to matter more than the detector. 0.45–0.50 is roughly the ceiling on this problem given a smoothed percentile ground truth; anyone publishing 0.9+ is labelling with future information.

The threshold shifts themselves

The shared preset had low_enter=-2.5, enter_k=2 — enter the low-vol regime as soon as two consecutive bars crossed a 2.5σ drop. For SOL, XRP, BNB, DOGE, ADA, AVAX, POL, LINK the Pareto-front picks were all low_enter=-2.0, enter_k=8:

A wider threshold (−2.0 vs −2.5) because those markets spend more time at the sides of the distribution. The distribution has heavier tails, so −2.5 is rarer than on BTC/ETH and you were missing real transitions.
A longer confirmation window (8 vs 2) because alt-market microstructure generates more false starts. An alt can wick −2σ on a single bar and come right back; requiring 8 bars filters those.

Counter-intuitively, looser threshold + longer hysteresis ≠ noisier. The two effects balance: you let more candidates in, but you filter harder on sustain. The Pareto front preferred that trade consistently.

How this updates on its own

The sweep reruns every night. A GitHub Action backfills the last 180 days, runs calibration for each symbol, diffs the presets against HEAD, and opens a PR if any F1 moved by more than 0.001. We review, merge, Fly redeploys, and the new presets ship. Your regime API gradually follows the market as it drifts.

If it turns out a different per-timeframe calibration does better (currently we use 1m-fit thresholds for all six timeframes), the sweep will pick that up too — the z-score framing makes comparison across timescales at least reasonable, but there's optimization still on the table.

What it doesn't mean

F1 = 0.45 does not mean your PnL will be 45% higher by listening to Amaneki. The regime label is a description of what the recent realized vol is doing relative to its own recent baseline. Whether that description helps your strategy depends on your strategy.

The honest selling point is that we've done the grind — the calibration, the sweeps, the Pareto picks, the publishing of the honest numbers — and you can now build on top of it instead of doing it yourself. If that saves you more hours than the subscription costs, we're doing our job.

Reach me at [email protected].