02 / Calibration Scorecard

Checking our work — misses included

Every quarter, every probability claim this model made is replayed against realized outcomes. Sample sizes appear next to every number; the misses are printed, not buried.

0.158

Brier score

−2.0%

Brier skill vs base rate

80.8% n=47

80% band, observed

95.7% n=47

95% band, observed

Walk-forward residuals

A / Reliability

Claimed probability vs reality

Reliability diagram

Interval coverage

Outcomes are leave-one-out band hits, so no residual is ever judged by a quantile it helped set. Base rate 0.809; reference Brier 0.155.

Forecast bin	Mean predicted	Observed freq	n
0.6 – 0.7	0.676	1.000	3
0.7 – 0.8	0.764	0.762	21
0.8 – 0.9	0.820	0.826	23

B / IPO validation

IPO class — 2026-H1

Held-out outcomes: marks frozen the day before each debut, then compared against the first public print open. Verdicts are interval-honest: a wide band that covers a big move is 'covered', not 'hit'.

Company	IPO	Final mark	80% interval	IPO open	Error vs open	Last-print baseline	Verdict
Cerebras Systems	2026-05-14	$8.1B	[$3.8B – $17.1B]	$70B	−88.4%	−88.4%	MISS

Why we publish the misses

5 of the validation cases missed badly — including Circle, Chime, Figma, Klarna, Cerebras Systems. Each one is a case where public prints alone were stale or wrong, which is precisely the gap market-derived inputs close. Read the case study →

C / Model disclosure

Every fitted constant, in the open

Model

Recency half-life: 90d — selected by walk-forward MAPE: 90d → 39.9% · 180d → 43.0% · 270d → 44.5%
Conformal quantiles: q80 = 0.646, q95 = 0.991
Width rule: band = mark * exp(+/- q * w), w = 1 + 0.25 * staleness_days / 365
Confidence rule: clamp(0.45, 0.95, 0.80 + 0.32*(mean_quality - 0.75) - 0.18*min(staleness/730, 1) + 0.04*min(n_inputs, 5)/5)
Residual skew: 97.9% of next prints landed above the prior mark — disclosed, not hidden.

Caveats

Illustrative backtest on a curated public-events dataset; no live marks are published.
Calibration pool is small (n=47 walk-forward residuals pooled across companies and time); pooling weakens the exchangeability assumption behind the conformal guarantee, so coverage is reported empirically.
The recency half-life is the model's single fitted constant, selected by walk-forward error on the same window and disclosed in full below.
IPO validation is a handful of individually narrated cases, not a statistical sample; misses are published alongside hits.
Signed residuals are one-sided in this window (98% printed above the prior mark): private valuations trended up, so the log-symmetric band is conservative on the downside. The skew is published, not hidden.

Scorecard figures are computed from a backtest replay of public events. Indicative valuations, not transactable prices. Underlying assets are illiquid; inputs are limited to publicly reported events with source attribution. Pegasus Three Sixty and SpearHead are an information-only valuation product, do not hold client balances, and do not provide investment recommendations.