02 / Calibration Scorecard
Checking our work — misses included
Every quarter, every probability claim this model made is replayed against realized outcomes. Sample sizes appear next to every number; the misses are printed, not buried.
A / Reliability
Claimed probability vs reality
Reliability diagram
Interval coverage
Outcomes are leave-one-out band hits, so no residual is ever judged by a quantile it helped set. Base rate 0.821; reference Brier 0.147.
| Forecast bin | Mean predicted | Observed freq | n |
|---|---|---|---|
| 0.6 – 0.7 | 0.676 | 1.000 | 3 |
| 0.7 – 0.8 | 0.763 | 0.778 | 18 |
| 0.8 – 0.9 | 0.818 | 0.833 | 18 |
B / IPO validation
IPO class — 2025-FY
Held-out outcomes: marks frozen the day before each debut, then compared against the first public print open. Verdicts are interval-honest: a wide band that covers a big move is 'covered', not 'hit'.
| Company | IPO | Final mark | 80% interval | IPO open | Error vs open | Last-print baseline | Verdict |
|---|---|---|---|---|---|---|---|
| CoreWeave | 2025-03-28 | $22.1B | [$10.9B – $44.5B] | $22.4B | −1.5% | +2.7% | HIT |
| Circle | 2025-06-05 | $9B | [$2.9B – $28.3B] | $15.4B | −41.6% | −41.6% | COVERED |
| Chime | 2025-06-12 | $25B | [$7.1B – $87.8B] | $15.7B | +59.2% | +59.2% | COVERED |
| Figma | 2025-07-31 | $12.5B | [$5.4B – $28.8B] | $50B | −75.0% | −75.0% | MISS |
| Klarna | 2025-09-10 | $8.5B | [$2.7B – $26.7B] | $19.6B | −57.0% | −65.9% | COVERED |
Why we publish the misses
4 of the validation cases missed badly — including Circle, Chime, Figma, Klarna. Each one is a case where public prints alone were stale or wrong, which is precisely the gap market-derived inputs close. Read the case study →
C / Model disclosure
Every fitted constant, in the open
Model
- Recency half-life: 90d — selected by walk-forward MAPE: 90d → 39.9% · 180d → 43.0% · 270d → 44.5%
- Conformal quantiles: q80 = 0.642, q95 = 0.978
- Width rule: band = mark * exp(+/- q * w), w = 1 + 0.25 * staleness_days / 365
- Confidence rule: clamp(0.45, 0.95, 0.80 + 0.32*(mean_quality - 0.75) - 0.18*min(staleness/730, 1) + 0.04*min(n_inputs, 5)/5)
- Residual skew: 97.4% of next prints landed above the prior mark — disclosed, not hidden.
Caveats
- Illustrative backtest on a curated public-events dataset; no live marks are published.
- Calibration pool is small (n=39 walk-forward residuals pooled across companies and time); pooling weakens the exchangeability assumption behind the conformal guarantee, so coverage is reported empirically.
- The recency half-life is the model's single fitted constant, selected by walk-forward error on the same window and disclosed in full below.
- IPO validation is a handful of individually narrated cases, not a statistical sample; misses are published alongside hits.
- Signed residuals are one-sided in this window (97% printed above the prior mark): private valuations trended up, so the log-symmetric band is conservative on the downside. The skew is published, not hidden.
Scorecard figures are computed from a backtest replay of public events. Indicative valuations, not transactable prices. Underlying assets are illiquid; inputs are limited to publicly reported events with source attribution. Pegasus Three Sixty and SpearHead are an information-only valuation product, do not hold client balances, and do not provide investment recommendations.