Spearhead
02 / Calibration Scorecard

Checking our work — misses included

Every quarter, every probability claim this model made is replayed against realized outcomes. Sample sizes appear next to every number; the misses are printed, not buried.

2025-FY2026-H1WINDOW 2025-01-012025-12-31DATA: FIXTURE | BACKTEST
0.152
Brier score
−3.5%
Brier skill vs base rate
82.0% n=39
80% band, observed
97.4% n=39
95% band, observed
39
Walk-forward residuals
A / Reliability

Claimed probability vs reality

Reliability diagram

0.00.00.20.20.40.40.60.60.80.81.01.0p_hat 0.68 -> observed 1.00 · n=3p_hat 0.76 -> observed 0.78 · n=18p_hat 0.82 -> observed 0.83 · n=18PREDICTED PROBABILITY · n=39OBSERVED FREQUENCY

Interval coverage

nominal 80%80% NOMINAL -> 82.0% OBSERVED · n=39nominal 95%95% NOMINAL -> 97.4% OBSERVED · n=39

Outcomes are leave-one-out band hits, so no residual is ever judged by a quantile it helped set. Base rate 0.821; reference Brier 0.147.

Forecast binMean predictedObserved freqn
0.60.70.6761.0003
0.70.80.7630.77818
0.80.90.8180.83318
B / IPO validation

IPO class — 2025-FY

Held-out outcomes: marks frozen the day before each debut, then compared against the first public print open. Verdicts are interval-honest: a wide band that covers a big move is 'covered', not 'hit'.

CompanyIPOFinal mark80% intervalIPO openError vs openLast-print baselineVerdict
CoreWeave2025-03-28$22.1B[$10.9B – $44.5B]$22.4B−1.5%+2.7%HIT
Circle2025-06-05$9B[$2.9B – $28.3B]$15.4B−41.6%−41.6%COVERED
Chime2025-06-12$25B[$7.1B – $87.8B]$15.7B+59.2%+59.2%COVERED
Figma2025-07-31$12.5B[$5.4B – $28.8B]$50B−75.0%−75.0%MISS
Klarna2025-09-10$8.5B[$2.7B – $26.7B]$19.6B−57.0%−65.9%COVERED
Why we publish the misses

4 of the validation cases missed badly — including Circle, Chime, Figma, Klarna. Each one is a case where public prints alone were stale or wrong, which is precisely the gap market-derived inputs close. Read the case study →

C / Model disclosure

Every fitted constant, in the open

Model

  • Recency half-life: 90d — selected by walk-forward MAPE: 90d → 39.9% · 180d → 43.0% · 270d → 44.5%
  • Conformal quantiles: q80 = 0.642, q95 = 0.978
  • Width rule: band = mark * exp(+/- q * w), w = 1 + 0.25 * staleness_days / 365
  • Confidence rule: clamp(0.45, 0.95, 0.80 + 0.32*(mean_quality - 0.75) - 0.18*min(staleness/730, 1) + 0.04*min(n_inputs, 5)/5)
  • Residual skew: 97.4% of next prints landed above the prior mark — disclosed, not hidden.

Caveats

  • Illustrative backtest on a curated public-events dataset; no live marks are published.
  • Calibration pool is small (n=39 walk-forward residuals pooled across companies and time); pooling weakens the exchangeability assumption behind the conformal guarantee, so coverage is reported empirically.
  • The recency half-life is the model's single fitted constant, selected by walk-forward error on the same window and disclosed in full below.
  • IPO validation is a handful of individually narrated cases, not a statistical sample; misses are published alongside hits.
  • Signed residuals are one-sided in this window (97% printed above the prior mark): private valuations trended up, so the log-symmetric band is conservative on the downside. The skew is published, not hidden.
Scorecard figures are computed from a backtest replay of public events. Indicative valuations, not transactable prices. Underlying assets are illiquid; inputs are limited to publicly reported events with source attribution. Pegasus Three Sixty and SpearHead are an information-only valuation product, do not hold client balances, and do not provide investment recommendations.