How the model’s predictions have compared to actual eBird checklist diversity at Wendy Park. 79 days, 2024-04-26 → 2025-06-05.
Pearson r(predicted, actual) = +0.707 (79 days, days with zero checklists excluded)
Each row is one verdict bucket. Mean / median / range of actual eBird species on days the model placed there. A useful model walks mean species down monotonically as the verdict gets gloomier.
| verdict | n | mean spp | median | range |
|---|---|---|---|---|
| DEFINITELY_GO | 15 | 77.6 | 78.0 | 36–115 |
| GO | 33 | 66.1 | 66.0 | 30–106 |
| MARGINAL | 28 | 43.8 | 37.5 | 17–82 |
| SKIP |
Biggest disagreements between the model and the day’s actual checklist diversity, ranked by standardized residual.
| date | predicted | verdict | actual | amplifiers / vetos |
|---|---|---|---|---|
| 2024-05-18 | 6.74 | GO | 106 | — |
| 2025-04-29 | 5.37 | MARGINAL | 82 | — |
| 2025-05-21 | 4.92 | MARGINAL | 74 | — |
Pearson r between roughly +0.4 and +0.8 means the model orders days correctly more often than not — useful for ranking, not for predicting an exact species count.
Verdict bins are honest if mean species walks downward from DEFINITELY_GO to SKIP. A SKIP bucket that out-births GO is a sign the veto layer is over-eager.
Misses are the right place to look first when something feels off — they’re the days a future calibration pass needs to either explain or absorb.
| 3 |
| 31.7 |
| 34.0 |
| 20–41 |
| 2025-05-22 | 4.67 | MARGINAL | 64 | — |
| 2025-05-23 | 4.83 | MARGINAL | 66 | — |
| date | predicted | verdict | actual | amplifiers / vetos |
|---|---|---|---|---|
| 2024-05-12 | 8.03 | DEFINITELY_GO | 36 | +peak_week_geometry |
| 2025-05-12 | 8.27 | DEFINITELY_GO | 56 | +peak_week_geometry |
| 2025-05-18 | 6.47 | GO | 31 | — |
| 2024-05-16 | 7.84 | DEFINITELY_GO | 52 | +peak_week_geometry |
| 2024-04-26 | 5.36 | MARGINAL | 17 | — |