December 19, 2025
Why Stability Matters When Evaluating Goalkeeper Performance

When we talk about goalkeeper performance, analysts often jump straight to outcomes: goals conceded, save percentage, or - when feeling fancy - post-shot xG minus goals allowed. These metrics are intuitive and familiar, but there’s a question we don’t ask often enough: do these metrics actually describe repeatable goalkeeper skill?
One useful way to approach that question is season-to-season stability. If a metric is meant to capture something intrinsic about a goalkeeper - technique, reflexes, or positioning - we would expect it to be at least somewhat consistent from one season to the next. Metrics that fluctuate heavily are more likely reflecting context, randomness, or short-term results rather than underlying skill.
To explore this, I examined the season-to-season stability of several goalkeeper metrics using consistent filters and a 900-minute minimum threshold, focusing on goalkeepers with meaningful playing time in consecutive seasons. The logic is simple: if a metric reflects goalkeeper skill, it should describe roughly the same goalkeeper a season later.
All metrics are calculated at the goalkeeper–season level:
- SFONS% - Percentage of shots on target saved.
- PSxG-GAp90 - PSxG minus goals allowed per 90 minutes.
- SFSAE% - Percentage of shots on target where the goalkeeper received a positive shot-stopping grade.
- SFGKM% - Percentage of shots on target where the goalkeeper received a negative shot-stopping grade.
- SFPM% - Percentage of shots on target where the goalkeeper received a negative positioning grade.
Our grading framework is execution-based, focusing on the quality of the goalkeeper's actions - their positioning, footwork, and technique - rather than the binary outcome of goal or save, which can be determined by inches and luck. I used Spearman's ρ to measure correlation between consecutive seasons. Values closer to 1 indicate strong positive relationships (the metric is stable), while values near 0 suggest little to no relationship. The 95% confidence intervals show the range where we can be reasonably certain the true correlation lies: if the interval includes zero, we cannot confidently say the metric is stable.
Outcome-based metrics such as save percentage and post-shot xG relative to goals conceded show very low season-to-season stability. This is not a flaw in those metrics, but a reflection of how heavily they depend on rare goal events, context, and short-term variance.
In contrast, one of our execution-based process metrics paints a clearer picture: avoiding mistakes emerges as the most repeatable goalkeeper skill. The rate of negative shot-stopping actions (SFGKM%) is the only one whose stability is statistically distinguishable from zero, even with a modest sample of 58 goalkeeper season-to-season pairs.
In our dataset, which includes four Premier League seasons and two Bundesliga seasons, many goalkeepers produce individual seasons with low error rates. Far fewer, however, demonstrate sustained mistake avoidance across multiple campaigns. By that standard, David Raya, Alisson, and Bernd Leno emerge as the strongest performers, consistently recording low SFGKM% values over time.
The takeaway is not that making good saves or outperforming expected goals doesn’t matter. Rather, when evaluating goalkeeper skill, the ability to consistently avoid errors appears to be more stable - and therefore more informative - than producing standout outcomes in a single season.
Stability is not the only criterion for evaluating performance, but it provides a useful lens for distinguishing which signals are most likely to travel with the goalkeeper and which are more dependent on context.

