The Football Forecast relies on Opta data via fbref as its foundation, focusing on two core metrics: GxG and GAxGA . These figures blend actual performance with underlying expected values, including xG and npxG (non-penalty expected goals). For more information regarding Expected Goals, check out my blog post: xG Explained: What Are Expected Goals?
This approach means that even when a team appears to lose “undeservedly” its strength rating can still rise and therefore improving its projected position in the table. After each match the home and away team get assigned an initial GxG and GAxGA value purely based on raw stats, ignoring key factors like opponent strength and home advantage. In the next step, these values are adjusted, by the leagues typical home advantage and by the opponent’s strength.
These adjustments drastically influence how analysts interpret results. They transform a raw GxG value of 1 away at Arsenal’s Emirates Stadium into an adjusted value of 1.82, while scoring 1 GxG against Southampton at home reduces the adjusted figure to about 0.6. This highlights how much easier it becomes to score against weaker defenses.
Refining the Numbers
After the calculation of the GxG and GAxGA values, an exponential moving average (EMA) is applied to produce trailing values. So, all the games are being considered for each team, with the most recent games having the highest weight. This ensures, that the model stays up to date with a team’s current form.
The latest values are taken from each team and are getting normalized around 1. For example, a team with a GxG ratio of 0.9 and a GAxGA ratio of 0.9 means that the team scores 10% less than the current league average, however, also concedes 10% fewer goals than the current league average. Union Berlin provides an interesting case study: with a GxG ratio of around 0.59, they are dead last in the Bundesliga, however they have the 4th best defense, balancing things out to create a stable mid-table projection.
With the final ratio values in place, simulations for the remaining league games can begin.
Simulation Process
Once the final offensive and defensive values are assigned to each team, they are plugged into a Monte Carlo simulation. Each league is simulated 20,000 times using a Poisson distribution. The number 20,000 provides a solid balance, generating reliable projections without excessive computing times.
The image below illustrates how a Poisson distribution appears in practice, showing how different probabilities are assigned to potential goal outcomes based on a given scoring rate.
After running the simulations, results are compiled for every team. This makes it possible to determine probabilities of winning the title, qualifying for the UEFA Champions League, or facing relegation. At the beginning of the season, everything is still possible, with multiple contenders for every scenario. As the season progresses and matchdays unfold, these probabilities become more and more precise.
Football League Forecast & Prediction | StatsUltra