What is a Monte Carlo simulation?

A Monte Carlo simulation tests thousands of plausible versions of an uncertain outcome instead of claiming to know the one true result. For the World Cup, the model plays the whole tournament out repeatedly, resolving every match, group-stage permutation and knockout bracket, then reports how often each team wins across all the runs.

How many simulations do you need for a reliable forecast?

It depends on the model. The World Cup forecast settles into a stable range at around 5,000 runs, where the first version's 1,000 runs still swung between batches. A more complex model might need 100,000 runs or a million. The goal is a result stable enough to support a useful decision, not the largest possible number.

Do I need a World Cup dataset to build what-if scenarios?

No. Football just makes the idea easy to follow. The same pattern works on energy use, revenue and margin, or benchmarking data. The underlying data does not need to contain probabilities; the probabilities come from the model placed on top of it.

How long does it take to build something like this?

The move from Luzmo's Euro 2024 predictor to a World Cup version with interactive scenarios took around five days. AI-assisted development sped up the build, while a calculation engine ported to Rust handled the heavy work, running 5,000 full tournaments in roughly two seconds.

Who wins the World Cup? We simulated it over 7 million times

Midway the last day of the group stage of World Cup 2026, our baseline forecast had France beating Argentina in the final. England lost the third-place match to Spain.

That is the safe prediction.

The more interesting part starts when someone changes an assumption.

What happens if the USA can select any player currently playing in MLS, rather than only US nationals? In a recent Luzmo office-hours session, Haroen tested exactly that. One scenario change later, the USA wins Group D, reaches the semi-finals, loses to France, and moves into third place for overall tournament win probability.

Its chance of winning rises by 11 percentage points. No other team moves as much.

That is what makes the World Cup simulator more than a prediction tool. It gives people something to question, change and explore. Instead of accepting one forecast, they can test the conditions behind it.

The project grew out of a smaller Euro 2024 predictor, complete with an AI octopus calling results. For World Cup 2026, the main feature is not the mascot but the ability to rerun an entire tournament almost instantly after someone asks a what-if question.

And that idea reaches far beyond football.

It starts with one match

Everything begins with a single fixture.

Take USA versus Australia, the USA's second group-stage match. In the baseline model, the USA has a 55% chance of winning. A representative scoreline is 2–0 (just like the actual score!).

That percentage comes from a mix of team-level and player-level data.

For every squad, the model looks at roughly two years of performance data for selected players across national-team and club matches. The exact signals depend on the position. A defender's contribution is not judged in the same way as an attacker's. The model considers things like defensive actions, passing quality, assists, scoring output and expected minutes.

It also adjusts for what happens during the tournament itself. A player who has started every match may carry more fatigue. An injury can reduce availability. Travel, altitude and conditions can affect how a team performs.

Those individual inputs roll up into offensive and defensive ratings for each team.

The scoring model reflects a simple football truth. A single weak defender can cost a side a goal. One exceptional attacker can decide a match with one moment of quality.

So the model gives extra weight to the weakest defensive link and the strongest attacking one.

Once both teams have a rating, the system estimates expected goals. An xG score of two does not mean a team will score exactly twice. It means that across many versions of the same match, two goals is the average outcome.

Some simulations end 0–0. Others end 3–1. A few become ridiculous.

That is football.

Why the tournament needs simulation

One match's outcome can be modelled with a relatively simple statistical distribution, but a full World Cup cannot.

There are group-stage permutations, injuries, travel effects, third-place qualification rules and knockout brackets that depend on results elsewhere. A single early upset can change the path for several teams.

Trying to reduce all of that to one tidy calculation would miss too much.

So the model does something more practical: it plays the tournament out repeatedly.

Each run simulates every match, moves teams through the group stage, resolves qualification rules, fills the knockout bracket and produces a winner. Repeat that enough times, and the results start to converge and reveal the pattern.

France might win 22% of simulated tournaments. England might win 17%. Spain might reach the semi-finals more often than Argentina.

That is a Monte Carlo simulation: rather than claiming to know the one true future, it tests thousands of plausible versions of it.

For the Euro 2024 predictor, a full run took five to ten minutes for 5,000 to 10,000 iterations. That was fine for a static dashboard, but it wasn't fast enough for someone waiting for an answer to a new scenario.

For the World Cup version, Haroen ported the core calculation engine to Rust, the same technology behind Luzmo's query engine.

Luzmo Office Hours: World Cup 2026 AI Dashboard with Haroen

Now the system can simulate 5,000 full tournaments, covering 104 matches each, in roughly two seconds.

That changes the experience completely.

A user doesn't need to submit a request and come back later. They can ask a question, rerun the tournament and continue the conversation while the idea is still interesting.

Why 5,000 runs is enough

More simulations usually mean more stable results. The first few hundred runs can vary sharply. One batch may make England look stronger than France, but the next might reverse the order.

At a certain point, the results stop moving enough to matter.

The first version of the predictor used 1,000 simulations. That still produced visible swings between runs. At 5,000, the probabilities settled into a stable range for this model.

That does not make 5,000 a universal rule.

A more complex model may need 100,000 runs. Another may need a million. The right number depends on the question, the number of variables and the level of confidence people need from the result.

In some cases, the engine could even stop dynamically. Once the difference between runs falls below an agreed threshold, there is little reason to keep calculating.

The important part is not chasing the largest possible number. It is reaching a result stable enough to support a useful decision.

A forecast that changes when reality does

The World Cup model is not frozen on opening day.

Team strength starts with an Elo rating, similar to the system used in chess. Beat a stronger opponent and your rating rises more. Beat a weaker side and it barely changes. Lose unexpectedly and the drop is sharper.

Haroen combines that team rating with player-level performance data at roughly a 40/60 split.

The player data matters because it keeps scenarios grounded.

A team rating alone can tell you that France is strong. It cannot reliably explain what happens when a specific player joins a different squad, misses a match or returns from injury.

The model can.

It also accounts for factors that matter in this tournament. Some matches take place at high altitude. Mexico City sits more than 2,000 metres above sea level, which can affect teams that are not acclimatised. Heat matters. So does travel. Host nations also receive a home-advantage adjustment.

None of those factors need to remain fixed.

They are inputs to the model. Someone can reduce fatigue effects, change player availability, adjust home advantage or test a different squad.

The system reruns the tournament with those changes in place.

The underlying source data also refreshes daily. Elo ratings, injuries, player availability and actual match results all feed back into the baseline forecast. After Spain drew with Cabo Verde, the model reduced Spain's outlook and France moved into first place.

That is what makes the forecast feel alive rather than decorative.

Turning a question into a scenario

The interactive layer uses a small AI agent with a focused set of tools.

A user writes a plain-English question. Something like:

"What would happen if the USA could select any MLS player?"

A language model (we chose OpenAI for this app) interprets that request and creates a scenario patch: a small set of changes to the main source data. In the MLS example, that included named players moving into the squad, plus an adjustment to morale.

The system applies those changes, reruns the model and compares the outcome with the original forecast.

Then it explains the difference.

In the MLS scenario, the USA does not simply become "better." The system can show where the change matters most. It highlights matches against Australia and Turkey as the biggest probability shifts. It can trace the USA's projected route through the tournament. It can also expose the underlying scenario data for users who want to inspect it.

That is an important detail.

The AI doesn't invent a narrative around a fixed answer. It changes controlled inputs, triggers a real calculation, then explains the result. That leaves room for curiosity without turning the experience into a black box.

It also means the team did not need to predict every possible question in advance. Give people the right data, a fast model and a way to ask natural questions, and they will find the interesting corners themselves.

As Haroen put it, users are often the ones who make it fun and discover the limits of what is possible.

You don't need a World Cup dataset

Football makes the idea easy to understand because the data is familiar. Players, goals and tournament brackets give people something concrete to follow.

But the same pattern works with much less glamorous datasets.

Imagine a company that tracks energy use across buildings. It already knows how much electricity, heating and occupancy affect emissions. Add a calculation model, and customers can test questions like:

"What happens if we reduce office use to four days a week?"

Or take a finance platform. A customer may already see revenue, operating costs and gross margin. A sensitivity model could show how margin changes when revenue comes in 5% lower than planned.

That is more useful than a static dashboard because the customer can test the pressure points themselves.

Benchmarking opens another route. A platform with enough anonymised customer data could show where each client sits against comparable businesses. Not vague claims about "above average," but a view of whether they land in the top 10%, middle range or lower end for a given metric.

That can become part of the product, not just a report.

The underlying data does not need to contain probabilities. The probabilities come from the model placed on top of it.

The real payoff is trust

Most business forecasts come with a disclaimer.

"Give or take 5%."

"It depends on market conditions."

"Results may vary."

Those statements are often true. They are also not very helpful.

A model can show what the uncertainty actually looks like.

It can show a range of outcomes, reveal which inputs create the biggest swings and make assumptions visible. That gives customers something far more useful than a single number with a vague caveat attached.

They can see what changes the result.

They can challenge it.

They can understand where confidence comes from.

That tends to build more trust, not less. Especially when the model doesn't pretend to predict the future perfectly.

How to start

The World Cup simulator didn't take months of planning.

Haroen said the move from the Euro predictor to a World Cup version with interactive scenarios took around five days. AI-assisted development helped speed up the build, while the calculation engine handled the heavy work.

The dataset may be different for your company, but the starting point is similar.

Look at the data you already collect. Find one question customers ask repeatedly. Then ask whether a small model could turn that answer into something people can explore. If you want to talk it through, book a demo.

It could be a sensitivity analysis. A benchmark. A forecast that reacts to changing inputs. A planning tool that makes trade-offs clearer.

Start small.

The first World Cup version did not include every feature users now see. The bracket view and "most surprising scenarios" panel came later, after people began asking for them.

That is usually the right order.

Build something useful. Put it in front of users. Let their questions show you what to add next.

The World Cup simulator happens to be about football. The larger idea is about giving people a way to play with the data that already matters to them.

And sometimes the best product insight starts with someone asking a slightly ridiculous question.