WillWin · Method ← Return home
A technical walk-through

Method

How a single language model, reading ten public sources, becomes a number on this page.

The nightly pipeline

Every night at 04:30 UTC a single cron job runs end-to-end. No human touches it. No cloud API is called. The only writes are a commit to GitHub and an rsync to the web root.

WillWin nightly pipeline diagram 1 · PUBLIC SOURCES 2 · PROMPT 3 · MODEL · 5 RUNS 4 · AVERAGER FIFA ranking inside.fifa.com Elo eloratings.net Fixtures openfootball Squads football-data.org Form / lineups TheSportsDB Opta prior theanalyst.com Outright odds Pinnacle / Oddschecker News DuckDuckGo / Brave Squad notes Wikipedia infoboxes History history.json (own) Prompt builder 48 teams · 48 rows ~8 KB structured JSON schema qwen3-32b · run 1 qwen3-32b · run 2 qwen3-32b · run 3 qwen3-32b · run 4 qwen3-32b · run 5 temperature 0.3 · ctx 8192 JSON-mode · schema-validated Averager pooled mean renormalized Σ=1 weekly deltas 5 · PUBLISH (NO HUMAN IN THE LOOP) Commit · Push · Build · Rsync current.json · history.json[+1] · git push origin main · pnpm build · rsync -a --delete dist/ → /var/www/html/willwin/ CADENCE 04:30 UTC nightly cron — ~8 minutes end-to-end — fully idempotent — failing sources degrade, never block
Figure 1 · The full nightly pipeline, from public sources to the page you're reading.

Step by step

  1. 01

    Fetch the public record

    The job begins by pulling the latest monthly FIFA ranking, the daily Eloratings.net table, the openfootball fixtures JSON, football-data.org squad announcements, TheSportsDB recent results, the most recent public Opta Analyst article, a handful of European bookmaker outrights, DuckDuckGo news snippets for each qualified team, and Wikipedia infoboxes for squad confirmations. Each source is fetched in parallel with a strict timeout. If a source is down, the pipeline continues without it and flags the degradation in the prompt.

  2. 02

    Assemble a structured prompt

    The prompt is not free text. It is a table: one row per qualified team with columns for FIFA rank, Elo rating, manager, last-5 results, squad notes, news headlines, Opta's published probability, and the consensus bookmaker price. A short instruction block explains the tournament format and asks the model to return a strict JSON object matching a fixed schema. The prompt is capped at roughly 8 KB so the context window is never saturated.

  3. 03

    Run the model five times

    The prompt is sent, five times independently, to qwen3-32b — a 32-billion-parameter open-weight model from the Qwen team. We use temperature 0.3 and a 8,192-token context window. The model is run in strict JSON mode so every output conforms to the same schema. Each run takes roughly 70 seconds on a single consumer GPU.

    Why five? Enough to smooth out sampling noise without spending the entire night on one forecast. Three feels unstable on mid-ranked teams; ten has hit diminishing returns.

  4. 04

    Average and renormalize

    For every team, we take the simple arithmetic mean of the five probabilities emitted by the model. The five vectors are then pooled and renormalized so the 48-team distribution sums to 1.0 exactly. This last step is important: any individual run can emit a distribution that is 3% over or under, and without renormalization the headline numbers would drift.

    The one-liners, dark-horse theses, and lede are taken from the highest-probability run rather than averaged, because averaging natural-language strings is nonsense.

  5. 05

    Append to history · compute deltas

    The averaged distribution is appended to history.json with today's date. The last sixty days are kept; everything older is dropped. Week and month deltas on the homepage come from this file: today's value minus the value from seven and thirty days ago, expressed in percentage points.

  6. 06

    Publish

    The updated JSON is committed to the public git repository, the static Astro site is rebuilt, and the output is rsynced to the Apache web root. There is no human in the loop and no cloud-hosted API anywhere on the path between the data and the page. If you reload the homepage a minute after the cron runs, you are reading the exact output of that job.

How the odds are generated

A language model is not a Monte-Carlo simulator. It does not run a million virtual tournaments. So how can it emit a probability at all? The honest answer is that the number represents the model's calibrated confidence, grounded in the public data it just read, that a given team will lift the trophy on July 19, 2026. The model is instructed to output a number between 0 and 1 for every qualifier, such that the 48 numbers sum to 1.

In practice the model anchors itself on the inputs it was given and nudges the distribution from there:

None of this is a simulation. It is a language model doing its best impression of one, anchored to real data. The output is a useful editorial artefact and a decent sanity check against the market — nothing more. It is not betting advice, and it is not affiliated with FIFA or any of the sources above.

The full source list

  1. 01 · openfootball/worldcup.json
  2. 02 · football-data.org
  3. 03 · TheSportsDB
  4. 04 · inside.fifa.com (rankings)
  5. 05 · eloratings.net
  6. 06 · Opta Analyst (public)
  7. 07 · Pinnacle / Oddschecker
  8. 08 · DuckDuckGo news
  9. 09 · Wikipedia infoboxes
  10. 10 · qwen3-32b (open weight)