Question 1

What is TrueSkill and how does it differ from Elo rating?

Accepted Answer

TrueSkill is a Bayesian skill rating system developed by Microsoft Research for Xbox Live matchmaking, designed to address limitations of the traditional Elo system. While Elo uses a single number to represent skill, TrueSkill uses two parameters: mu (the estimated mean skill level) and sigma (the uncertainty or confidence in that estimate). This dual-parameter approach provides several advantages. New players start with high sigma, meaning the system rapidly adjusts their rating until it converges on their true skill. TrueSkill natively supports multiplayer games with more than two players, team-based games, and free-for-all formats, whereas Elo was designed only for one-versus-one matchups. TrueSkill also naturally handles the cold-start problem better, as the high initial uncertainty allows faster convergence for new players without destabilizing ratings of established players.

Question 2

What do the mu and sigma values represent in TrueSkill?

Accepted Answer

In TrueSkill, mu (the Greek letter for mean) represents the systems best estimate of a players true skill level. The default starting value is 25, and it can range theoretically from negative infinity to positive infinity, though practically it stays between 0 and 50 for most players. Sigma represents the standard deviation or uncertainty in the mu estimate. A new player starts with sigma of 8.333, meaning the system is very uncertain about their skill. As a player completes more games, sigma decreases, indicating growing confidence in the rating. The conservative skill estimate, calculated as mu minus three times sigma, gives a 99.7% confidence lower bound on a players true skill. This conservative estimate is what is typically displayed on leaderboards, ensuring that highly uncertain ratings do not place unproven players at the top of rankings.

Question 3

How does TrueSkill calculate win probability between two players?

Accepted Answer

TrueSkill calculates win probability using the Gaussian (normal) distribution properties of the skill estimates. For two players, the probability that Player 1 beats Player 2 is determined by the cumulative distribution function of the skill difference divided by the combined uncertainty. Specifically, P(win) equals the normal CDF of the value (mu1 minus mu2) divided by the square root of (2 times beta squared plus sigma1 squared plus sigma2 squared). Beta is a parameter representing the performance variation in a single game. If both players have identical ratings (same mu and sigma), the win probability is exactly 50%. A larger skill gap (higher mu difference) increases the predicted win probability, while higher uncertainty (larger sigmas) pushes the probability closer to 50%. This probabilistic framework allows the matchmaking system to create balanced matches by pairing players with similar estimated skill levels.

Question 4

How many games does it take for a TrueSkill rating to stabilize?

Accepted Answer

The number of games required for a TrueSkill rating to stabilize depends on several factors, but typically the rating converges to a reasonably accurate estimate within 20 to 50 games for consistent players. During the first 5 to 10 games, the sigma value decreases most rapidly as the system gathers initial data about the player. By about 20 games, sigma typically drops to around 2 to 3, meaning the conservative skill estimate is within about 6 to 9 points of the true estimate. After approximately 46 games against opponents of similar skill, sigma approaches its practical minimum (determined by the tau dynamic factor), and further games produce only minor adjustments. However, if a player genuinely improves or declines in skill over time, the tau parameter ensures sigma never decreases to zero, allowing the system to continue adapting. Players who play against a wide variety of opponents with different skill levels will see faster convergence than those who only play against similar-level opponents.

Trueskill Rating Calculator

Formula

Worked Examples

Example 1: New Player Beats Established Player

Example 2: Evenly Matched Players

Frequently Asked Questions

What is TrueSkill and how does it differ from Elo rating?

What do the mu and sigma values represent in TrueSkill?

How does TrueSkill calculate win probability between two players?

How many games does it take for a TrueSkill rating to stabilize?

References