With Game 5 of Team WE vs. Cloud9 complete, the Quarterfinals stage of the 2017 World Championship is over. The tournament clocks in at 107 games so far, and it’s clear which champion is strongest: Kalista is the only champion picked or banned in every single game thus far, a feat achieved by only one champion in each of the World Championships since 2013. But there are twenty champions in each game – ten picked and ten banned – and correctly choosing the other nineteen goes a long way towards winning a game.
While many analysts, broadcasters, and statistics websites use statistics like presence (pick+ban%), games played, and win rate to rank champions, none of these measurements truly capture the power level of a champion. As an alternative, we can use a Binomial Proportion Wilson Score Interval to attempt to evaluate win rates and adjust them, in order to find the “best” champions at Worlds 2017.
Binomial Proportion Wilson Score Interval
For those familiar with statistics, the terms “normal distribution” and “confidence intervals” should ring a bell. These estimators are used to find what the true average might be. Binomial proportion is similar, but is based not the normal distribution, but on binomial distribution, where only two values are possible for each item.
In League of Legends, a champion can only win or lose. There are no ties, no 1.5 wins, and no two losses in a game. That said, champions can also be banned or not even picked at all. It is because champions can be banned or not even picked that we should judge the value of a champion based on its Wilson score instead of its presence, games played, or win rate.
Wilson score intervals provide a range within which the true value (what we could call “expected win rate”) is likely to fall, because it accounts for such things as sample size, average, and desired confidence level. Wilson score presents itself similarly to a confidence interval for our purposes, since it features:
- Bias towards 50%: Wilson score interval adjusts every binomial average towards 50%, sometimes to a fault. It works to our advantage since LoL is a zero sum game. For every game a team wins, another team must lose.
- Range from 0% to 100%: normal distributions will return a value above 100% or below 0% if given a large enough z-score*, whereas Wilson score intervals will always be between 0% to 100%. Since a champion cannot have a win rate below 0% or above 100%, higher confidence levels can be used without adjustments.
*z-score is a measure of confidence level based on the standard deviation
For this article, a confidence level of 95% was chosen, which equates to a z-score of 1.96.
Building a Model
The next thing to do is compare multiple methods of applying Wilson score intervals to the 2017 World Championship pick and ban rates to see which approach produces the most accurate ranking of Champions.
Method 1: Picks Only
In this approach we calculate the lower and upper bounds of our Wilson score interval using only actual win rate and number of games played. The table is sorted in descending order by lower bound.
|Champion||Picks||Win rate||Lower Bound||Upper Bound|
This approach immediately fails the eye test: Kalista is ranked at 12th, tied with Malzahar. Kalista is a must-ban champion on Red Side and there is no way she should be ranked at the same level as Malzahar.
This is happening because when we only use picks, the lower number of games equates to a larger range for where the true win rate might lie. Therefore, the lower bound is a smaller number. Note that every champion rated above Kalista has a win rate lower than hers. The larger sample size makes the estimated range tighter, and thus, the lower bound is higher than hers.
To rectify this, we should try including bans in the formula.
Note: From here on out, we will only be interested in the lower bound, since high win rates produce upper bounds that are very close to 100%, which doesn’t tell us much. We will no longer point out that as sample size increase, the range between upper bound to lower bound becomes tighter.
Method 2: Picks and Bans
|Champion||Picks+Bans||Win rate||Lower Bound|
When the champions’ bans are accounted for, Kalista jumps to the top. Perfect.
Singed, however, jumps to 8th just by adding five more bans, and Trundle sits 16th with fifteen bans. At such a low sample size, increasing the sample by a small amount will significantly increase the lower bound. Aurelion Sol’s lower bound leaped from 34.2% to 51.0%. Similarly, Trundle’s lower bound increased from 31.3% to 40.7%.
This highlights a big problem: selection bias.
Only twenty of the strongest champions are either picked or banned each game. This model implicitly claims that any champion that is not picked or banned is not felt to be strong enough by either team and should be penalized accordingly. That is not necessarily the case, since picks and bans are very contextual, and depend on what has happened in the draft up until that point.
It should also be noted that the ratio of picks to bans varies wildly from champion to champion. Jarvan IV has been picked 48 times and banned 57 times, while Twitch has been picked 20 times and banned only 9 times. Thus Twitch’s higher win rate does not provide full context. In most scenarios where Twitch is considered, he is more likely to be picked than banned, while Jarvan IV warrants bans at a much higher rate than he is picked. Thus, it makes sense to reward champions for being picked more than their win rate might suggest they should. Bans should carry more weight than picks, as picks are the result of specific champions being banned beforehand.
Method 3: Adjusted Winrates with Picks and Bans
Here we introduce adjusted win rate, which is a custom formula based on the number of games picked, banned, and neither picked or banned.
The formula for adjusted win rate makes the following assumptions:
- A champion is banned when it is expected to perform better than current win rate. It is a pretty safe assumption that champions are banned in a draft when they are expected to outperform their normal expectations. Thus the average of current win rate and 100% is assigned as a theoretical win rate for games when a champion is banned.
- A champion is not picked or banned when it is expected to perform worse than current win rate. As we discussed selection bias earlier, a champion’s theoretical win rate should be lower in games where it is not considered. We make the assumption that a champion with 100% win rate is likely to have a 40% win rate in an ill-fitting scenario and arrive at dividing by 2.5 by simple linear interpolation.
With the new adjusted win rates, and using total number of picks and bans for each champion as the sample size, the final result is below.
|Champion||Picks+Bans||True Win rate||Adjusted WR||Lower Bound|
To put it simply, if Janna, picked 54 times and banned 29 times, were to be picked every single game, we could expect her win rate to be between 55.6% and 75.5%, 95 out of 100 times, in a tournament similar to the 2017 World Championship.
This list looks more in line with what teams consider to be the strongest champions in the game. Every single meta champion is well represented here and pocket picks with high win rates have dropped considerably in rank. Outliers dropped appropriately, like Aurelion Sol (from 13th to 30th) and Trundle (from 16th to 24th).
Wilson score intervals provide a way to condense win rate, pick rate, and ban rate into one value that can rank champions by power level and be easily interpreted.
Of course, while we can look at Wilson Score intervals to determine individual champions’ success, simply picking the strongest champions in each role will not necessarily result in a better team composition. Individual skill, team coordination, and champion synergy are all important parts of the draft. Furthermore, some teams succeed more than others with similar drafts. And more teams participated in Worlds this year, with the introduction of the play-in stage, so the difference in skill between the best and worst teams is larger than ever. While we could venture an attempt to value some wins more than others based on teams’ performances, each team only played between four to fifteen games, which is not enough to truly rank the teams.
On top of this, selection bias assumes that teams know the power levels of champions beforehand. However, the draft is a learning process and no team truly masters it. Teams’ priorities change as the tournament progresses: some teams adapt to new strategies while those who cannot are eliminated.
Wilson Score intervals help to assess champion strengt, but there is a subtle game of rock, paper, scissors in drafting a team composition, and the optimal strategy on paper is not always the winning one. I encourage everyone to simulate drafts with the Wilson Score Intervals in mind and learn for themselves that the whole is greater than the sum of its parts.
Dan is currently an analyst for Team Vitality in the EU LCS. His background is in Applied Mathematics and Economics, as well as software development. Follow him on Twitter.