LoL’s Advanced Stats Problem – Cody Gerard

The following article was contributed by Cody Gerard.

NOTE: For some interesting discussion on the article, check out the Twitter thread!

If you’re a League of Legends stathead like me—and let’s face it, if you’re reading this article then you are!—you’ve probably wondered why there’s such a lack of reliable, useful advanced metrics and statistics in League of Legends. On the surface, everything about League of Legends should lend itself to this. After all, it is a game that, in many ways, is all about maximizing value and efficiency, the things which most advanced stats in traditional sports are designed to measure, and yet by and large League of Legends lacks those advanced stats.

When I first started trying to write this article, that was the problem I was endeavoring to solve. I wanted to create, at the very least, an outline for a stat designed to give an overarching valuation of an individual player, something similar to baseball’s Wins Above Replacement (WAR) or Weighted Runs Created Plus (wRC+). I wanted to create a similar stat to these, something that measured how much gold a player created for their team. But the deeper I dug, the more problems I found in trying to make this statistic work. I was convinced my theory was sound, that I was working with good ideas, and that what I was doing should be possible, but I just could not make it work. Ultimately, I hit on what I feel is the reason advanced statistics like WAR and wRC+ don’t exist in LoL: we lack the necessary precision in statistical measurement to make them both practical and effective.

As a result, I’ve decided to take a  dive into why this is a problem, some potential solutions, as well as what we can do with what we now have access to, in order to further League of Legends statistics.

What You Need For an Advanced Stat

As I mentioned above, most of the more known and useful advanced statistics in traditional sports are meant to measure either value or efficiency, or both at the same time. This can include things such as True Shooting Percentage (TS%) in basketball, meant to measure how efficient a shooter a player is in isolation from circumstances, or wRC+ in baseball, which is meant to create an isolated value for a player’s offensive production, again independent of circumstances.

To accurately measure the things TS% and wRC+ do, though, you need equally accurate underlying measurements in order to control for all of the factors that play into every shot or at bat. You need every factor involved. For wRC+, you need statistics to compensate for how each ballpark affects a player’s production. If you include its underlying statistics, wRC+ also controls for the probability of a run being scored on a given hit, instead of the actual result, which itself may be dependent on circumstances.

In LoL we  simply cannot control for so many factors. With wRC+, we can control for just about every factor for every plate appearance.  Such large-scale controls are needed to ensure these statistics have proper accuracy.

As an example of this, let’s try to define the value of a kill. The answer to that seems obvious: the total kill gold + the total assist gold. This is true in a broad sense, but now let’s say three people contributed to the kill. You then want to assign a portion of the value of the kill to each player. Obviously it’s not as simple as assigning the gold value they received from it. You can do precisely 1 point of damage to a player and receive 300 gold, but no one would realistically say you created 300 gold for your team by dealing that 1 point of damage. This is the kind of problem I was trying to solve again and again while trying to make an evaluative statistic, and time and time again I could not find practical, workable solutions.

One idea I had to adjust for the difference between gold received and gold created in this situation was to assign the gold proportionately to the damage done. However, this data is not readily available, especially in the en masse quantities needed. The information exists; you see it every time you die in game on the death recap. But there’s no practical way to record the damage data for each and every kill in a given game. Heck, there is not even a way to data mine en masse the number of people who contributed to every kill. The only way to do that is to manually count, or maybe to create a program to do it.

Creating an accurate player-by-player evaluation for each individual kill would be only one layer to a statistic like this. You would also need to create systems to factor in the different ways certain champions contribute, for which you would need access to in-depth information on every champion in pro play, from expected CS differences to expected damage output. Once again, while the data necessary to collect this information exists in theory, it would take a tremendous amount of effort to compile and tabulate.

This brings me to the central problem: League of Legends lacks the necessary underlying metrics to make true all-encompassing advanced statistics work. Statistics build on each other. Not only do we lack the precise measurement needed for these stats, but we also lack the statistical infrastructure needed to support them. For example, wRC+ factors in another advanced statistics in its formula, such as Weighted Runs Above Average, which is in turn created by using Weighted On Base Average.

Once we have the precise enough measurements, we can begin creating the underlying statistics necessary for something like a Weighted Gold Created statistic. But without precision measurements, and without proper underlying metrics, any attempt at creating something like this turns into a lot of estimation and guesswork, which defeats the very purpose of creating something precise and accurate.

What We Can Do Now

Despite the shortcomings in League of Legends statistics at the moment, we are not entirely helpless to create at least some smaller scale advanced statistics. Specifically, I believe that it is possible to create advanced statistics for the laning phase of the game. The laning phase, for the most part, is isolated in and of itself, so controlling for a smaller number of variables, such as jungle presence, is doable. In addition, many of the most basic statistics in the game could be applied for proper use here when they could not be on a broader scale. Things such as GD@10, CSD@10, and XPD@10 could all be used here with proper control of a few outside factors. By comparison, using any of these stats to create something as well controlled at 20 minutes would be nearly impossible, considering the significantly larger number of outside factors that can alter them once the laning phase ends.

Considering the smaller scope, the laning phase is also much more forgiving towards approximations, as they will have far fewer cascading effects. For example, four hard values could be created to categorize champion matchups: hard losing, soft losing, soft winning, and hard winning. These values could be used as a coefficient to compensate for the factors of a given matchup. This sort of ballpark, catch-all estimation would be fatal to any overarching advanced stat, as the sheer breadth of matchups and factors would call for more precision, without which the statistic would become a mere approximation. But in the more controlled and defined environment of the laning phase, we might be able to get away with this.

As a result of these possibilities, I plan to continue attempting to create an advanced statistic, just one smaller in scope. This will be meant to give an isolated evaluation of a player’s laning phase, one that could work when factoring in either a single game or even the player’s overall body of work. There is still a lot of work to do, but I am confident that it is possible, and I hope to see other analysts undertake similar endeavors with regards to laning phase. The bigger a statistical profile we can build up where we can, the easier it will be to create true, fully developed advanced stats when the time comes that we are able to do so.

What Needs to Be Done for Advanced Stats to Thrive

As I’ve said, for advanced stats to exist, we need more precise measurements to be easily accessible. The thing is, almost all of the information we need exists. It just needs to be properly taken from the game logs and match histories and made readily accessible. I am no expert on Riot’s server infrastructure or their ability to collect data on a game to game basis, but I do believe, considering what we are able to see in game and have access to outside of it now, it should be possible.

Given this, I am hopeful that one day the data I need to complete the ambitious project I set out to do when I first began thinking about this article, will be available. When it is, I’ll fully plan to come back and finish the job.

Cody Gerard is a League of Legends Analyst and Coach who has worked with Rogue, ROCCAT, and Baskonia esports.

Join the Discussion

This site uses Akismet to reduce spam. Learn how your comment data is processed.