Match Prediction in League of Legends Using Vanilla Deep Neural Network

Learn about the League of Legends match prediction model that shows about 70% accuracy with its unique input features.

Seouk Jun Kim

Published in

Towards Data Science

9 min readApr 18, 2020

Why Study Win Prediction in E-sports Games?

The first general public response is money, and true, maybe money could be made with a sufficiently successful prediction model, but perhaps that is just looking at the small picture.

A bigger picture can be seen by dissecting the model. A successful win prediction model tells us exactly what contributes to the victory of a match.

For example, suppose that one of the data points used in the prediction model is player height. If it appears through the prediction model that the taller the average height of the team, the more likely that team is going to win the match, then we can derive from the prediction model that contrary to our beliefs, player height is actually crucial in winning a League match! This could fundamentally change the scouting system for professional League E-sports teams around the world.

Why care about E-sports industry in the first place, you might ask. Well, in 2019’s League of Legends World Championship, 44 million people concurrently watched the finals on November 10th, and 100 million unique viewers turned in for the event online. That exceeds the Super Bowl viewership number of that year.¹ And with the growing popularity of the industry, the numbers are expected to grow continually.

One final reason to study the numbers behind E-sports is accessible data. There exists some debate on this topic, as RiotApi (an api for acquiring match data, made available through RiotGames, the pubilsher of League of Legends) offers restricted and inaccurate data, which could be the reason why most research papers on the same game genre are based on DotA2, League’s rival. However, the data is free. This is a huge advantage compared to other traditional sports like soccer, where meaningful data can only be acquired through costly subscriptions from firms like Opta.

Previous Studies on Esports Match Prediction

As mentioned before, scarce few prior studies on League of Legends match prediction exists, but the opposite is the case for DotA2 match prediction. As these two games are of the same genre with similar game plays, these studies are worth exploring.

The paper Dota 2 Win Prediction, for example, offers a predictor that has 73% accuracy, using features such as offset, matchup, synergy, and countering.²
Another paper To Win or Not to Win offers a predictor that has 74% accuracy, using co-occurrence network to uncover hero synergy data and using logistic regression to do the final magic.³

As amazing as these results are, there are certain improvements to these prior studies that can be applied to attain better results.

First, both papers treat each hero as an individual vector in the input, or in the data pre-processing process. This lets you “consider the individual impact of each Radiant and Dire hero on a match outcome” but also has the danger of training the prediction model too narrowly on each champion.⁴ For example, the prediction model will not be able to look at Ornn as a champion with tankiness and a lot of magic damage, but just as Ornn himself. It will miss the traits that define individual champions and fail to generalize with other similar champions.

Second, both papers don’t take into account the skill of the player controlling the hero. It is quite obvious that a professional player controlling a hero would have much bigger influence on the game outcome than an average player controlling the exact same hero. Therefore while the influence of the hero, its synergies, and its counter-picks are all important, the skill of the player also matters. This could be an additional feature to improve the prediction power.

Data Set

The problem with RiotApi is that it is difficult to get a set amount of match data on the go. It is necessary to pinpoint one player, and then go through his match history, then on to match history of players appearing on his match history, to recursively collect match data.

Even when the data is acquired, it must be filtered to make sure that the lane positions of the champions are marked correctly. In more than half the cases, Riot incorrectly marks the lane position of the champion, marking three champions in top lane, four champions in mid lane, zero champions in bot lane, etc.

RiotApi also enforces a rate limit to 100 requests every 2 minutes, and that coupled with the high probability of corrupted data meant that I had to run a script on a custom AWS EC2 server for several days endlessly digging for match data.

In my research, I collected 1045 match data, of which 80% were used as training set, 10% as dev set, and the rest 10% as test set.

Features

What really separates my model from prior studies is my selection of features.

The features I have used for my model are win-rates.

An obvious fact about a League of Legends game is that a team of good players controlling strong champions would win. Here being a “good player” means more specifically that the player is good at controlling that specific champion, as the player has to be able to match the champion’s play-style to maximize the champion’s potential.

So the problem becomes, what data shows the following two features in numbers?

How well a player controls a certain champion
How strong a champion is in the current meta

The first feature is captured by the player’s win-rate with that specific champion. For example, a player who only plays a champion named Nidalee will most likely have a high win-rate when playing that champion.

The second feature is captured by the win-rate of that specific champion in the current meta/patch. A strong champion is known to have a high win-rate (usually equal to or bigger than 53% when counting from all tiers combined).

We collect this data for every player in each team, so that each input vector to the neural net would be 20 data points large, like the following:

[0.5114, 0.52, 0.5275, 0.619, 0.5074, 0.727, 0.4999, 0.517, 0.5187, 0.659, 0.5034, 0.0, 0.5005, 0.5, 0.4448, 0.257, 0.5065, 0.286, 0.5199, 0.544]

The result is that I have an input feature that is only 20 large in size. This is extremely small compared to the input features used in previous studies.

Every 2*i | 0≤i≤9 element is the champion’s win-rate in the current meta/patch, and the 1+i | 0≤i≤9 is the player’s win-rate while controlling that specific champion.

Model and Result

Model

As mentioned in the title, I used a vanilla deep neural network, meaning a fully-connected feed-forward deep neural network with four hidden layers.
I used the tflearn library to build the model, with batch training included. Various parameters and hyper-parameters were tried, including the number of hidden layers. As excessive as four hidden layers seems, it seemed to produce the best results, although the performance as differed little with two or three hidden layers. I won’t reveal the exact parameters/hyper-parameters, as it should be pretty easy to reproduce the results by setting your own parameters.

Results

The performance of the neural network depended on the parameters/hyper-parameters, but the most representative numbers were as follows:

Training accuracy: 0.7255
Validation accuracy: 0.6905
Test accuracy: 0.7033

Also, as softmax activation was used at the output layer, the output vector shows a degree of “confidence” the network has in the match being a victory or a loss as a vector of size two (for example [0.317, 0.683] would show that the model believes the match to be a loss). The numbers in the output vector do not directly translate to the “probability” of a win or a loss, especially because batch training was used, but they do give a general sense of the likelihood of each event.
So I gathered all the matches where the model had 0.8 or more “confidence” that the match will result in a victory, then got a 76.76% accuracy: for certain matches, we could be pretty sure about the result of it.

Big Room for Improvement

What remains so promising about this model is that there is a lot of room for improvement, and therefore, a possibility for greater match prediction accuracy.

Red side / Blue side

When placed into a League of Legends match, each player is assigned to either a red side, or a blue side. Traditionally, there has been a difference in win-rate depending on which side you’re positioned on, and while in the current patch there is no big difference (50.3% win-rate for the blue side), it is a number that changes from patch to patch and can be taken into consideration.⁵

Champion Synergy

Synergy is one of the features included in previous studies, and is solely responsible for the 74% prediction accuracy recorded by To win or not to win.⁵ Coming up with a way to accurately reflect champion synergy and include it as an input feature could enhance the model’s accuracy.

Refinement of Data

In the current data used for train/dev/test set, some data were impossible to find and therefore were left to chance. These loan to the fact that I heavily used op.gg and other League of legends analytics websites to gather champion statistics specific to players.
For example, if when per-processing the data, the python crawling script could not find a champion specific win-rate for a player, that win-rate got automatically assigned a 0.5 value. This was a reasonable choice because in most cases, the champion specific win-rate does not appear because of the lack of that specific champion’s usage, which means that it is just as likely for that player to be bad at using the champion as he/she is good at it.
With more refined data with less holes in them, the performance of this neural network model could improve.

Form of a Player

A common phenomenon between players is that sometimes players get in a “roll.” This means that when a player has been winning several matches in a row, he feels that he is likely to win again the next match, and vice versa. In professional tournament settings, it is nearly a given fact that a team could have very different performance depending on the form of the players. KT Rolster in LCK Spring 2020, for example, performed extremely poorly for the majority of the season, but showed remarkable prowess in the last two weeks of the tournament, owing to the increased “form” of the players and the team. This could also be an additional feature that impacts match prediction.

Champion Composition

Very similar to champion synergy, yet a bit different, champion composition refers to how a team is structured before a match. For example, the blue side team might have one healer champion, two tank champions, and two dealer champions, which depending on the patch/meta could influence that team’s win possibility.
Another distinction could be the “early-game” champions and “late-game-carry” champions. These labels differentiate champions based on when that champion has the most influence on the game, and therefore could be another way to represent team composition.

What Now?

I have confidence that improving the prediction model with additional features and better data will enhance the prediction capabilities of the model, perhaps over the 74% baseline set by one of the prior researches.

Hopefully with more advanced prediction models, we could set guidelines on the best team composition, and what specifically to look for when scouting for future professional players. By coming up with a successful model and examining the weights, we could potentially gain insight into how important mastery of a champion is in comparison to team composition, etc. This could impact not only the scouting system of professional teams, but also their strategies in championships.

As I am in serving in the military right now, I do have limits in how fast/efficiently I can pursue this research. But I’ll do it nonetheless and keep posting on what I get.

Next Steps:

Learning and setting up DynamoDB on AWS.
Creating script files to set up my own database on DynamoDB.
Signing up for Development Api Key for RiotApi.
Learning how to represent champion synergies and compositions.
Etc.

[1] Pei, Annie, “This esports giant draws in more viewers than the Super Bowl, and it’s expected to get even bigger”, https://www.cnbc.com/2019/04/14/league-of-legends-gets-more-viewers-than-super-bowlwhats-coming-next.html
[2] Kinkade, Nicholas, “DOTA 2 Win Prediction.” (2015).
[3] Kalyanaraman, Kaushik. “To win or not to win ? A prediction model to determine the outcome of a DotA 2 match.” (2015).
[4] Kinkade.
[5] Kalyanaraman.
[6] Red vs Blue Graph (2020.04.17), https://www.leagueofgraphs.com/rankings/blue-vs-red/na