Fans learning tree

Help people get to know teams/players quickly and improve the success rate of gambling

TEAM

Endong Han

Yue Zhang

Zhang Yang

Tao Ji

Yuhan Ren

MY ROLE

UX designer

DATE

06/2018-07/2018

METHODS

Machine learning

Spider

TOOLS

Echarts

Process on

Spark MLlib

Introduction

Recently, the World Cup has become one of the hot topics. People without much knowledge of the World Cup will also want to know the relevant information of the World Cup, including the information of participating teams and players, etc. At the same time, those who already have a good understanding of the World Cup will want to know more, or obtain the predicted results of the matches that have not been played by certain technical means. Therefore, we developed such a product for the people who are interested in the World Cup, to meet the needs of various kinds of users.

User Needs

See World Cup matches information
See World Cup players goal list
Improve the success rate of predicting games, and provide reference for buying gambling

Solutions

To give the user the information and the game prediction of two teams, including the team's goal number, the ball control, the attack, the fouls, the yellow CARDS, the red CARDS, and predict the winning team and the probability.
To give the user information about a player, including name, team, goals, assists, penalty kicks, etc.

Competitive Analysis

At present, there is no mature product on the market which can predict the result of the competition.We call FIFA's official app interface to get detailed data about each game, and conduct logical regression training on the obtained data. Let the machine consider the weight of various data in a football game, obtain the weight value of each index through training, and finally predict the result of the game by comparing various information of the two teams.

Our product and this product have the function of predicting the result of the game, the difference is that our information is more focused on the team's various ability indicators, while they focus on the odds of betting and so on. The advantage of our product is that the information displayed is more professional in football, easy to understand and intuitive, and the page is simple and beautiful for the public who are interested in football.

Implementation Procedure

Python Requests library is used to write spider and get data.
HDFS of Hadoop is used to store data from the spider, as well as data after preliminary cleaning.
Spark MLlib is used to model the processed data in a machine learning way to obtain a prediction model that can predict the result of the game between two teams.
AJAX is used to transfer data between server and browser.
Django and Flask are used to create web services.
Echarts is used to visualize data on the front end and draw data in the form of charts.

Model Evaluation

Two models are established for football game prediction, respectively the prediction model based on

integral statistical ranking and the model based on logistics regression, both of which are classification models. The model evaluation here focuses on the generalization capability of the model.

1.Prediction model based on integral statistical ranking

The evaluation method based on confusion matrix is given.

first of all, get the names of the two teams from the matches that have been held in this World Cup, and define that the home team wins 1 and the visiting team wins 0;
get the points of the two teams from the established model points statistical ranking table. If there is no visiting team in the ranking table, the host team wins; if there is no host team but a visiting team, the host team wins and then skip the next step;
compare the scores of the two teams. The team with the highest score wins and records the result.

4.get the results of the game against the actual results, you get the TP, TN, FP, FN, and then get the confusion matrix, and calculate the probability P and the full rate R.

From the results, the P of this model is poor, but the R is above the average level. In other words, the model can not predicts the winning situation of the game very accurately, but the model can comprehensively include the winning team. However, since World Cup games are often unpredictable, prediction models should place more emphasis on the reasonableness of the results of matches played by stable teams, and the R reflects such an indicator. Combining the results with reality, this model is relatively reasonable for the prediction of general stable team competition results.

2.Prediction model based on logistics regression

take the test data into the logic regression model, return the probability prediction;
if the probability prediction result is greater than 0.5, the home team is deemed to win; otherwise, the visiting team wins;

Model independent variable: the data difference between the home team and the visiting team in the 16 dimensions

Model dependent variable: whether the host team wins (1 for victory, 0 for failure)

3. according to the predicted results, the confusion matrix is obtained, calculate P and R.

As can be seen from the above data, both P and R have been improved, and the prediction accuracy of the model has been improved. At the same time, it can well include the winning situation of the home team (or the losing situation of the visiting team). The improvement of P and R indicates that logistic regression model is indeed a better model. Considering various factors in the game, rather than only comparing team ranking points, the new model is more optimized than the traditional simple model.