DATASET

En el presente trabajo, se trata de observar los resultados de la Premier League, desde la temporada 1993-94 hasta la 2017-18. En primer lugar, observamos la estructura principal de los datos y los paquetes utilizados. A continuacion veremos los ajustes correspondientes. En segundo lugar, se enseñaran los distintos algoritmos empleados, los cuales nos podran ser utiles para sacar predicciones de dichos datos. El objetivo principal del trabajo, va a tratar de predecir, como hubiese sido la clasificacion de le EPL durante la temporada 2017-18. Centrandose, principalmente en los partidos del Liverpool. Finalmente, sacaremos conclusiones, de los resultados obtenidos.

El conjunto de datos, para el proyecto, se han sacadado de la pagina web de Kaggle. La estructura se compone de: 9665 observaciones por 11 columnas. Las variables incluyen: Division: denominado como E0 HomeTeam: Equipo Local AwayTeam: Equipo visitante FTHG : total de goles en casa FTAG: total de goles fuera de casa FTR: resultado final HTHG: goles en casa, a mitad de partido HTAG: goles fuera de casa, a mitad de partido HTR: resultados a mitad de partido Season:temporada

OBJETIVO

Utilizando los algoritmos, comprobando si funcionan en los datasets, veremos si es capaz de encontrar predicciones razonables para futuros partidos en la Premier League, los cuales podrian utilizarse para casas de apuestas deportivas.

# Comenzamos cargando los datos en R
url= "https://raw.githubusercontent.com/ferandoases/machinelearning1/master/EPL_Set.csv"
df=read.csv(url)

LIBRARY:

#Listado de paquetes empleados:
library(tidyverse)

## -- Attaching packages --------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  1.4.2     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.3.1
## v readr   1.2.1     v forcats 0.3.0

## -- Conflicts ------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(rlang)

## 
## Attaching package: 'rlang'

## The following objects are masked from 'package:purrr':
## 
##     %@%, %||%, as_function, flatten, flatten_chr, flatten_dbl,
##     flatten_int, flatten_lgl, invoke, list_along, modify, prepend,
##     rep_along, splice

library(ggpubr)

## Loading required package: magrittr

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:rlang':
## 
##     set_names

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

library(partykit)

## Loading required package: grid

## Loading required package: libcoin

## Loading required package: mvtnorm

library(C50)
library(caret)

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

library(ranger)
library(e1071)
library(klaR)

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

# Transformamos los datos de DATE que eran factores, a un formato de fecha.
# Ademas borramos los NA, que no interesan.
df$Date <- dmy(df$Date)
df <- df[complete.cases(df),]
str(df)

## 'data.frame':    8740 obs. of  11 variables:
##  $ ï..Div  : Factor w/ 1 level "E0": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Date    : Date, format: "1995-08-19" "1995-08-19" ...
##  $ HomeTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 2 5 14 25 26 30 39 47 49 1 ...
##  $ AwayTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 27 35 18 38 44 15 32 23 7 29 ...
##  $ FTHG    : int  3 1 0 1 1 3 3 1 3 1 ...
##  $ FTAG    : int  1 0 0 0 1 0 4 2 2 1 ...
##  $ FTR     : Factor w/ 3 levels "A","D","H": 3 3 2 3 2 3 1 1 3 2 ...
##  $ HTHG    : int  3 1 0 0 0 1 1 1 2 1 ...
##  $ HTAG    : int  0 0 0 0 1 0 3 0 2 1 ...
##  $ HTR     : Factor w/ 4 levels "","A","D","H": 4 4 3 3 2 4 2 4 3 3 ...
##  $ Season  : Factor w/ 25 levels "1993-94","1994-95",..: 3 3 3 3 3 3 3 3 3 3 ...

# Borramos la Var Div, ya que los cuales son iguales en todos los años.
# Son equipos de primera Division.
df <- df[-1]
str(df)

## 'data.frame':    8740 obs. of  10 variables:
##  $ Date    : Date, format: "1995-08-19" "1995-08-19" ...
##  $ HomeTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 2 5 14 25 26 30 39 47 49 1 ...
##  $ AwayTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 27 35 18 38 44 15 32 23 7 29 ...
##  $ FTHG    : int  3 1 0 1 1 3 3 1 3 1 ...
##  $ FTAG    : int  1 0 0 0 1 0 4 2 2 1 ...
##  $ FTR     : Factor w/ 3 levels "A","D","H": 3 3 2 3 2 3 1 1 3 2 ...
##  $ HTHG    : int  3 1 0 0 0 1 1 1 2 1 ...
##  $ HTAG    : int  0 0 0 0 1 0 3 0 2 1 ...
##  $ HTR     : Factor w/ 4 levels "","A","D","H": 4 4 3 3 2 4 2 4 3 3 ...
##  $ Season  : Factor w/ 25 levels "1993-94","1994-95",..: 3 3 3 3 3 3 3 3 3 3 ...

# Procedemos a ordenar las filas del dataframe por una columna.
# Vamos a centrarnos en las variables: FTR,HTR.
# en orden creciente.
dfn <- df[, 1:10]
dfn$HTR <- droplevels(df$HTR)
dfn$FTR <- droplevels(df$FTR)
str(dfn)

## 'data.frame':    8740 obs. of  10 variables:
##  $ Date    : Date, format: "1995-08-19" "1995-08-19" ...
##  $ HomeTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 2 5 14 25 26 30 39 47 49 1 ...
##  $ AwayTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 27 35 18 38 44 15 32 23 7 29 ...
##  $ FTHG    : int  3 1 0 1 1 3 3 1 3 1 ...
##  $ FTAG    : int  1 0 0 0 1 0 4 2 2 1 ...
##  $ FTR     : Factor w/ 3 levels "A","D","H": 3 3 2 3 2 3 1 1 3 2 ...
##  $ HTHG    : int  3 1 0 0 0 1 1 1 2 1 ...
##  $ HTAG    : int  0 0 0 0 1 0 3 0 2 1 ...
##  $ HTR     : Factor w/ 3 levels "A","D","H": 3 3 2 2 1 3 1 3 2 2 ...
##  $ Season  : Factor w/ 25 levels "1993-94","1994-95",..: 3 3 3 3 3 3 3 3 3 3 ...

#Como queremos estudiar la temporada 2017-18.
#y especialmente los partidos del Liverpool.
#realizamos lo siguiente:
temporada <- dfn %>% filter(Season == '2017-18')
sapply(temporada, function(x) sum(is.na(x)))

##     Date HomeTeam AwayTeam     FTHG     FTAG      FTR     HTHG     HTAG 
##        0        0        0        0        0        0        0        0 
##      HTR   Season 
##        0        0

str(dfn)

## 'data.frame':    8740 obs. of  10 variables:
##  $ Date    : Date, format: "1995-08-19" "1995-08-19" ...
##  $ HomeTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 2 5 14 25 26 30 39 47 49 1 ...
##  $ AwayTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 27 35 18 38 44 15 32 23 7 29 ...
##  $ FTHG    : int  3 1 0 1 1 3 3 1 3 1 ...
##  $ FTAG    : int  1 0 0 0 1 0 4 2 2 1 ...
##  $ FTR     : Factor w/ 3 levels "A","D","H": 3 3 2 3 2 3 1 1 3 2 ...
##  $ HTHG    : int  3 1 0 0 0 1 1 1 2 1 ...
##  $ HTAG    : int  0 0 0 0 1 0 3 0 2 1 ...
##  $ HTR     : Factor w/ 3 levels "A","D","H": 3 3 2 2 1 3 1 3 2 2 ...
##  $ Season  : Factor w/ 25 levels "1993-94","1994-95",..: 3 3 3 3 3 3 3 3 3 3 ...

str(temporada)

## 'data.frame':    380 obs. of  10 variables:
##  $ Date    : Date, format: "2017-08-11" "2017-08-12" ...
##  $ HomeTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 1 10 14 16 18 39 45 46 27 30 ...
##  $ AwayTeam: Factor w/ 50 levels "Arsenal","Aston Villa",..: 24 26 11 20 40 42 25 8 47 44 ...
##  $ FTHG    : int  4 0 2 0 1 0 3 1 4 0 ...
##  $ FTAG    : int  3 2 3 3 0 0 3 0 0 2 ...
##  $ FTR     : Factor w/ 3 levels "A","D","H": 3 1 1 1 3 2 2 3 3 1 ...
##  $ HTHG    : int  2 0 0 0 1 0 2 1 1 0 ...
##  $ HTAG    : int  2 0 3 2 0 0 1 0 0 0 ...
##  $ HTR     : Factor w/ 3 levels "A","D","H": 2 2 1 1 3 2 3 3 3 2 ...
##  $ Season  : Factor w/ 25 levels "1993-94","1994-95",..: 25 25 25 25 25 25 25 25 25 25 ...

dfn1 <- temporada [, 4:9]
str(dfn1)

## 'data.frame':    380 obs. of  6 variables:
##  $ FTHG: int  4 0 2 0 1 0 3 1 4 0 ...
##  $ FTAG: int  3 2 3 3 0 0 3 0 0 2 ...
##  $ FTR : Factor w/ 3 levels "A","D","H": 3 1 1 1 3 2 2 3 3 1 ...
##  $ HTHG: int  2 0 0 0 1 0 2 1 1 0 ...
##  $ HTAG: int  2 0 3 2 0 0 1 0 0 0 ...
##  $ HTR : Factor w/ 3 levels "A","D","H": 2 2 1 1 3 2 3 3 3 2 ...

# Podemos hacer predicciones, utilizando árboles, después de haber cargado
# previamente los paquetes. Utilizamos 4 modelos. 
# Despues diremos cual es el mejor:
#MODELO 1
m <- ctree(FTR ~ ., data=dfn1)
m

## 
## Model formula:
## FTR ~ FTHG + FTAG + HTHG + HTAG + HTR
## 
## Fitted party:
## [1] root
## |   [2] FTAG <= 1
## |   |   [3] FTHG <= 0
## |   |   |   [4] FTAG <= 0: D (n = 32, err = 0.0%)
## |   |   |   [5] FTAG > 0: A (n = 23, err = 0.0%)
## |   |   [6] FTHG > 0
## |   |   |   [7] FTAG <= 0: H (n = 104, err = 0.0%)
## |   |   |   [8] FTAG > 0
## |   |   |   |   [9] FTHG <= 1: D (n = 45, err = 0.0%)
## |   |   |   |   [10] FTHG > 1: H (n = 59, err = 0.0%)
## |   [11] FTAG > 1
## |   |   [12] FTHG <= 2
## |   |   |   [13] FTHG <= 1: A (n = 72, err = 0.0%)
## |   |   |   [14] FTHG > 1
## |   |   |   |   [15] FTAG <= 2: D (n = 19, err = 0.0%)
## |   |   |   |   [16] FTAG > 2: A (n = 13, err = 0.0%)
## |   |   [17] FTHG > 2: H (n = 13, err = 23.1%)
## 
## Number of inner nodes:    8
## Number of terminal nodes: 9

plot(m)

#MODELO 2
m2 <- C5.0(FTR ~ ., data=dfn1)
m2

## 
## Call:
## C5.0.formula(formula = FTR ~ ., data = dfn1)
## 
## Classification Tree
## Number of samples: 380 
## Number of predictors: 5 
## 
## Tree size: 11 
## 
## Non-standard options: attempt to group attributes

plot(m2)

#MODELO 3
#m3 <- cforest(FTR ~ FTHG + HTHG + HTAG + HTR + FTAG, data=dfn1)
#m3
#He ejecutado en oculto el codigo del modelo 3, ya que su output es
#demasiado grande.

#MODELO 4
m4 <- ranger(FTR ~ FTHG + HTHG + HTAG + HTR + FTAG, data=dfn1, importance="permutation")
m4

## Ranger result
## 
## Call:
##  ranger(FTR ~ FTHG + HTHG + HTAG + HTR + FTAG, data = dfn1, importance = "permutation") 
## 
## Type:                             Classification 
## Number of trees:                  500 
## Sample size:                      380 
## Number of independent variables:  5 
## Mtry:                             2 
## Target node size:                 1 
## Variable importance mode:         permutation 
## Splitrule:                        gini 
## OOB prediction error:             1.84 %

#Finalmente vemos que el mejor modelo es el 2.
#Cargamos ademas el plot

m2

## 
## Call:
## C5.0.formula(formula = FTR ~ ., data = dfn1)
## 
## Classification Tree
## Number of samples: 380 
## Number of predictors: 5 
## 
## Tree size: 11 
## 
## Non-standard options: attempt to group attributes

plot(m2)

str(dfn1)

## 'data.frame':    380 obs. of  6 variables:
##  $ FTHG: int  4 0 2 0 1 0 3 1 4 0 ...
##  $ FTAG: int  3 2 3 3 0 0 3 0 0 2 ...
##  $ FTR : Factor w/ 3 levels "A","D","H": 3 1 1 1 3 2 2 3 3 1 ...
##  $ HTHG: int  2 0 0 0 1 0 2 1 1 0 ...
##  $ HTAG: int  2 0 3 2 0 0 1 0 0 0 ...
##  $ HTR : Factor w/ 3 levels "A","D","H": 2 2 1 1 3 2 3 3 3 2 ...

The dataset

# El numero de la columna con el target:
names(dfn1)

## [1] "FTHG" "FTAG" "FTR"  "HTHG" "HTAG" "HTR"

cl <- 3
# El nombre del target variable en formula:
f <- as.formula(paste(colnames(dfn1)[cl], "~", "."))

#Asignamos un Id, en mi caso el codigo de alumno:
id <- 128502

# y separamos en train y test poniendo un seed para que sea reproducible
set.seed(id)

spl <- createDataPartition(dfn1[,3], p=0.7, list=FALSE)
tr <- dfn1[ spl,]
te <- dfn1[-spl,]

# Entrenamos los tres algoritmos knn, nb y rf1: 
m.knn <- train(f, data=tr, method="knn")
m.nb  <- train(f, data=tr, method="nb")
m.rf1 <- train(f, data=tr, method="ranger")

# juntamos los resultados y lo llamamos res
res <- resamples(list(KNN=m.knn, NB=m.nb, RF=m.rf1))

# y vemos los resultados en numeros y graficamente
summary(res)

## 
## Call:
## summary.resamples(object = res)
## 
## Models: KNN, NB, RF 
## Number of resamples: 25 
## 
## Accuracy 
##          Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## KNN 0.7659574 0.8163265 0.8617021 0.8475269 0.8686869 0.9215686    0
## NB  0.5940594 0.6451613 0.6880734 0.6886992 0.7142857 0.8235294    0
## RF  0.9369369 0.9887640 0.9898990 0.9844629 0.9906542 1.0000000    0
## 
## Kappa 
##          Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## KNN 0.6380233 0.7147477 0.7838316 0.7623911 0.7989063 0.8671875    0
## NB  0.3969719 0.4680187 0.5301128 0.5290736 0.5688623 0.7260519    0
## RF  0.9021657 0.9829730 0.9839169 0.9757651 0.9858353 1.0000000    0

dotplot(res)

# Observammos que el algoritmo,con mas precision es el random forest,
# despues vemos que tanto en el Kappa como en el Accuracy, el Knn esta 
# por el estilo. Vemos que el algoritmo nb es descartado.
# Pasamos con las predicciones.
#PREDICCION 1
prediccion1 <- predict(m.rf1, te)
confusionMatrix(prediccion1,te[,cl])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  A  D  H
##          A 32  0  0
##          D  0 29  0
##          H  0  0 51
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9676, 1)
##     No Information Rate : 0.4554     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: A Class: D Class: H
## Sensitivity            1.0000   1.0000   1.0000
## Specificity            1.0000   1.0000   1.0000
## Pos Pred Value         1.0000   1.0000   1.0000
## Neg Pred Value         1.0000   1.0000   1.0000
## Prevalence             0.2857   0.2589   0.4554
## Detection Rate         0.2857   0.2589   0.4554
## Detection Prevalence   0.2857   0.2589   0.4554
## Balanced Accuracy      1.0000   1.0000   1.0000

str(prediccion1)

##  Factor w/ 3 levels "A","D","H": 1 1 2 3 1 3 1 1 3 3 ...

summary(prediccion1)

##  A  D  H 
## 32 29 51

#PREDICCION 2
prediccion2 <- predict(m.knn, te)
confusionMatrix(prediccion2,te[,cl])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  A  D  H
##          A 29  2  0
##          D  3 22  0
##          H  0  5 51
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9107          
##                  95% CI : (0.8419, 0.9564)
##     No Information Rate : 0.4554          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8595          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: A Class: D Class: H
## Sensitivity            0.9062   0.7586   1.0000
## Specificity            0.9750   0.9639   0.9180
## Pos Pred Value         0.9355   0.8800   0.9107
## Neg Pred Value         0.9630   0.9195   1.0000
## Prevalence             0.2857   0.2589   0.4554
## Detection Rate         0.2589   0.1964   0.4554
## Detection Prevalence   0.2768   0.2232   0.5000
## Balanced Accuracy      0.9406   0.8612   0.9590

str(prediccion2)

##  Factor w/ 3 levels "A","D","H": 1 1 3 3 1 3 1 1 3 3 ...

summary(prediccion2)

##  A  D  H 
## 31 25 56

#PREDICCION 3
prediccion3<- predict(m.nb, te)
confusionMatrix(prediccion3,te[,cl])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  A  D  H
##          A 23  7  0
##          D  5 17 14
##          H  4  5 37
## 
## Overall Statistics
##                                          
##                Accuracy : 0.6875         
##                  95% CI : (0.593, 0.7717)
##     No Information Rate : 0.4554         
##     P-Value [Acc > NIR] : 5.953e-07      
##                                          
##                   Kappa : 0.5216         
##  Mcnemar's Test P-Value : 0.03517        
## 
## Statistics by Class:
## 
##                      Class: A Class: D Class: H
## Sensitivity            0.7188   0.5862   0.7255
## Specificity            0.9125   0.7711   0.8525
## Pos Pred Value         0.7667   0.4722   0.8043
## Neg Pred Value         0.8902   0.8421   0.7879
## Prevalence             0.2857   0.2589   0.4554
## Detection Rate         0.2054   0.1518   0.3304
## Detection Prevalence   0.2679   0.3214   0.4107
## Balanced Accuracy      0.8156   0.6786   0.7890

str(prediccion3)

##  Factor w/ 3 levels "A","D","H": 1 1 3 3 2 3 1 1 2 3 ...

summary(prediccion3)

##  A  D  H 
## 30 36 46

#Procedemos a estudiar como se ha desenvuelto el Liverpool en la temporada:
liverpool_temporada <- temporada %>% filter(HomeTeam == 'Liverpool' | AwayTeam == 'Liverpool')

# Separamos los partidos en casa y visitante
liverpool_casa <- liverpool_temporada %>% filter(HomeTeam == 'Liverpool')
liverpool_visitante <- liverpool_temporada %>% filter(AwayTeam == 'Liverpool')

# Estudiaremos mediante la función count, para poder observar los partidos
# totales fuera de casa y en casa.
# Seguidamente, cargaremos un gráfico de barras para reflejar los resultados,
# de forma sencilla.
full_time_home <- liverpool_casa %>% count(FTR) 
half_time_home <- liverpool_casa %>% count(HTR)
full_time_away <- liverpool_visitante %>% count(FTR)
half_time_away <- liverpool_visitante %>% count(HTR)

home_results_breakdown <- left_join(full_time_home, half_time_home, by = c('FTR' = 'HTR')) %>% 
  rename(result = FTR, FTR = n.x, HTR = n.y) %>% 
  gather(time, count, -result) %>% 
  mutate(location = "Home", 
         result = ordered(ifelse(result == 'A', 'Lose', ifelse(result == 'H', 'Win', 'Draw')), levels = c('Win', 'Draw', 'Lose')), 
         time = ifelse(time == 'HTR', 'Half Time Result', 'Full Time Result'),
         time = ordered(time, levels = c('Half Time Result', 'Full Time Result')))

away_results_breakdown <- left_join(full_time_away, half_time_away, by = c('FTR' = 'HTR')) %>% 
  rename(result = FTR, FTR = n.x, HTR = n.y) %>% 
  gather(time, count, -result) %>% 
  mutate(location = "Away", 
         result = ordered(ifelse(result == 'A', 'Win', ifelse(result == 'H', 'Lose', 'Draw')), levels = c('Win', 'Draw', 'Lose')), 
         time = ifelse(time == 'HTR', 'Half Time Result', 'Full Time Result'),
         time = ordered(time, levels = c('Half Time Result', 'Full Time Result')))
results_breakdown <- bind_rows(home_results_breakdown, away_results_breakdown) %>% mutate(location = ordered(location, c('Home', 'Away')))

home_points = c(1,3)
total_points_full_time <- liverpool_casa %>% count(FTR) %>% mutate(point_value = home_points, total_points = n * point_value)
total_points_full_time

## # A tibble: 2 x 4
##   FTR       n point_value total_points
##   <fct> <int>       <dbl>        <dbl>
## 1 D         7           1            7
## 2 H        12           3           36

home_points = c(0,1,3)
total_points_full_time_half <- liverpool_casa %>% count(HTR) %>% mutate(point_value = home_points, total_points = n * point_value)
total_points_full_time_half

## # A tibble: 3 x 4
##   HTR       n point_value total_points
##   <fct> <int>       <dbl>        <dbl>
## 1 A         1           0            0
## 2 D         8           1            8
## 3 H        10           3           30

#si observamos, solo los partidos jugados en casa el Liverpool, a tiempo completo, 
#el equipo obtendria 43puntos y si los partidos solo durasen los primeros 45 min, 
# hubiese obtenido 38 puntos.

CLASIFICACION:

## # A tibble: 1 x 2
##   team      final_points
##   <chr>            <dbl>
## 1 Liverpool           43

## # A tibble: 20 x 2
##    team           final_points
##    <chr>                 <dbl>
##  1 Man City                 50
##  2 Arsenal                  47
##  3 Man United               47
##  4 Liverpool                43
##  5 Tottenham                43
##  6 Chelsea                  37
##  7 Everton                  34
##  8 Brighton                 29
##  9 Newcastle                28
## 10 Watford                  27
## 11 Leicester                27
## 12 West Ham                 27
## 13 Crystal Palace           26
## 14 Bournemouth              26
## 15 Burnley                  26
## 16 Huddersfield             23
## 17 Swansea                  21
## 18 Stoke                    20
## 19 Southampton              19
## 20 West Brom                18

## # A tibble: 1 x 2
##   team      final_points
##   <chr>            <dbl>
## 1 Liverpool           38

## # A tibble: 20 x 2
##    team           final_points
##    <chr>                 <dbl>
##  1 Man City                 43
##  2 Man United               38
##  3 Liverpool                38
##  4 Arsenal                  37
##  5 Chelsea                  34
##  6 Tottenham                31
##  7 Leicester                25
##  8 Huddersfield             25
##  9 Crystal Palace           24
## 10 Burnley                  24
## 11 Southampton              23
## 12 Watford                  23
## 13 Newcastle                23
## 14 Stoke                    23
## 15 Bournemouth              22
## 16 West Ham                 22
## 17 Brighton                 21
## 18 Everton                  21
## 19 West Brom                21
## 20 Swansea                  16

Machine Learning: EPL Results

Fernando Estrada Fernandez

1 de diciembre de 2018

DATASET

OBJETIVO

LIBRARY:

The dataset

CLASIFICACION: