R ve R Studio

Wikipedia

R, istatistiksel hesaplama ve grafikler için yazılım ortamı olup aynı zamanda programlama dilidir. R Foundation tarafından desteklenen ve GNU Tasarısının parçası olan bir özgür yazılımdır. Yeni Zelanda Auckland Üniversitesinden Ross Ihaka ve Robert Gentleman tarafından ortaya çıkarılan R, hâlihazırda R Geliştirme Çekirdek Ekibi tarafından geliştirilmektedir. S programlama diline benzeyen R, S’nin uyarlaması olarak değerlendirilebilir.

İstatistikî yazılım geliştirme için istatistikçiler arasında de fakto standart haline gelen R, istatistikî yazılım geliştirme ve veri analizi alanında kullanılmaktadır.

R’nin kaynak kodları GNU Genel Kamu Lisansı altında olup değişik işletim sistemlerinde kullanılabilir durumdadır. R, komut satırı arayüzü kullanıyor olsa da değişik grafik kullanıcı arayüzleri de bulunmaktadır.

R

https://www.r-project.org/

R Windows için

https://cran.r-project.org/bin/windows/base/

R Mac için

https://cran.r-project.org/bin/macosx/

RStudio

https://rstudio.com/products/rstudio/download/

R Programlama Dilini Nasıl Öğrenirim?

Datacamp

https://www.datacamp.com/

Udemy

https://www.udemy.com/course/r-programlama/

https://www.udemy.com/course/veri-bilimi-ve-makine-ogrenmesi-egitimi/

ggplot2 Kütüphanesi ve Veri Bilimi’nin Kıvanç Tatlıtuğ’u: Hadley Wickham

knitr::include_graphics("www/gorsel3.jpg") 

Grammer of Graphics Makelesi

https://vita.had.co.nz/papers/layered-grammar.pdf

R for Data Science Kitabı

https://r4ds.had.co.nz

Veri Bilimi Okulu: Veri Görselleştirme Yazısı

https://www.veribilimiokulu.com/r-ile-veri-gorsellestirme/

ggplot2 Referans Kağıdı

https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

Veri Bilimi Süreci

knitr::include_graphics("www/datascience.png") 

Veri Görselleştirmeye Giriş

knitr::include_graphics("www/gorsel1.png") 

Veri görselleştirmek için nelere ihtiyacımız var?

knitr::include_graphics("www/gorsel5.jpg") 

• Veri seti – data

• Verinin estetik özellikleri – aes

• Grafik çeşidi – geom

Veri Türleri

Görselleştirme Türleri

Hangi Görsel Çeşidi Ne İşe Yarıyor? https://datavizcatalogue.com/TR/

Estetikler

Geometriler

ggplot2 Kütüphanesi içerisindeki örnek veri setleri

knitr::include_graphics("www/gorsel2.jpg") 

ggplot2 Referans Kağıdı https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

ggplot yapısı

library(tidyverse)
ggplot(data, mapping = aes(x,y))+
  geom_

Tek Değişkenli

Sürekli

ggplot(mpg, aes(hwy))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Kesikli (Kategorik)

ggplot(data = mtcars, aes(gear))+
  geom_bar()

İki Değişkenli

Sürekli-Sürekli

ggplot(mtcars, aes(mpg, disp))+
  geom_point()

Sürekli - Kesikli

ggplot(mtcars, aes(x = gear, y = disp))+
  geom_col()

Estestik Özellikler

Fill

ggplot(mtcars, aes(disp))+
  geom_histogram(fill = "blue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(mtcars, aes(disp, fill = as.factor(gear)))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Color

ggplot(mtcars, aes(disp, mpg))+
  geom_point(color = "red")

ggplot(mtcars, aes(disp, mpg, color = as.factor(gear)))+
  geom_point()

Size

ggplot(mtcars, aes(disp, mpg, size = drat))+
  geom_point()

Shape

ggplot(mtcars, aes(disp, mpg, shape = factor(gear), color = factor(gear)))+
  geom_point()

Tema -Theme

ggplot(mtcars, aes(disp, mpg, shape = factor(gear), color = factor(gear)))+
  geom_point()+
  theme(axis.text = element_text(colour = "red"),
        axis.title.x = element_text(colour = "blue", size = 50),
        panel.background = element_rect(fill = "green") 
        )

Lejant - Legend

ggplot(mtcars, aes(disp, mpg, shape = factor(gear), color = factor(gear)))+
  geom_point()+
  theme(legend.position = "bottom")

Koordinat

ggplot(mtcars, aes(gear))+
  geom_bar()+
  coord_flip()

Facet

ggplot(mtcars, aes(mpg, disp))+
  geom_point()+
  facet_wrap(gear~.)

İsimlendirme (Başlık, x/y eksenlerinin isimlendirilmesi gibi)

ggplot(mtcars, aes(mpg, disp))+
  geom_point()+
  facet_wrap(gear~.)+
  labs(title = "Veri Görselleştirme Eğitimi", 
       x = "Yatay Eksen", y = "Dikey Eksen",
       caption = "kaynak:",
       subtitle = "Facet Fonksiyonu")

Veri Bilimi Platformu: Kaggle

https://www.kaggle.com/

Çalışmada kullanılan veri seti: FIFA 19

https://www.kaggle.com/karangadiya/fifa19

Uygulama Kodları

https://www.kaggle.com/ekrembayar/fifa-data-analysis-visualization

Uygulama

knitr::include_graphics("www/gorsel4.png") 

1. Packages

library(tidyverse) 
library(magrittr)
library(DataExplorer)
library(maps)
library(plotly)
library(DT)
library(tidytext)
library(gridExtra)


options(scipen = 999)

2. Data

# Data Import
df <- read.csv("data.csv", encoding = "UTF-8")[-1]
head(df)
##       ID              Name Age                                          Photo
## 1 158023          L. Messi  31 https://cdn.sofifa.org/players/4/19/158023.png
## 2  20801 Cristiano Ronaldo  33  https://cdn.sofifa.org/players/4/19/20801.png
## 3 190871         Neymar Jr  26 https://cdn.sofifa.org/players/4/19/190871.png
## 4 193080            De Gea  27 https://cdn.sofifa.org/players/4/19/193080.png
## 5 192985      K. De Bruyne  27 https://cdn.sofifa.org/players/4/19/192985.png
## 6 183277         E. Hazard  27 https://cdn.sofifa.org/players/4/19/183277.png
##   Nationality                                Flag Overall Potential
## 1   Argentina https://cdn.sofifa.org/flags/52.png      94        94
## 2    Portugal https://cdn.sofifa.org/flags/38.png      94        94
## 3      Brazil https://cdn.sofifa.org/flags/54.png      92        93
## 4       Spain https://cdn.sofifa.org/flags/45.png      91        93
## 5     Belgium  https://cdn.sofifa.org/flags/7.png      91        92
## 6     Belgium  https://cdn.sofifa.org/flags/7.png      91        91
##                  Club                                    Club.Logo   Value
## 1        FC Barcelona https://cdn.sofifa.org/teams/2/light/241.png \200110.5M
## 2            Juventus  https://cdn.sofifa.org/teams/2/light/45.png    \20077M
## 3 Paris Saint-Germain  https://cdn.sofifa.org/teams/2/light/73.png \200118.5M
## 4   Manchester United  https://cdn.sofifa.org/teams/2/light/11.png    \20072M
## 5     Manchester City  https://cdn.sofifa.org/teams/2/light/10.png   \200102M
## 6             Chelsea   https://cdn.sofifa.org/teams/2/light/5.png    \20093M
##    Wage Special Preferred.Foot International.Reputation Weak.Foot Skill.Moves
## 1 \200565K    2202           Left                        5         4           4
## 2 \200405K    2228          Right                        5         4           5
## 3 \200290K    2143          Right                        5         5           5
## 4 \200260K    1471          Right                        4         3           1
## 5 \200355K    2281          Right                        4         5           4
## 6 \200340K    2142          Right                        4         4           4
##        Work.Rate  Body.Type Real.Face Position Jersey.Number       Joined
## 1 Medium/ Medium      Messi       Yes       RF            10  Jul 1, 2004
## 2      High/ Low C. Ronaldo       Yes       ST             7 Jul 10, 2018
## 3   High/ Medium     Neymar       Yes       LW            10  Aug 3, 2017
## 4 Medium/ Medium       Lean       Yes       GK             1  Jul 1, 2011
## 5     High/ High     Normal       Yes      RCM             7 Aug 30, 2015
## 6   High/ Medium     Normal       Yes       LF            10  Jul 1, 2012
##   Loaned.From Contract.Valid.Until Height Weight   LS   ST   RS   LW   LF   CF
## 1                             2021    5'7 159lbs 88+2 88+2 88+2 92+2 93+2 93+2
## 2                             2022    6'2 183lbs 91+3 91+3 91+3 89+3 90+3 90+3
## 3                             2022    5'9 150lbs 84+3 84+3 84+3 89+3 89+3 89+3
## 4                             2020    6'4 168lbs                              
## 5                             2023   5'11 154lbs 82+3 82+3 82+3 87+3 87+3 87+3
## 6                             2020    5'8 163lbs 83+3 83+3 83+3 89+3 88+3 88+3
##     RF   RW  LAM  CAM  RAM   LM  LCM   CM  RCM   RM  LWB  LDM  CDM  RDM  RWB
## 1 93+2 92+2 93+2 93+2 93+2 91+2 84+2 84+2 84+2 91+2 64+2 61+2 61+2 61+2 64+2
## 2 90+3 89+3 88+3 88+3 88+3 88+3 81+3 81+3 81+3 88+3 65+3 61+3 61+3 61+3 65+3
## 3 89+3 89+3 89+3 89+3 89+3 88+3 81+3 81+3 81+3 88+3 65+3 60+3 60+3 60+3 65+3
## 4                                                                           
## 5 87+3 87+3 88+3 88+3 88+3 88+3 87+3 87+3 87+3 88+3 77+3 77+3 77+3 77+3 77+3
## 6 88+3 89+3 89+3 89+3 89+3 89+3 82+3 82+3 82+3 89+3 66+3 63+3 63+3 63+3 66+3
##     LB  LCB   CB  RCB   RB Crossing Finishing HeadingAccuracy ShortPassing
## 1 59+2 47+2 47+2 47+2 59+2       84        95              70           90
## 2 61+3 53+3 53+3 53+3 61+3       84        94              89           81
## 3 60+3 47+3 47+3 47+3 60+3       79        87              62           84
## 4                                17        13              21           50
## 5 73+3 66+3 66+3 66+3 73+3       93        82              55           92
## 6 60+3 49+3 49+3 49+3 60+3       81        84              61           89
##   Volleys Dribbling Curve FKAccuracy LongPassing BallControl Acceleration
## 1      86        97    93         94          87          96           91
## 2      87        88    81         76          77          94           89
## 3      84        96    88         87          78          95           94
## 4      13        18    21         19          51          42           57
## 5      82        86    85         83          91          91           78
## 6      80        95    83         79          83          94           94
##   SprintSpeed Agility Reactions Balance ShotPower Jumping Stamina Strength
## 1          86      91        95      95        85      68      72       59
## 2          91      87        96      70        95      95      88       79
## 3          90      96        94      84        80      61      81       49
## 4          58      60        90      43        31      67      43       64
## 5          76      79        91      77        91      63      90       75
## 6          88      95        90      94        82      56      83       66
##   LongShots Aggression Interceptions Positioning Vision Penalties Composure
## 1        94         48            22          94     94        75        96
## 2        93         63            29          95     82        85        95
## 3        82         56            36          89     87        81        94
## 4        12         38            30          12     68        40        68
## 5        91         76            61          87     94        79        88
## 6        80         54            41          87     89        86        91
##   Marking StandingTackle SlidingTackle GKDiving GKHandling GKKicking
## 1      33             28            26        6         11        15
## 2      28             31            23        7         11        15
## 3      27             24            33        9          9        15
## 4      15             21            13       90         85        87
## 5      68             58            51       15         13         5
## 6      34             27            22       11         12         6
##   GKPositioning GKReflexes Release.Clause
## 1            14          8        \200226.5M
## 2            14         11        \200127.1M
## 3            15         11        \200228.1M
## 4            88         94        \200138.6M
## 5            10         13        \200196.4M
## 6             8          8        \200172.1M

Data Structure

dim(df)
## [1] 18207    88
str(df)
## 'data.frame':    18207 obs. of  88 variables:
##  $ ID                      : int  158023 20801 190871 193080 192985 183277 177003 176580 155862 200389 ...
##  $ Name                    : Factor w/ 17194 levels "A. Ábalos","A. Abang",..: 9676 3192 12552 4169 8661 4458 9684 9892 15466 7823 ...
##  $ Age                     : int  31 33 26 27 27 27 32 31 32 25 ...
##  $ Photo                   : Factor w/ 18207 levels "https://cdn.sofifa.org/players/4/19/100803.png",..: 567 6032 3132 3468 3453 1986 1447 1391 484 4443 ...
##  $ Nationality             : Factor w/ 164 levels "Afghanistan",..: 7 124 21 141 14 14 36 159 141 138 ...
##  $ Flag                    : Factor w/ 164 levels "https://cdn.sofifa.org/flags/1.png",..: 123 108 125 115 138 138 2 132 115 114 ...
##  $ Overall                 : int  94 94 92 91 91 91 91 91 91 90 ...
##  $ Potential               : int  94 94 93 93 92 91 91 91 91 93 ...
##  $ Club                    : Factor w/ 652 levels ""," SSV Jahn Regensburg",..: 215 330 437 377 376 137 474 215 474 62 ...
##  $ Club.Logo               : Factor w/ 679 levels "https://cdn.sofifa.org/flags/103.png",..: 491 553 638 90 30 577 493 491 493 490 ...
##  $ Value                   : Factor w/ 217 levels "€0","€1.1M","€1.2M",..: 17 196 19 191 13 214 183 202 155 184 ...
##  $ Wage                    : Factor w/ 144 levels "€0","€100K","€105K",..: 95 75 56 50 67 65 78 82 71 138 ...
##  $ Special                 : int  2202 2228 2143 1471 2281 2142 2280 2346 2201 1331 ...
##  $ Preferred.Foot          : Factor w/ 3 levels "","Left","Right": 2 3 3 3 3 3 3 3 3 3 ...
##  $ International.Reputation: int  5 5 5 4 4 4 4 5 4 3 ...
##  $ Weak.Foot               : int  4 4 5 3 5 4 4 4 3 3 ...
##  $ Skill.Moves             : int  4 5 5 1 4 4 4 3 3 1 ...
##  $ Work.Rate               : Factor w/ 10 levels "","High/ High",..: 10 3 4 10 2 4 2 4 4 10 ...
##  $ Body.Type               : Factor w/ 11 levels "","Akinfenwa",..: 6 3 7 5 8 8 5 8 8 8 ...
##  $ Real.Face               : Factor w/ 3 levels "","No","Yes": 3 3 3 3 3 3 3 3 3 3 ...
##  $ Position                : Factor w/ 28 levels "","CAM","CB",..: 23 28 16 7 21 13 21 25 20 7 ...
##  $ Jersey.Number           : int  10 7 10 1 7 10 10 9 15 1 ...
##  $ Joined                  : Factor w/ 1737 levels "","Apr 1, 2008",..: 776 796 249 783 255 784 105 801 98 836 ...
##  $ Loaned.From             : Factor w/ 342 levels "","1. FC Köln",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Contract.Valid.Until    : Factor w/ 37 levels "","2018","2019",..: 5 6 6 4 7 4 4 5 4 5 ...
##  $ Height                  : Factor w/ 22 levels "","5'1","5'10",..: 10 15 12 17 4 11 11 13 13 15 ...
##  $ Weight                  : Factor w/ 58 levels "","110lbs","115lbs",..: 23 34 19 27 21 25 17 37 33 38 ...
##  $ LS                      : Factor w/ 94 levels "","31+2","32+2",..: 93 94 88 1 85 87 75 92 67 1 ...
##  $ ST                      : Factor w/ 94 levels "","31+2","32+2",..: 93 94 88 1 85 87 75 92 67 1 ...
##  $ RS                      : Factor w/ 94 levels "","31+2","32+2",..: 93 94 88 1 85 87 75 92 67 1 ...
##  $ LW                      : Factor w/ 106 levels "","25+2","27+2",..: 106 105 105 1 104 105 101 103 71 1 ...
##  $ LF                      : Factor w/ 103 levels "","27+2","29+2",..: 103 102 101 1 98 100 95 99 68 1 ...
##  $ CF                      : Factor w/ 103 levels "","27+2","29+2",..: 103 102 101 1 98 100 95 99 68 1 ...
##  $ RF                      : Factor w/ 103 levels "","27+2","29+2",..: 103 102 101 1 98 100 95 99 68 1 ...
##  $ RW                      : Factor w/ 106 levels "","25+2","27+2",..: 106 105 105 1 104 105 101 103 71 1 ...
##  $ LAM                     : Factor w/ 102 levels "","27+2","28+2",..: 102 100 101 1 100 101 99 97 69 1 ...
##  $ CAM                     : Factor w/ 102 levels "","27+2","28+2",..: 102 100 101 1 100 101 99 97 69 1 ...
##  $ RAM                     : Factor w/ 102 levels "","27+2","28+2",..: 102 100 101 1 100 101 99 97 69 1 ...
##  $ LM                      : Factor w/ 101 levels "","27+2","28+2",..: 101 99 99 1 99 100 98 96 70 1 ...
##  $ LCM                     : Factor w/ 93 levels "","30+2","31+2",..: 88 83 83 1 92 85 93 79 70 1 ...
##  $ CM                      : Factor w/ 93 levels "","30+2","31+2",..: 88 83 83 1 92 85 93 79 70 1 ...
##  $ RCM                     : Factor w/ 93 levels "","30+2","31+2",..: 88 83 83 1 92 85 93 79 70 1 ...
##  $ RM                      : Factor w/ 101 levels "","27+2","28+2",..: 101 99 99 1 99 100 98 96 70 1 ...
##  $ LWB                     : Factor w/ 96 levels "","30+2","31+2",..: 55 58 58 1 83 60 93 67 91 1 ...
##  $ LDM                     : Factor w/ 100 levels "","28+2","29+2",..: 50 51 49 1 84 55 92 66 97 1 ...
##  $ CDM                     : Factor w/ 100 levels "","28+2","29+2",..: 50 51 49 1 84 55 92 66 97 1 ...
##  $ RDM                     : Factor w/ 100 levels "","28+2","29+2",..: 50 51 49 1 84 55 92 66 97 1 ...
##  $ RWB                     : Factor w/ 96 levels "","30+2","31+2",..: 55 58 58 1 83 60 93 67 91 1 ...
##  $ LB                      : Factor w/ 99 levels "","29+2","30+2",..: 49 54 52 1 79 52 91 65 99 1 ...
##  $ LCB                     : Factor w/ 109 levels "","25+2","27+2",..: 30 44 31 1 71 35 81 65 109 1 ...
##  $ CB                      : Factor w/ 109 levels "","25+2","27+2",..: 30 44 31 1 71 35 81 65 109 1 ...
##  $ RCB                     : Factor w/ 109 levels "","25+2","27+2",..: 30 44 31 1 71 35 81 65 109 1 ...
##  $ RB                      : Factor w/ 99 levels "","29+2","30+2",..: 49 54 52 1 79 52 91 65 99 1 ...
##  $ Crossing                : int  84 84 79 17 93 81 86 77 66 13 ...
##  $ Finishing               : int  95 94 87 13 82 84 72 93 60 11 ...
##  $ HeadingAccuracy         : int  70 89 62 21 55 61 55 77 91 15 ...
##  $ ShortPassing            : int  90 81 84 50 92 89 93 82 78 29 ...
##  $ Volleys                 : int  86 87 84 13 82 80 76 88 66 13 ...
##  $ Dribbling               : int  97 88 96 18 86 95 90 87 63 12 ...
##  $ Curve                   : int  93 81 88 21 85 83 85 86 74 13 ...
##  $ FKAccuracy              : int  94 76 87 19 83 79 78 84 72 14 ...
##  $ LongPassing             : int  87 77 78 51 91 83 88 64 77 26 ...
##  $ BallControl             : int  96 94 95 42 91 94 93 90 84 16 ...
##  $ Acceleration            : int  91 89 94 57 78 94 80 86 76 43 ...
##  $ SprintSpeed             : int  86 91 90 58 76 88 72 75 75 60 ...
##  $ Agility                 : int  91 87 96 60 79 95 93 82 78 67 ...
##  $ Reactions               : int  95 96 94 90 91 90 90 92 85 86 ...
##  $ Balance                 : int  95 70 84 43 77 94 94 83 66 49 ...
##  $ ShotPower               : int  85 95 80 31 91 82 79 86 79 22 ...
##  $ Jumping                 : int  68 95 61 67 63 56 68 69 93 76 ...
##  $ Stamina                 : int  72 88 81 43 90 83 89 90 84 41 ...
##  $ Strength                : int  59 79 49 64 75 66 58 83 83 78 ...
##  $ LongShots               : int  94 93 82 12 91 80 82 85 59 12 ...
##  $ Aggression              : int  48 63 56 38 76 54 62 87 88 34 ...
##  $ Interceptions           : int  22 29 36 30 61 41 83 41 90 19 ...
##  $ Positioning             : int  94 95 89 12 87 87 79 92 60 11 ...
##  $ Vision                  : int  94 82 87 68 94 89 92 84 63 70 ...
##  $ Penalties               : int  75 85 81 40 79 86 82 85 75 11 ...
##  $ Composure               : int  96 95 94 68 88 91 84 85 82 70 ...
##  $ Marking                 : int  33 28 27 15 68 34 60 62 87 27 ...
##  $ StandingTackle          : int  28 31 24 21 58 27 76 45 92 12 ...
##  $ SlidingTackle           : int  26 23 33 13 51 22 73 38 91 18 ...
##  $ GKDiving                : int  6 7 9 90 15 11 13 27 11 86 ...
##  $ GKHandling              : int  11 11 9 85 13 12 9 25 8 92 ...
##  $ GKKicking               : int  15 15 15 87 5 6 7 31 9 78 ...
##  $ GKPositioning           : int  14 14 15 88 10 8 14 33 7 88 ...
##  $ GKReflexes              : int  8 11 11 94 13 8 9 37 11 89 ...
##  $ Release.Clause          : Factor w/ 1245 levels "","€1.1M","€1.2M",..: 295 84 296 106 234 189 104 170 24 123 ...

DataExplorer

introduce(df)
##    rows columns discrete_columns continuous_columns all_missing_columns
## 1 18207      88               45                 43                   0
##   total_missing_values complete_rows total_observations memory_usage
## 1                 1836         18147            1602216     10154392
plot_intro(df)

Missing Values

plot_missing(df)

3. Data Manipulation

Create Leagues & Sampling

bundesliga <- c(
  "1. FC Nürnberg", "1. FSV Mainz 05", "Bayer 04 Leverkusen", "FC Bayern München",
  "Borussia Dortmund", "Borussia Mönchengladbach", "Eintracht Frankfurt",
  "FC Augsburg", "FC Schalke 04", "Fortuna Düsseldorf", "Hannover 96",
  "Hertha BSC", "RB Leipzig", "SC Freiburg", "TSG 1899 Hoffenheim",
  "VfB Stuttgart", "VfL Wolfsburg", "SV Werder Bremen"
)

premierLeague <- c(
  "Arsenal", "Bournemouth", "Brighton & Hove Albion", "Burnley",
  "Cardiff City", "Chelsea", "Crystal Palace", "Everton", "Fulham",
  "Huddersfield Town", "Leicester City", "Liverpool", "Manchester City",
  "Manchester United", "Newcastle United", "Southampton", 
  "Tottenham Hotspur", "Watford", "West Ham United", "Wolverhampton Wanderers"
  
)

laliga <- c(
  "Athletic Club de Bilbao", "Atlético Madrid", "CD Leganés",
  "Deportivo Alavés", "FC Barcelona", "Getafe CF", "Girona FC", 
  "Levante UD", "Rayo Vallecano", "RC Celta", "RCD Espanyol", 
  "Real Betis", "Real Madrid", "Real Sociedad", "Real Valladolid CF",
  "SD Eibar", "SD Huesca", "Sevilla FC", "Valencia CF", "Villarreal CF"
)

seriea <- c(
  "Atalanta","Bologna","Cagliari","Chievo Verona","Empoli", "Fiorentina","Frosinone","Genoa",
  "Inter","Juventus","Lazio","Milan","Napoli","Parma","Roma","Sampdoria","Sassuolo","SPAL",
  "Torino","Udinese"
  
)

superlig <- c(
  "Akhisar Belediyespor","Alanyaspor", "Antalyaspor","Medipol Başakşehir FK","BB Erzurumspor","Beşiktaş JK",
  "Bursaspor","Çaykur Rizespor","Fenerbahçe SK", "Galatasaray SK","Göztepe SK","Kasimpaşa SK",
  "Kayserispor","Atiker Konyaspor","MKE Ankaragücü", "Sivasspor","Trabzonspor","Yeni Malatyaspor"
)

ligue1 <- c(
  "Amiens SC", "Angers SCO", "AS Monaco", "AS Saint-Étienne", "Dijon FCO", "En Avant de Guingamp",
  "FC Nantes", "FC Girondins de Bordeaux", "LOSC Lille", "Montpellier HSC", "Nîmes Olympique", 
  "OGC Nice", "Olympique Lyonnais","Olympique de Marseille", "Paris Saint-Germain", 
  "RC Strasbourg Alsace", "Stade Malherbe Caen", "Stade de Reims", "Stade Rennais FC", "Toulouse Football Club"
)

eredivisie <- c(
  "ADO Den Haag","Ajax", "AZ Alkmaar", "De Graafschap","Excelsior","FC Emmen","FC Groningen",
  "FC Utrecht", "Feyenoord","Fortuna Sittard", "Heracles Almelo","NAC Breda",
  "PEC Zwolle", "PSV","SC Heerenveen","Vitesse","VVV-Venlo","Willem II"
)

liganos <- c(
  "Os Belenenses", "Boavista FC", "CD Feirense", "CD Tondela", "CD Aves", "FC Porto",
  "CD Nacional", "GD Chaves", "Clube Sport Marítimo", "Moreirense FC", "Portimonense SC", "Rio Ave FC",
  "Santa Clara", "SC Braga", "SL Benfica", "Sporting CP", "Vitória Guimarães", "Vitória de Setúbal"
)
# Leagues
df %<>% mutate(League = if_else(Club %in% bundesliga, "Bundesliga",
                                  if_else(Club %in% premierLeague, "Premier League", 
                                          if_else(Club %in% laliga, "La Liga", 
                                                  if_else(Club %in% seriea, "Serie A", 
                                                          if_else(Club %in% superlig, "Süper Lig", 
                                                                  if_else(Club %in% ligue1, "Ligue 1", 
                                                                          if_else(Club %in% eredivisie, "Eredivisie",
                                                                                  if_else(Club %in% liganos, "Liga Nos", NA_character_)))))))),
                 
                 Country = if_else(League == "Bundesliga", "Germany",
                                   if_else(League == "Premier League", "UK",
                                           if_else(League == "La Liga", "Spain", 
                                                   if_else(League == "Serie A", "Italy", 
                                                           if_else(League == "Süper Lig", "Turkey", 
                                                                   if_else(League == "Ligue 1", "France", 
                                                                           if_else(League == "Liga Nos", "Portugal", 
                                                                                   if_else(League == "Eredivisie", "Netherlands", NA_character_))))))))) %>% 
  filter(!is.na(League)) %>% mutate_if(is.factor, as.character())


rm(bundesliga, premierLeague, laliga, seriea, superlig, ligue1, eredivisie, liganos)

String Manipulation

# String Manipulation #

# Player Value
df$Values <- str_remove_all(df$Value,"€")
df$Values <- str_replace_all(df$Values,"K", "000")
df$Values <- str_remove_all(df$Values,"M")

df$Values <- as.numeric(df$Values)

# Player Wage
df$Wages <- str_remove_all(df$Wage,"€")
df$Wages <- str_replace_all(df$Wages,"K", "000")

df$Wages <- as.numeric(df$Wages)

df <- df  %>% mutate(Values = if_else(df$Values < 1000 , Values * 1000000, Values))

Create Position Class

defence <- c("CB", "RB", "LB", "LWB", "RWB", "LCB", "RCB")
midfielder <- c("CM", "CDM","CAM","LM","RM", "LAM", "RAM", "LCM", "RCM", "LDM", "RDM")

df %<>% mutate(Class = if_else(Position %in% "GK", "Goal Keeper",
                                 if_else(Position %in% defence, "Defender",
                                         if_else(Position %in% midfielder, "Midfielder", "Forward"))))

rm(defence, midfielder)

Data Transformation

Height & Weight

# From categorical to numeric
df %<>%
  mutate(Height = round((as.numeric(str_sub(Height, start=1,end = 1))*30.48) + (as.numeric(str_sub(Height, start = 3, end = 5))* 2.54)),
         Weight = round(as.numeric(str_sub(Weight, start = 1, end = 3)) / 2.204623))

Preferred Foot

df %<>% filter(Preferred.Foot %in% c("Left", "Right")) 
df$Preferred.Foot <- as.factor(as.character(df$Preferred.Foot))

Rename Variables

df %<>% 
  rename(
    "Heading.Accuracy"= HeadingAccuracy,
    "Short.Passing"= ShortPassing,
    "FK.Accuracy" = FKAccuracy,
    "Long.Passing"= LongPassing,
    "Ball.Control"= BallControl,
    "Sprint.Speed"= SprintSpeed,
    "Shot.Power"= ShotPower,
    "Long.Shots"= LongShots,
    "Standing.Tackle"= StandingTackle,
    "Sliding.Tackle"= SlidingTackle,
    "GK.Diving"= GKDiving,
    "GK.Handling"= GKHandling,
    "GK.Kicking"= GKKicking,
    "GK.Positioning"= GKPositioning,
    "GK.Reflexes"= GKReflexes
  )

Remove Unnecessary Variables

df %<>% select(-ID, -Body.Type, -Real.Face, -Joined, -Loaned.From, -Release.Clause, -Photo, -Flag, -Special, -Work.Rate) 

4. Tidy Data

introduce(df)
##   rows columns discrete_columns continuous_columns all_missing_columns
## 1 4414      83               38                 45                   0
##   total_missing_values complete_rows total_observations memory_usage
## 1                    0          4414             366362      3178944
plot_intro(df)

plot_missing(df)

5. Data Analysis & Visualization

Liglerin Yaş Ortalamasının Bulunması

ggplot(df, aes(Age))+
  geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

df %>% 
  group_by(League) %>% 
  summarise(Avg.Age = mean(Age)) %>% 
  arrange(-Avg.Age)
## # A tibble: 8 x 2
##   League         Avg.Age
##   <chr>            <dbl>
## 1 Süper Lig         26.6
## 2 Serie A           25.7
## 3 Liga Nos          25.5
## 4 La Liga           24.7
## 5 Premier League    24.6
## 6 Ligue 1           24.4
## 7 Bundesliga        24.2
## 8 Eredivisie        23.3
summ <- df %>% 
  group_by(League) %>% 
  summarise(age = mean(Age), median = median(Age))

ggplot()+
  geom_histogram(df, mapping = aes(Age, fill = League), show.legend = FALSE)+
  facet_wrap(League~.)+
  geom_vline(summ, mapping = aes(xintercept = age), color = "red")+
  geom_text(summ, mapping = aes(x = age+3, y = 65, label = round(age, 2)))+
  theme_minimal()+
  #theme(legend.position = "bottom")+
  labs(y = "Frekans", x = "Yaş", fill = "Lig", title = "Liglerin Yaş Ortalaması", caption = "@EA Sports - FIFA 19")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

summ <- df %>% 
  group_by(League) %>% 
  summarise(age = mean(Age), median = median(Age))

ggplot()+
  geom_histogram(df, mapping = aes(Age, fill = League))+
  geom_vline(summ, mapping = aes(xintercept = age), color = "red", size = 1.5)+
  geom_text(summ, mapping = aes(x = age+3, y = 65, label = round(age,digits = 2)))+
  facet_wrap(League~.)+
  theme_minimal()+
  theme(legend.position = "bottom")+
  labs(y = "Frekans", x = "Yaş", fill = "Lig", title = "Liglerin Yaş Ortalaması", caption = "@EA Sports - FIFA 19")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

df %>% 
  filter(Age > 25) %>% 
  group_by(League) %>% 
  count(sort = TRUE)
## # A tibble: 8 x 2
## # Groups:   League [8]
##   League             n
##   <chr>          <int>
## 1 Süper Lig        316
## 2 Premier League   285
## 3 La Liga          262
## 4 Serie A          256
## 5 Ligue 1          217
## 6 Liga Nos         214
## 7 Bundesliga       199
## 8 Eredivisie       142

Liglerin Pazar Değerlerinin Karşılaştırılması

df %>% 
  group_by(League) %>% 
  summarise(Total.Value = sum(as.integer(Values), na.rm = TRUE)) %>% 
  ggplot(aes(reorder(League, Total.Value), Total.Value, fill = Total.Value))+
  geom_col(show.legend = FALSE)+
  coord_flip()+
  theme_minimal()+
  labs(x = NULL, y = "Ligin Pazar Değeri")+
  scale_fill_gradient(low = "khaki", high = "seagreen")

df %>% 
  group_by(League) %>% 
  summarise(Total.Value = sum(as.integer(Values), na.rm = TRUE)) %>% 
  ggplot(aes(reorder(League, Total.Value), Total.Value, fill = Total.Value))+
  geom_col(show.legend = FALSE)+
  coord_flip()+
  theme_minimal()+
  labs(x = NULL, y = "Ligin Pazar Değeri")+
  scale_fill_gradient(low = "khaki", high = "seagreen")+
  theme(axis.line.y = element_line(colour = "darkslategray"),
        axis.ticks.x = element_line(colour = "darkslategray"))+
  scale_y_continuous(labels = c("0 €", "2 Milyar €", "4 Milyar €", "6 Milyar €")) 

İnteraktif Dünya Haritası ve Futbolcu Sayısı

world_map <- map_data("world")

numofplayers <- world_map %>% 
  mutate(region = as.character(region)) %>% 
  left_join((df %>% mutate(Nationality = as.character(Nationality),
                           Nationality = if_else(Nationality %in% "England", 
                                                 "UK", Nationality)) %>%
               #filter(League == "Bundesliga") %>%
               count(Nationality, name = "Number of Player") %>%
               rename(region = Nationality) %>%
               mutate(region = as.character(region))), by = "region")


ggplot(numofplayers, aes(long, lat, group = group))+
  geom_polygon(aes(fill = `Number of Player` ), color = "white", show.legend = FALSE)

ggplotly(
  ggplot(numofplayers, aes(long, lat, group = group))+
    geom_polygon(aes(fill = `Number of Player` ), color = "white", show.legend = FALSE)+
    scale_fill_viridis_c(option = "C")+
    theme_void()+
    labs(fill = "Futbolcu Sayısı",
         title = "Hangi ülkeden kaç futbolcu var?"))

İki Futbolcunun Karşılaştırılması

# Futbolcuların Seçimi
players <- df %>% 
  filter(Name %in% c("Cristiano Ronaldo", "L. Messi")) %>% 
# Futbolcu ve Takım İsminin Birleştirilmesi
  mutate(Name = paste0(Name, ", ", Club)) %>%
# Futbolcuların becerilerini temsil eden değişkenlerin seçimi
  select(Name,Crossing:Sliding.Tackle) %>% 
# Değişkenlerdeki noktalama işaretlerinin düzenlenmesi  
  rename_all(funs(gsub("[[:punct:]]", " ", .))) %>% 
# Değişken - Gözlem Dönüşümü
  gather(Skill, Exp, Crossing:`Sliding Tackle`, -Name)
## Warning: funs() is soft deprecated as of dplyr 0.8.0
## Please use a list of either functions or lambdas: 
## 
##   # Simple named list: 
##   list(mean = mean, median = median)
## 
##   # Auto named with `tibble::lst()`: 
##   tibble::lst(mean, median)
## 
##   # Using lambdas
##   list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once per session.
players  
##                           Name            Skill Exp
## 1       L. Messi, FC Barcelona         Crossing  84
## 2  Cristiano Ronaldo, Juventus         Crossing  84
## 3       L. Messi, FC Barcelona        Finishing  95
## 4  Cristiano Ronaldo, Juventus        Finishing  94
## 5       L. Messi, FC Barcelona Heading Accuracy  70
## 6  Cristiano Ronaldo, Juventus Heading Accuracy  89
## 7       L. Messi, FC Barcelona    Short Passing  90
## 8  Cristiano Ronaldo, Juventus    Short Passing  81
## 9       L. Messi, FC Barcelona          Volleys  86
## 10 Cristiano Ronaldo, Juventus          Volleys  87
## 11      L. Messi, FC Barcelona        Dribbling  97
## 12 Cristiano Ronaldo, Juventus        Dribbling  88
## 13      L. Messi, FC Barcelona            Curve  93
## 14 Cristiano Ronaldo, Juventus            Curve  81
## 15      L. Messi, FC Barcelona      FK Accuracy  94
## 16 Cristiano Ronaldo, Juventus      FK Accuracy  76
## 17      L. Messi, FC Barcelona     Long Passing  87
## 18 Cristiano Ronaldo, Juventus     Long Passing  77
## 19      L. Messi, FC Barcelona     Ball Control  96
## 20 Cristiano Ronaldo, Juventus     Ball Control  94
## 21      L. Messi, FC Barcelona     Acceleration  91
## 22 Cristiano Ronaldo, Juventus     Acceleration  89
## 23      L. Messi, FC Barcelona     Sprint Speed  86
## 24 Cristiano Ronaldo, Juventus     Sprint Speed  91
## 25      L. Messi, FC Barcelona          Agility  91
## 26 Cristiano Ronaldo, Juventus          Agility  87
## 27      L. Messi, FC Barcelona        Reactions  95
## 28 Cristiano Ronaldo, Juventus        Reactions  96
## 29      L. Messi, FC Barcelona          Balance  95
## 30 Cristiano Ronaldo, Juventus          Balance  70
## 31      L. Messi, FC Barcelona       Shot Power  85
## 32 Cristiano Ronaldo, Juventus       Shot Power  95
## 33      L. Messi, FC Barcelona          Jumping  68
## 34 Cristiano Ronaldo, Juventus          Jumping  95
## 35      L. Messi, FC Barcelona          Stamina  72
## 36 Cristiano Ronaldo, Juventus          Stamina  88
## 37      L. Messi, FC Barcelona         Strength  59
## 38 Cristiano Ronaldo, Juventus         Strength  79
## 39      L. Messi, FC Barcelona       Long Shots  94
## 40 Cristiano Ronaldo, Juventus       Long Shots  93
## 41      L. Messi, FC Barcelona       Aggression  48
## 42 Cristiano Ronaldo, Juventus       Aggression  63
## 43      L. Messi, FC Barcelona    Interceptions  22
## 44 Cristiano Ronaldo, Juventus    Interceptions  29
## 45      L. Messi, FC Barcelona      Positioning  94
## 46 Cristiano Ronaldo, Juventus      Positioning  95
## 47      L. Messi, FC Barcelona           Vision  94
## 48 Cristiano Ronaldo, Juventus           Vision  82
## 49      L. Messi, FC Barcelona        Penalties  75
## 50 Cristiano Ronaldo, Juventus        Penalties  85
## 51      L. Messi, FC Barcelona        Composure  96
## 52 Cristiano Ronaldo, Juventus        Composure  95
## 53      L. Messi, FC Barcelona          Marking  33
## 54 Cristiano Ronaldo, Juventus          Marking  28
## 55      L. Messi, FC Barcelona  Standing Tackle  28
## 56 Cristiano Ronaldo, Juventus  Standing Tackle  31
## 57      L. Messi, FC Barcelona   Sliding Tackle  26
## 58 Cristiano Ronaldo, Juventus   Sliding Tackle  23
# Becerilere göre futbolcuların ayrı ayrı görselleştirilmesi
ggplot(players, aes(Skill, Exp, fill = Name))+
  geom_col(show.legend = FALSE)+
  coord_flip()+
  facet_wrap(Name~.)+
  scale_fill_manual(values = c("black", "navy"))+
  theme_minimal()

# Becerilere göre futbolcuların birlikte görselleştirilmesi
ggplot(players, aes(Skill, Exp, fill = Name))+
  geom_col(position = "fill")+
  coord_flip()+
  scale_fill_manual(values = c("black", "navy"))+
  theme_minimal()+
  geom_hline(yintercept = 0.5, color = "white", size = 1, linetype = 2)+
  theme(legend.position = "top", axis.text.x=element_blank())+
  labs(title = "Futbolcuların Yeteneklerinin Karşılaştırılması", 
       caption = "@EA Sports - FIFA 19",
       fill = NULL,x = NULL, y = NULL)

Mevkilerin Görselleştirilmesi

# Futbolcu Seçimi
player <- df %>% filter(Name == "Neymar Jr")%>%  select(Position, LS:RB)
# Pozisyonları Satır Bazına İndirgemek
player <- as.data.frame(t(player)) %>% 
  rownames_to_column("Pos") %>% 
  mutate(V1 = as.numeric(str_sub(V1, end = 2)),
         Pos = as.factor(Pos))
## Warning: Zorlamadan dolayı ortaya çıkan NAs
# Mevkilerin Oluşturulması
pos <- data.frame(
  Pos = as.character(c("LB","LCB","CB", "RCB","RB",
                      "LWB", "LDM", "CDM", "RDM", "RWB",
                      "LM", "LCM", "CM", "RCM", "RM",
                      "LAM", "CAM", "RAM",
                      "LW","LF","CF","RF","RW",
                      "LS","ST","RS")), 
  x = c(1:5, 1:5,1:5, 2:4, 1:5,2:4),
  y = c(rep(1,5), rep(1.5,5), rep(2,5), rep(2.5,3), rep(3,5), rep(3.5,3)))

# Mevki koordinatlarının veri seti ile birleştirilmesi
player <- left_join(player, pos, by = 'Pos')
## Warning: Column `Pos` joining factors with different levels, coercing to
## character vector
# Gereksiz gözlemin silinmesi
player <- na.omit(player)


ggplot(player, aes(x,y))+
  geom_point(shape = 22, size = 20, color = "white")+
  geom_text(aes(label = Pos), vjust= -0.5, color = "white", size = 4.5, fontface = "bold")+
  geom_text(aes(label = V1), vjust = 1.5, fontface = "bold", color = "white")+
  theme(panel.background = element_rect(fill = "#224C56"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank(),
        title = element_text(color = "#DA291C",face="bold.italic",size = 23) 
        )

# Saha Formatı 
ggplot(player, aes(x, y, fill = if_else(V1 < 50, "orangered", 
                                        if_else(V1 <60, "orange",
                                                if_else(V1 < 70, "goldenrod1", 
                                                        if_else(V1 <80, "palegreen4",
                                                                if_else(V1 < 90, "forestgreen",
                                                                        if_else(V1 == 0,
                                                                                "orangered","darkgreen"))))))
                   ))+
  geom_point(shape = 22, size = 20, color = "white", show.legend = FALSE,position = "identity")+
  theme(panel.background = element_rect(fill = "#224C56"),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.text = element_blank(),
        axis.ticks = element_blank(),
        axis.title = element_blank(),
        title = element_text(color = "#DA291C",face="bold.italic",size = 23) 
        )+
  geom_text(aes(label = Pos), vjust= -0.5, color = "white", size = 4.5, fontface = "bold")+
  geom_text(aes(label = V1), vjust = 1.5, fontface = "bold", color = "white")+
  scale_fill_identity()+
  ylim(0.8, 4)+
  labs(title = "Neymar Jr Mevki Güçleri")

Vücut Kitle Endeksi

  • İdeal Kilonun Altı: < 18,49
  • İdeal Kilo: 18.5 - 24,99
  • İdeal Kilonun Üzeri: 25 - 29,99
  • İdeal Kilonun Çok Üzeri: > 30
# Vücut Kitle Endeksinin Hesaplanması
bmi <- df %>% 
  filter(Club == "Liverpool") %>%
  mutate(BMI = round(Weight/(Height/100)^2, digits = 4))%>%
  arrange(-BMI)%>%
  select(Name, Age, Position, Class, Height, Weight, BMI)

# İnteraktif Sonuç
datatable(bmi)
# İlk ve son 5'şer gözlemler
bmi2  <- rbind(
  bmi %>% head(5) %>% mutate(BMI = BMI * -1),
  bmi %>% tail(5)
  ) %>% mutate(Type = if_else(BMI < 0, "Head", "Tail"))

# Futbolcuların Vücut Kitle Endekslerinin Görselleştirilmesi  
bmi2 %>% 
  ggplot(aes(fct_reorder(paste(Name,",", Position), desc(BMI)), BMI))+
  geom_col(aes(fill = Type))+
  geom_text(aes(y = c(rep(-2,5), rep(2,5)),label = round(abs(BMI),digits = 2)), 
            color = "white", fontface = "bold", size = 4)+
  coord_flip()+
  theme_minimal()+
  theme(axis.text.x = element_blank(),
        legend.position = "top",
        panel.background = element_rect(fill = "lightgray"),
        panel.grid.minor = element_blank(),
        axis.text = element_text(color = "slategray", face = "bold.italic",size = 12),
        title = element_text(color = "slategray", face = "bold.italic",size = 20),
        legend.box.background = element_rect(linetype = 2))+
  labs(x = NULL, y = NULL, fill = NULL, title = "BMI Index")+
  scale_fill_manual(values = c("steelblue", "khaki"))

Korelasyon

kor <- df %>% 
  filter(League == "La Liga", Class == "Forward") %>% 
  select(Name, Preferred.Foot, Finishing, Shot.Power)
shapiro.test(kor$Finishing); shapiro.test(kor$Shot.Power)
## 
##  Shapiro-Wilk normality test
## 
## data:  kor$Finishing
## W = 0.9751, p-value = 0.03053
## 
##  Shapiro-Wilk normality test
## 
## data:  kor$Shot.Power
## W = 0.9425, p-value = 0.00009062
cor.test(kor$Shot.Power, kor$Finishing, method = "pearson")
## 
##  Pearson's product-moment correlation
## 
## data:  kor$Shot.Power and kor$Finishing
## t = 12.023, df = 113, p-value < 0.00000000000000022
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6560646 0.8198210
## sample estimates:
##      cor 
## 0.749175
cor.test(kor$Shot.Power, kor$Finishing, method = "kendall")
## 
##  Kendall's rank correlation tau
## 
## data:  kor$Shot.Power and kor$Finishing
## z = 8.7156, p-value < 0.00000000000000022
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##       tau 
## 0.5674854
hypo <- cor.test(kor$Shot.Power, kor$Finishing, method = "spearman")
## Warning in cor.test.default(kor$Shot.Power, kor$Finishing, method = "spearman"):
## Cannot compute exact p-value with ties
hypo
## 
##  Spearman's rank correlation rho
## 
## data:  kor$Shot.Power and kor$Finishing
## S = 64431, p-value < 0.00000000000000022
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.7457925
ggplot(kor, aes(Shot.Power, Finishing, label = Name, color = Preferred.Foot))+
  geom_text()+
  theme_minimal()+
  theme(legend.position = "bottom")+
  geom_jitter(alpha = 0.3, size = 2.5, width = 0.3, height = 0.3)+
  geom_smooth(method = "lm", color = "gray40", lty = 2, se = FALSE, size = 0.6)+
  scale_color_manual(values = c("orangered","steelblue"))+
  labs(title = paste("Spearman Correlation Coefficient:", round(hypo$estimate, digits = 2)),
       subtitle = "p-value < 0.05")

Ayak Tercihine Göre Bitiricilik veya Şut Gücü yetenekleri arasında anlamlı bir fark olup olmadığı

xt1 <- kor %>% filter(Preferred.Foot == "Left") %>% select(Shot.Power) %>% pull()
xt2 <- kor %>% filter(Preferred.Foot == "Right") %>% select(Shot.Power) %>% pull()
yt1 <- kor %>% filter(Preferred.Foot == "Left") %>% select(Finishing) %>% pull()
yt2 <- kor %>% filter(Preferred.Foot == "Right") %>% select(Finishing) %>% pull()
xht <- wilcox.test(xt1, xt2, alternative = "two.sided")
yht <- wilcox.test(yt1, yt2, alternative = "two.sided")

xht; yht
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  xt1 and xt2
## W = 1160, p-value = 0.8149
## alternative hypothesis: true location shift is not equal to 0
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  yt1 and yt2
## W = 974.5, p-value = 0.3086
## alternative hypothesis: true location shift is not equal to 0
grid.arrange(ncol = 2,
  ggplot(kor, aes(x = Preferred.Foot, y = Shot.Power, fill = Preferred.Foot))+
    geom_boxplot(show.legend = FALSE)+
    theme_minimal()+
    scale_fill_manual(values = c("orangered", "steelblue"))+
    ylim(30,100)+
    labs(title = "Şut Gücü"),
  
  ggplot(kor, aes(x = Preferred.Foot, y = Finishing, fill = Preferred.Foot))+
    geom_boxplot(show.legend = FALSE)+
    theme_minimal()+
    scale_fill_manual(values = c("orangered", "steelblue"))+
    ylim(30,100)+
    labs(title = "Bitiricilik")
)

3 Boyutlu İnteraktif Saçılım Grafiği

p3d <- df %>% filter(League == "Süper Lig")  
  
plot_ly(p3d, x = ~Finishing, y = ~Age, z = ~Shot.Power, color = ~Class, text = ~Name) %>% 
  add_markers() %>%
  layout(
    scene = list(xaxis = list(title = 'Bitiricilik'),
                 yaxis = list(title = 'Yaş'),
                 zaxis = list(title = 'Şut Gücü'))
    )

Potansiyelleirn ve Overall karşılaştırılması

df %>% 
  filter(Club == "Paris Saint-Germain") %>% 
  select(Name, Overall, Potential) %>% 
  arrange(-Overall) %>% 
  head(10) %>% 
  gather(variable, Exp, -Name) %>% 
  ggplot(aes(Name, Exp, fill = variable))+
  geom_col(position = "dodge")+
  geom_text(aes(label = Exp),position = position_dodge(width = 0.9), vjust = -0.5)+
  scale_fill_manual(values = c("#DA291C", "#004170"))+
  theme_minimal()+
  theme(legend.position = "bottom")+
  labs(fill = NULL, x = NULL, title = "Paris Saint-Germain")

Sözleşmeler Ne Zaman Sonlanacak?

df <- df %>% 
  mutate(Contract.Valid.Until = as.numeric(
    str_sub(
      Contract.Valid.Until, str_length(Contract.Valid.Until)-3, str_length(Contract.Valid.Until))
    )
  ) 

df %>% 
  group_by(Contract.Valid.Until, League) %>% 
  count() %>%
  ungroup() %>% 
  ggplot(aes(Contract.Valid.Until, n, color = League))+
  geom_line(size = 1.2)+
  theme_light()+
  scale_color_manual(values = c("seagreen", "royalblue", "orchid", "orange", "gray", "tomato", "navy", "red"))

df %>% 
  group_by(Contract.Valid.Until, League) %>% 
  count() %>%
  ungroup() %>% 
  ggplot(aes(Contract.Valid.Until, n, color = League))+
  geom_line(size = 1.2)+
  theme_light()+
  scale_color_manual(values = c("seagreen", "royalblue", "orchid", "orange", "gray", "tomato", "navy", "red"))+
  facet_wrap(League~.)

En Güçlü Takımlar

en.guclu <- df %>% 
  group_by(Club) %>% 
  summarise(mean = mean(Overall)) %>% 
  arrange(-mean) %>% 
  head(20)


df %>% 
  group_by(Club, Class) %>% 
  summarise(mean = mean(Overall)) %>% 
  ungroup() %>% 
  filter(Club %in% en.guclu$Club) %>% 
  ggplot(aes(reorder(Club, mean), mean, fill = Class))+
  geom_col(position = "fill")+
  geom_text(aes(label = round(mean,digits = 2)), position = position_fill(0.5))+
  coord_flip()+
  theme_minimal()+
  theme(legend.position = "top")+
  labs(x = NULL, y = NULL, title = "Pozisyonlara Göre Takım Gücü")

En iyi forvet oyuncularının en iyi özellikleri

df %>% 
  arrange(-Overall) %>% 
  filter(Class == "Forward") %>% 
  head(12) %>% 
  select(Name, Crossing:Sliding.Tackle) %>% 
  gather(variables, Exp, -Name) %>% 
  group_by(Name) %>%
  arrange(-Exp) %>% 
  do(head(., 5)) %>% 
  ungroup() %>% 
  mutate(variables = reorder_within(variables, Exp, Name)) %>% 
  ggplot(aes(variables, Exp, fill = Name))+
  geom_col(show.legend = FALSE)+
  geom_text(aes(label = Exp), position = position_stack(vjust = 0.5), color = "gold")+
  facet_wrap(Name~., scales = "free_y")+
  scale_x_reordered()+
  coord_flip()+
  theme_dark()+
  scale_fill_manual(values = c("black", "#004170","royalblue", "white", "white","#12A0D7","#800000", "#800000",
                               "#004170", "black","red", "#6CADDF"))+
  labs(x = NULL, y = NULL)

Ligler içerisindeki pozisyon sınıflarının dağılımı

 df %>% group_by(League) %>% count(Class) %>% 
  ggplot(aes(League, n, fill = Class)) +
  geom_col()+
  coord_polar()+
  scale_fill_ordinal()+
  theme_minimal()+
  labs(x = NULL, y = NULL)

Pozisyon sınıfına göre Premier Ligindeki futbolcuarın ortalama özet istatistikleri

df %>% 
  filter(League == "Premier League") %>% 
  select(Class, Sprint.Speed, Dribbling, Shot.Power, Finishing, Balance, Short.Passing) %>% 
  group_by(Class) %>% 
  summarise_at(vars(Sprint.Speed:Short.Passing), funs(mean)) %>% 
  gather(variables, values, -Class) %>% 
  ggplot(aes(variables, values, fill = Class))+
  geom_col(position = "dodge")+
  coord_polar()+
  scale_fill_ordinal()+
  theme_minimal()+
  labs(x = NULL, y = NULL)

Yaşlara göre liglerin ortalama potansiyel ve güçlerin karşılaştırılması

df %>% 
  group_by(League, Age) %>% 
  summarise(Overall = mean(Overall),
            Potential = mean(Potential)) %>% 
  ggplot()+
  geom_line(aes(Age, Potential, color = "Potential")) +
  geom_line(aes(Age, Overall, color = "Overall"), alpha = 0.5) +
  facet_wrap(League~.)+
  scale_color_manual(values = c("blue", "red"))+
  theme(legend.position = "bottom")+
  labs(color = NULL)

Shiny Kütüphanesi ile FIFA 19 Dashboard

https://ekrem-bayar.shinyapps.io/FifaDash/

knitr::include_graphics("www/gorsel3.png")