Shana Green

DATA 607 - Homework 1

Due Date: 8/29/2020

Introduction

Tennis is an individual sport in which the player uses a racquet to hit a ball over a net into the opponent’s court. As a great admirer of tennis, it is a time-consuming sport to watch. Carl Bialik’s article Why Some Tennis Matches Take Forever analyzes certain variables to convey why is this the case in tennis.

# Loading data sets for the events, players, and servers
 
event <- read.csv("https://raw.githubusercontent.com/sagreen131/DATA-607-HW-1/master/events_time.csv")

player <- read.csv("https://raw.githubusercontent.com/sagreen131/DATA-607-HW-1/master/players_time.csv")

server <- read.csv("https://raw.githubusercontent.com/sagreen131/DATA-607-HW-1/master/serve_times.csv")

Summary stats for all variables in events, players, and servers

View(event)
View(player)
View(server)

summary(event)
##   tournament          surface          seconds_added_per_point
##  Length:205         Length:205         Min.   :-2.980         
##  Class :character   Class :character   1st Qu.: 0.730         
##  Mode  :character   Mode  :character   Median : 1.630         
##                                        Mean   : 1.762         
##                                        3rd Qu.: 2.920         
##                                        Max.   : 5.380         
##     years          
##  Length:205        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
summary(player)
##     player          seconds_added_per_point
##  Length:218         Min.   :-6.3700        
##  Class :character   1st Qu.:-1.9475        
##  Mode  :character   Median : 1.9650        
##                     Mean   : 0.9417        
##                     3rd Qu.: 3.0400        
##                     Max.   : 6.3500
summary(server)
##     server          seconds_before_next_point     day           
##  Length:120         Min.   : 9.00             Length:120        
##  Class :character   1st Qu.:15.00             Class :character  
##  Mode  :character   Median :19.50             Mode  :character  
##                     Mean   :20.36                               
##                     3rd Qu.:25.25                               
##                     Max.   :38.00                               
##    opponent          game_score             set            game          
##  Length:120         Length:120         Min.   :1.000   Length:120        
##  Class :character   Class :character   1st Qu.:2.000   Class :character  
##  Mode  :character   Mode  :character   Median :3.000   Mode  :character  
##                                        Mean   :2.567                     
##                                        3rd Qu.:3.000                     
##                                        Max.   :5.000

The mean, median, standard deviation, and variance of the events, players, and server of a given match

mean(event$seconds_added_per_point,ra.rm = TRUE)
## [1] 1.761805
mean(player$seconds_added_per_point, na.rm = TRUE)
## [1] 0.9417431
mean(server$seconds_before_next_point, na.rm = TRUE)
## [1] 20.35833
median(event$seconds_added_per_point,ra.rm = TRUE)
## [1] 1.63
median(player$seconds_added_per_point, na.rm = TRUE)
## [1] 1.965
median(server$seconds_before_next_point, na.rm = TRUE)
## [1] 19.5
sd(event$seconds_added_per_point)
## [1] 1.600261
sd(player$seconds_added_per_point)
## [1] 2.860936
sd(server$seconds_before_next_point)
## [1] 6.889031
var(event$seconds_added_per_point)
## [1] 2.560836
var(player$seconds_added_per_point)
## [1] 8.184955
var(server$seconds_before_next_point)
## [1] 47.45875

Is there a correlation between surface and the seconds added per point in a given tournament? I created two subsets and compared grass to clay with respect to time.

numgrass<-subset(event,seconds_added_per_point<0 & surface == "Grass")
View(numgrass)
summary(numgrass)
##   tournament          surface          seconds_added_per_point
##  Length:9           Length:9           Min.   :-2.98          
##  Class :character   Class :character   1st Qu.:-2.33          
##  Mode  :character   Mode  :character   Median :-1.50          
##                                        Mean   :-1.50          
##                                        3rd Qu.:-0.58          
##                                        Max.   :-0.35          
##     years          
##  Length:9          
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
numclay<-subset(event,seconds_added_per_point>0 & surface == "Clay")
View(numclay)
summary(numclay)
##   tournament          surface          seconds_added_per_point
##  Length:70          Length:70          Min.   :0.440          
##  Class :character   Class :character   1st Qu.:2.560          
##  Mode  :character   Mode  :character   Median :3.115          
##                                        Mean   :3.188          
##                                        3rd Qu.:3.658          
##                                        Max.   :5.380          
##     years          
##  Length:70         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

According to the analysis, when tournaments are played on grass, the players generally play faster. However, when tournaments are played on clay, the players play slower. Nine games were played faster on grass, in comparison to seventy games played slower on clay.

####Does the number of deuces affect the speed of a player?

numdeuce<-subset(server,seconds_before_next_point>20 & game_score == "Deuce")
View(numdeuce)
summary(numdeuce)
##     server          seconds_before_next_point     day           
##  Length:8           Min.   :21.00             Length:8          
##  Class :character   1st Qu.:22.75             Class :character  
##  Mode  :character   Median :24.00             Mode  :character  
##                     Mean   :25.12                               
##                     3rd Qu.:24.25                               
##                     Max.   :38.00                               
##    opponent          game_score             set          game          
##  Length:8           Length:8           Min.   :1.0   Length:8          
##  Class :character   Class :character   1st Qu.:1.0   Class :character  
##  Mode  :character   Mode  :character   Median :3.0   Mode  :character  
##                                        Mean   :2.5                     
##                                        3rd Qu.:3.0                     
##                                        Max.   :5.0
numdeuce2<-subset(server,seconds_before_next_point<20 & game_score == "Deuce")
View(numdeuce2)
summary(numdeuce2)
##     server          seconds_before_next_point     day           
##  Length:9           Min.   : 9.00             Length:9          
##  Class :character   1st Qu.:12.00             Class :character  
##  Mode  :character   Median :13.00             Mode  :character  
##                     Mean   :13.89                               
##                     3rd Qu.:17.00                               
##                     Max.   :18.00                               
##    opponent          game_score             set            game          
##  Length:9           Length:9           Min.   :1.000   Length:9          
##  Class :character   Class :character   1st Qu.:2.000   Class :character  
##  Mode  :character   Mode  :character   Median :3.000   Mode  :character  
##                                        Mean   :2.556                     
##                                        3rd Qu.:3.000                     
##                                        Max.   :3.000
numdeuce$server
## [1] "Nicolas Almagro" "Rafael Nadal"    "Pablo Andujar"   "Borna Coric"    
## [5] "Andy Murray"     "Andy Murray"     "Andy Murray"     "Nick Kyrgios"
numdeuce2$server
## [1] "Bernard Tomic" "Bernard Tomic" "Lukas Rosol"   "Roger Federer"
## [5] "Roger Federer" "Benoit Paire"  "Nick Kyrgios"  "Nick Kyrgios" 
## [9] "Nick Kyrgios"

Nick Kyrgios has the best speed average before the next point for a deuce. He is the only server who appears on both data sets and is the number 25th fastest player.

Findings and Recommendations

The harder the surface of a tennis match, the greater the time a player spends in between their sets. I loved Carl Bialik’s analysis on correlating the player’s speed over the years. I recommend an additional variable to be considered. Since Tennis is an outdoor sport, weather plays a major role as well. Rain, for example can affect the surface of a given match. If this vital information is added, it may change the data analysis.