Tennis is an individual sport in which the player uses a racquet to hit a ball over a net into the opponent’s court. As a great admirer of tennis, it is a time-consuming sport to watch. Carl Bialik’s article Why Some Tennis Matches Take Forever analyzes certain variables to convey why is this the case in tennis.
# Loading data sets for the events, players, and servers
event <- read.csv("https://raw.githubusercontent.com/sagreen131/DATA-607-HW-1/master/events_time.csv")
player <- read.csv("https://raw.githubusercontent.com/sagreen131/DATA-607-HW-1/master/players_time.csv")
server <- read.csv("https://raw.githubusercontent.com/sagreen131/DATA-607-HW-1/master/serve_times.csv")
View(event)
View(player)
View(server)
summary(event)
## tournament surface seconds_added_per_point
## Length:205 Length:205 Min. :-2.980
## Class :character Class :character 1st Qu.: 0.730
## Mode :character Mode :character Median : 1.630
## Mean : 1.762
## 3rd Qu.: 2.920
## Max. : 5.380
## years
## Length:205
## Class :character
## Mode :character
##
##
##
summary(player)
## player seconds_added_per_point
## Length:218 Min. :-6.3700
## Class :character 1st Qu.:-1.9475
## Mode :character Median : 1.9650
## Mean : 0.9417
## 3rd Qu.: 3.0400
## Max. : 6.3500
summary(server)
## server seconds_before_next_point day
## Length:120 Min. : 9.00 Length:120
## Class :character 1st Qu.:15.00 Class :character
## Mode :character Median :19.50 Mode :character
## Mean :20.36
## 3rd Qu.:25.25
## Max. :38.00
## opponent game_score set game
## Length:120 Length:120 Min. :1.000 Length:120
## Class :character Class :character 1st Qu.:2.000 Class :character
## Mode :character Mode :character Median :3.000 Mode :character
## Mean :2.567
## 3rd Qu.:3.000
## Max. :5.000
mean(event$seconds_added_per_point,ra.rm = TRUE)
## [1] 1.761805
mean(player$seconds_added_per_point, na.rm = TRUE)
## [1] 0.9417431
mean(server$seconds_before_next_point, na.rm = TRUE)
## [1] 20.35833
median(event$seconds_added_per_point,ra.rm = TRUE)
## [1] 1.63
median(player$seconds_added_per_point, na.rm = TRUE)
## [1] 1.965
median(server$seconds_before_next_point, na.rm = TRUE)
## [1] 19.5
sd(event$seconds_added_per_point)
## [1] 1.600261
sd(player$seconds_added_per_point)
## [1] 2.860936
sd(server$seconds_before_next_point)
## [1] 6.889031
var(event$seconds_added_per_point)
## [1] 2.560836
var(player$seconds_added_per_point)
## [1] 8.184955
var(server$seconds_before_next_point)
## [1] 47.45875
numgrass<-subset(event,seconds_added_per_point<0 & surface == "Grass")
View(numgrass)
summary(numgrass)
## tournament surface seconds_added_per_point
## Length:9 Length:9 Min. :-2.98
## Class :character Class :character 1st Qu.:-2.33
## Mode :character Mode :character Median :-1.50
## Mean :-1.50
## 3rd Qu.:-0.58
## Max. :-0.35
## years
## Length:9
## Class :character
## Mode :character
##
##
##
numclay<-subset(event,seconds_added_per_point>0 & surface == "Clay")
View(numclay)
summary(numclay)
## tournament surface seconds_added_per_point
## Length:70 Length:70 Min. :0.440
## Class :character Class :character 1st Qu.:2.560
## Mode :character Mode :character Median :3.115
## Mean :3.188
## 3rd Qu.:3.658
## Max. :5.380
## years
## Length:70
## Class :character
## Mode :character
##
##
##
According to the analysis, when tournaments are played on grass, the players generally play faster. However, when tournaments are played on clay, the players play slower. Nine games were played faster on grass, in comparison to seventy games played slower on clay.
####Does the number of deuces affect the speed of a player?
numdeuce<-subset(server,seconds_before_next_point>20 & game_score == "Deuce")
View(numdeuce)
summary(numdeuce)
## server seconds_before_next_point day
## Length:8 Min. :21.00 Length:8
## Class :character 1st Qu.:22.75 Class :character
## Mode :character Median :24.00 Mode :character
## Mean :25.12
## 3rd Qu.:24.25
## Max. :38.00
## opponent game_score set game
## Length:8 Length:8 Min. :1.0 Length:8
## Class :character Class :character 1st Qu.:1.0 Class :character
## Mode :character Mode :character Median :3.0 Mode :character
## Mean :2.5
## 3rd Qu.:3.0
## Max. :5.0
numdeuce2<-subset(server,seconds_before_next_point<20 & game_score == "Deuce")
View(numdeuce2)
summary(numdeuce2)
## server seconds_before_next_point day
## Length:9 Min. : 9.00 Length:9
## Class :character 1st Qu.:12.00 Class :character
## Mode :character Median :13.00 Mode :character
## Mean :13.89
## 3rd Qu.:17.00
## Max. :18.00
## opponent game_score set game
## Length:9 Length:9 Min. :1.000 Length:9
## Class :character Class :character 1st Qu.:2.000 Class :character
## Mode :character Mode :character Median :3.000 Mode :character
## Mean :2.556
## 3rd Qu.:3.000
## Max. :3.000
numdeuce$server
## [1] "Nicolas Almagro" "Rafael Nadal" "Pablo Andujar" "Borna Coric"
## [5] "Andy Murray" "Andy Murray" "Andy Murray" "Nick Kyrgios"
numdeuce2$server
## [1] "Bernard Tomic" "Bernard Tomic" "Lukas Rosol" "Roger Federer"
## [5] "Roger Federer" "Benoit Paire" "Nick Kyrgios" "Nick Kyrgios"
## [9] "Nick Kyrgios"
Nick Kyrgios has the best speed average before the next point for a deuce. He is the only server who appears on both data sets and is the number 25th fastest player.
The harder the surface of a tennis match, the greater the time a player spends in between their sets. I loved Carl Bialik’s analysis on correlating the player’s speed over the years. I recommend an additional variable to be considered. Since Tennis is an outdoor sport, weather plays a major role as well. Rain, for example can affect the surface of a given match. If this vital information is added, it may change the data analysis.