RutlandSwim 2019 - Exploratory analysis

This is basic showing of the data of the Rutland Swim competion of 2019. See https://my6.raceresult.com/109569/results?lang=en.

The main parameter analysed is the pace, this is the time that it takes for a swimmer to cover 1 km. This parameter is analysed against the distance they signed up for, age group and their gender.

All the graphs show first a plot of the number of swimmers for each category analysed and then the plot of the pace per category.

setwd("/home/pedro/Documents/R analysis")
D<- read.csv("RutlandSwim.csv", stringsAsFactors = FALSE)

library(ggplot2)
library(gridExtra)

# Data mangling
D$Distance<-as.factor(D$Distance)
D$GenTime<-as.POSIXct(D$GenTime, format="%H:%M:%S")
D$Pace<-as.POSIXct(D$Pace, format="%M:%S")
D$AgeGroup<-as.factor(D$AgeGroup)
D$Gender<-as.factor(D$Gender)

Data sorted by distance

# General by Distance
b<-ggplot(D, aes(Distance)) + geom_bar(position="dodge", colour="black") + labs(y="Swimmers", x=element_blank())
g<-ggplot(D, aes(Distance, Pace)) + geom_boxplot() + labs(x="Distance (km)",y="Pace (hh:mm/km)")
grid.arrange(b,g)

Data sorted by distance and gender

# General by Gender

# a<- aggregate(D$Gender,list(D$Distance), FUN=summary)
b<-ggplot(D, aes(Distance, fill=Gender)) + geom_bar(position="dodge", colour="black") + labs(y="Swimmers", x=element_blank())
g<-ggplot(D, aes(Distance, Pace, fill=Gender)) + geom_boxplot() + labs(x="Distance (km)",y="Pace (hh:mm/km)")
grid.arrange(b,g)

Data sorted by Age

# General by Age
b<-ggplot(D, aes(AgeGroup )) + geom_bar(position="dodge", colour="black") + labs(y="Swimmers", x=element_blank())
g<-ggplot(D, aes(AgeGroup, Pace)) + geom_boxplot() + labs(x="Distance (km)",y="Pace hh:mm")
grid.arrange(b,g)

Conclusion

Main preliminary observations are (i.e.: null Hipotesis):