First we need to read in the data!
Activity<-read.table('/Users/Leland/Desktop/Fitbit Data/Stats Final Project.csv',header=T, sep=",")
head(Activity)
## X Date Activity X.1 Distance..Mi. Duration..Min. Rate
## 1 NA Nov 30 8:07AM Run NA 3.34 22.13 6.625749
## 2 NA Nov 28 11:01 AM Run NA 3.39 22.65 6.681416
## 3 NA Nov 24, 8:16AM Run NA 3.97 22.35 5.629723
## 4 NA Nov 22, 8:57AM Run NA 3.48 21.90 6.293103
## 5 NA Nov 20, 4:49PM Run NA 4.44 28.73 6.470721
## 6 NA Nov 18, 8:18AM Run NA 4.18 23.42 5.602871
## Time.of.Day Minutes.Asleep Minutes.Awake Ratio.of.MinAwake.MinAsleep
## 1 AM 440 28 0.9401709
## 2 AM 518 34 0.9384058
## 3 AM 460 24 0.9504132
## 4 AM 452 26 0.9456067
## 5 PM 395 26 0.9382423
## 6 AM 432 34 0.9270386
## Weekday.
## 1 Y
## 2 N
## 3 Y
## 4 N
## 5 Y
## 6 Y
We have a total of 46 instances of runs.
Now that we’ve read in the data let’s take a look at the variables that are going to inform our analyses. We want to look at the distance I ran and the duration of the runs because we will use the two of these two make a variable that is equal to the average pace of my run.
distance<-Activity$Distance..Mi.
hist(distance)
Notice that more than half of the runs were in the 5k distance.
duration<-Activity$Duration..Min.
hist(duration, xlab = "Time spent running")
Again, we see that more than half of the runs are in between 20 and 30 minutes.
Cool, let’s investigate the pace of the run.
RunPace<- duration/distance
mean(RunPace)
## [1] 6.865468
sd(RunPace)
## [1] 0.6754506
hist(RunPace, main = 'Speed of run(min/mile) between Aug-Nov 2015', ylab = 'Frequency', xlab = 'Speed of run (min/mile)')
Great, let’s look at my sleep statistics now.
hours_asleep<-Activity$Minutes.Asleep/60
hist(hours_asleep)
Divide by 60 so that results are stored in hours not minutes.
hours_awake<-Activity$Minutes.Awake/60
hist(hours_awake)
QualityofSleep <-hours_asleep/(hours_asleep+hours_awake)
hist(QualityofSleep, main = 'Ratio of time asleep vs total time in bed between Aug-Nov 2015', ylab = 'Frequency', xlab = 'Percent time asleep')
Let’s plot the two against eachother.
plot(QualityofSleep,RunPace, main = 'Comparison between quality of sleep and run intensity', ylab = 'Run intensity (min/mile)',xlab = 'Percent time asleep')
abline(lm(RunPace~QualityofSleep), col="red")
cor(QualityofSleep,RunPace)
## [1] -0.2613655
Pace and Sleep quality are negatively correlated, something I would have expected, but the relationship isn’t very strong.
There are many things that could have influenced this data: Caffeine/Alcohol intake, The time of day that I ran, the amount of stress I was feeling at the time, whether the run was on a weekend vs a weekday.
I investigated a couple of these confounders and further subdivided the data by them.
Now let’s divide the runs by time of day and weekend vs weekday.
ColumnsYouNeed<-c(Activity$Rate,Activity$Ratio.of.MinAwake.MinAsleep)
Morning<-Activity[Activity$Time.of.Day == "AM", c("Rate","Ratio.of.MinAwake.MinAsleep","Distance..Mi.")]
Afternoon<-Activity[Activity$Time.of.Day == "PM", c("Rate","Ratio.of.MinAwake.MinAsleep","Distance..Mi.")]
head(Morning)
## Rate Ratio.of.MinAwake.MinAsleep Distance..Mi.
## 1 6.625749 0.9401709 3.34
## 2 6.681416 0.9384058 3.39
## 3 5.629723 0.9504132 3.97
## 4 6.293103 0.9456067 3.48
## 6 5.602871 0.9270386 4.18
## 7 6.933535 0.9256018 3.31
head(Afternoon)
## Rate Ratio.of.MinAwake.MinAsleep Distance..Mi.
## 5 6.470721 0.9382423 4.44
## 14 7.549148 0.8788396 7.63
## 20 5.496350 0.8661258 1.37
## 22 6.017370 0.8760000 4.03
## 29 5.638086 0.9490835 4.91
## 30 6.729798 0.9321267 3.96
boxplot(Morning$Rate,Afternoon$Rate, main = "Comparing Pace of Running (AM vs PM)", names = c("Morning","Afternoon"), ylab = "Run Pace (min/mile)",col=c("thistle","wheat"))
Weekday<-Activity[Activity$Weekday == "Y", c("Rate","Ratio.of.MinAwake.MinAsleep","Distance..Mi.")]
Weekend<-Activity[Activity$Weekday == "N", c("Rate","Ratio.of.MinAwake.MinAsleep","Distance..Mi.")]
boxplot(Weekday$Rate,Weekend$Rate, main = "Comparing Pace of Running (Weekday vs Weekend)", names = c("Weekday","Weekend"),ylab = "Run Pace (min/mile)",col=c("thistle","wheat"))