Problem I

(a)

The mean of the column named Steps is

## [1] 9622.533

The median of the column named Steps is

## [1] 9385.5

And the standard deviation of Steps is

## [1] 3736.228

(b)

The mean of the column named Asleep is

## [1] 7.402049

The median of the column named Asleep is

## [1] 8.02

And the standard deviation of Asleep is

## [1] 2.097509

(c)

The quartiles from the column Total Miles Traveled Per Day are

##     0%    25%    50%    75%   100% 
## 0.1000 2.9925 4.1050 5.2700 8.1900

(d)

The quartiles from the column Total Number of Steps Per Day are

##       0%      25%      50%      75%     100% 
##   233.00  6809.25  9385.50 12065.25 18754.00

Problem II

(a)

The five number summary for Total Hours Asleep is

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   7.222   8.020   7.402   8.528   9.780

(b)

The cutoffs for outliers for Hours of Sleep per day are

## Lower-Cutoff Upper-Cutoff 
##        5.265       10.485

(c)

The standard deviation of Total Miles Traveled Per Day is

## [1] 1.631983

This means that the distance traveled in a day varies on average by 1.6 miles around the mean.

(d)

The coefficient of variation for the Total Miles Traveled Per Day is

## [1] 38.76822

This means the standard deviation is 38.77 percent of the mean.

Problem III

(a)

It does not appear that one day is less active than the others, because the IQRs overlap significantly for all days.

(b)

It does appear that Sunday allows for more sleep than other days, given the fact that 75% of its data falls above the 50th percentile of a majority the other days, and its upper limit exceeds all the others.

(c)

This data appears to be symmetric, as the distribution is fairly even on either side of the mean.

(d)

This data appears to be skewed, as its tail to the left is longer and uneven.

Code

ErinsFitbit <- read.csv("C:/Users/Lisa/Downloads/ErinsFitbit.txt", sep="")

(a)

The mean of the column named Steps is

mean(ErinsFitbit$Steps)
## [1] 9622.533

The median of the column named Steps is

median(ErinsFitbit$Steps)
## [1] 9385.5

And the standard deviation of Steps is

sd(ErinsFitbit$Steps)
## [1] 3736.228

(b)

The mean of the column named Asleep is

mean(ErinsFitbit$Asleep)
## [1] 7.402049

The median of the column named Asleep is

median(ErinsFitbit$Asleep)
## [1] 8.02

And the standard deviation of Asleep is

sd(ErinsFitbit$Asleep)
## [1] 2.097509

(c)

The quartiles from the column Total Miles Traveled Per Day are

quantile(ErinsFitbit$Distance)
##     0%    25%    50%    75%   100% 
## 0.1000 2.9925 4.1050 5.2700 8.1900

(d)

The quartiles from the column Total Number of Steps Per Day are

quantile(ErinsFitbit$Steps)
##       0%      25%      50%      75%     100% 
##   233.00  6809.25  9385.50 12065.25 18754.00

Problem II

(a)

The five number summary for Total Hours Asleep is

summary(ErinsFitbit$Asleep)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   7.222   8.020   7.402   8.528   9.780

(b)

The cutoffs for outliers for Hours of Sleep per day are

Q1 = quantile(ErinsFitbit$Asleep, c(0.25))
Q3 = quantile(ErinsFitbit$Asleep, c(0.75))
upper.cutoff = Q3 + 1.5*(Q3-Q1)
lower.cutoff = Q1 - 1.5*(Q3-Q1)
cutoffs = c(lower.cutoff, upper.cutoff)
names(cutoffs) = c("Lower-Cutoff","Upper-Cutoff")
cutoffs
## Lower-Cutoff Upper-Cutoff 
##        5.265       10.485

(c)

The standard deviation of Total Miles Traveled Per Day is

sd(ErinsFitbit$Distance)
## [1] 1.631983

This means that the distance traveled in a day varies on average by 1.6 miles around the mean.

(d)

The coefficient of variation for the Total Miles Traveled Per Day is

y = mean(ErinsFitbit$Distance)

s = sd(ErinsFitbit$Distance)
cv = ((s/y)*100)
cv
## [1] 38.76822

This means the standard deviation is 38.77 percent of the mean.

Problem III

(a)

boxplot(ErinsFitbit$Steps ~ ErinsFitbit$Day, data = ErinsFitbit, Main = "Steps Per Day of The Week",horizontal = TRUE)

It does not appear that one day is less active than the others, because the IQRs overlap significantly for all days.

(b)

boxplot(ErinsFitbit$Asleep ~ ErinsFitbit$Day, data = ErinsFitbit, Main = "Hours of Sleep Per Day of the Week",horizontal = TRUE)

It does appear that Sunday allows for more sleep than other days, given the fact that 75% of its data falls above the 50th percentile of a majority the other days, and its upper limit exceeds all the others.

(c)

hist(ErinsFitbit$Distance, main = "Distribution", xlab = "Miles Traveled Per Day", freq = TRUE)

This data appears to be symmetric, as the distribution is fairly even on either side of the mean.

(d)

hist(ErinsFitbit$Asleep, main = "Distribution", xlab = "Hours of Sleep Per Day", freq = TRUE)

This data appears to be skewed, as its tail to the left is longer and uneven.