Problem I

  1. List the names of the columns
## [1] "Steps"  "Miles"  "Floors" "Sleep"  "Day"    "Month"
  1. Find the number of rows in the dataset.
## [1] 88
  1. Use the function summary on the dataset and display the results. Describe how this function treats cate-gorical columns, and how it treats numeric columns.
##      Steps           Miles           Floors           Sleep        Day    
##  Min.   :  114   Min.   :0.050   Min.   :  1.00   Min.   :0.000   F  :13  
##  1st Qu.: 7722   1st Qu.:3.390   1st Qu.: 11.00   1st Qu.:7.383   M  :13  
##  Median :10920   Median :4.930   Median : 16.00   Median :7.617   R  :12  
##  Mean   :10749   Mean   :4.759   Mean   : 20.78   Mean   :7.407   Sat:13  
##  3rd Qu.:13780   3rd Qu.:6.093   3rd Qu.: 27.00   3rd Qu.:8.104   Sun:13  
##  Max.   :20122   Max.   :8.790   Max.   :140.00   Max.   :9.333   T  :12  
##                                                                   W  :12  
##    Month   
##  Feb  :28  
##  Jan  :31  
##  March:29  
##            
##            
##            
## 

For the Numerical data, it summarmize as 6 values for it which are min, first quartile, median, mean, third quartile and the max. For the categorical data, it summarmize as how many do they have for each categoies.

  1. Find the mean of the column Steps.
## [1] 10749.34

Problem II. Continue using the data from Problem I.

  1. Find and display the average steps taken for everyday of the week.
##   Day     Steps
## 1   F 13068.615
## 2   M 14500.846
## 3   R 10843.667
## 4 Sat  8222.538
## 5 Sun  6318.538
## 6   T 10501.583
## 7   W 11863.500
  1. Find and display the average hours of sleep for every day of the week.
##   Day    Sleep
## 1   F 7.591026
## 2   M 6.341026
## 3   R 7.412500
## 4 Sat 7.850000
## 5 Sun 7.238462
## 6   T 8.019444
## 7   W 7.444444
  1. Find and display the standard deviation of steps taken for every day of the week.
##   Day    Steps
## 1   F 3365.953
## 2   M 5362.416
## 3   R 2105.690
## 4 Sat 3270.769
## 5 Sun 3424.365
## 6   T 2631.131
## 7   W 3441.038
  1. Find and display the standard deviation of hours of sleep for every day of the week.
##   Day     Sleep
## 1   F 0.9029431
## 2   M 2.1613890
## 3   R 1.4411717
## 4 Sat 0.7228096
## 5 Sun 2.2390721
## 6   T 0.7544090
## 7   W 0.4347490

Problem III. Continue using the data from Problem I.

  1. Use the function fivenum on the column Miles and display the results. Describe what this function re-turns. You may also use ?fivenum to answer the second part of this question.
## [1] 0.050 3.390 4.930 6.105 8.790

It found the minimum, lower-hinge, median, upper-hinge, maximum for the data.

  1. Use the function table on the column Day and dis-play the results. Describe what this function re-turns. You may also use ?table to answer the sec-ond part of this question.
## 
##   F   M   R Sat Sun   T   W 
##  13  13  12  13  13  12  12
  1. Find and display the median number of hours of sleep for every month.
##   Day    Sleep
## 1   F 7.666667
## 2   M 7.133333
## 3   R 7.816667
## 4 Sat 8.000000
## 5 Sun 7.700000
## 6   T 7.725000
## 7   W 7.525000
  1. Find and display the median distance traveled for every month.
##   Day Miles
## 1   F 6.070
## 2   M 7.550
## 3   R 4.755
## 4 Sat 3.670
## 5 Sun 2.290
## 6   T 4.710
## 7   W 5.645

Problem IV. Continue using the data from Problem I.

  1. Create a boxplot of the total number of steps for every day of the week (there should be 7 sub-plots). Does it appear one day is less active than the rest? Explain.

Sunday is less active than the rest since the plot showed that the steps people walked is the least.

  1. Create a boxplot of the total hours of sleep for ev-ery day of the week (there should be 7 sub-plots). Does it appear one day is less restful than the rest? Explain.

The plot showed the restful for everyday for week is about the same.

  1. Does there appear to be any outliers in the data based on your plot from (a)? Explain, and if so identify what day of the week they are in.

Yes, there are one outliers on Saturday, one on Wednesday, and three outliers on Sunday. The reason of outliers appear the most on Sunday i think it’s because Sunday is weekend, some of the people chosen to stay home and some of them chosen to go out. Therefore, it will make a huge different.

  1. Does there appear to be any outliers in the data based on your plot from (b)? Explain, and if so identify what day of the week they are in.

There are one outlier on Friday, one on Monday, one on Thursday, one on Saturday and one on Sunday.

Code Appendix

d=read.table("Fitbit.csv",sep=",",header=TRUE)
colnames(d)
nrow(d)
summary(d)
mean(d$Steps)
aggregate(Steps~Day,d,mean)
aggregate(Sleep~Day,d,mean)
aggregate(Steps~Day,d,sd)
aggregate(Sleep~Day,d,sd)
fivenum(d$Miles)
table(d$Day)
aggregate(Sleep~Day,d,median)
aggregate(Miles~Day,d,median)
boxplot(d$Steps ~ d$Day)
boxplot(d$Sleep ~ d$Day)