Problem I

  1. The names of the columns:
## [1] "Steps"  "Miles"  "Floors" "Sleep"  "Day"    "Month"
  1. The number of rows in the dataset:
## [1] 88
  1. Summary:
##      Steps           Miles           Floors           Sleep        Day    
##  Min.   :  114   Min.   :0.050   Min.   :  1.00   Min.   :0.000   F  :13  
##  1st Qu.: 7722   1st Qu.:3.390   1st Qu.: 11.00   1st Qu.:7.383   M  :13  
##  Median :10920   Median :4.930   Median : 16.00   Median :7.617   R  :12  
##  Mean   :10749   Mean   :4.759   Mean   : 20.78   Mean   :7.407   Sat:13  
##  3rd Qu.:13780   3rd Qu.:6.093   3rd Qu.: 27.00   3rd Qu.:8.104   Sun:13  
##  Max.   :20122   Max.   :8.790   Max.   :140.00   Max.   :9.333   T  :12  
##                                                                   W  :12  
##    Month   
##  Feb  :28  
##  Jan  :31  
##  March:29  
##            
##            
##            
## 

The summary shows Steps, Miles, Floors and Sleep as numerical columns and Day and Month as categorical columns.

  1. The mean of the column Steps:
## [1] 10749.34
  1. The average steps taken for everyday of the week:
##   Day     Steps
## 1   F 13068.615
## 2   M 14500.846
## 3   R 10843.667
## 4 Sat  8222.538
## 5 Sun  6318.538
## 6   T 10501.583
## 7   W 11863.500

Standard deviation for steps taken for every day of the week:

##   Day    Steps
## 1   F 3365.953
## 2   M 5362.416
## 3   R 2105.690
## 4 Sat 3270.769
## 5 Sun 3424.365
## 6   T 2631.131
## 7   W 3441.038
  1. The average hours of sleep for every day of the week:
##   Day    Sleep
## 1   F 7.591026
## 2   M 6.341026
## 3   R 7.412500
## 4 Sat 7.850000
## 5 Sun 7.238462
## 6   T 8.019444
## 7   W 7.444444

Standard deviation of sleep for every day of the week:

##   Day     Sleep
## 1   F 0.9029431
## 2   M 2.1613890
## 3   R 1.4411717
## 4 Sat 0.7228096
## 5 Sun 2.2390721
## 6   T 0.7544090
## 7   W 0.4347490
  1. Boxplot of the total number of steps for every day of the week:

According to boxplot, people less active on Sunday

  1. Boxplot of the total hours of sleep for every day of the week:

The result of total hours of sleep is almost the same every day.

  1. The number of days where the total steps were above 10000:
## [1] 51
  1. The average number of steps taken when total sleep was below 7 hours:
## [1] 13

Problem II

  1. The function subtracts the mean and divides by the standard deviation, then returns the standard deviation of the result, vector: X = 1:100:
## [1] 1
  1. The function finds the values which are (mean − 2s,mean + 2s), vector: X= 1:100:
##      upper      lower 
## 108.522984  -7.522984
  1. The function calculates the mean after removing any observations that are more than 3 sd from the mean, vector X =c(1:100,200,300):
## [1] 50.5
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
##  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
##  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
##  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
##  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100

Problem III

The purpose of this problem is to simulate a fair coin flip, and to see how many flips it takes for the probability of a head to be approximately 0.50.

  1. The probability of getting a “head” based on the 20 flips:
## [1] 0.4
  1. Using “sapply” function to find the probability for 10,100,1000,10000,100000 tosses:
## [1] 0.40000 0.56000 0.48900 0.50430 0.50077
  1. The absolute error:
## [1] 0.10000 0.06000 0.01100 0.00430 0.00077
  1. We can see that increasing the samlpe size leads to deacresing in the absolute error.

Appendix Code

Fitbit=read.table("~/Desktop/Fitbit.csv", header=TRUE, sep=",")
#1.a
colnames(Fitbit)
#1.b
nrow(Fitbit)
#1.c
summary(Fitbit)
#1.d
mean(Fitbit$Steps)
#1.e
aggregate(Steps~Day,Fitbit,mean)
#1.e
aggregate(Steps~Day,Fitbit,sd)
#1.f
aggregate(Sleep~Day,Fitbit,mean)
#1.f
aggregate(Sleep~Day,Fitbit,sd)
#1.g
boxplot(Fitbit$Steps ~ Fitbit$Day)
#1.h
boxplot(Fitbit$Sleep ~ Fitbit$Day)
library(dplyr, warn.conflicts = FALSE, quietly=TRUE)
#1.i
Fitbit %>% filter(Steps > 10000) %>% nrow()
#1.j
Fitbit %>% filter(Sleep < 7) %>% nrow()
#2.a
f=function(x)
{
y=(x-mean(x))/sqrt(var(x))
return(sqrt(var(y)))
}
x=c(1:100)
f(x)

#2.b
g=function(y)
{
upper=mean(y)+2*sqrt(var(y))
lower=mean(y)-2*sqrt(var(y))
return(c("upper"=upper,"lower"=lower))
}
y=c(1:100)
g(y)

#2.c
z=function(x)
{
mean = mean(x)
sd = sd(x)
r = x[x<=mean+3*sd] 
x=r
mean = mean(x)
print(mean)
return(x)
}
x =c(1:100,200,300)
z(x)

#3.a
f=function(n)
  {
s=sample(c("H","T"),n,replace = T)
p=length(which(s=="H"))/length(s)
return(p)
  }
f(20)
#3.b
p=sapply(c(10,100,1000,10000,100000),f)
p
#3.c
abs(0.5-p)