Let’s continue our exploration of sampling.
# Install packages if necessary. For this assignment, you'll need the moments package which contains the skewness function. Remember, do not write install code in the markdown document, install the packages in the console.
# When you need to use a function that is not a base R function, you need to load the package in the markdown document. So, to access the skewness function, you shold load the moments package. Just remove the # in the line below to load the moments package.
library(moments)
# 1. The function, call it ‘printVecInfo’ should take a vector as input
# 2. The function should print the following information:
# a. Mean b. Median c. Min & max d. Standard deviation e. Quantiles (at 0.05 and 0.95) f. Skewness
# 3. Test the function with a vector that has (1,2,3,4,5,6,7,8,9,10,50).
# You should see something such as:
# [1] "mean: 9.54545454545454"
# [1] "median: 6"
# [1] "min: 1 max: 50"
# [1] "sd: 13.7212509368762"
# [1] "quantile (0.05 - 0.95): 1.5 -- 30"
# [1] "skewness: 2.62039633563579"
printVecInfo <- function(HWfourTest)
{
cat("mean: ",mean (HWfourTest),"\n")
cat("median: ",median (HWfourTest),"\n")
cat("min: ",min(HWfourTest),"max: ",max(HWfourTest),"\n")
cat("sd: ",sd(HWfourTest),"\n")
cat("quantile (0.05 - 0.95): ",quantile(HWfourTest,
probs = c(.05))," -- ",quantile(HWfourTest,
probs = c(.95)),"\n")
cat("skewness: ",skewness(HWfourTest,na.rm = FALSE),"\n")
}
# 4. Create a variable ‘jar’ that has 50 red and 50 blue marbles (hint: the jar can have strings as objects, with some of the strings being ‘red’ and some of the strings being ‘blue’
RM <- "red marble"
BM <-"blue marble"
fiftyRM <-replicate(50,RM)
fiftyBM <-replicate(50,BM)
HWfourJar <- c(fiftyBM,fiftyRM)
# 5. Confirm there are 50 reds by summing the samples that are red
length(fiftyRM)
## [1] 50
# 6. Sample 10 ‘marbles’ (really strings) from the jar. How many are red? What was the percentage of red marbles?
HWfourQsixJar <-sample(HWfourJar, size=10, replace = TRUE)
HWfourQsixRed <-grep("red marble",HWfourQsixJar)
cat("%: ",length(HWfourQsixRed)/length(HWfourJar)*100,"\n")
## %: 6
# 7. Repeat #6, but this time, using the replicate command draw 10 sample sets (i.e., 10 sets of drawing 10 marbles). Compute the mean number of red marbles in these 10 sample sets. Lasty, repeat the above process 20 times to get a list of 20 mean numbers. Use your printVecInfo to see information of the samples. Also generate a histogram of the samples
HWfourRed <- function(HWfourJar,num)
{
HWfourQsevenJar <- sample(HWfourJar,10, replace = TRUE)
HWfourQsevenRed <-grep("red marble",HWfourQsevenJar)
return(length((HWfourQsevenRed)))
}
HWfourQsevenRepten <-replicate(10,mean(HWfourRed(HWfourQsixJar,10),simplify = TRUE))
hist(HWfourQsevenRepten)
printVecInfo(HWfourQsevenRepten)
## mean: 6.6
## median: 7
## min: 3 max: 9
## sd: 1.712698
## quantile (0.05 - 0.95): 3.9 -- 8.55
## skewness: -0.7385489
HWfourQsevenReptwenty <-replicate(20,mean(HWfourRed(HWfourQsixJar,10),simplify = TRUE))
hist(HWfourQsevenReptwenty)
printVecInfo(HWfourQsevenReptwenty)
## mean: 6.05
## median: 6
## min: 2 max: 8
## sd: 1.431782
## quantile (0.05 - 0.95): 3.9 -- 8
## skewness: -1.082588
# 8. Repeat #7, but this time, sample the jar 100 times. You should get 20 numbers, this time each number represents the mean of how many reds there were in the 100 samples. Use your printVecInfo to see information of the samples. Also generate a histogram of the samples.
HWfourQeightsamphundred <-replicate(20,mean(HWfourRed(HWfourQsixJar,100),simplify = TRUE))
hist(HWfourQeightsamphundred)
printVecInfo(HWfourQeightsamphundred)
## mean: 6.4
## median: 7
## min: 3 max: 9
## sd: 1.569445
## quantile (0.05 - 0.95): 3.95 -- 8.05
## skewness: -0.5229764
# 9. Repeat #8, but this time, replicate the sampling 100 times. You should get 100 numbers, this time each number represents the mean of how many reds there were in the 100 samples. Use your printVecInfo to see information of the samples. Also generate a histogram of the samples
HWfourQnineRephundred <-replicate(100,mean(HWfourRed(HWfourQsixJar,100),simplify = TRUE))
hist(HWfourQnineRephundred)
printVecInfo(HWfourQnineRephundred)
## mean: 6.22
## median: 6
## min: 3 max: 9
## sd: 1.382283
## quantile (0.05 - 0.95): 4 -- 8
## skewness: -0.1918435
# 10. Store the ‘airquality’ dataset into a temporary variable
HWfourStepthree<-airquality
# 11. Clean the dataset (i.e. remove the NAs)
HWfourStepthreeQeleven <- na.omit(HWfourStepthree)
# 12. Explore Ozone, Wind and Temp by doing a ‘printVecInfo’ on each as well as generating a histogram for each
printVecInfo(HWfourStepthreeQeleven$Ozone)
## mean: 42.0991
## median: 31
## min: 1 max: 168
## sd: 33.27597
## quantile (0.05 - 0.95): 8.5 -- 109
## skewness: 1.248104
hist(HWfourStepthreeQeleven$Ozone)
printVecInfo(HWfourStepthreeQeleven$Wind)
## mean: 9.93964
## median: 9.7
## min: 2.3 max: 20.7
## sd: 3.557713
## quantile (0.05 - 0.95): 4.6 -- 15.5
## skewness: 0.4556414
hist(HWfourStepthreeQeleven$Wind)
printVecInfo(HWfourStepthreeQeleven$Temp)
## mean: 77.79279
## median: 79
## min: 57 max: 97
## sd: 9.529969
## quantile (0.05 - 0.95): 61 -- 92.5
## skewness: -0.2250959
hist(HWfourStepthreeQeleven$Temp)