This is Quiz 1 from Coursera’s R Programming class within the Data Science Specialization. This publication is intended as a learning resource, all answers are documented and explained. Datasets are available in R packages.
1. The R language is a dialect of which of the following programming languages?
R is an open source implementation of S with a revised syntax and an awesome community.
2. The definition of free software consists of four freedoms (freedoms 0 through 3). Which of the following is NOT one of the freedoms that are part of the definition? Select all that apply.
The freedom to sell the software for any price.
The freedom to restrict access to the source code for the software.
The freedom to prevent users from using the software for undesirable purposes.
Yay free software!
dat <- read.table('http://www4.stat.ncsu.edu/~stefanski/NSF_Supported/Hidden_Images/orly_owl_files/orly_owl_Lin_4p_5_flat.txt', header = FALSE)
fit <- lm(V1 ~ . - 1, data = dat); plot(predict(fit), resid(fit), pch = '.')
3. In R the following are all atomic data types EXCEPT: (Select all that apply)
list
array
matrix
data frame
table
Predicting with the lower and upper bounds of the confidence intervals
fit <- lm(mpg~wt,mtcars)
summary(fit)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.285126 1.877627 19.857575 8.241799e-19
## wt -5.344472 0.559101 -9.559044 1.293959e-10
4. If I execute the expression x <- 4 in R, what is the class of the object x' as determined by the
class()’ function?
R automatically interprets 4 as a numeric class object.
5. What is the class of the object defined by the expression x <- c(4, “a”, TRUE)?
x <- c(4, "a", TRUE)
class(x)
## [1] "character"
Combine the two vectors as columns.
7. A key property of vectors in R is that
This is nice for statistical purposes.
8. Suppose I have a list defined as x <- list(2, “a”, “b”, TRUE). What does x[[2]] give me? Select all that apply.
a character vector containing the letter “a”.
a character vector of length 1.
Two brackets gives the actual element inside of the list, one bracket gives the list with the element inside.
9. Suppose I have a vector x <- 1:4 and a vector y <- 2. What is produced by the expression x + y?
10. Suppose I have a vector x <- c(17, 14, 4, 5, 13, 12, 10) and I want to set all elements of this vector that are greater than 10 to be equal to 4. What R code achieves this? Select all that apply.
x[x >= 11] <- 4
x[x > 10] <- 4
Indexing with a boolean.
11. Use the Week 1 Quiz Data Set to answer questions 11-20.
In the dataset provided for this Quiz, what are the column names of the dataset?Download, unzip, read, print.
dat <- download.file('https://d396qusza40orc.cloudfront.net/rprog/data/quiz1_data.zip', destfile ="quizdat.zip")
dat <- unzip("quizdat.zip")
dat <- read.csv("hw1_data.csv")
names(dat)
## [1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
12. Extract the first 2 rows of the data frame and print them to the console. What does the output look like?
Index
dat[1:2,]
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
13. How many observations (i.e. rows) are in this data frame?
Nrow()
nrow(dat)
## [1] 153
14. Extract the last 2 rows of the data frame and print them to the console. What does the output look like?
Correlation(XY)* SDy/SDx
dat[152:153,]
## Ozone Solar.R Wind Temp Month Day
## 152 18 131 8.0 76 9 29
## 153 20 223 11.5 68 9 30
15.What is the value of Ozone in the 47th row?
$ notation is useful
dat$Ozone[47]
## [1] 21
Is NA return T/F values which can be summed to get a count of NAs.
sum(is.na(dat$Ozone))
## [1] 37
na.rm is a great option for calculation where NAs might interfere
mean(dat$Ozone, na.rm=TRUE)
## [1] 42.12931
18. Extract the subset of rows of the data frame where Ozone values are above 31 and Temp values are above 90. What is the mean of Solar.R in this subset?
Which give index of booleans, $ selects columns.
mean(dat[which(dat$Ozone >31 & dat$Temp > 90),]$Solar.R)
## [1] 212.8
19. What is the mean of “Temp” when “Month” is equal to 6?
Same as above
mean(dat[which(dat$Month == 6),]$Temp)
## [1] 79.1
20. Let the slope having fit Y as the outcome and X as the predictor be denoted as β1. Let the slope from fitting X as the outcome and Y as the predictor be denoted as γ1. Suppose that you divide β1 by γ1; in other words consider β1/γ1. What is this ratio always equal to?
Need to remove NA for this.
max(dat[which(dat$Month == 5),]$Ozone, na.rm = TRUE)
## [1] 115