# built-in dataset in R
head(airquality)
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
Change data from wide to narrow format (above) using gather function. Group Ozone, Solar.R, Wind, Temp into one variable called type and create another column called value to store their values. Your output should look like this:
## Month Day type value
## 1 5 1 Ozone 41
## 2 5 2 Ozone 36
## 3 5 3 Ozone 12
## 4 5 4 Ozone 18
## 5 5 5 Ozone NA
## 6 5 6 Ozone 28
Suppose you have a data frame, data, as given belows:
## V1 V2 V3 V4
## 1 a 1 alpha 10
## 2 a 2 beta 20
## 3 b 1 gamma 30
## 4 b 2 alpha 40
## 5 c 1 beta 50
## 6 c 2 gamma 60
Assuming that the tidyr and dplyr libraries are already loaded, write down what the output for the following code. The final result is enough for full credit, but partial credit will be given for writing out and labelling intermediate steps.
data %>%
filter(V1 == "a") %>% # Step 1
select(V2, V4) %>% # Step 2
gather(key = Apple, value = Banana, V2, V4) %>% # Step 3
mutate(Apple = Banana) # Step 4
Step 1: Since the data data frame consists of a column of a’s, b’s, and c’s two of each, this operation filters out the first two rows of the data frame, consisting of “a”. As such, we are left with a 2 by 4 data frame.
Step 2: This function selects the second and fourth columns, which are vectors “V2” and “V4”, creating a data frame of numerics of “1”’s and “2”’s once due to the first filter in the “V2” vector, and numerics “10” and “20” in the column “V4”.
Step 3: With our remaining rows and columns from the first two functions, gather takes the vector names “V2” and “V4” and collects them under the column named “Apple”, and the numeric values from both “V2” and “V4” collected in another column named “Banana”.
Step 4: The mutate function takes each numeric value under “Banana” and copies it to under the “Apple” column, replacing the vector names “V2” and “V4”.
The resulting vector will be: (using an alterior code to produce the same result to avoid the risk of copy/paste):
Apple <- c(1,2,10,20)
Banana <- c(1,2,10,20)
AB <- data.frame(Apple, Banana)
AB
## Apple Banana
## 1 1 1
## 2 2 2
## 3 10 10
## 4 20 20
Suppose you have a data frame, data, as given below.
## a b c d e f
## 1 1 6 1 5 -99 1
## 2 10 4 4 -99 9 3
## 3 7 9 5 4 1 4
## 4 2 9 3 8 6 8
## 5 1 10 5 9 8 6
## 6 6 2 1 3 8 5
fix_missing_99 that takes one argument: x, a numeric vector. The function should replace every component of x equal to -99 with NA.fix_missing_99 <- function(x) {
if (!is.vector(x)) {
stop("x must be a vector")
}
x[x==-99] <- NA
return(x)
}
fix_missing_99(c(2,3,-99, 3,4,5, -99, -99))
## [1] 2 3 NA 3 4 5 NA NA
sapply(x, function(x) {ifelse(x == -99, “NA”, x) })
data with NA. For full credit, your code must use the function in part (a) and it should continue to work without modification if additional columns are added to the data frame.for (j in names(data)) {
data[,j] <- fix_missing_99(data[,j])
}
data
## a b c d e f
## 1 1 6 1 5 NA 1
## 2 10 4 4 NA 9 3
## 3 7 9 5 4 1 4
## 4 2 9 3 8 6 8
## 5 1 10 5 9 8 6
## 6 6 2 1 3 8 5
apply family of functionals to perform the same task as in part (b).sapply(data, function(x) {ifelse(x == -99, "NA", x) })
## a b c d e f
## [1,] 1 6 1 5 NA 1
## [2,] 10 4 4 NA 9 3
## [3,] 7 9 5 4 1 4
## [4,] 2 9 3 8 6 8
## [5,] 1 10 5 9 8 6
## [6,] 6 2 1 3 8 5
Assuming the ‘ggplot2’ is already loaded. The first 6 rows of the ‘diamond’ dataset are:
## Source: local data frame [6 x 10]
##
## carat cut color clarity depth table price x y z
## (dbl) (fctr) (fctr) (fctr) (dbl) (dbl) (int) (dbl) (dbl) (dbl)
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
What command in ‘ggplot’ that you will use to generate the graph given below?