Source file ⇒ Lab_8.Rmd

1. The following loop can be done as a vectorized calculation. What is the equivalent vectorized calculation?

vec1 = 1:50
i = 1

while(i < 50) {
  vec1[i] <- i^2
  i <- i + 1
}
vec1
##  [1]    1    4    9   16   25   36   49   64   81  100  121  144  169  196
## [15]  225  256  289  324  361  400  441  484  529  576  625  676  729  784
## [29]  841  900  961 1024 1089 1156 1225 1296 1369 1444 1521 1600 1681 1764
## [43] 1849 1936 2025 2116 2209 2304 2401   50
vec2 <- 1:50
for (i in 1:length(vec2)-1) {
  vec2[i] <- i^2
}
vec2
##  [1]    1    4    9   16   25   36   49   64   81  100  121  144  169  196
## [15]  225  256  289  324  361  400  441  484  529  576  625  676  729  784
## [29]  841  900  961 1024 1089 1156 1225 1296 1369 1444 1521 1600 1681 1764
## [43] 1849 1936 2025 2116 2209 2304 2401   50

2. Write down what the vector x will contain after each line of R code, if they are executed sequentially.

  1. 0 2 4 6 8

  2. NA NA 4 6 8

  3. 0 0 0 0 0

  4. 0

3. Fill in the appropriate types of R objects in this sentance: The lapply function in R operates on _______ and returns ________. There may be more than one correct answer.

  1. a list or vector
  2. a list

4. Suppose we have a 5000 by 3 matrix m

set.seed(1337)
m <- matrix(runif(15000, -3, 3), ncol = 3)
m.ssq.loop <- rep(0,len = 5000)
  1. We need to create a vector containing the sum of the squared entries in each row of m. Write R code to do this in two different ways:

Loop Method

for (i in 1:5000){
  m.ssq.loop[i] <- sum(m[i,]^2)
}
head(m.ssq.loop)
## [1] 10.219360  4.811994  9.842371  7.161204  6.058554 14.288626

apply Method

m.ssq.apply <- apply(m, 1, function(x) {sum(x^2)}) 
head(m.ssq.apply)
## [1] 10.219360  4.811994  9.842371  7.161204  6.058554 14.288626
  1. Write a script that will check if your answers in part (a) are the same. It should return TRUE if the two vectors are exactly the same and FALSE otherwise. This is called a sanity check, and you should do them often when coding.
identical(m.ssq.loop,m.ssq.apply)
## [1] TRUE

5. table1 is given below:

Year <- c(2000, 2001)
Algeria <- c(7 ,9)
Brazil <- c(12, NA)
Columbia <- c(16, 18)

table1 <- data.frame(Year, Algeria, Brazil, Columbia)
table1
##   Year Algeria Brazil Columbia
## 1 2000       7     12       16
## 2 2001       9     NA       18
  1. Wrangle table1 into the object table2 as shown below:
table2 <- table1 %>%
  gather(key = Country, value = Value, -Year)

table2
##   Year  Country Value
## 1 2000  Algeria     7
## 2 2001  Algeria     9
## 3 2000   Brazil    12
## 4 2001   Brazil    NA
## 5 2000 Columbia    16
## 6 2001 Columbia    18
  1. Use table2 to produce table3 as printed below:
table3 <- table2 %>%
  group_by(Year) %>%
  mutate(Average = mean(as.numeric(Value), na.rm = TRUE)) %>%
  spread(key = Country, value = Value) 

table3
## Source: local data frame [2 x 5]
## Groups: Year [2]
## 
##    Year  Average Algeria Brazil Columbia
##   (dbl)    (dbl)   (dbl)  (dbl)    (dbl)
## 1  2000 11.66667       7     12       16
## 2  2001 13.50000       9     NA       18

6. Consider the following graphic:

  1. The glyphs are lines and points on the graph

  2. x = education, y = wage, color = sex

  3. red - Female, blue - Male,

  4. Education and Wage = quantitative, sex = qualitative

  5. the smooth line helps readers better understand how spread/clustered the data is.

plot <-CPS85 %>%
  ggplot(aes(x = educ, y = wage,col= sex)) + geom_point(size = 3) +ylim(0,15)
plot2 <-plot+geom_smooth(method = lm)+ ggtitle("Wage vs. Education in CPS85" )
plot2+theme(plot.title = element_text(size = rel(3)))
## Warning: Removed 55 rows containing non-finite values (stat_smooth).
## Warning: Removed 55 rows containing missing values (geom_point).

7. Consider the string my.string given below:

my.string <- "ggplot2 is a data visualization package for the statistical programming language R"
  1. Write a function SplitChars that takes a character string as input and splits it into a vector of single character elements. Hint: the function strsplit() may be useful. The result of calling the function on my.string is shown.
SplitChars <- function(x) {
  return(strsplit(x, "")[[1]])
} 
string.split <- SplitChars(my.string)
string.split
##  [1] "g" "g" "p" "l" "o" "t" "2" " " "i" "s" " " "a" " " "d" "a" "t" "a"
## [18] " " "v" "i" "s" "u" "a" "l" "i" "z" "a" "t" "i" "o" "n" " " "p" "a"
## [35] "c" "k" "a" "g" "e" " " "f" "o" "r" " " "t" "h" "e" " " "s" "t" "a"
## [52] "t" "i" "s" "t" "i" "c" "a" "l" " " "p" "r" "o" "g" "r" "a" "m" "m"
## [69] "i" "n" "g" " " "l" "a" "n" "g" "u" "a" "g" "e" " " "R"
  1. Write a for loop that counts the number of times that a, s, R, r appears in my.string and saves it in an object called count.
count = 0
for(i in 1:length(string.split)) {
  if(string.split[i] == "a" |
     string.split[i] == "s" |
     string.split[i] == "R" |
     string.split[i] == "r") {
    count = count + 1
    }
}
print(count)
## [1] 20
  1. Write a function Reverse that reverses the order of a vector of single characters. The result of using it on SplitChars(my.string) is shown below.
Reverse <- function(x) {
  rev(x)
}

my.string %>%
  SplitChars %>%
  Reverse
##  [1] "R" " " "e" "g" "a" "u" "g" "n" "a" "l" " " "g" "n" "i" "m" "m" "a"
## [18] "r" "g" "o" "r" "p" " " "l" "a" "c" "i" "t" "s" "i" "t" "a" "t" "s"
## [35] " " "e" "h" "t" " " "r" "o" "f" " " "e" "g" "a" "k" "c" "a" "p" " "
## [52] "n" "o" "i" "t" "a" "z" "i" "l" "a" "u" "s" "i" "v" " " "a" "t" "a"
## [69] "d" " " "a" " " "s" "i" " " "2" "t" "o" "l" "p" "g" "g"