Question 1

Using a loop, print the integers from 1 to 50. (Hint, use the print() function).

for (i in 1:50) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25
## [1] 26
## [1] 27
## [1] 28
## [1] 29
## [1] 30
## [1] 31
## [1] 32
## [1] 33
## [1] 34
## [1] 35
## [1] 36
## [1] 37
## [1] 38
## [1] 39
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50

Question 2.

Using a loop, add all the integers between 0 and 1000.

current.sum <-0
for (i in 0:1000){
  current.sum <- current.sum + i
}

current.sum
## [1] 500500

Now, add all the EVEN integers between 0 and 1000 (hint: use seq())

s <- seq(2,1000, 2)
cur.sum <- 0
for (i in s){
  cur.sum <- cur.sum + i
}
cur.sum
## [1] 250500

Now, repeat A and B WITHOUT using a loop.

1000*(1000+1)/2
## [1] 500500
1000*(1000+2)/4
## [1] 250500

Question 3

Here is a dataframe of survey data containing 5 questions I collected from 6 participants:

survey <- data.frame(
                     "participant" = c(1, 2, 3, 4, 5, 6),
                     "q1" = c(5, 3, 2, 7, 11, 0),
                     "q2" = c(4, 2, 2, 5, -10, 99),
                     "q3" = c(-4, -3, 4, 2, 9, 10),
                     "q4" = c(-30, 5, 2, 23, 4, 2),
                     "q5" = c(88, 4, -20, 2, 4, 2)
                     )

The response to each question should be an integer between 1 and 5. Obviously, we have some bad values in the dataframe. Let’s fix them.

Using a loop, create a new dataframe called survey.clean where all the invalid values (those that are not integers between 1 and 5) are set to NA.

Create a new object called survey.clean by assigning the original dataset to survey.clean.

Set the loop index to i.

Set the loop index.values to the vector of data columns.

In the loop code, assign the ith column of data to a new vector called data.temp.

Convert all invalid values in data.temp to NA (hint: use )

Assign data.temp back to the ith column of survey.clean.

Close the loop and let it run!

survey.clean <- survey
for (i in 1:nrow(survey.clean)){
  y <- survey.clean[, i]
  y[(y %in% c(1:5)) == F] <- NA
  survey.clean[, i] <- y
}
survey.clean
##   participant q1 q2 q3 q4 q5
## 1           1  5  4 NA NA NA
## 2           2  3  2 NA  5  4
## 3           3  2  2  4  2 NA
## 4           4 NA  5  2 NA  2
## 5           5 NA NA NA  4  4
## 6          NA NA NA NA  2  2

Now, again using a loop, add a new column to the dataframe called “invalid.answers” that indicates, for each participant, how many bad answers they gave. Hint: Use the following steps

Assign the new vector invalid.answers to the dataframe containing all NA values.

Create a loop over the rows of the dataframe.

Assign the data for the ith row to a new vector called part.i

Calculate how many of the values in part.i are NA (use is.na())

Assign the result to the ith row in invalid.answers

survey.clean$invalid.answers <- rep(NA, nrow(survey.clean))
for (i in 1:nrow(survey.clean)){
  y <- survey.clean[i,1:ncol(survey.clean)-1]
  x <- is.na(y)
  survey.clean$invalid.answers[i] <- sum(x)
}
survey.clean
##   participant q1 q2 q3 q4 q5 invalid.answers
## 1           1  5  4 NA NA NA               3
## 2           2  3  2 NA  5  4               1
## 3           3  2  2  4  2 NA               1
## 4           4 NA  5  2 NA  2               2
## 5           5 NA NA NA  4  4               3
## 6          NA NA NA NA  2  2               4

Question 4

Standardizing a variable means subtracting the mean, and then dividing by the standard deviation. Let’s use a loop to standardize the numeric columns in the pirates dataset.

Create a function called standardize.me() that takes a numeric vector as an argument, and returns the standardized version of the vector (hint: Look at the answers to WPA8!)

Assign all the numeric columns of the original pirates dataset to a new dataset called pirates.z

Using a loop and your new function, standardize all the variables pirates.z dataset

pirates <- read.delim("~/Dropbox/RSeminar/pirates.txt")
standardize.me <- function(vec){
  result <- ( vec - mean(vec) ) / sd(vec)
  return (result)
}

pirates.z <- pirates
for (i in c(1,4,6,7,8,11,12,13)) {
  y <- pirates.z[,i]
  x <- standardize.me(y)
  pirates.z[,i] <- x
}
head(pirates.z)
##          id    sex headband        age college   tattoos    tchests
## 1 -1.729454 female      yes  0.4705692   JSSFP 0.4837266  1.8316209
## 2 -1.725992   male      yes -0.4326347    CCCC 1.6649880 -0.1670301
## 3 -1.722530   male      yes -0.4326347    CCCC 0.7790419 -0.3097909
## 4 -1.719067   male      yes  0.2899284   JSSFP 0.7790419 -1.0235949
## 5 -1.715605 female      yes  0.6512100    CCCC 2.2556187  0.5467738
## 6 -1.712142   male      yes  0.4705692   JSSFP 0.7790419 -0.7380733
##      parrots favorite.pirate sword.type  sword.time   eyepatch
## 1  1.6147531      Blackbeard    cutlass -0.16245384  0.6951229
## 2  0.1411021        Anicetus    cutlass -0.25404809  0.6951229
## 3 -0.2273107    Jack Sparrow    cutlass -0.09261323 -1.4371559
## 4 -0.5957234      Edward Low    cutlass -0.23687417  0.6951229
## 5  2.7199914        Anicetus    cutlass -0.18420748  0.6951229
## 6 -0.2273107    Jack Sparrow      sabre  1.33396715  0.6951229
##   beard.length       fav.pixar
## 1   -0.9591265              Up
## 2    0.6028098     Toy Story 2
## 3    1.0909148            Cars
## 4    1.0909148 The Incredibles
## 5   -0.9591265      Inside Out
## 6    1.1885359      Inside Out

What should the mean and standard deviation of all your new standardized variables be? Test your prediction by running a loop

for (i in c(1,4,6,7,8,11,12,13)) {
  y <- pirates.z[,i]
  m <- mean(y)
  s <- sd(y)
  print(i)
  print(m)
  print(s)
}
## [1] 1
## [1] 0
## [1] 1
## [1] 4
## [1] 9.581778e-17
## [1] 1
## [1] 6
## [1] -3.52138e-17
## [1] 1
## [1] 7
## [1] -1.107621e-18
## [1] 1
## [1] 8
## [1] -1.285316e-17
## [1] 1
## [1] 11
## [1] 7.051105e-18
## [1] 1
## [1] 12
## [1] -1.247169e-16
## [1] 1
## [1] 13
## [1] 7.438765e-17
## [1] 1

Question 5

Using a loop, calculate the mean selling prices of the ships separated by the number of cannons they have.

library("yarrr")
## 
## Attaching package: 'yarrr'
## The following object is masked _by_ '.GlobalEnv':
## 
##     pirates
group <- unique(auction$cannons)
mean.df <- data.frame(group, rep(NA, length(group)))
for (i in 1:length(group)){
  m <- mean(auction[auction$cannons == group[i],]$price)
  mean.df[i,2] <- m
}
mean.df
##    group rep.NA..length.group..
## 1     16              1254.7119
## 2     10               739.4052
## 3     12               832.2190
## 4      6               450.7007
## 5     14              1022.2750
## 6      4               226.6176
## 7      8               566.2791
## 8      2               273.0686
## 9     20              1423.1250
## 10    18              1426.8649