Question 1
Using a loop, print the integers from 1 to 50. (Hint, use the print() function).
for (i in 1:50) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 11
## [1] 12
## [1] 13
## [1] 14
## [1] 15
## [1] 16
## [1] 17
## [1] 18
## [1] 19
## [1] 20
## [1] 21
## [1] 22
## [1] 23
## [1] 24
## [1] 25
## [1] 26
## [1] 27
## [1] 28
## [1] 29
## [1] 30
## [1] 31
## [1] 32
## [1] 33
## [1] 34
## [1] 35
## [1] 36
## [1] 37
## [1] 38
## [1] 39
## [1] 40
## [1] 41
## [1] 42
## [1] 43
## [1] 44
## [1] 45
## [1] 46
## [1] 47
## [1] 48
## [1] 49
## [1] 50
Question 2.
Using a loop, add all the integers between 0 and 1000.
current.sum <-0
for (i in 0:1000){
current.sum <- current.sum + i
}
current.sum
## [1] 500500
Now, add all the EVEN integers between 0 and 1000 (hint: use seq())
s <- seq(2,1000, 2)
cur.sum <- 0
for (i in s){
cur.sum <- cur.sum + i
}
cur.sum
## [1] 250500
Now, repeat A and B WITHOUT using a loop.
1000*(1000+1)/2
## [1] 500500
1000*(1000+2)/4
## [1] 250500
Question 3
Here is a dataframe of survey data containing 5 questions I collected from 6 participants:
survey <- data.frame(
"participant" = c(1, 2, 3, 4, 5, 6),
"q1" = c(5, 3, 2, 7, 11, 0),
"q2" = c(4, 2, 2, 5, -10, 99),
"q3" = c(-4, -3, 4, 2, 9, 10),
"q4" = c(-30, 5, 2, 23, 4, 2),
"q5" = c(88, 4, -20, 2, 4, 2)
)
The response to each question should be an integer between 1 and 5. Obviously, we have some bad values in the dataframe. Let’s fix them.
Using a loop, create a new dataframe called survey.clean where all the invalid values (those that are not integers between 1 and 5) are set to NA.
Create a new object called survey.clean by assigning the original dataset to survey.clean.
Set the loop index to i.
Set the loop index.values to the vector of data columns.
In the loop code, assign the ith column of data to a new vector called data.temp.
Convert all invalid values in data.temp to NA (hint: use )
Assign data.temp back to the ith column of survey.clean.
Close the loop and let it run!
survey.clean <- survey
for (i in 1:nrow(survey.clean)){
y <- survey.clean[, i]
y[(y %in% c(1:5)) == F] <- NA
survey.clean[, i] <- y
}
survey.clean
## participant q1 q2 q3 q4 q5
## 1 1 5 4 NA NA NA
## 2 2 3 2 NA 5 4
## 3 3 2 2 4 2 NA
## 4 4 NA 5 2 NA 2
## 5 5 NA NA NA 4 4
## 6 NA NA NA NA 2 2
Now, again using a loop, add a new column to the dataframe called “invalid.answers” that indicates, for each participant, how many bad answers they gave. Hint: Use the following steps
Assign the new vector invalid.answers to the dataframe containing all NA values.
Create a loop over the rows of the dataframe.
Assign the data for the ith row to a new vector called part.i
Calculate how many of the values in part.i are NA (use is.na())
Assign the result to the ith row in invalid.answers
survey.clean$invalid.answers <- rep(NA, nrow(survey.clean))
for (i in 1:nrow(survey.clean)){
y <- survey.clean[i,1:ncol(survey.clean)-1]
x <- is.na(y)
survey.clean$invalid.answers[i] <- sum(x)
}
survey.clean
## participant q1 q2 q3 q4 q5 invalid.answers
## 1 1 5 4 NA NA NA 3
## 2 2 3 2 NA 5 4 1
## 3 3 2 2 4 2 NA 1
## 4 4 NA 5 2 NA 2 2
## 5 5 NA NA NA 4 4 3
## 6 NA NA NA NA 2 2 4
Question 4
Standardizing a variable means subtracting the mean, and then dividing by the standard deviation. Let’s use a loop to standardize the numeric columns in the pirates dataset.
Create a function called standardize.me() that takes a numeric vector as an argument, and returns the standardized version of the vector (hint: Look at the answers to WPA8!)
Assign all the numeric columns of the original pirates dataset to a new dataset called pirates.z
Using a loop and your new function, standardize all the variables pirates.z dataset
pirates <- read.delim("~/Dropbox/RSeminar/pirates.txt")
standardize.me <- function(vec){
result <- ( vec - mean(vec) ) / sd(vec)
return (result)
}
pirates.z <- pirates
for (i in c(1,4,6,7,8,11,12,13)) {
y <- pirates.z[,i]
x <- standardize.me(y)
pirates.z[,i] <- x
}
head(pirates.z)
## id sex headband age college tattoos tchests
## 1 -1.729454 female yes 0.4705692 JSSFP 0.4837266 1.8316209
## 2 -1.725992 male yes -0.4326347 CCCC 1.6649880 -0.1670301
## 3 -1.722530 male yes -0.4326347 CCCC 0.7790419 -0.3097909
## 4 -1.719067 male yes 0.2899284 JSSFP 0.7790419 -1.0235949
## 5 -1.715605 female yes 0.6512100 CCCC 2.2556187 0.5467738
## 6 -1.712142 male yes 0.4705692 JSSFP 0.7790419 -0.7380733
## parrots favorite.pirate sword.type sword.time eyepatch
## 1 1.6147531 Blackbeard cutlass -0.16245384 0.6951229
## 2 0.1411021 Anicetus cutlass -0.25404809 0.6951229
## 3 -0.2273107 Jack Sparrow cutlass -0.09261323 -1.4371559
## 4 -0.5957234 Edward Low cutlass -0.23687417 0.6951229
## 5 2.7199914 Anicetus cutlass -0.18420748 0.6951229
## 6 -0.2273107 Jack Sparrow sabre 1.33396715 0.6951229
## beard.length fav.pixar
## 1 -0.9591265 Up
## 2 0.6028098 Toy Story 2
## 3 1.0909148 Cars
## 4 1.0909148 The Incredibles
## 5 -0.9591265 Inside Out
## 6 1.1885359 Inside Out
What should the mean and standard deviation of all your new standardized variables be? Test your prediction by running a loop
for (i in c(1,4,6,7,8,11,12,13)) {
y <- pirates.z[,i]
m <- mean(y)
s <- sd(y)
print(i)
print(m)
print(s)
}
## [1] 1
## [1] 0
## [1] 1
## [1] 4
## [1] 9.581778e-17
## [1] 1
## [1] 6
## [1] -3.52138e-17
## [1] 1
## [1] 7
## [1] -1.107621e-18
## [1] 1
## [1] 8
## [1] -1.285316e-17
## [1] 1
## [1] 11
## [1] 7.051105e-18
## [1] 1
## [1] 12
## [1] -1.247169e-16
## [1] 1
## [1] 13
## [1] 7.438765e-17
## [1] 1
Question 5
Using a loop, calculate the mean selling prices of the ships separated by the number of cannons they have.
library("yarrr")
##
## Attaching package: 'yarrr'
## The following object is masked _by_ '.GlobalEnv':
##
## pirates
group <- unique(auction$cannons)
mean.df <- data.frame(group, rep(NA, length(group)))
for (i in 1:length(group)){
m <- mean(auction[auction$cannons == group[i],]$price)
mean.df[i,2] <- m
}
mean.df
## group rep.NA..length.group..
## 1 16 1254.7119
## 2 10 739.4052
## 3 12 832.2190
## 4 6 450.7007
## 5 14 1022.2750
## 6 4 226.6176
## 7 8 566.2791
## 8 2 273.0686
## 9 20 1423.1250
## 10 18 1426.8649