WPA #9: Loops

Question 1

Using a loop, print the integers from 1 to 50. (Hint, use the print() function).

Question 2.

Using a loop, add all the integers between 0 and 1000.
Now, add all the EVEN integers between 0 and 1000 (hint: use seq())
Now, repeat A and B WITHOUT using a loop.

Question 3

Here is a dataframe of survey data containing 5 questions I collected from 6 participants:

survey <- data.frame(
                     "participant" = c(1, 2, 3, 4, 5, 6),
                     "q1" = c(5, 3, 2, 7, 11, 0),
                     "q2" = c(4, 2, 2, 5, -10, 99),
                     "q3" = c(-4, -3, 4, 2, 9, 10),
                     "q4" = c(-30, 5, 2, 23, 4, 2),
                     "q5" = c(88, 4, -20, 2, 4, 2)
                     )

The response to each question should be an integer between 1 and 5. Obviously, we have some bad values in the dataframe. Let’s fix them.

Using a loop, create a new dataframe called survey.clean where all the invalid values (those that are not integers between 1 and 5) are set to NA.

Create a new object called survey.clean by assigning the original dataset to survey.clean.
Set the loop index to i.
Set the loop index.values to the vector of data columns.
In the loop code, assign the ith column of data to a new vector called data.temp.
Convert all invalid values in data.temp to NA (hint: use )
Assign data.temp back to the ith column of survey.clean.
Close the loop and let it run!

Now, again using a loop, add a new column to the dataframe called “invalid.answers” that indicates, for each participant, how many bad answers they gave.

Hint: Use the following steps

Assign the new vector invalid.answers to the dataframe containing all NA values.
Create a loop over the rows of the dataframe.
Assign the data for the ith row to a new vector called part.i
Calculate how many of the values in part.i are NA (use is.na())
Assign the result to the ith row in invalid.answers

Question 4

Standardizing a variable means subtracting the mean, and then dividing by the standard deviation. Let’s use a loop to standardize the numeric columns in the pirates dataset. You can access this dataset in the yarrr package, or by downloading it from http://nathanieldphillips.com/wp-content/uploads/2016/01/pirates.txt

Create a function called standardize.me() that takes a numeric vector as an argument, and returns the standardized version of the vector (hint: Look at the answers to WPA8!)
Assign all the numeric columns of the original pirates dataset to a new dataset called pirates.z
Using a loop and your new function, standardize all the variables pirates.z dataset
What should the mean and standard deviation of all your new standardized variables be? Test your prediction by running a loop

For this question we’ll use the auction dataset in the yarrr package. This dataset shows the selling prices of 1,000 pirate ships sold at an auction. If you can’t access the yarrr package, you can download the dataset using this link: “http://nathanieldphillips.com/wp-content/uploads/2016/01/auction.txt”

Question 5

Using a loop, calculate the mean selling prices of the ships separated by the number of cannons they have.
Now do the same thing, but now don’t use a loop (maybe aggregate() or dplyr….)

Question 6

Using a loop, create 10 histograms showing the selling prices of ships with conditions of 1, 2, 3, …10. (Put them all in one plot by using par(mfrow())). In the main title of each plot, make sure to indicate which condition value is being plotted. Also, include a vertical line showing the mean selling prices of all ships in the plot.

Question 7

Have you heard of the term “p-hacking”? Unfortunately it has nothing to do with pirates. It describes how some researchers will conduct as many tests as they can in order to get a test with a p-value less than .05 (which they can then say they predicted all along!). Here’s how easy it is to p-hack using a loop.

In this example, we’ll see what variables are correlated with people’s heights. Let’s start by creating a vector of heights. We’ll do this using the rnorm() function. This will generate 100 heights from a normal distribution with mean 170 and standard deviation of 10.

height <- round(rnorm(100, mean = 170, sd = 10), 0)

Create a histogram of the height data.
Now, let’s create some made-up survey data. This data will represent the result of a survey with 100 different questions for each of the 100 participants. Create a matrix called survey with 100 rows and 100 columns using the following code. As you can see, these data have nothing to do with our height data. Therefore, any significant p-values we get by correlating these values with our height data will be false alarms!

survey <- matrix(rnorm(n = 100 * 100, mean = 0, sd = 1),
                    nrow = 100, ncol = 100
                    )

Now, using a loop, conduct a correlation test between height and each column of survey. Store the p.value of each test in a new vector called p.values
Create a histogram showing the distribution of all the p-values you found
How many of the p.values are less than .05?
Which variables produced a p-value less than .05? (hint, use the which() function). Store the results in a vector called sig.variables
For each of the significant variables, create a scatterplot showing the relationship between height and that variable. Include a regression line. Again, use par(mfrow = c(X, Y)) to put them all in one plot
For each of the significant p-values, write an apa style conclusion of your p-hacked results! (hint: use the apa function from the previous WPA or in the yarrr package).