In this WPA, you will analyze data from a (again…fake) study on attraction. In the study, 1000 heterosexual University students viewed the Facebook profile of another student (the “target”) of the opposite sex. Based on a target’s profile, each participant made three judgments about the target - intelligence, attractiveness, and dateability. The primary judgement was a dateability rating indicating how dateable the person was on a scale of 0 to 100.
The data are located in a tab-delimited text file at http://nathanieldphillips.com/wp-content/uploads/2016/04/facebook.txt. Here is how the first few rows of the data should look:
Datafile description
The data file has 1000 rows and 10 columns. Here are the columns
session: The experiment session in which the study was run. There were 50 total sessions.sex: The sex of the targetage: The age of the targethaircolor: The haircolor of the targetuniversity: The university that the target attended.education: The highest level of education obtained by the target.shirtless: Did the target have a shirtless profile picture? 1.No v 2.Yesintelligence: How intelligent do you find this target? 1.Low, 2.Medium, 3.Highattractiveness: How physically attractive do you find this target? 1.Low, 2.Medium, 3.Highdateability: How dateable is this target? 0 to 100.
Data loading and preparation
Open your class R project. Open a new script and enter your name, date, and the wpa number at the top. Save the script in the
Rfolder in your project working directory aswpa_9_LASTFIRST.R, where LAST and FIRST are your last and first names.The data are stored in a tab–delimited text file located at http://nathanieldphillips.com/wp-content/uploads/2016/04/facebook.txt. Using
read.table()load this data into R as a new object calledfacebook
Understand the data
Look at the first few rows of the dataframe with the
head()function to make sure it loaded correctly.Using the
str()function, look at the structure of the dataframe to make sure everything looks ok
Custom Functions
- Write a function called
feed.me()that takes a stringfoodas an argument, and returns the sentence “I love to eat ___“. Try your function by runningfeed.me("apples").
feed.me <- function(___) {
output <- paste0("I love to eat ", ___)
return(___)
}- Write a function called
my.mean()that takes a vectorxas an argument, and returns the mean of the vectorx. Don’t use themean()function! Usesum()andlength()!
my.mean <- function(___) {
result <- sum(___) / length(___)
return(result)
}Try your
my.mean()function to calculate the mean dateability rating of participants in thefacebookdataset and compare the result to the built-inmean()function to make sure you get the same result!Write a function called
how.many.na()that takes a vectorxas an argument, and returns the number of NA values found in the vector:
how.many.na <- function(x) {
output <- sum(is.na(___))
return(___)
}Test your
how.many.na()function on the age of participants in thefacebookdataframe and then on the vectorx = c(4, 7, 3, NA, NA, 1)Create a function called
my.plot()that takes argumentsxandyand returns a customised scatterplot with gridlines and a regression line:
my.plot <- function(x, y) {
plot(x = ___,
y = ___,
pch = ___, # look at ?points to see the values of pch!
col = ___)
grid() # Add gridlines
# Add a regression line
abline(lm(___ ~ ___),
col = ___)
}- Now test your
my.plot()function on the age and dateability of participants in thefacebookdataset
Loops
- Create a loop that prints the squares of integers from 1 to 10:
for(i in ___) {
square.i <- ___
print(square.i)
}- Using the following template, create a loop that calculates (and prints) the mean dateability rating of students from each university in the
facebookdataset
for(university.i in c("1.Basel", "2.Zurich", "3.Geneva")) {
data.i <- facebook$dateability[facebook$university == ___]
output <- paste0("The mean dateability of university ", ___, " is ", ___)
print(___)
}- Now create a histogram of dateability ratings from each level of intelligence using the following structure
par(mfrow = c(1, 3)) # Set up 1 x 3 plotting grid
for(intelligence.i in c("1.low", "2.medium", "3.high")) {
hist(facebook$dateability[facebook$intelligence == ___],
main = ___,
xlab = ___,
col = ___
)
}
par(mfrow = c(1, 1)) # Reset plotting gridCHECKPOINT!
More functions!
- Write a function called
ttest.apa()that takes a vectorxas an argument and returns an apa style conclusion from a one-sample test ofx.
ttest.apa <- function(x, mu) {
# Store the one-sample ttest in object a
a <- t.test(x = ___,
mu = ___)
df <- a$parameter # Get the degrees of freedom
test.stat <- ___ # Get the test statistic
p.value <- ___ # Get the p-value
# If the test is significant...
if(p.value <= ___) {
output <- paste0("The test is significant! t(",
df, ") = ", test.stat,
", p = ", p.value,
" (H0 = ", mu, ")")
}
# If the test is not significant...
if(p.value > ___) {
output <- ___
}
print(___)
}- Test your
ttest.apa()function on the the dateability of participants in the facebook study. Specifically, test if their mean dateability rating is different from 50
More loops!
- The following dataframe
surveycontains results from a survey of 5 participants. Each participant was asked 5 questions on a 1-10 likert scale. As you can see, many of the responses are not valid integers from 1-10. Using a loop, create a new dataframe calledsurvey.correctedthat converts all invalid values to NA:
survey <- data.frame("p" = c(1, 2, 3, 4, 5),
"q1" = c(5, 3, 6, 3, 5),
"q2" = c(-1, 4, 3, 6, 11),
"q3" = c(6, 22, 4, 6, -5),
"q4" = c(6, 3, 4, -2, 4),
"q5" = c(1, 1, 900, 1, 2))survey.corrected <- survey # COPY SURVEY
for(column.i in ____ ) { # LOOP OVER COLUMNS
x <- ____ # COPY THE ORIGINAL COLUMN as x
x[(x %in% ___) == FALSE] <- ___ # REPLACE
survey.corrected[,___] <- ___ # ASSIGN x back to survey.correced
}Simulation!
- What is the probability of getting a significant p-value if the null hypothesis is true? Test this by conducting the following simulation:
- Create a vector called
p.valueswith 100 NA values. - Draw a sample of size 10 from a normal distribution with mean = 0 and standard deviation = 1.
- Do a one-sample t-test testing if the mean of the distribution is different from 0. Save the p-value from this test in the 1st position of
p.values. - Repeat these steps with a loop to fill
p.valueswith 100 p-values. - Create a histogram of
p.valuesand calculate the proportion ofp-valuesthat are significant at the .05 level.
p.values <- rep(NA, ___)
for(i in ___) {
x <- rnorm(n = ___, mean = ___, sd = ___)
result <- t.test(___)$___
p.values[___] <- ___
}- Create a function called
psimulationwith 4 arguments:sim: the number of simulations,samplesize: the sample size,mu.true: the true mean, andsd.true: the true standard deviation. Your function should repeat the simulation from the previous question with the given arguments. That is, it should calculatesimp-values testing whethersamplesizesamples from a normal distribution with mean =mu.trueand standard deviation =sd.trueis significantly different from 0. The function should return a vector of p-values.
Submit!
Save and email your wpa_9_LastFirst.R file to me at nathaniel.phillips@unibas.ch. Then, go to https://goo.gl/3VxYkN to complete the WPA submission form.