In this WPA, you will analyze data from a (again…fake) study on attraction. In the study, 1000 heterosexual University students viewed the Facebook profile of another student (the “target”) of the opposite sex. Based on a target’s profile, each participant made three judgments about the target - intelligence, attractiveness, and dateability. The primary judgement was a dateability rating indicating how dateable the person was on a scale of 0 to 100.
facebook <- read.table("http://nathanieldphillips.com/wp-content/uploads/2016/04/facebook.txt", sep = "\t", header = TRUE)
The data are located in a tab-delimited text file at http://nathanieldphillips.com/wp-content/uploads/2016/04/facebook.txt. Here is how the first few rows of the data should look:
head(facebook)
## session sex age haircolor university education shirtless intelligence
## 1 1 m 23 brown 3.Geneva 3.Masters 2.Yes 1.low
## 2 1 m 19 blonde 2.Zurich 1.HighSchool 1.No 2.medium
## 3 1 f 22 brown 2.Zurich 2.Bachelors 2.Yes 1.low
## 4 1 f 22 red 2.Zurich 2.Bachelors 1.No 2.medium
## 5 1 m 23 brown 3.Geneva 2.Bachelors 1.No 2.medium
## 6 1 m 26 blonde 2.Zurich 3.Masters 2.Yes 3.high
## attractiveness dateability
## 1 3.high 15
## 2 2.medium 44
## 3 2.medium 100
## 4 3.high 100
## 5 2.medium 63
## 6 3.high 76
Datafile description
The data file has 1000 rows and 10 columns. Here are the columns
session
: The experiment session in which the study was run. There were 50 total sessions.sex
: The sex of the targetage
: The age of the targethaircolor
: The haircolor of the targetuniversity
: The university that the target attended.education
: The highest level of education obtained by the target.shirtless
: Did the target have a shirtless profile picture? 1.No v 2.Yesintelligence
: How intelligent do you find this target? 1.Low, 2.Medium, 3.Highattractiveness
: How physically attractive do you find this target? 1.Low, 2.Medium, 3.Highdateability
: How dateable is this target? 0 to 100.
Data loading and preparation
Open your class R project. Open a new script and enter your name, date, and the wpa number at the top. Save the script in the
R
folder in your project working directory aswpa_9_LASTFIRST.R
, where LAST and FIRST are your last and first names.The data are stored in a tab–delimited text file located at http://nathanieldphillips.com/wp-content/uploads/2016/04/facebook.txt. Using
read.table()
load this data into R as a new object calledfacebook
Understand the data
- Look at the first few rows of the dataframe with the
head()
function to make sure it loaded correctly.
head(facebook)
## session sex age haircolor university education shirtless intelligence
## 1 1 m 23 brown 3.Geneva 3.Masters 2.Yes 1.low
## 2 1 m 19 blonde 2.Zurich 1.HighSchool 1.No 2.medium
## 3 1 f 22 brown 2.Zurich 2.Bachelors 2.Yes 1.low
## 4 1 f 22 red 2.Zurich 2.Bachelors 1.No 2.medium
## 5 1 m 23 brown 3.Geneva 2.Bachelors 1.No 2.medium
## 6 1 m 26 blonde 2.Zurich 3.Masters 2.Yes 3.high
## attractiveness dateability
## 1 3.high 15
## 2 2.medium 44
## 3 2.medium 100
## 4 3.high 100
## 5 2.medium 63
## 6 3.high 76
- Using the
str()
function, look at the structure of the dataframe to make sure everything looks ok
str(facebook)
## 'data.frame': 1000 obs. of 10 variables:
## $ session : int 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "f","m": 2 2 1 1 2 2 1 2 1 1 ...
## $ age : int 23 19 22 22 23 26 19 25 22 19 ...
## $ haircolor : Factor w/ 3 levels "blonde","brown",..: 2 1 2 3 2 1 2 3 2 1 ...
## $ university : Factor w/ 3 levels "1.Basel","2.Zurich",..: 3 2 2 2 3 2 3 2 2 3 ...
## $ education : Factor w/ 4 levels "1.HighSchool",..: 3 1 2 2 2 3 1 3 2 1 ...
## $ shirtless : Factor w/ 2 levels "1.No","2.Yes": 2 1 2 1 1 2 1 2 2 1 ...
## $ intelligence : Factor w/ 3 levels "1.low","2.medium",..: 1 2 1 2 2 3 1 3 2 2 ...
## $ attractiveness: Factor w/ 3 levels "1.low","2.medium",..: 3 2 2 3 2 3 2 1 2 2 ...
## $ dateability : int 15 44 100 100 63 76 61 26 76 40 ...
Custom Functions
- Write a function called
feed.me()
that takes a stringfood
as an argument, and returns the sentence “I love to eat ___“. Try your function by runningfeed.me("apples")
.
feed.me <- function(___) {
output <- paste0("I love to eat ", ___)
print(___)
}
feed.me <- function(food) {
output <- paste0("I love to eat ", food)
print(output)
}
feed.me("apples")
## [1] "I love to eat apples"
- Write a function called
my.mean()
that takes a vectorx
as an argument, and returns the mean of the vectorx
. Don’t use themean()
function! Usesum()
andlength()
!
my.mean <- function(___) {
result <- sum(___) / length(___)
return(result)
}
my.mean <- function(x) {
result <- sum(x) / length(x)
return(result)
}
- Try your
my.mean()
function to calculate the mean dateability rating of participants in thefacebook
dataset and compare the result to the built-inmean()
function to make sure you get the same result!
my.mean(facebook$dateability)
## [1] 54.199
mean(facebook$dateability)
## [1] 54.199
- Write a function called
how.many.na()
that takes a vectorx
as an argument, and returns the number of NA values found in the vector:
how.many.na <- function(x) {
output <- sum(is.na(___))
return(___)
}
how.many.na <- function(x) {
output <- sum(is.na(x))
return(output)
}
- Test your
how.many.na()
function on the age of participants in thefacebook
dataframe and then on the vectorx = c(4, 7, 3, NA, NA, 1)
how.many.na(facebook$age)
## [1] 0
how.many.na(c(4, 7, 3, NA, NA, 1))
## [1] 2
- Create a function called
my.plot()
that takes argumentsx
andy
and returns a customised scatterplot with gridlines and a regression line:
my.plot <- function(x, y) {
plot(x = ___,
y = ___,
pch = ___, # look at ?points to see the values of pch!
col = ___)
grid() # Add gridlines
# Add a regression line
abline(lm(___ ~ ___),
col = ___)
}
my.plot <- function(x, y) {
plot(x = x,
y = y,
pch = 16, # look at ?points to see the values of pch!
col = "green")
grid() # Add gridlines
# Add a regression line
abline(lm(y ~ x),
col = "blue")
}
- Now test your
my.plot()
function on the age and dateability of participants in thefacebook
dataset
my.plot(facebook$age,
facebook$dateability)
Loops
- Create a loop that prints the squares of integers from 1 to 10:
for(i in ___) {
square.i <- ___
print(square.i)
}
for(i in 1:10) {
square.i <- i ^ 2
print(square.i)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
## [1] 36
## [1] 49
## [1] 64
## [1] 81
## [1] 100
- Using the following template, create a loop that calculates (and prints) the mean dateability rating of students from each university in the
facebook
dataset
for(university.i in c("1.Basel", "2.Zurich", "3.Geneva")) {
data.i <- facebook$dateability[facebook$university == ___]
output <- paste0("The mean dateability of university ", ___, " is ", ___)
print(___)
}
for(university.i in c("1.Basel", "2.Zurich", "3.Geneva")) {
data.i <- facebook$dateability[facebook$university == university.i]
output <- paste0("The mean dateability of university ", university.i, " is ", mean(data.i))
print(output)
}
## [1] "The mean dateability of university 1.Basel is 60.7548387096774"
## [1] "The mean dateability of university 2.Zurich is 50.4793388429752"
## [1] "The mean dateability of university 3.Geneva is 52.1131498470948"
- Now create a histogram of dateability ratings from each level of intelligence using the following structure
par(mfrow = c(1, 3)) # Set up 1 x 3 plotting grid
for(intelligence.i in c("1.low", "2.medium", "3.high")) {
hist(facebook$dateability[facebook$intelligence == ___],
main = ___,
xlab = ___,
col = ___
)
}
par(mfrow = c(1, 1)) # Reset plotting grid
par(mfrow = c(1, 3)) # Set up 1 x 3 plotting grid
for(intelligence.i in c("1.low", "2.medium", "3.high")) {
hist(facebook$dateability[facebook$intelligence == intelligence.i],
main = intelligence.i,
xlab = "Dateability",
col = "skyblue"
)
}
par(mfrow = c(1, 1)) # Reset plotting grid
CHECKPOINT!
More functions!
- Write a function called
ttest.apa()
that takes a vectorx
as an argument and returns an apa style conclusion from a one-sample test ofx
.
ttest.apa <- function(x, mu) {
# Store the one-sample ttest in object a
a <- t.test(x = ___,
mu = ___)
df <- a$parameter # Get the degrees of freedom
test.stat <- ___ # Get the test statistic
p.value <- ___ # Get the p-value
# If the test is significant...
if(p.value <= ___) {
output <- paste0("The test is significant! t(",
df, ") = ", test.stat,
", p = ", p.value,
" (H0 = ", mu, ")")
}
# If the test is not significant...
if(p.value > ___) {
output <- ___
}
print(___)
}
ttest.apa <- function(x, mu) {
# Store the one-sample ttest in object a
a <- t.test(x = x,
mu = mu)
df <- a$parameter # Get the degrees of freedom
test.stat <- a$statistic # Get the test statistic
p.value <- a$p.value # Get the p-value
# If the test is significant...
if(p.value <= .05) {
output <- paste0("The test is significant! t(",
df, ") = ", test.stat,
", p = ", p.value,
" (H0 = ", mu, ")")
}
# If the test is not significant...
if(p.value > .05) {
output <- paste0("The test is not significant! t(",
df, ") = ", test.stat,
", p = ", p.value,
" (H0 = ", mu, ")")
}
print(output)
}
- Test your
ttest.apa()
function on the the dateability of participants in the facebook study. Specifically, test if their mean dateability rating is different from 50
ttest.apa(facebook$dateability, 50)
## [1] "The test is significant! t(999) = 4.93261151453317, p = 9.49897937574552e-07 (H0 = 50)"
More loops!
- The following dataframe
survey
contains results from a survey of 5 participants. Each participant was asked 5 questions on a 1-10 likert scale. As you can see, many of the responses are not valid integers from 1-10. Using a loop, create a new dataframe calledsurvey.corrected
that converts all invalid values to NA:
survey <- data.frame("p" = c(1, 2, 3, 4, 5),
"q1" = c(5, 3, 6, 3, 5),
"q2" = c(-1, 4, 3, 6, 11),
"q3" = c(6, 22, 4, 6, -5),
"q4" = c(6, 3, 4, -2, 4),
"q5" = c(1, 1, 900, 1, 2))
survey.corrected <- survey # COPY SURVEY
for(column.i in ____ ) { # LOOP OVER COLUMNS
x <- ____ # COPY THE ORIGINAL COLUMN as x
x[x %in% ___ == FALSE] <- ___ # REPLACE
survey.corrected[,___] <- ___ # ASSIGN x back to survey.correced
}
survey.corrected <- survey # COPY SURVEY
for(column.i in 2:6 ) { # LOOP OVER COLUMNS
x <- survey[,column.i] # COPY THE ORIGINAL COLUMN as x
x[(x %in% 1:10) == FALSE] <- NA # REPLACE
survey.corrected[,column.i] <- x # ASSIGN x back to survey.correced
}
survey.corrected
## p q1 q2 q3 q4 q5
## 1 1 5 NA 6 6 1
## 2 2 3 4 NA 3 1
## 3 3 6 3 4 4 NA
## 4 4 3 6 6 NA 1
## 5 5 5 NA NA 4 2
Simulation!
- What is the probability of getting a significant p-value if the null hypothesis is true? Test this by conducting the following simulation:
- Create a vector called
p.values
with 100 NA values. - Draw a sample of size 10 from a normal distribution with mean = 0 and standard deviation = 1.
- Do a one-sample t-test testing if the mean of the distribution is different from 0. Save the p-value from this test in the 1st position of
p.values
. - Repeat these steps with a loop to fill
p.values
with 100 p-values. - Create a histogram of
p.values
and calculate the proportion ofp-values
that are significant at the .05 level.
p.values <- rep(NA, ___)
for(i in ___) {
x <- rnorm(n = ___, mean = ___, sd = ___)
result <- t.test(___)$___
p.values[___] <- ___
}
p.values <- rep(NA, 100)
for(i in 1:100) {
x <- rnorm(n = 10, mean = 0, sd = 1)
result <- t.test(x)$p.value
p.values[i] <- result
}
hist(p.values)
mean(p.values < .05)
## [1] 0.05
- Create a function called
psimulation
with 4 arguments:sim
: the number of simulations,samplesize
: the sample size,mu.true
: the true mean, andsd.true
: the true standard deviation. Your function should repeat the simulation from the previous question with the given arguments. That is, it should calculatesim
p-values testing whethersamplesize
samples from a normal distribution with mean =mu.true
and standard deviation =sd.true
is significantly different from 0. The function should return a vector of p-values.
psimulation <- function(sim,
samplesize,
mu.true,
sd.true) {
p.values <- rep(NA, sim)
for(i in 1:sim) {
x <- rnorm(n = samplesize, mean = mu.true, sd = sd.true)
result <- t.test(x)$p.value
p.values[i] <- result
}
return(p.values)
}
# testing the function
psimulation(sim = 20, samplesize = 20, mu.true = 1, sd.true = 1)
## [1] 1.055971e-03 4.378150e-09 6.837770e-08 1.195854e-03 5.838443e-02
## [6] 1.993403e-04 1.804775e-04 4.266576e-04 5.785016e-04 5.137253e-04
## [11] 7.624965e-06 9.271828e-04 2.558004e-03 1.408167e-02 2.325602e-05
## [16] 9.951451e-05 8.273548e-07 9.853791e-04 1.132161e-04 1.104221e-05
Submit!
Save and email your wpa_9_LastFirst.R
file to me at nathaniel.phillips@unibas.ch. Then, go to https://goo.gl/forms/UblvQ6dvA76veEWu1 to complete the WPA submission form.