Hypothesis tests from a student survey
In this WPA, you’ll analyze data from a fictional survey of 100 students. In fact, you can even see the code I used to generate the data here (code to generate wpa6 data).
The data are located in a tab-delimited text file at http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt
Datafile description
The data file has 100 rows and 12 columns. Here are the columns
sex
: string. A string indicting the sex of the participant. “m” = male, “f” = female.age
: integer. An integer indicating the age of the participant.major
: string. A string indicating the participant’s majorhaircolor
: string. Hair coloriq
: integer. P’s score on an IQ test.country
: string. P’s country of originlogic
: numeric. Amount of time it took for a participant to complete a logic problem. Smaller is better.siblings
: integer. How many siblings does the P have?multitasking
: integer. Participant’s score on a multitasking task. Higher is better.partners
: integer. How many sexual partners has the participant had?marijuana
: binary. Has the participant ever tried marijuana? 0 = “no”, 1 = “yes”risk
: binary. Would the person play a gamble with a 50% chance of losing 20CHF and a 50% chance of earning 20CHF? 0 means the participant would not play the gamble, 1 means they would
Data loading and preparation
Open your class R project. Open a new script and enter your name, date, and the wpa number at the top. Save the script in the
R
folder in your project working directory aswpa_6_LASTFIRST.R
, where LAST and FIRST are your last and first names.The data are stored in a tab–delimited text file located at http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt. Using
read.table()
load this data into R as a new object calledwpa6.df
as follows.
# Read data into a new object called wpa6.df
wpa6.df <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt",
header = TRUE, # There is a header row
sep = "\t") # Data are tab-delimited
- Using
write.table()
, write the data as a text file titledwpa6.txt
into the data folder in your project working directory as follows.
# Write wpa6.df to a tab-delimited text file in my data folder.
write.table(wpa6.df, # Object to be written
file = "data/wpa6.txt", # Put file wpa6.txt in the data folder of my working directory
sep = "\t") # Make data tab-delimited
- Using
head()
,str()
, andView()
look at the dataset and make sure that it was loaded correctly. If the data don’t look correct (i.e; if there isn’t a header row and 100 rows and 12 columns), you didn’t load it correctly!
Please write your answers to all hypothesis test questions in proper American Pirate Association (APA) style! If your p-value is less than .01, just write p < .01
Chi-square: X(df) = XXX, p = YYY
t-test: t(df) = XXX, p = YYY
correlation test: r = XXX, t(df) = YYY, p = ZZZZ
For example, here is some output with the appropriate apa conclusion:
library(yarrr)
# Do pirates with headbands have different numbers of tattoos than those
# who do not wear headbands?
t.test(tattoos ~ headband,
data = pirates)
##
## Welch Two Sample t-test
##
## data: tattoos by headband
## t = -19.313, df = 146.73, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.878101 -4.786803
## sample estimates:
## mean in group no mean in group yes
## 4.699115 10.031567
Answer: Pirates with headbands have significantly more tattoos on average than those who do not wear headbands: t(146.73) = -19.31, p < .01
t-test(s)
Average IQ in the general population is 100. Do the participants have an IQ different from the general population? Answer this with a one-sample t-test.
A friend of yours claims that students have 2.5 siblings on average. Test this claim with a one-sample t-test.
Do students that have smoked marijuana have different IQ levels than those who have never smoked marijuana? Test this claim with a two-sample t-test (you can either use the vector or the formula notation for a t-test)
Correlation test(s)
Do students with higher multitasking skills tend to have more romantic partners than those with lower multitasking skills? Test this with a correlation test:
Do people with higher IQs perform faster on the logic test? Answer this question with a correlation test.
chi-square test(s)
Are some majors more popular than others? Answer this question with a one-sample chi-square test.
In general, were students more likely to take a risk than not? Answer this question with a one-sample chi-square test
Is there a relationship between hair color and students’ academic major? Answer this with a two-sample chi-square test
CHECKPOINT!
Anscombe’s Famous data quartet
In the next few questions, we’ll explore Anscombe’s famous data quartet. This famous dataset will show you the dangers of interpreting statistical tests (like a correlation test), without first plotting the data!
- Run the following code to create the
anscombe.df
dataframe. This dataframe contains 4 datasets x1 and y1, x2 and y2, x3 and y3 and x4 and y4:
# JUST COPY, PASTE, AND RUN!
anscombe.df <- data.frame(x1 = c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5),
y1 = c(8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 4.68),
x2 = c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5),
y2 = c(9.14, 8.14, 8.74, 8.77, 9.26, 8.1, 6.13, 3.1, 9.13, 7.26, 4.74),
x3 = c(10, 8, 13, 9, 11, 14, 6, 4, 12, 7, 5),
y3 = c(7.46, 6.77, 12.74, 7.11, 7.81, 8.84, 6.08, 5.39, 8.15, 6.42, 5.73),
x4 = c(8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8),
y4 = c(6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.5, 5.56, 7.91, 6.89))
Calculate the correlation between x1 and y1, x2 and y2, x3 and y3, and x4 and y4 separately (that is, what is the correlation between x1 and y? Now, what is the correlation between x2 and y2?, …). What do you notice about the correlation values for each test?
Now run the following code to generate a scatterplot of each data pair, what do you find?
# JUST COPY, PASTE, AND RUN!
# Plot the famous Anscombe quartet
par(mfrow = c(2, 2)) # Create 2 x 2 plotting grid
for (i in 1:4) { # Loop over datasets
# Assign x and y for current value of i
if (i == 1) {x <- anscombe.df$x1
y <- anscombe.df$y1}
if (i == 2) {x <- anscombe.df$x2
y <- anscombe.df$y2}
if (i == 3) {x <- anscombe.df$x3
y <- anscombe.df$y3}
if (i == 4) {x <- anscombe.df$x4
y <- anscombe.df$y4}
# Create plot
plot(x = x, y = y, pch = 21, main = "Anscombe 1",
bg = "orange", col = "red",
xlim = c(0, 20), ylim = c(0, 15))
# Add regression line
abline(lm(y ~ x,
data = data.frame(y, x)),
col = "blue", lty = 2)
# Add correlation test text
text(x = 3, y = 12,
labels = paste0("cor = ", round(cor(x, y), 2)))
}
par(mfrow = c(1, 1)) # Reset plotting grid
What you have just seen is the famous Anscombe’s quartet a dataset designed to show you how important is to always plot your data before running a statistical test!!! You can see more at the wikipedia page here: https://en.wikipedia.org/wiki/Anscombe%27s_quartet
You pick the test!
Is there a relationship between whether a student has ever smoked marijuana and his/her decision to accept or reject the risky gamble?
Do males and females have different numbers of sexual partners on average?
Do males and females differ in how likely they are to have smoked marijuana?
Do people who have smoked marijuana have different logic scores on average than those who never have smoked marijuana?
Do people with higher iq scores tend to perform better on the logic test that those with lower iq scores?
More complicated tests
Are Germans more likely than not to have tried marijuana? (Hint: this is a one-sample chi-square test with a
subset
argument)Does the relationship between iq and multitasking differ between males and females? Test this by conducting two separate tests – one for males and one for females. Do your conclusions differ?
Does the IQ of people with brown hair differ from blondes? (Hint: This is a two-sample t-test that requires you to use the
subset()
argument to tell R which two groups you want to compare)Only for men from Switzerland, is there a relationship between age and IQ?
Only for people who chose the risky gamble, do people that have smoked marijuana have more sexual partners than those who have never smoked marijuana? is there a relationship between smoking marijuana and number of sexual partners?
Only for people who chose the risky gamble and have never tried marijuana, is there a relationship between iq and performance on the logic test?
Submit!
Save and email your wpa_6_LastFirst.R
file to me at nathaniel.phillips@unibas.ch. Then, go to https://goo.gl/forms/UblvQ6dvA76veEWu1 to complete the WPA submission form.