Open a new R script in R and save it as wpa_X_LastFirst.R (where Last and First is your last and first name). At the top of your script, write the assignment number, your name and date in comments. When you answer a task, indicate which task you are answering with appopriate comments.
Analyzing Bar survey data
The following contain (fictional!) data from a survey of 200 people at one of two bars in Basel (Grenzwert and Paddy’s) last Friday night at 3:00am. Each person was asked which kind of cologne they were wearing. After answering this question, a (very busy) researcher recorded how long each person spent talking to people at the bar. The data are stored in the following 5 vector objects:
id: An id indicating the participant in the formx.n, wherexis the name of the bar the participant was at, andnis a random indexing number)sex: The person’s sex:maleorfemalecologne: Which cologne did the person wear?gioorcalvinkleinbar: Where the person went out:grenzwertorpaddystime: The amount of time the person spent talking to people in minutes
Thankfully, you don’t need to type in the data yourself! The objects are stored in an RData file online.
A. Load the vectors into your R session by running the following code.
load(file = url("https://dl.dropboxusercontent.com/u/7618380/wpa2.RData"))B. The str() function will give you basic information about objects. Get to know the objects (id, sex, cologne, bar, time) by running the str() function on each of the 5 vectors.
Review
How many people were there of each sex? (Hint: use
table())What was the mean time?
What was the standard deviation of times?
Create
time.za z-score transformation of time. (Hint: z-score is defined as(x - mean(x)) / sd(x))
Numerical Indexing
What was the value of the first time?
What were the sexes of the first five participants?
What were the colognes of the 10th through the 20th participants
Which bar did the last participant go to? (hint: don’t write the indexing number directly; instead, index the vector using the
length()function with the appropriate argument)
Logical Indexing on one variable
- How many people wore gio? b) How many wore calvinklein?
- How many people went to Grenzwert? b) How many went to Paddys?
What percent of people went to Grenzert? (Hint: use
mean()combined with a logical vector)How many talking times were longer than 30 minutes?
What percent of talking times were longer than an hour?
What percent of talking times were longer than 20 minutes but less than 40 minutes?
Logical indexing and two variables
What were the ids of people who went to grenzwert?
What was the mean time of people who went to Grenzwert?
What was the mean time of people who went to Paddys?
What was the mean time of people who wore gio?
What was the mean time of people who wore calvinklein?
Based on what you’ve learned, if someone wants to talk as much (I should have said as long) as possible, what cologne should they wear?
Changing data in a vector with indexing
In the next questions, we’ll use indexing and assignment to change the values within a vector. Because we don’t want to change the original data, we’ll make all of our adjustments on new vectors.
Create new objects
bar.r,cologne.randtime.rthat are copies of the originalbar,cologneandtimeobjects (Hint: Just assign the existing vectors to new objects)- In the
bar.rvector, change the"grenzwert"values to"g". b) Now change the"paddys"values to"p"
- In the
- In the
cologne.rvector, change the"gio"values to"G". b) Now change the"calvinklein"values to"C"
- In the
In the
time.rvector, change all time values greater than 280 to 280. Confirm that you did it correctly by calculating the maximum time intime.r
Checkpoint! If you got this far you’re doing great!
Solving a paradox…
- Based on what you’ve learned so far, if someone wanted to talk to people as long as possible, what cologne should they wear?
Let’s see if your prediction holds up!
What was the mean time of people who went to Grenzwert and wore gio??
What was the mean time of people who went to Grenzwert and wore calvinklein?
What was the mean time of people who went to Paddys who wore gio?
What was the mean time of people who went to Paddys who wore calvinklein??
Based on what you’ve learned now, if someone’s goal is to talk to people as long as possible, what cologne should they wear?
You can visualize the data using the following code
# Combine vectors in a dataframe
survey.df <- data.frame(bar, cologne, time)
# Create a pirateplot of the data
yarrr:::pirateplot(time ~ cologne + bar, data = survey.df)What you’ve just seen is an example of Simpson’s Paradox. If you want to learn more, check out the wikipedia page.
Some bigger challenges…
What percent of women wore calvinklein?
What was the median time of people who went to grenzwert and wore gio but who talked more than 100 minutes?
What percent of participants either went to grenzwert and talked for less than 220 minutes or went to paddys and talked for more than 150 minutes but no longer than 250 minutes?
Let’s make the calvinklein wearers look better. For all of the calvinklein wearers, add a random sample from a normal distribution with mean 30 and standard deviation 5 to their original talking times.
Submit!
Save and email your wpa_X_LastFirst.R file to me at nathaniel.phillips@unibas.ch. Then, go to https://goo.gl/forms/UblvQ6dvA76veEWu1 to complete the WPA submission form.