Just like in our other projects, you’ll want to start by using .libPaths() to point R to the folder where all of the packages are installed and then load the relevant packages. You should also upload coffeedata.RData (if you create a new project for this analysis, you’ll want to upload it into your project folder) and then load the data.

.libPaths("/home/rstudioshared")
library(dplyr)
library(ggplot2)
library(tidyr)
library(knitr)

load("coffeedata.RData")

Loading “coffeedata.RData” adds three data frames to your enivorment: cup_contents, students and test_results. These tables correspond to the data sheets that you completed while performing the experiment. The first five rows of each table are shown below. You may want to take a closer look using the View() function.

cup_contents
TestId Cup.A Cup.B
1 SA Starbucks
2 Starbucks Starbucks
3 SA SA
4 Starbucks SA
5 Starbucks SA
students
StudentId Grade coffee_per_week coffee_ability drink_stanns_coffee
1 9 NA
2 9 NA
3 9 NA
4 9 NA
5 9 2 strong never
test_results
TestId StudentId difference preference which_sa
1 208 yes B A
2 221 no
3 71 yes A A
4 180 yes B A
5 111 yes A B

To understanding our results, we will need to join these tables. Notice that test_results and cup_contents each have a column entitled TestId that allows us to match up student answers with the actual contents of their cups. We can do that as follows:

join1 <- left_join(test_results, cup_contents, by="TestId")

Before continuing, take a look at join1. Left joins are only one of several types of joins. You can see the other options on the data wrangling cheat sheet or see an image that summarizes the different types here.

You may also want to join the survey results with your coffee tasting results. Notice that both the test_results and students tables have a column entitled StudentId. We can use that column to add the survey results to the join we just made as follows:

full_results <- left_join(join1, students, by="StudentId")

Analysis

First, let’s add a column named real.difference that keeps track of whether the two cups really contained different types of coffee.

full_results <- full_results %>% mutate(real.difference = Cup.A!=Cup.B)

Now, let’s group the data by whether students perceived a difference as well as whether there was truly a difference

full_results %>% group_by(difference, real.difference) %>% summarize(n=length(TestId))

1. What, if anything, do these results tell you about students’ abilities to indentify a difference between two types of coffee?

Next, let’s limit our data to tests where there really was a difference. Try using the code below.

guess_subset <- filter(full_results, difference=="yes" & real.difference==TRUE)
guess_subset %>% group_by(Cup.A, which_sa) %>% summarize(n=length(TestId))
guess_subset %>% group_by(Cup.A, preference) %>% summarize(n=length(TestId))

2. How often did students correctly identify Saint Ann’s coffee?

3. How often did students prefer Saint Ann’s coffee?

Finally, how did student preferences depend on which drink they thought was in each cup? We can do this analysis for tests that truly involved two different coffee or using all tests.

guess_subset %>% group_by(which_sa, preference) %>% summarize(n=length(TestId))
full_results %>% filter(difference=="yes") %>% group_by(which_sa, preference) %>% summarize(n=length(TestId))

4. How did student preferences depend on what they perceive to be in each cup?

5. What do the survey results tell us about Saint Ann’s students? How often do they drink coffee, what coffee do they drink and how do they rate their own ability to distinguish between coffee types?

6. Is there any relationship between how often students drink coffee or their self-rated abilty to distinguish between coffee types and their actual success in identifying Saint Ann’s coffee?