Coffee Analysis

Just like in our other projects, you’ll want to start by using .libPaths() to point R to the folder where all of the packages are installed and then load the relevant packages. You should also upload coffeedata.RData (if you create a new project for this analysis, you’ll want to upload it into your project folder) and then load the data.

.libPaths("/home/rstudioshared")
library(dplyr)
library(ggplot2)
library(tidyr)
library(knitr)

load("coffeedata.RData")

Loading “coffeedata.RData” adds three data frames to your enivorment: cup_contents, students and test_results. These tables correspond to the data sheets that you completed while performing the experiment. The first five rows of each table are shown below. You may want to take a closer look using the View() function.

cup_contents
TestId	Cup.A	Cup.B
1	SA	Starbucks
2	Starbucks	Starbucks
3	SA	SA
4	Starbucks	SA
5	Starbucks	SA

students
StudentId	Grade	coffee_per_week	coffee_ability	drink_stanns_coffee
1	9	NA
2	9	NA
3	9	NA
4	9	NA
5	9	2	strong	never

test_results
TestId	StudentId	difference	preference	which_sa
1	208	yes	B	A
2	221	no
3	71	yes	A	A
4	180	yes	B	A
5	111	yes	A	B

To understanding our results, we will need to join these tables. Notice that test_results and cup_contents each have a column entitled TestId that allows us to match up student answers with the actual contents of their cups. We can do that as follows:

join1 <- left_join(test_results, cup_contents, by="TestId")

Before continuing, take a look at join1. Left joins are only one of several types of joins. You can see the other options on the data wrangling cheat sheet or see an image that summarizes the different types here.

You may also want to join the survey results with your coffee tasting results. Notice that both the test_results and students tables have a column entitled StudentId. We can use that column to add the survey results to the join we just made as follows:

full_results <- left_join(join1, students, by="StudentId")

Analysis

First, let’s add a column named real.difference that keeps track of whether the two cups really contained different types of coffee.

full_results <- full_results %>% mutate(real.difference = Cup.A!=Cup.B)

Now, let’s group the data by whether students perceived a difference as well as whether there was truly a difference

full_results %>% group_by(difference, real.difference) %>% summarize(n=length(TestId))

1. What, if anything, do these results tell you about students’ abilities to indentify a difference between two types of coffee?

Next, let’s limit our data to tests where there really was a difference. Try using the code below.

guess_subset <- filter(full_results, difference=="yes" & real.difference==TRUE)
guess_subset %>% group_by(Cup.A, which_sa) %>% summarize(n=length(TestId))
guess_subset %>% group_by(Cup.A, preference) %>% summarize(n=length(TestId))

2. How often did students correctly identify Saint Ann’s coffee?

3. How often did students prefer Saint Ann’s coffee?

Finally, how did student preferences depend on which drink they thought was in each cup? We can do this analysis for tests that truly involved two different coffee or using all tests.

guess_subset %>% group_by(which_sa, preference) %>% summarize(n=length(TestId))
full_results %>% filter(difference=="yes") %>% group_by(which_sa, preference) %>% summarize(n=length(TestId))

4. How did student preferences depend on what they perceive to be in each cup?

5. What do the survey results tell us about Saint Ann’s students? How often do they drink coffee, what coffee do they drink and how do they rate their own ability to distinguish between coffee types?

6. Is there any relationship between how often students drink coffee or their self-rated abilty to distinguish between coffee types and their actual success in identifying Saint Ann’s coffee?

Coffee Analysis

Probability and Statistics

November 23, 2015