We are not collecting data becayse we were given a set of data. Becuase of this, this step is minimal. We only need to determine how we will analyze this data. What tests we will use and what hypotheses we will test? We will address these questions in later steps.
preference <- read.csv("preference.csv")
preference
## preference primed
## 1 1.8 0
## 2 0.1 0
## 3 4.0 0
## 4 2.1 0
## 5 2.4 0
## 6 3.4 0
## 7 1.7 0
## 8 2.2 0
## 9 1.9 0
## 10 1.9 0
## 11 0.1 0
## 12 3.3 0
## 13 2.1 0
## 14 2.0 0
## 15 1.4 0
## 16 1.6 0
## 17 2.3 0
## 18 1.8 0
## 19 3.2 0
## 20 0.8 0
## 21 1.7 1
## 22 1.7 1
## 23 4.2 1
## 24 3.0 1
## 25 2.9 1
## 26 3.0 1
## 27 4.0 1
## 28 4.1 1
## 29 2.9 1
## 30 2.9 1
## 31 1.2 1
## 32 4.0 1
## 33 3.0 1
## 34 3.9 1
## 35 3.1 1
## 36 2.5 1
## 37 3.2 1
## 38 4.1 1
## 39 3.9 1
## 40 1.1 1
## 41 1.9 1
## 42 3.1 1
There are 42 rows and 2 columns. The rows represent subjects surveyed and the first column indicatesthe preference of each subject regarding the logo and the second column represents whether the subjects have been primed about hte logo, 1, or not primed, 0.
The purpose of the study is to determine whether being primed before viewing a company’s logo has an impact on a consumer’s opinion of the logo.
library(ggplot2)
ggplot(data=preference, mapping=aes(x=as.factor(primed), y=preference )) + geom_point()
The plot suggests that the primed group has a greater mean preference
Null hypothesis: the mean of the primed group is the same as that of the preference group.
Alternative hypothesis: the mean of the preference group is not the same as that of the primed group.
A t-test is the best choice because it tests the hypotheses about population means of a quantitative variable. A proportion test is not recommended because it is best for testing hypotheses about population proportions of cateogrical variables. This is something we are not interested in for this particular project.
This is a two sample test because we have two independent groups - primed and preference.
We will use a qq plot to check the test’s assumptions.
ggplot(data=preference) + geom_qq(mapping=aes(sample=preference, color=as.factor(primed)))
the Level of Signficance will be 0.05.
t.test(formula=preference~as.factor(primed), data=preference)
##
## Welch Two Sample t-test
##
## data: preference by as.factor(primed)
## t = -3.2072, df = 39.282, p-value = 0.002666
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.577912 -0.357543
## sample estimates:
## mean in group 0 mean in group 1
## 2.005000 2.972727
Since the p-value is less than the level of significance, we reject the null hypothesis that the means are equal.
The confidence interval is the range of approved values for the average difference. Zero is not included in this interval. Therefore, 0 is not a plausible value for the difference in means. It is not plausible that the means are the same.
We have concluded that the means are not equal. The mean is higher in the primed group for their preferences.
Priming did help people improve their preference of product logos.