Since we were already given the data, we do not necessarily need to formulate a hypothesis until after we see the data. This is not a common practice in statistics, but since we were given the data beforehand, we will wait to see the data and skip the designing process.
preference <- read.csv("preference.csv")
preference
## preference primed
## 1 1.8 0
## 2 0.1 0
## 3 4.0 0
## 4 2.1 0
## 5 2.4 0
## 6 3.4 0
## 7 1.7 0
## 8 2.2 0
## 9 1.9 0
## 10 1.9 0
## 11 0.1 0
## 12 3.3 0
## 13 2.1 0
## 14 2.0 0
## 15 1.4 0
## 16 1.6 0
## 17 2.3 0
## 18 1.8 0
## 19 3.2 0
## 20 0.8 0
## 21 1.7 1
## 22 1.7 1
## 23 4.2 1
## 24 3.0 1
## 25 2.9 1
## 26 3.0 1
## 27 4.0 1
## 28 4.1 1
## 29 2.9 1
## 30 2.9 1
## 31 1.2 1
## 32 4.0 1
## 33 3.0 1
## 34 3.9 1
## 35 3.1 1
## 36 2.5 1
## 37 3.2 1
## 38 4.1 1
## 39 3.9 1
## 40 1.1 1
## 41 1.9 1
## 42 3.1 1
We have two columns, preference and primed, and 34 rows of data. Preference varies from 0 to 6 based on how likely the subject is to buy the product. In the primed column, 0 stands for unprimed, which means the subject has no pre conceived thoughts or notions about the animal on the product. 1 stands for primed, which means that the subject has thought about the animal before, and therefore has preconceived thoughts on it that may influence their decision on whether or not to buy the product.
The purpose of the study is to evaluate if there is a difference between a person’s favorability of a product with an animal on it based on the fact if they are primed or unprimed. Is there a potential correlation between being prime or unprimed and favoring the product with the animal on it?
It is always a good idea to visualize your data, as soon as you load it. Technically this should be done after you design the study, but as mentioned above, we are going to switch the order around. Load graphics library, then plot:
library(ggplot2)
ggplot(data=preference, mapping=aes(x=as.factor (primed), y=preference)) + geom_point()
There are higher points of preference for those that have been primed (1). It is left skewed. For those that have not been primed (0) there is not a strong concentration of higher preferences, but instead of lower preferences, particularly around 1.5 to 2.5. It is more right skewed.
The mean of the unprimed preferences would equal the mean of the primed preferences. It does not matter if you have been primed or unprimed, the difference of the means would be 0.
The mean of the unprimed preferences is less than the mean of the primed preferences. Mean of unprimed preferences < mean of primed preferences.
T-test, because we are testing hypotheses about population means of a quantitative variable.
The correct choice is two sample—we have a sample for primed and a sample for unprimed.
For the t-test, the main assumption is that the data lie close enough to a Normal (bell shaped) distribution. How close does it have to be? It depends on the sample size, the greater the sample size the more robust the t-test is to non-Normality. Actually even for small sample sizes (10 or 11) it is fairly robust, so unless there is strong skewness or substantial outliers we will be OK.
The best way of judging this is with a qq-plot.
gg <- ggplot(data=preference)
gg + geom_qq(mapping=aes(sample=preference, color=as.factor(primed)))
We deem this normal enough for the test.
It is always a safe bet to use the traditional level of significance 0.05.
t.test(formula=preference~primed, data=preference)
##
## Welch Two Sample t-test
##
## data: preference by primed
## t = -3.2072, df = 39.282, p-value = 0.002666
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.577912 -0.357543
## sample estimates:
## mean in group 0 mean in group 1
## 2.005000 2.972727
Our p-value is 0.002666, which is less that 0.05, our level of significance. Since the p-value is less than the level of significance, we reject the null hypothesis that the means of the preferences of primed subjects are the same as the means of the preferences of unprimed subjects. The differences do not equal 0.
We have a 95% confidence interval of (-1.577192, -0.357543)
The confidence interval is the range of plausible values for the difference in means. Zero is not in this interval. Therefore 0 is not a plausible value for the difference in means, so it is not plausible that the means are the same. The result of STEP 15 is consistent with the result of STEP 14.
We have concluded that the means are not equal, but we really want to know: are the preference results higher when the subjects are primed or unprimed? Knowing that the means are unequal we can answer this question by looking the sample estimates primed subjects had higher/more favorable preference ratings than unprimed subjects.
We have evidence that primed subjects have a more favorable/higher preference of items that contain an animal on them than unprimed subjects. This reveals that businesses might want to use more widely recognizeable animals on their products that most consumers are already primed to. This increases their preference of the product, and therefore will be more likely to purchase it.