library(ggplot2)
preference <- read.csv("preference.csv")
preference
## preference primed
## 1 1.8 0
## 2 0.1 0
## 3 4.0 0
## 4 2.1 0
## 5 2.4 0
## 6 3.4 0
## 7 1.7 0
## 8 2.2 0
## 9 1.9 0
## 10 1.9 0
## 11 0.1 0
## 12 3.3 0
## 13 2.1 0
## 14 2.0 0
## 15 1.4 0
## 16 1.6 0
## 17 2.3 0
## 18 1.8 0
## 19 3.2 0
## 20 0.8 0
## 21 1.7 1
## 22 1.7 1
## 23 4.2 1
## 24 3.0 1
## 25 2.9 1
## 26 3.0 1
## 27 4.0 1
## 28 4.1 1
## 29 2.9 1
## 30 2.9 1
## 31 1.2 1
## 32 4.0 1
## 33 3.0 1
## 34 3.9 1
## 35 3.1 1
## 36 2.5 1
## 37 3.2 1
## 38 4.1 1
## 39 3.9 1
## 40 1.1 1
## 41 1.9 1
## 42 3.1 1
There are two columns and 42 rows for this data set. The columns represent the two groups, which are preference and primed, while the rows represent the subject’s feelings towards the pet on the label.
The purpose of this study is to determine if the animal on the label of the bottle makes a person more likely to purchase that drink, as well as to determine if people being primed toward a certain image affects their choices through the labels as well.
Our data for this plot is categorical.
library(ggplot2)
ggplot(data=preference, mapping=aes(x=as.factor (primed), y=preference)) + geom_point()
There are higher points of preference for those that have been primed versus those who haven’t. There’s a high concentration for the primed group as well, while the data for the un-primed group has more spread. The un-primed group is concentrated around 1.5-2.5, while the primed group is around 3-5. ## STEP SEVEN: FORMULATE THE NULL HYPOTHESIS The null hypothesis is that the means are the same so that it does not matter if the person has been primed or not.
The mean of the primed population will be larger than that of the un-primed population.
For this data, we will be using a t-test due tot eh data of the means being a quantitative variable.
Two sample - the primed population and the un-primed population.
For the t-test, the main assumption is that the data lie close enough to a Normal (bell shaped) distribution. How close does it have to be? It depends on the sample size, the greater the sample size the more robust the t-test is to non-Normality. Actually even for small sample sizes (10 or 11) it is fairly robust, so unless there is strong skewness or substantial outliers we will be OK.
The best way of judging this is with a qq-plot.
ggplot(data=preference) + geom_qq(mapping=aes(sample=preference, color=as.factor(primed)))
The normal level of significance is 0.05.
t.test(formula=preference~primed, data=preference)
##
## Welch Two Sample t-test
##
## data: preference by primed
## t = -3.2072, df = 39.282, p-value = 0.002666
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.577912 -0.357543
## sample estimates:
## mean in group 0 mean in group 1
## 2.005000 2.972727
Since the p-value (.002666) is less than the level of significance (.05), we reject the null hypothesis that the means are equal.
The confidence interval is between -1.577912 and -.357543, which is a 95% confidence interval. Since 0 is not included in the interval, the means cannot be the same. The confidence interval is the range of plausible values for the difference in means. Zero is not in this interval. Therefore 0 is not a plausible value for the difference in means, so it is not plausible that the means are the same. The result of STEP 15 is consistent with the result of STEP 14.
We have concluded that the means are not equal, but we really want to know: is it better to be primed or un-primed? Knowing that the means are unequal we can answer this question by looking the sample estimates primed subjects had a higher preference than un-primed subjects.
We have concluded through evidence that primed subjects have a higher preference to the products with an animal on them than the un-primed subjects. Businesses could use this to determine that recognizable labels will result in consumers being more likely to buy their product.