Preferences

STEP 1: Design an Experiment

Since we were already given the data, we do not necessarily need to formulate a hypothesis until after we see the data. This is not a common practice in statistics, but since we were given the data beforehand, we will wait to see the data and skip the designing process.

STEP 2: Collect or Load Data

preference <- read.csv("preference.csv")
preference

##    preference primed
## 1         1.8      0
## 2         0.1      0
## 3         4.0      0
## 4         2.1      0
## 5         2.4      0
## 6         3.4      0
## 7         1.7      0
## 8         2.2      0
## 9         1.9      0
## 10        1.9      0
## 11        0.1      0
## 12        3.3      0
## 13        2.1      0
## 14        2.0      0
## 15        1.4      0
## 16        1.6      0
## 17        2.3      0
## 18        1.8      0
## 19        3.2      0
## 20        0.8      0
## 21        1.7      1
## 22        1.7      1
## 23        4.2      1
## 24        3.0      1
## 25        2.9      1
## 26        3.0      1
## 27        4.0      1
## 28        4.1      1
## 29        2.9      1
## 30        2.9      1
## 31        1.2      1
## 32        4.0      1
## 33        3.0      1
## 34        3.9      1
## 35        3.1      1
## 36        2.5      1
## 37        3.2      1
## 38        4.1      1
## 39        3.9      1
## 40        1.1      1
## 41        1.9      1
## 42        3.1      1

STEP 3 Describe the Data

We have two columns, preference and primed, and 34 rows of data. Preference varies from 0 to 6 based on how likely the subject is to buy the product. In the primed column, 0 stands for unprimed, which means the subject has no pre conceived thoughts or notions about the animal on the product. 1 stands for primed, which means that the subject has thought about the animal before, and therefore has preconceived thoughts on it that may influence their decision on whether or not to buy the product.

STEP 4: Identify the purpose of the study

The purpose of the study is to evaluate if there is a difference between a person’s favorability of a product with an animal on it based on the fact if they are primed or unprimed. Is there a potential correlation between being prime or unprimed and favoring the product with the animal on it?

STEP 5: Visualize data

It is always a good idea to visualize your data, as soon as you load it. Technically this should be done after you design the study, but as mentioned above, we are going to switch the order around. Load graphics library, then plot:

library(ggplot2)
ggplot(data=preference, mapping=aes(x=as.factor (primed), y=preference)) + geom_point()

STEP 6: Interpret the plot

There are higher points of preference for those that have been primed (1). It is left skewed. For those that have not been primed (0) there is not a strong concentration of higher preferences, but instead of lower preferences, particularly around 1.5 to 2.5. It is more right skewed.

STEP 7: Formulate the null hypothesis.

The mean of the unprimed preferences would equal the mean of the primed preferences. It does not matter if you have been primed or unprimed, the difference of the means would be 0.

Step 8: Formulate the alternative hypothesis.

The mean of the unprimed preferences is less than the mean of the primed preferences. Mean of unprimed preferences < mean of primed preferences.

STEP 9: Decide on type of test.

T-test, because we are testing hypotheses about population means of a quantitative variable.

STEP 10: Choose one sample or two.

The correct choice is two sample—we have a sample for primed and a sample for unprimed.

STEP 11: Check assumptions of the test

For the t-test, the main assumption is that the data lie close enough to a Normal (bell shaped) distribution. How close does it have to be? It depends on the sample size, the greater the sample size the more robust the t-test is to non-Normality. Actually even for small sample sizes (10 or 11) it is fairly robust, so unless there is strong skewness or substantial outliers we will be OK.

The best way of judging this is with a qq-plot.

gg <- ggplot(data=preference)
gg + geom_qq(mapping=aes(sample=preference, color=as.factor(primed)))

We deem this normal enough for the test.

STEP 12: Decide on a level of significance of the test

It is always a safe bet to use the traditional level of significance 0.05.

STEP 13: Perform the test

t.test(formula=preference~primed, data=preference)

## 
##  Welch Two Sample t-test
## 
## data:  preference by primed
## t = -3.2072, df = 39.282, p-value = 0.002666
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.577912 -0.357543
## sample estimates:
## mean in group 0 mean in group 1 
##        2.005000        2.972727

STEP 14: Interpret the p-value

Our p-value is 0.002666, which is less that 0.05, our level of significance. Since the p-value is less than the level of significance, we reject the null hypothesis that the means of the preferences of primed subjects are the same as the means of the preferences of unprimed subjects. The differences do not equal 0.

STEP 15: Interpret the confidence interval

We have a 95% confidence interval of (-1.577192, -0.357543)

The confidence interval is the range of plausible values for the difference in means. Zero is not in this interval. Therefore 0 is not a plausible value for the difference in means, so it is not plausible that the means are the same. The result of STEP 15 is consistent with the result of STEP 14.

STEP 16: Interpret the sample estimates

We have concluded that the means are not equal, but we really want to know: are the preference results higher when the subjects are primed or unprimed? Knowing that the means are unequal we can answer this question by looking the sample estimates primed subjects had higher/more favorable preference ratings than unprimed subjects.

STEP 17: State your conclusion

We have evidence that primed subjects have a more favorable/higher preference of items that contain an animal on them than unprimed subjects. This reveals that businesses might want to use more widely recognizeable animals on their products that most consumers are already primed to. This increases their preference of the product, and therefore will be more likely to purchase it.