Design the experiment

One of the things you have to do here is decide how you are going to collect the data. How many subjects are you going to have? What groups are there going to be what are you going to do differently for each group???these are called different treatments).

Load data

First, we print and load the data set.

preference<-read.csv("preference.csv")
preference
##    preference primed
## 1         1.8      0
## 2         0.1      0
## 3         4.0      0
## 4         2.1      0
## 5         2.4      0
## 6         3.4      0
## 7         1.7      0
## 8         2.2      0
## 9         1.9      0
## 10        1.9      0
## 11        0.1      0
## 12        3.3      0
## 13        2.1      0
## 14        2.0      0
## 15        1.4      0
## 16        1.6      0
## 17        2.3      0
## 18        1.8      0
## 19        3.2      0
## 20        0.8      0
## 21        1.7      1
## 22        1.7      1
## 23        4.2      1
## 24        3.0      1
## 25        2.9      1
## 26        3.0      1
## 27        4.0      1
## 28        4.1      1
## 29        2.9      1
## 30        2.9      1
## 31        1.2      1
## 32        4.0      1
## 33        3.0      1
## 34        3.9      1
## 35        3.1      1
## 36        2.5      1
## 37        3.2      1
## 38        4.1      1
## 39        3.9      1
## 40        1.1      1
## 41        1.9      1
## 42        3.1      1

Describe the data

There are 42 rows and 2 columns. The rows represent the subjects of the study(tested individuals) and the columns represent the variables. The first variable is “preference”, which represents the rating scale of preference of every individual and the scale ranging from 0(dislike very much) to 6(like very much). The second column “primed” is a binary categorical variable placing each subject into one of two different groups: “primed” and “unprimed”. Subjects in the “primed” group will have the products that are featured an animal on the label and rate the products. Subjects in the “unprimed” group will have the products that don’t have animals on the label and rate the products. There are 20 subjects in the “unprimed” group and 22 subjects in the “primed” group.

Identify the purpose

The porpose of this study is to assess if the primed subjects will think about the image earlier in an unrelated context, process visual information easier and might have different preferences from non-primed consumers and test if the animals on the label will attract the customers.

Visualize the data

library(ggplot2)
ggplot(data=preference,mapping=aes(x=as.factor(primed),y=preference))+geom_point()

Interpret the plot

As we can see from the graph, most of the rating scales that the “primed” give to the products that have the animals on the label are higher tha most of the rating scales of “unprimed” group.

Formulate the hypothesis

Null hypothesis: The population mean of the primed group is same as the population mean of the unprimed group. Usually this is stated as a difference in means(equall zero)

Each group is a sample from a larger population. Specifically the population of all fish who might conceivably take this test.

Identify the alternative hypothesis

Alternative hypothesis: There are two alternative hypotheses (1) the population mean of the primed group is larger than the population mean of the unprimed group((or the mean of primed minus the mean of unprimed greater than 0)) (2) the population mean of the unprimed group is smaller than the population mean of the primed group ((or the mean of primed minus the mean of unprimed smaller than 0))

Decide one sample or two sample

Two sample.

Decide on type of test

The choices here are t-test and proportion test. T-test for testing hypothesis about the population means of a quantitative variables. Proportion tests are for testing hypotheses about population proportions of categorical variable, so the correct choice is t-test.

Check assumptions of the test

For the t-test, the main assumption is that the data lie close enough to a Normal (bell shaped) distribution. How close does it have to be? It depends on the sample size, the greater the sample size the more robust the t-test is to non-Normality. Actually even for small sample sizes (10 or 11) it is fairly robust, so unless there is strong skewness or substantial outliers we will be OK.

The best way of judging this is to use t-test

ggplot(data=preference)+geom_qq(mapping=aes(sample=preference,color=as.factor(primed)))

If the data are Normal, they will lie on line. This graph shows the distributions sample of two groups are approximately normal, which means that we can do the t-test based on two groups of data.

Decide on a level of significance of the test

It is always safe bet to use the traditional level of significance 0.05.

Perform the test

t.test(formula=preference~primed,data=preference)
## 
##  Welch Two Sample t-test
## 
## data:  preference by primed
## t = -3.2072, df = 39.282, p-value = 0.002666
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.577912 -0.357543
## sample estimates:
## mean in group 0 mean in group 1 
##        2.005000        2.972727

Interpret the p-value

Since the p-value is less than the level of significance, we REJECT the null hypothesis that the means are equal.

Confidence interval

The confidence interval is the range of plausible values for the difference in means. Zero is not in this interval. Therefore 0 is not a plausible value for the difference in means, so it is not plausible that the means are the same.

Sample Estimates

We have concluded that the means are not equal, but we really want to know: is the mean of the preference of the primed group is more than the mean of the preference of the unprimed group? Knowing that the means are unequal we can answer this question by checking whether the sample estimate of “primed group” is more than that of “unprimed group” or not (see the difference between two sample estimates).

Conclusion

We have known from the graph that the sample mean of the preference of the primed group is more than the sample mean of the preference of the unprimed group, so it is plausible to conclude that the primed group that has thought about the animals pictures earlier will have higher rating scale of preference. Therefore, we can say that the products that have the animals on the label will be much more attractive to the customers than the products that don’t have.