Preference

STEP 1: Design the experiment

The data for this project was given to us. In the following steps, the data will be describe and the hypothese will be formed. Graphs will support evidence of p-values, while providing evidence as to whether to reject or accept the null hypothesis.

STEP 2: Collect (or load) data

preference<- read.csv("preference.csv")
preference

##    preference primed
## 1         1.8      0
## 2         0.1      0
## 3         4.0      0
## 4         2.1      0
## 5         2.4      0
## 6         3.4      0
## 7         1.7      0
## 8         2.2      0
## 9         1.9      0
## 10        1.9      0
## 11        0.1      0
## 12        3.3      0
## 13        2.1      0
## 14        2.0      0
## 15        1.4      0
## 16        1.6      0
## 17        2.3      0
## 18        1.8      0
## 19        3.2      0
## 20        0.8      0
## 21        1.7      1
## 22        1.7      1
## 23        4.2      1
## 24        3.0      1
## 25        2.9      1
## 26        3.0      1
## 27        4.0      1
## 28        4.1      1
## 29        2.9      1
## 30        2.9      1
## 31        1.2      1
## 32        4.0      1
## 33        3.0      1
## 34        3.9      1
## 35        3.1      1
## 36        2.5      1
## 37        3.2      1
## 38        4.1      1
## 39        3.9      1
## 40        1.1      1
## 41        1.9      1
## 42        3.1      1

STEP 3: Describe data

This is a data set with 42 rows The rows are representative of the subjects (individual people) of this study. Subjects included consumers, while the variables described the type of consumers. Researchers assigned participants were asked to rate their attitudes on a scale ranging from 0(dislike) to 1(like).

STEP 4: Identify the purpose of the study

The purpose of the study is to determine whether participants in the study are more likely to choose a product primed or non-primed before being exposed to the product. There are 2 columns in this data set Preference and Primed. The Preference column describes people who did not encounter any element of the product before hand. The Primed column describes participants who had be exposed to the product before the study.

STEP 5: Visualize data

Here is a boxplot that desiplays the data according to the Preference and Primed columns. In the code chunk, I had to add “x=as.factor(primed)” because the data describes categorical variables. Both the Preference and Primed groups are separate entities, so each should have a separate boxplot.

Load graphics library, then plot:

library(ggplot2)
ggplot(data=preference, mapping=aes(x=as.factor(primed),y=preference)) + geom_boxplot()

STEP 6: Interpret the plot

The boxblot graph provides convincing evidence that people are more likely to choose the product when they are primed (the plot on x= “1”). The mean appears to be significantly greater than Preference and outliers are not very present above Q3, and therefore; the plot is skewed to the left.

STEP 7: Formulate the null hypothesis.

Null Hypothesis: The means of both Preference and Primed are equal.

STEP 8: Identify the alternative hypothesis.

Alternative Hypothesis: The mean of the Primed population is greater than the Preference population.

STEP 9: Decide on type of test.

A t-test will be used in order to test the hypotheses about the population means of the catergorical variables. ## STEP 10: Choose one sample or two.

Two sample becase there is a separate sample for both Preference and Primed.

STEP 11: Check assumptions of the test

The best way of judging this is with a qq-plot.

library(ggplot2)
ggplot() + geom_qq(mapping=aes(sample=preference$preference, color=as.factor(preference$primed)))

The Graph describes a fairly normal distribution. There seems to be a positive correlation as well for both Preference and Primed separately.

STEP 12: Decide on a level of significance of the test

The level of signifcance will be 0.05.

STEP 13: Perform the test

t.test(preference$preference, preference$primed)

## 
##  Welch Two Sample t-test
## 
## data:  preference$preference and preference$primed
## t = 10.805, df = 58.14, p-value = 1.58e-15
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.619797 2.356393
## sample estimates:
## mean of x mean of y 
## 2.5119048 0.5238095

STEP 14: Interpret the p-value

The p-value is less than the level of significance, therefore the null hypothesis is REJECTED. The means of Preference and Primed are not equal, in other words, the difference of both means is not zero. The alternative hypothesis is accepted.

STEP 15: Interpret the confidence interval

The confidence interval describes the range of values that both means could take. Zero is not included in this interval. In that case, zero can not be the difference in means, which also reaffirms the alternative hypothesis. The interval is [1.619797, 2.356393]. This step relates to the previous step.

STEP 16: Interpret the sample estimates

There is evidence to prove that the means are not equal in the entire population.

STEP 17: State your conclusion

There is evidence that Primed consumers are more likely to buy the product than un-primed people. When looking at the summary statistics, the evidence is supported. Specifically, there are less outliers in the Primed sample than the Preference, as seen in the 1st quartile.

summary(preference)

##    preference        primed      
##  Min.   :0.100   Min.   :0.0000  
##  1st Qu.:1.800   1st Qu.:0.0000  
##  Median :2.450   Median :1.0000  
##  Mean   :2.512   Mean   :0.5238  
##  3rd Qu.:3.200   3rd Qu.:1.0000  
##  Max.   :4.200   Max.   :1.0000