The data for this project was given to us. In the following steps, the data will be describe and the hypothese will be formed. Graphs will support evidence of p-values, while providing evidence as to whether to reject or accept the null hypothesis.
preference<- read.csv("preference.csv")
preference
## preference primed
## 1 1.8 0
## 2 0.1 0
## 3 4.0 0
## 4 2.1 0
## 5 2.4 0
## 6 3.4 0
## 7 1.7 0
## 8 2.2 0
## 9 1.9 0
## 10 1.9 0
## 11 0.1 0
## 12 3.3 0
## 13 2.1 0
## 14 2.0 0
## 15 1.4 0
## 16 1.6 0
## 17 2.3 0
## 18 1.8 0
## 19 3.2 0
## 20 0.8 0
## 21 1.7 1
## 22 1.7 1
## 23 4.2 1
## 24 3.0 1
## 25 2.9 1
## 26 3.0 1
## 27 4.0 1
## 28 4.1 1
## 29 2.9 1
## 30 2.9 1
## 31 1.2 1
## 32 4.0 1
## 33 3.0 1
## 34 3.9 1
## 35 3.1 1
## 36 2.5 1
## 37 3.2 1
## 38 4.1 1
## 39 3.9 1
## 40 1.1 1
## 41 1.9 1
## 42 3.1 1
This is a data set with 42 rows The rows are representative of the subjects (individual people) of this study. Subjects included consumers, while the variables described the type of consumers. Researchers assigned participants were asked to rate their attitudes on a scale ranging from 0(dislike) to 1(like).
The purpose of the study is to determine whether participants in the study are more likely to choose a product primed or non-primed before being exposed to the product. There are 2 columns in this data set Preference and Primed. The Preference column describes people who did not encounter any element of the product before hand. The Primed column describes participants who had be exposed to the product before the study.
Here is a boxplot that desiplays the data according to the Preference and Primed columns. In the code chunk, I had to add “x=as.factor(primed)” because the data describes categorical variables. Both the Preference and Primed groups are separate entities, so each should have a separate boxplot.
Load graphics library, then plot:
library(ggplot2)
ggplot(data=preference, mapping=aes(x=as.factor(primed),y=preference)) + geom_boxplot()
The boxblot graph provides convincing evidence that people are more likely to choose the product when they are primed (the plot on x= “1”). The mean appears to be significantly greater than Preference and outliers are not very present above Q3, and therefore; the plot is skewed to the left.
Null Hypothesis: The means of both Preference and Primed are equal.
Alternative Hypothesis: The mean of the Primed population is greater than the Preference population.
A t-test will be used in order to test the hypotheses about the population means of the catergorical variables. ## STEP 10: Choose one sample or two.
Two sample becase there is a separate sample for both Preference and Primed.
The best way of judging this is with a qq-plot.
library(ggplot2)
ggplot() + geom_qq(mapping=aes(sample=preference$preference, color=as.factor(preference$primed)))
The Graph describes a fairly normal distribution. There seems to be a positive correlation as well for both Preference and Primed separately.
The level of signifcance will be 0.05.
t.test(preference$preference, preference$primed)
##
## Welch Two Sample t-test
##
## data: preference$preference and preference$primed
## t = 10.805, df = 58.14, p-value = 1.58e-15
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.619797 2.356393
## sample estimates:
## mean of x mean of y
## 2.5119048 0.5238095
The p-value is less than the level of significance, therefore the null hypothesis is REJECTED. The means of Preference and Primed are not equal, in other words, the difference of both means is not zero. The alternative hypothesis is accepted.
The confidence interval describes the range of values that both means could take. Zero is not included in this interval. In that case, zero can not be the difference in means, which also reaffirms the alternative hypothesis. The interval is [1.619797, 2.356393]. This step relates to the previous step.
There is evidence to prove that the means are not equal in the entire population.
There is evidence that Primed consumers are more likely to buy the product than un-primed people. When looking at the summary statistics, the evidence is supported. Specifically, there are less outliers in the Primed sample than the Preference, as seen in the 1st quartile.
summary(preference)
## preference primed
## Min. :0.100 Min. :0.0000
## 1st Qu.:1.800 1st Qu.:0.0000
## Median :2.450 Median :1.0000
## Mean :2.512 Mean :0.5238
## 3rd Qu.:3.200 3rd Qu.:1.0000
## Max. :4.200 Max. :1.0000