The population parameter I’m interested in is the difference in the average number of servings of vegetables eaten per day between vegetarian adults and non-vegetarian adults in the village of Arcadia. My research question is, do vegetarians eat more servings of vegetables per day than non-vegetarians? I will use the initial assumption that there is no difference, but I hypothesize that vegetarians do eat more servings of vegetables per day.
The observational units were 30 vegetarians and 147 non-vegetarians who were all adults living in the village of Arcadia (located on the island of Providence). I gathered the data on the categorical variable of whether or not the participant was vegetarian by asking a survey with the question, “Are you a vegetarian?” The quantitative variable that I measured for all participants in these two categories was the usual number of servings of vegetables eaten per day. I gathered the data for this variable with my second survey question, which was, “How many serves of vegetables do you usually eat each day?” To gather my sample, I used a random number generator to select a number between 1 and 1246, which is the number of households in Arcadia. If the number gave an empty household, I moved on to generate a new random number. Otherwise, I then counted the number of adults in the household. I used a random number generator again to generate a number between 1 and the number of adults in the household, and selected that adult as my participant (for example, if the generator produced the number 2, I selected the second adult listed in the household). If there was only one adult in the household, I selected them as my participant. If a selected participant declined to participate, I moved on to a new household (generating a new random number between 1 and 1246). Throughout the sampling process, there were several people who declined to participate. Moreover, towards the end of the sampling process there were several times when I randomly selected a participant who was already a participant in the study and had to move on to the next randomly chosen household. Since I had to continue the sampling process until I ensured that I had at least 30 observations in each group (vegetarians and non-vegetarians), the process took longer than expected given the much lower frequency of vegetarians.
- Define the population(s) and parameter(s) (again) in words.
The populations are vegetarian adults in Arcadia and non-vegetarian adults in Arcadia. The parameter is the difference in the number of servings of vegetables eaten per day between vegetarians and non-vegetarians.
- State the null and alternative hypotheses in symbols and in words.
The null hypothesis is that there is no difference in the average servings of vegetables eaten per day between vegetarians and non-vegetarians, while the alternative hypothesis is that vegetarians eat a higher average number of servings of vegetables per day than non-vegetarians. These hypotheses in symbol form appear as follows.
\(H_{0}: \mu_{non-vegetarians}-\mu_{vegetarians} = 0\) \(H_{A}: \mu_{non-vegetarians}-\mu_{vegetarians} < 0\)
- State what a type I and a type II error would represent in this setting.
In this context, a type I error would be finding that vegetarians do eat more servings of vegetables (rejecting the null hypothesis), when in reality there is no significant difference between vegetarians and non-vegetarians in terms of the number of servings of vegetables eaten per day. A type II error would be finding that there is no difference (failing to reject the null hypothesis), when in reality vegetarians do eat more vegetables than non-vegetarians.
- Discuss/justify whether or not your measurements can reasonably be considered a representative sample from the population(s) of interest.
My measurements can reasonably be considered a representative sample of the population of interest because I gathered a random sample of adults in Arcadia, asked them whether or not they were vegetarian, and ensured that I had at least 30 observations in each group (vegetarians and non-vegetarians).
- Use a theory-based approach and appropriate R code to…
- Find an appropriate test statistic and comment on appropriate validity conditions.
The test statistic I will use is the mean servings of vegetables eaten by non-vegetarians minus the mean servings of vegetables eaten by vegetarians. This value is -2.02, meaning vegetarians in this sample eat 2.02 more servings of vegetables per day than non-vegetarians. Our standardized statistic (t) is -6.95.
The validity conditions for using a theory-based approach to analyze the difference in two means require that both sample sizes are large, meaning there are at least 30 observations in each group, and that neither distribution is strongly skewed. Each group had at least 30 observations, since the sample included 30 vegetarians and 147 non-vegetarians. Based on the histograms below, we see that the distributions are somewhat skewed, but they are not too strongly skewed to proceed.
library(readr)
Vegetables <-
read_csv("~/Documents/math 247 2022/Vegetables - Sheet1.csv")
favstats(servings ~ `vegetarian?`, data = Vegetables)
diff(rev(mean(servings ~ `vegetarian?`, data = Vegetables)))
## non-vegetarian
## -2.021088
stat(t.test(servings ~ `vegetarian?`, data = Vegetables))
## t
## -6.952295
histogram(~servings | `vegetarian?`, data = Vegetables, width = 1, layout = c(1, 2))
- Find the p-value corresponding to your alternative hypothesis and provide a one-sentence interpretation of the p-value in context.
In conducting a one-sided t-test, the resulting p-value is approximately 0. This means that there is approximately a 0% chance of observing a difference in means as large or larger than the one we observed assuming the null hypothesis is true.
pval(t.test(servings ~ `vegetarian?`, data = Vegetables))/2
## p.value
## 1.474148e-08
- Indicate what statistical decision this p-value leads you to draw about the null hypothesis.
This p-value leads to the conclusion that we should reject the null hypothesis that there is no difference in the average servings of vegetables eaten per day between vegetarians and non-vegetarians.
- State your conclusion in the context of the problem.
Therefore, in the context of the problem, we have very strong evidence that vegetarians eat a higher average number of servings of vegetables per day than non-vegetarians.
- Use R to find an appropriate confidence interval to describe the plausible values of your population parameter.
Based on a 95% confidence interval, the plausible values for the difference in number of servings of vegetables between vegetarians and non-vegetarians are between -2.61 and -1.43.
confint(t.test(servings ~ `vegetarian?`, data = Vegetables))
- Interpret the confidence interval in the context of the problem. Make sure to also comment on whether zero is included in the confidence interval. Compare your conclusion to the conclusion in 5d.
In the context of the problem, this means that we are 95% confident that vegetarians eat between 1.43 and 2.61 more servings of vegetables per day than non-vegetarians. This interval does not contain zero, meaning it is not plausible that there is no difference in the average servings of vegetables eaten per day between vegetarians and non-vegetarians. In 5d, based on the p-value, we had very strong evidence that vegetarians eat a higher average number of servings of vegetables per day than non-vegetarians and rejected the null hypothesis that there is no difference. This conclusion corresponds with what we find from this confidence interval, since both measures suggest that there is a significant difference in means.
This study provides very strong evidence that vegetarian adults eat more servings of vegetables per day than non-vegetarian adults in the village of Arcadia. This evidence comes from a t-test that resulted in a one-sided p-value of approximately zero, which meant there is approximately a 0% chance of observing a difference in means as large or larger than the one we observed (which was -2.02) assuming that there is actually no significant difference. Additionally, a 95% confidence interval showed that the plausible values for the difference in mean servings of vegetables between vegetarians and non-vegetarians is between -2.61 and -1.43. This interval does not contain zero, once again showing that it is not plausible that there is no difference in means. The data did behave as I expected, since I hypothesized that vegetarians would eat more servings vegetables per day, on average, than non-vegetarians. Since this sample consisted of only adults in Arcadia, we must be cautious about generalizing our results to people outside of this village. However, since this was a random sample, it is reasonable to generalize these results to the population of adults in Arcadia. Additionally, it’s important to note that since this was an observational study and not an experiment, we can not make conclusions about cause and effect, only association. In the future, someone might choose to investigate this question in different villages or on different islands. One could also investigate how being vegetarian versus non-vegetarian is related to various measures of health.