One thing I was wondering is that I always heard was that people who do physical exercise tend to be happy. The parameter of interest is the difference of means of time doing exercise for people who feel happy or not. The question I am trying to answer is there a difference between exercise minutes between islanders that feel happy or not happy? I think that there is a difference mostly that people who are happy have the greater mean of exercise minutes.
The observational units were the islanders from Blonduos. The categorical variable was collected by sending a questionnaire to the islanders asking if they were happy. The quantitative variable was measured in minutes by choosing the survey that ask how many minutes of moderate exercise did the islander did in one week. For getting data of minutes of exercise was difficult because when asked using the questionnaire the islanders answered time they spent doing exercise from one day, which was not I was looking for. I wasn’t able to find a question to collect data I wanted, that is why I used the preset questions from the survey to get minutes exercise in one week.
In carrying out a test of significance and a confidence interval about your population parameter(s), make sure you:
- Define the population(s) and parameter(s) (again) in words
The population is the islanders from Blonduos and the parameter is the difference of means of time doing exercise for people who feel happy or not.
- State the null and alternative hypotheses in symbols and in words
The null hypothesis is that there is no difference of exercise minutes between people who are happy or not, \(H_0: \mu_{happy} = \mu_{nothappy}\). The alternative hypothesis is that ther is a difference of exercise minutes between people who are happy or not, \(H_a: \mu_{happy} \neq \mu_{nothappy}\).
- State what a type I and a type II error would represent in this setting
A type I error would means that we have strong evidence to show there is a difference, but there is no difference making a false positive. A type II error would mean that we do not have evidence to show there is a difference, but in reality there is a difference making a false negative.
- Discuss/justify whether or not your measurements can reasonably be considered a representative sample from the population(s) of interest
My measurement can be reasonably be considered a representative sample from the population of interest. The reason is that I used a random number generator to make a randomized sample. Having a random sample makes the results unbiased and closer in representing the population.
- Use a theory-based approach and appropriate R code to
hdata <- read.csv("~/Documents/Math-247 Spring 2022/proyect2.csv")
head(hdata, n=2)
- Find an appropriate test statistic and comment on appropriate validity conditions
The test statistic is -0.4005 and this data meets the validity conditions because there are 30 observations for both groups.
stat(t.test(minutes ~ happy, data = hdata))
## t
## -0.4005423
- Find the p-value corresponding to your alternative hypothesis and provide a one-sentence interpretation of the p-value in context (use the definition of the p-value: i.e. probability of observing … assuming … is true)
The p-value of null distribution is 0.6901 which is the probability of observing the statistic assuming null is true.
pval(t.test(minutes ~ happy, data = hdata))
## p.value
## 0.6901957
- Indicate what statistical decision this p-value leads you to draw about the null hypothesis
Based on the p-value there is very weak evidence to show that there is a difference between minutes of exercise between people who are happy or not. We do not reject the null hypothesis.
- State your conclusion in the context of the problem
The data does not provide evidence that the mean number of exercise minutes that happy people have is different from the mean number of exercise minutes that not happy people have. The p-value is 0.6901 which means that we cannot reject the null hypothesis.
- Use R to find an appropriate confidence interval to describe the plausible values of your population parameter
confint(t.test(minutes ~ happy, data = hdata))
- Interpret the confidence interval in the context of the problem. Make sure to also comment on whether zero is included in the confidence interval. Compare your conclusion to the conclusion in 5d
We are 95% confident that the difference of means in exercise minutes of people who are happy or not is a range of (-35.2654, 75.8823). Zero is included in the confidence interval which means that zero is a plausible value which strong evidence to not reject the null hypothesis. This is the same conclusion as the conclusion of 5d.
From this study what we learn was that there is no difference between exercise minutes between islanders that feel happy or not happy. The reason was that from the t-statistic and p-value the data showed very weak evidence against the null hypothesis. I was expecting the data to show some evidence to reject the null hypothesis and positive, but based on the t-test and confidence interval there was no difference. I believe that it is reasonable to generalize the sample to the larger population because the sample was created by random sampling to make it unbiased and there were enough observations in each group to meet the validity conditions. What I would do differently is pick islander from different towns and write a different question so that the islanders could give data I wanted before like the time in hours or average hours of exercise per week. A similar question someone else can study is if doing exercise can help treat people with depression.