Math 247 Final Project Report

Introduction

One thing I was wondering is that I always heard was that people who do physical exercise tend to be happy. The parameter of interest is the difference of means of time doing exercise for people who feel happy or not. The question I am trying to answer is there a difference between exercise minutes between islanders that feel happy or not happy? I think that there is a difference mostly that people who are happy have the greater mean of exercise minutes.

Data Collection Methods

The observational units were the islanders from Blonduos. The categorical variable was collected by sending a questionnaire to the islanders asking if they were happy. The quantitative variable was measured in minutes by choosing the survey that ask how many minutes of moderate exercise did the islander did in one week. For getting data of minutes of exercise was difficult because when asked using the questionnaire the islanders answered time they spent doing exercise from one day, which was not I was looking for. I wasn’t able to find a question to collect data I wanted, that is why I used the preset questions from the survey to get minutes exercise in one week.

Analysis of Results

In carrying out a test of significance and a confidence interval about your population parameter(s), make sure you:

Define the population(s) and parameter(s) (again) in words

The population is the islanders from Blonduos and the parameter is the difference of means of time doing exercise for people who feel happy or not.

State the null and alternative hypotheses in symbols and in words

The null hypothesis is that there is no difference of exercise minutes between people who are happy or not, \(H_0: \mu_{happy} = \mu_{nothappy}\). The alternative hypothesis is that ther is a difference of exercise minutes between people who are happy or not, \(H_a: \mu_{happy} \neq \mu_{nothappy}\).

State what a type I and a type II error would represent in this setting

A type I error would means that we have strong evidence to show there is a difference, but there is no difference making a false positive. A type II error would mean that we do not have evidence to show there is a difference, but in reality there is a difference making a false negative.

Discuss/justify whether or not your measurements can reasonably be considered a representative sample from the population(s) of interest

My measurement can be reasonably be considered a representative sample from the population of interest. The reason is that I used a random number generator to make a randomized sample. Having a random sample makes the results unbiased and closer in representing the population.

Use a theory-based approach and appropriate R code to

hdata <- read.csv("~/Documents/Math-247 Spring 2022/proyect2.csv")
head(hdata, n=2)

Find an appropriate test statistic and comment on appropriate validity conditions

The test statistic is -0.4005 and this data meets the validity conditions because there are 30 observations for both groups.

stat(t.test(minutes ~ happy, data = hdata))

##          t 
## -0.4005423

Find the p-value corresponding to your alternative hypothesis and provide a one-sentence interpretation of the p-value in context (use the definition of the p-value: i.e. probability of observing … assuming … is true)

The p-value of null distribution is 0.6901 which is the probability of observing the statistic assuming null is true.

pval(t.test(minutes ~ happy, data = hdata))

##   p.value 
## 0.6901957

Indicate what statistical decision this p-value leads you to draw about the null hypothesis

Based on the p-value there is very weak evidence to show that there is a difference between minutes of exercise between people who are happy or not. We do not reject the null hypothesis.

State your conclusion in the context of the problem

The data does not provide evidence that the mean number of exercise minutes that happy people have is different from the mean number of exercise minutes that not happy people have. The p-value is 0.6901 which means that we cannot reject the null hypothesis.

Use R to find an appropriate confidence interval to describe the plausible values of your population parameter

confint(t.test(minutes ~ happy, data = hdata))

Interpret the confidence interval in the context of the problem. Make sure to also comment on whether zero is included in the confidence interval. Compare your conclusion to the conclusion in 5d

We are 95% confident that the difference of means in exercise minutes of people who are happy or not is a range of (-35.2654, 75.8823). Zero is included in the confidence interval which means that zero is a plausible value which strong evidence to not reject the null hypothesis. This is the same conclusion as the conclusion of 5d.

Conclusion

From this study what we learn was that there is no difference between exercise minutes between islanders that feel happy or not happy. The reason was that from the t-statistic and p-value the data showed very weak evidence against the null hypothesis. I was expecting the data to show some evidence to reject the null hypothesis and positive, but based on the t-test and confidence interval there was no difference. I believe that it is reasonable to generalize the sample to the larger population because the sample was created by random sampling to make it unbiased and there were enough observations in each group to meet the validity conditions. What I would do differently is pick islander from different towns and write a different question so that the islanders could give data I wanted before like the time in hours or average hours of exercise per week. A similar question someone else can study is if doing exercise can help treat people with depression.

Math 247 Final Project Report

Steve Lapa

Introduction

Data Collection Methods

Analysis of Results

Conclusion