Introduction

The goal of this statistical report is to provide an answer to the following research questions:

  1. Is there a linear relationship between an individual’s height and weight?
    • In order to test this we will test the linear regression of the two variables through a t-test for beta (the slope); with the null hypothesis (\(H_0\)) being \(\beta=0\) and the alternative hypothesis (\(H_1\)) being \(\beta≠0\)
  2. Is the mean height of males (m) and females (f) the same?
    • A two sample t-test is appropriate to use in the scenario to determine whether or not males and females do in fact have the same mean height. The null hypothesis (\(H_0\)) will be \(\mu_{m}-\mu_{f}=0\) and the alternative hypothesis (\(H_1\)) will be \(\mu_{m}-\mu_{f}≠0\)
  3. Is there any association between gender and the amount of physical activity (PA) ?
    • Undertaking a goodness of fit test will show whether or not there is an association between gender and the amount of physical activity. The test should have \(H_0\): \(PA_{none,males}=PA_{none,females}, PA_{moderate,males}=PA_{moderate,females}, PA_{intense,males}=PA_{intense,females}\) against \(H_1\)= not \(H_0\).

Data

The data for this report is provided through the A4.csv file saved in the folder labelled Jeremy_Jenkins_46346570, which has recorded the gender, height, weight and level of physical activity undertaken by 1000 different people.

Methods

  1. As mentioned in the introduction, to answer the first research question a linear regression model will be used. In this we will set height as the independent variable (x-axis) and weight as the dependent variable (y-axis).
    • From this, a t-test can be conducted on beta (the slope) to determine whether there is a strong enough linear relationship between these two factors for the given data. We will conduct this test at the 5% significance level meaning we will reject it if we receive a p-value<0.05.
    • To conduct this test we must also make two key assumptions:
      1. that the linear regression model is an appropriate model to use when comparing weight and heights of individuals.
      2. and that male and female heights have equal variances.
  2. To answer the second question a two sample t-test will be used, with the null hypothesis being \(\mu_{m}=\mu_{f}\). This test will also be undertaken at the 5% significance level.
    • In order to partake in this test, we must assume that both females and males heights have the same variance as each other (equal variance assumption) and that both the height of males and females is approximately a normal distribution. By doing so this allows for us to assume that the two sets of data are independent of each other.
  3. The final research question can be answered through the completion of a goodness of fit test/chi-squared test. For this test our null hypothesis states that the frequency of females completing no physical activity, moderate physical activity and intense physical activity should be the same as that of all males. We then reject this null hypothesis if even one of these assumptions is false. This will be undertaken at a 5% significance level meaning we will reject \(H_0\) if a p-value less than 0.05 is acquired.
    • For this statistical test we must make one key assumption, which is that the expected frequencys (\(E_i=np\)) are all greater than 5.

Results

Q1.

##    p_value t_value  df  beta
## 1 4.48e-72    19.5 998 0.873
##                   2.5 %      97.5 %
## (Intercept) -97.4121009 -67.0288498
## height        0.7855245   0.9611988

The above table higlights the key outcomes of the statistical test undertaken, showing the p-value, t-value, deegree of freedom, alpha and beta estimate’s and 95% confidence intervals. The graph below has then plotted all 1000 observations, along with the line of best fit to highlight the linear relationship between weight and height which has the beta estimate as the slope, however due to scaling it does not have the alpha estimate as the intercept.

Q2.

Before conducting the test we must check that our assumptions are true. Below are two QQ plot’s for the heights of male’s and female’s to test the normality of the data.

Below is the ratio of the two variances. If it is <2 then it is appropriate to use the equal variance assumption.

## [1] 1.047626

Now that we know both assumptions are satisfied, we can run the t test.

## 
##  Two Sample t-test
## 
## data:  Height_Male and Height_Female
## t = 38.758, df = 998, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   9.396445 10.398675
## sample estimates:
## mean of x mean of y 
##  177.8524  167.9548

Q3.

Below is a table showing all of the expected frequencies of each category to test the assumption.

##   Female_none Female_moderate Female_intense Male_none Male_moderate
## 1    36.33333        89.33333       43.33333  44.66667      75.33333
##   Male_intense
## 1     44.33333

Now that our assumption is satisfied we can run the test. Below shows the X-squared value, degree of freedom as well as the p-value.

## 
##  Pearson's Chi-squared test
## 
## data:  Gender and Physical_activity
## X-squared = 5.9823, df = 2, p-value = 0.05023

Conclusion

  1. At the 5% significance level we can reject the null hypothesis if the p-value is less than 0.05. Hence, due to the extremely small p-value of only \(4.48e^{-72}\), the null hypothesis of the first research question can be rejected. As such, it can be determined that there is in fact a linear relationship between the height and weight of individuals.

  2. Since the p-value is extremely low at only \(2.2e^{-16}\) we can safely reject the null hypoethesis that males and females have the same mean height. Furthermore the high degree of freedom suggests a good level of accuracy suggesting we can trust the 95% confidence interval which shows that men will most likely be 9.396 to 10.399 cm taller on average.

  3. For this statistical test we received a p-value of 0.05023. Despite this being very close to our significance level, it is still >0.05, meaning that we must take the null hypothesis as true. As such, we can say that there is no association between gender and the amount of physical activity undertaken, since both males and females partake in physical activity at the same rate.

Appendix

  1. Below is the overall summary of the Beta t-test used to determine whether weight and height had a linear relationship
## 
## Call:
## lm(formula = weight ~ height, data = A4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.025  -6.208  -0.043   5.817  30.515 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -82.22048    7.74157  -10.62   <2e-16 ***
## height        0.87336    0.04476   19.51   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.036 on 998 degrees of freedom
## Multiple R-squared:  0.2761, Adjusted R-squared:  0.2754 
## F-statistic: 380.7 on 1 and 998 DF,  p-value: < 2.2e-16