The goal of this statistical report is to provide an answer to the following research questions:
The data for this report is provided through the A4.csv file saved in the folder labelled Jeremy_Jenkins_46346570, which has recorded the gender, height, weight and level of physical activity undertaken by 1000 different people.
Q1.
## p_value t_value df beta
## 1 4.48e-72 19.5 998 0.873
## 2.5 % 97.5 %
## (Intercept) -97.4121009 -67.0288498
## height 0.7855245 0.9611988
The above table higlights the key outcomes of the statistical test undertaken, showing the p-value, t-value, deegree of freedom, alpha and beta estimate’s and 95% confidence intervals. The graph below has then plotted all 1000 observations, along with the line of best fit to highlight the linear relationship between weight and height which has the beta estimate as the slope, however due to scaling it does not have the alpha estimate as the intercept.
Q2.
Before conducting the test we must check that our assumptions are true. Below are two QQ plot’s for the heights of male’s and female’s to test the normality of the data.
Below is the ratio of the two variances. If it is <2 then it is appropriate to use the equal variance assumption.
## [1] 1.047626
Now that we know both assumptions are satisfied, we can run the t test.
##
## Two Sample t-test
##
## data: Height_Male and Height_Female
## t = 38.758, df = 998, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 9.396445 10.398675
## sample estimates:
## mean of x mean of y
## 177.8524 167.9548
Q3.
Below is a table showing all of the expected frequencies of each category to test the assumption.
## Female_none Female_moderate Female_intense Male_none Male_moderate
## 1 36.33333 89.33333 43.33333 44.66667 75.33333
## Male_intense
## 1 44.33333
Now that our assumption is satisfied we can run the test. Below shows the X-squared value, degree of freedom as well as the p-value.
##
## Pearson's Chi-squared test
##
## data: Gender and Physical_activity
## X-squared = 5.9823, df = 2, p-value = 0.05023
At the 5% significance level we can reject the null hypothesis if the p-value is less than 0.05. Hence, due to the extremely small p-value of only \(4.48e^{-72}\), the null hypothesis of the first research question can be rejected. As such, it can be determined that there is in fact a linear relationship between the height and weight of individuals.
Since the p-value is extremely low at only \(2.2e^{-16}\) we can safely reject the null hypoethesis that males and females have the same mean height. Furthermore the high degree of freedom suggests a good level of accuracy suggesting we can trust the 95% confidence interval which shows that men will most likely be 9.396 to 10.399 cm taller on average.
For this statistical test we received a p-value of 0.05023. Despite this being very close to our significance level, it is still >0.05, meaning that we must take the null hypothesis as true. As such, we can say that there is no association between gender and the amount of physical activity undertaken, since both males and females partake in physical activity at the same rate.
##
## Call:
## lm(formula = weight ~ height, data = A4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.025 -6.208 -0.043 5.817 30.515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -82.22048 7.74157 -10.62 <2e-16 ***
## height 0.87336 0.04476 19.51 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.036 on 998 degrees of freedom
## Multiple R-squared: 0.2761, Adjusted R-squared: 0.2754
## F-statistic: 380.7 on 1 and 998 DF, p-value: < 2.2e-16