Roy Wong Kher Yung (S3835352)
24/5/2020
The study below will seek to understand the if there is any statistical significance between body parts. For instance, the diameter of the chest can depend on how tall the person is. However, there can be other factors which can affect this measurement. Therefore, by knowing all this factors we will have a better understanding of our chest diameter is dependant on height.
In this study, we will seek to understand if there is a statistical significant relationship between a respondent’s chest and height. We have gathered a dataset of body girth measurements and skeletal diameter measurements, as well as age, weight, height, and gender. We analyse these datasets and apply statistical tests and techniques to determine factors which the chest measurements depend on.
The question that we have to ask is “Can we predict the chest diameter of a person with their height measurement?”. This investigation will seek to understand if there is any statistical significant relationship between a person’s chest diameter (che.di) and height (hgt).
\(Chest Diameter =\beta (Height) + \alpha\)
We will use Linear Regression to find out if any such relation exists and can be predicted as govern by the equation, \(y=\beta x + \alpha\) where \(\alpha\) is the intercept, \(\beta\) is the slope \(y\) is the dependent variable and \(x\) is independent variable.
Chest Diameter and height)We subset the data to obtain only the relevant variable to our study which are:
che.di - Respondent’s chest diameter in centimeters, measured at nipple level, mid-expiration.hgt - Respondent’s height in centimeters. che.di hgt
1 28.0 174.0
2 30.8 175.3
3 31.7 193.5
4 28.2 186.5
5 29.4 187.2
6 31.3 181.5
Summary Statistics for chest diameter of respondants
body_inves %>% summarise(Min = min(body_inves$che.di,na.rm = TRUE),
Q1 = quantile(body_inves$che.di,probs = .25,na.rm = TRUE),
Median = median(body_inves$che.di, na.rm = TRUE),
Q3 = quantile(body_inves$che.di,probs = .75,na.rm = TRUE),
Max = max(body_inves$che.di,na.rm = TRUE),
Mean = mean(body_inves$che.di, na.rm = TRUE) %>% round(3),
SD = sd(body_inves$che.di, na.rm = TRUE) %>% round(3),
n = n(),
Missing = sum(is.na(body_inves$che.di))) Min Q1 Median Q3 Max Mean SD n Missing
1 22.2 25.65 27.8 29.95 35.6 27.974 2.742 507 0
Summary Statistics for height of respondants
body_inves %>% summarise(Min = min(body_inves$hgt,na.rm = TRUE),
Q1 = quantile(body_inves$hgt,probs = .25,na.rm = TRUE),
Median = median(body_inves$hgt, na.rm = TRUE),
Q3 = quantile(body_inves$hgt,probs = .75,na.rm = TRUE),
Max = max(body_inves$hgt,na.rm = TRUE),
Mean = mean(body_inves$hgt, na.rm = TRUE) %>% round(3),
SD = sd(body_inves$hgt, na.rm = TRUE) %>% round(3),
n = n(),
Missing = sum(is.na(body_inves$hgt))) Min Q1 Median Q3 Max Mean SD n Missing
1 147.2 163.8 170.3 177.8 198.1 171.144 9.407 507 0
In this study, we will be using the F-test for Linear Regression.
The hypothesis for the overall Linear Regression Model.
Assumptions:
Linear regression models are fitted using the lm() function.
Call:
lm(formula = che.di ~ hgt, data = body_inves)
Residuals:
Min 1Q Median 3Q Max
-6.3102 -1.4326 -0.0696 1.4168 6.8929
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.2947 1.7319 -1.902 0.0577 .
hgt 0.1827 0.0101 18.082 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.138 on 505 degrees of freedom
Multiple R-squared: 0.393, Adjusted R-squared: 0.3918
F-statistic: 327 on 1 and 505 DF, p-value: < 2.2e-16
Further confirming the p-value:
[1] 1.00343e-56
With the p-value we see that it is p < 0.001. Thus, as the test is statistically significant, we reject \(H_0\).
The intercepting point is at \(\alpha = -3.2947\). The intercept is the average value for Chest Diameter when height=0. We confrim the statistical significance of the intercept.
Assumptions:
2.5 % 97.5 %
(Intercept) -6.6972252 0.1079121
hgt 0.1628512 0.2025541
Since the 95% Confidence Interval does not capture \(H_0\) and p-value < 0.001, we reject \(H_0\).
The slope of the regression line was reported as \(\beta=0.1827\). The slope represents the average increase in Chest Diameter following a one unit increase in height. The hypothesis test of the slope, \(\beta\), was as follows:
Assumptions:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.2946566 1.7318756 -1.902363 5.769227e-02
## hgt 0.1827027 0.0101042 18.081859 1.017694e-56
## [1] 1.038739e-56
Calculating two-tailed p-value for the slope, we confirm that p < 0.001. We reject the \(H_0\). The 95% CI for \(\beta\) to be [0.163, 0.203]. This 95% CI does not capture \(H_0\), therefore it was rejected. Hence, there was a statistically significant positive relationship between the chest diameter and the height of the respondants.
We now plot line of best fit on the linear regression model
plot(che.di ~ hgt, data = body_inves, ylab = "chest diameter", xlab = "height")
abline(lin_model, col=2, lw=2)A Pearson’s correlation was calculated to measure the strength of the linear relationship between Chest Diameter and Height of the Respondant.
## [1] 0.6268931
## [1] 0.5709813 0.6770164
Therefore, r=0.627. 95% CI [0.571, 0.677]. This confidence interval does not capture \(H_0\). Therefore, \(H_0\) was rejected. There was a statistically significant positive correlation between Chest Diameter and Height of the Respondant.
Results:
Decisions:
Hence, we concluded that there was a statistically significant positive linear relationship between a Chest Diameter and Height of the Respondant.
Interpretations are as follows:
Chest Diameter = 0.1827*Height - 3.2947 following the equation \(y=\beta x+\alpha\) where \(\alpha\) is the intercept and \(\beta\) is the slope.