MATH1324 Assignment 3

Statistical Significant Relationship between Chest Diameter and Height of the Respondant.

Roy Wong Kher Yung (S3835352)

24/5/2020

Introduction

The study below will seek to understand the if there is any statistical significance between body parts. For instance, the diameter of the chest can depend on how tall the person is. However, there can be other factors which can affect this measurement. Therefore, by knowing all this factors we will have a better understanding of our chest diameter is dependant on height.

In this study, we will seek to understand if there is a statistical significant relationship between a respondent’s chest and height. We have gathered a dataset of body girth measurements and skeletal diameter measurements, as well as age, weight, height, and gender. We analyse these datasets and apply statistical tests and techniques to determine factors which the chest measurements depend on.

Problem Statement

The question that we have to ask is “Can we predict the chest diameter of a person with their height measurement?”. This investigation will seek to understand if there is any statistical significant relationship between a person’s chest diameter (che.di) and height (hgt).

\(Chest Diameter =\beta (Height) + \alpha\)

We will use Linear Regression to find out if any such relation exists and can be predicted as govern by the equation, \(y=\beta x + \alpha\) where \(\alpha\) is the intercept, \(\beta\) is the slope \(y\) is the dependent variable and \(x\) is independent variable.

Data

Data Preprocessing

Data cont.

Import Data

body <- read.csv("bdims.csv")

Subsetting the Data

We subset the data to obtain only the relevant variable to our study which are:

body_inves <- body[,c("che.di","hgt")]
head(body_inves)
  che.di   hgt
1   28.0 174.0
2   30.8 175.3
3   31.7 193.5
4   28.2 186.5
5   29.4 187.2
6   31.3 181.5

Descriptive Statistics and Visualisation

Summary Statistics

Summary Statistics for chest diameter of respondants

body_inves %>% summarise(Min = min(body_inves$che.di,na.rm = TRUE),
                  Q1 = quantile(body_inves$che.di,probs = .25,na.rm = TRUE),
                  Median = median(body_inves$che.di, na.rm = TRUE),
                  Q3 = quantile(body_inves$che.di,probs = .75,na.rm = TRUE),
                  Max = max(body_inves$che.di,na.rm = TRUE),
                  Mean = mean(body_inves$che.di, na.rm = TRUE) %>% round(3),
                  SD = sd(body_inves$che.di, na.rm = TRUE) %>% round(3),
                  n = n(),
                  Missing = sum(is.na(body_inves$che.di)))
   Min    Q1 Median    Q3  Max   Mean    SD   n Missing
1 22.2 25.65   27.8 29.95 35.6 27.974 2.742 507       0

Summary Statistics for height of respondants

body_inves %>% summarise(Min = min(body_inves$hgt,na.rm = TRUE),
                  Q1 = quantile(body_inves$hgt,probs = .25,na.rm = TRUE),
                  Median = median(body_inves$hgt, na.rm = TRUE),
                  Q3 = quantile(body_inves$hgt,probs = .75,na.rm = TRUE),
                  Max = max(body_inves$hgt,na.rm = TRUE),
                  Mean = mean(body_inves$hgt, na.rm = TRUE) %>% round(3),
                  SD = sd(body_inves$hgt, na.rm = TRUE) %>% round(3),
                  n = n(),
                  Missing = sum(is.na(body_inves$hgt)))
    Min    Q1 Median    Q3   Max    Mean    SD   n Missing
1 147.2 163.8  170.3 177.8 198.1 171.144 9.407 507       0

Descriptive Statistics and Visualisation cont.

Scatter plot

plot(che.di ~ hgt, data = body_inves, ylab = "Chest Diameter (cm)", xlab = "Height (cm)")

Hypothesis Testing

In this study, we will be using the F-test for Linear Regression.

The hypothesis for the overall Linear Regression Model.

Assumptions:

Hypothesis Testing cont.

Linear regression models are fitted using the lm() function.

lin_model <- lm(che.di ~ hgt, data = body_inves)
lin_model %>% summary()

Call:
lm(formula = che.di ~ hgt, data = body_inves)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.3102 -1.4326 -0.0696  1.4168  6.8929 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -3.2947     1.7319  -1.902   0.0577 .  
hgt           0.1827     0.0101  18.082   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.138 on 505 degrees of freedom
Multiple R-squared:  0.393, Adjusted R-squared:  0.3918 
F-statistic:   327 on 1 and 505 DF,  p-value: < 2.2e-16

Further confirming the p-value:

pf(q = 327, 1, 505, lower.tail = FALSE)
[1] 1.00343e-56

With the p-value we see that it is p < 0.001. Thus, as the test is statistically significant, we reject \(H_0\).

Hypothesis Testing cont. - Interpreting the Intercept

The intercepting point is at \(\alpha = -3.2947\). The intercept is the average value for Chest Diameter when height=0. We confrim the statistical significance of the intercept.

Assumptions:

lin_model %>% confint()
                 2.5 %    97.5 %
(Intercept) -6.6972252 0.1079121
hgt          0.1628512 0.2025541

Since the 95% Confidence Interval does not capture \(H_0\) and p-value < 0.001, we reject \(H_0\).

Hypothesis Testing cont. - Interpreting the Slope

The slope of the regression line was reported as \(\beta=0.1827\). The slope represents the average increase in Chest Diameter following a one unit increase in height. The hypothesis test of the slope, \(\beta\), was as follows:

Assumptions:

lin_model %>% summary() %>% coef()
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) -3.2946566  1.7318756 -1.902363 5.769227e-02
## hgt          0.1827027  0.0101042 18.081859 1.017694e-56
2*pt(q = 18.08, df = 507-2, lower.tail = FALSE)
## [1] 1.038739e-56

Hypothesis Testing cont. - Interpreting the Slope

Calculating two-tailed p-value for the slope, we confirm that p < 0.001. We reject the \(H_0\). The 95% CI for \(\beta\) to be [0.163, 0.203]. This 95% CI does not capture \(H_0\), therefore it was rejected. Hence, there was a statistically significant positive relationship between the chest diameter and the height of the respondants.

Hypothesis Testing Cont. - Model Fitting

We now plot line of best fit on the linear regression model

plot(che.di ~ hgt, data = body_inves, ylab = "chest diameter", xlab = "height")
abline(lin_model, col=2, lw=2)

Hypothesis Testing Cont. - Testing the Assumptions

par(mfrow = c(2,2))
plot(lin_model)

Hypothesis Testing Cont. - Testing the Assumptions

Hypothesis Testing Cont. - Correlation Test

A Pearson’s correlation was calculated to measure the strength of the linear relationship between Chest Diameter and Height of the Respondant.

library(psychometric)
r=cor(body_inves$che.di, body_inves$hgt)
r
## [1] 0.6268931
CIr(r = r, n = 507, level = .95)
## [1] 0.5709813 0.6770164

Therefore, r=0.627. 95% CI [0.571, 0.677]. This confidence interval does not capture \(H_0\). Therefore, \(H_0\) was rejected. There was a statistically significant positive correlation between Chest Diameter and Height of the Respondant.

Discussion

Results:

Decisions:

Hence, we concluded that there was a statistically significant positive linear relationship between a Chest Diameter and Height of the Respondant.

Discussion cont.

Interpretations are as follows:

References