MATH1324 Applied Analytics Assignment 2

Analysis on Heart Failure Clinical Records

Bohan Hao s3567918

Last updated: 25 October, 2020

Introduction

Problem Statement

Data

Descriptive Statistics and Visualisation

sex Male Female
Min 113 116
Q1 134 135
Median 137 137
Q3 139 140
Max 148 146
Mean 136.5361 136.7905
SD 4.132675 4.904267
n 194 105
Missing 0 0

Decsriptive Statistics Cont.

Decsriptive Statistics Cont.

The distribution is approximately normal.

Decsriptive Statistics Cont.

Outliers present

Decsriptive Statistics Cont.

\[ If\ x_i \gt \textrm{Upper fence},\ then \ x_i=\ \textrm{95% quantile} \\ If\ x_i \lt \textrm{Lower fence},\ then \ x_i=\ \textrm{5% quantile} \] Where

\[ \textrm{Upper fence} = Q_3 + 1.5*IQR \\ \textrm{Lower fence} = Q_1 - 1.5*IQR \]

Hypothesis Testing - Levene’s test

We assume homogeneity of variance, then alternative hypothesis is heterogeneity of variance.

\[ H_0: \sigma^2_1 = \sigma^2_2 \\ H_A: \sigma^2_1 \neq \sigma^2_2 \] where \(\sigma_1^2\) and \(\sigma_2^2\) refer to the population variance respectively.

To test this, we perform the Levene’s test of equal variance for serum sodium between males and females.

Hypothesis Testing - Levene’s test

leveneTest(serum_sodium ~ sex, data=heart.df) %>% knitr::kable()
Df F value Pr(>F)
group 1 1.299041 0.2553068
297 NA NA

As \(p \gt 0.05\), the Levene’s test is not statistically significant.We fail to reject \(H_0\). It is safe to assume equal variance.

Hypthesis Testing - Two-sample t-test

According to Central Limit Theorem, since the sample sizes of each group are greater than 30, we can proceed with a two-sample t-test.

Now we assume that the average for male and female are the same. Alternative hypothesis is that they are different.

\[H_0: \mu_1 - \mu_2 = 0\\ H_A: \mu_1 - \mu_2 \neq 0\]

Where \(\mu_1\) and \(\mu_2\) refer to the average respectively.

To test this, we perform two-sample t-test assuming equal variance and two-sided hypothesis test.

Hypthesis Testing - Two-sample t-test

t.test(serum_sodium ~ sex, data=heart.df, var.equal=TRUE, alternative="two.sided")
## 
##  Two Sample t-test
## 
## data:  serum_sodium by sex
## t = -1.0605, df = 297, p-value = 0.2898
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.4009807  0.4198319
## sample estimates:
##   mean in group Male mean in group Female 
##             136.6237             137.1143

We get the test statistic \(t=-1.0605\).

Assuming \(\alpha = 0.05\) and a two-tailed test, \(t*\) is calculated as

qt(p = 0.025, df = 297)
## [1] -1.967984

Hypthesis Testing - Two-sample t-test

The test statistic \(t = -1.0605\) is less extreme than \(t* = -1.967984\), which means we fail to reject \(H_0\).

According to p-value method, \(p = 0.2898 > 0.05\), we also failed to reject \(H_0\).

From the R report, the 95% CI of the difference (-0.4906) is [-1.4009807, 0.4198319], which captures \(H_0\). We cannot reject \(H_0\).

Discussion

References