Albumin Level T-test

For Liver patients and non-patients

Jayprakash Nugasur S3214435

Last updated: 22 October, 2017

Introduction

Problem Statement

Description of dataset

The dataset indian_liver_patient.csv was acquired from https://www.kaggle.com/jeevannagaraj/indian-liver-patient-dataset. The column fields and descriptions are listed below:

Descriptive statistics and boxplot

The following changes were made to the dataset before it was further processed:

indian_liver_patient <- read_csv("C:/Users/Jay Nugasur/Desktop/Datasets/indian-liver-patient-records/indian_liver_patient.csv")

names(indian_liver_patient)[names(indian_liver_patient) == "Dataset"] <- "Liver_patient" 

indian_liver_patient$Liver_patient<-indian_liver_patient$Liver_patient %>% factor(levels=c(1,2), labels=c("Yes","No"),ordered= TRUE)
indian_liver_patient %>% group_by(Liver_patient) %>% summarise(Min = min(Albumin,na.rm = TRUE),
                                                    Q1 = quantile(Albumin,probs = .25,na.rm = TRUE),
                                                    Median = median(Albumin, na.rm = TRUE),
                                                    Q3 = quantile(Albumin,probs = .75,na.rm = TRUE),
                                                    Max = max(Albumin,na.rm = TRUE),
                                                    Mean = mean(Albumin, na.rm = TRUE),
                                                    SD = sd(Albumin, na.rm = TRUE),
                                                    n = n(),
                                                    Missing = sum(is.na(Albumin))) ->table1
knitr::kable(table1)
Liver_patient Min Q1 Median Q3 Max Mean SD n Missing
Yes 0.9 2.5 3.0 3.625 5.5 3.060577 0.7865947 416 0
No 1.4 2.9 3.4 4.000 5.0 3.344311 0.7836895 167 0

Descriptive statistics and boxplot Cont.

boxplot(Albumin~Liver_patient, data = indian_liver_patient, ylab = "Albumin Level", xlab= "Liver patient")

Hypothesis Testing: Normality and variance homogeneity

Patient <- indian_liver_patient %>% filter(Liver_patient == "Yes")
Patient$Albumin %>% qqPlot(dist="norm")

Non_patient <- indian_liver_patient %>% filter(Liver_patient == "No")
Non_patient$Albumin %>% qqPlot(dist="norm")

Hypothesis Testing: Normality and variance homogeneity Cont.

leveneTest(Albumin ~ Liver_patient, data = indian_liver_patient)->table2

knitr::kable(table2)
Df F value Pr(>F)
group 1 0.235723 0.6274954
581 NA NA

Hypthesis Testing

\[H_0: \mu_1 = \mu_2 \]

\[H_A: \mu_1 \ne \mu_2\]

Hypthesis Testing Cont.

result<- t.test(Albumin ~ Liver_patient, data= indian_liver_patient, var.equal = TRUE,alternative = "two.sided")
result
## 
##  Two Sample t-test
## 
## data:  Albumin by Liver_patient
## t = -3.9418, df = 581, p-value = 9.074e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4251106 -0.1423583
## sample estimates:
## mean in group Yes  mean in group No 
##          3.060577          3.344311
result$p.value->table3

knitr::kable(table3)
9.07e-05
result$conf.int->table4

knitr::kable(table4)
-0.4251106
-0.1423583

Result and discussion

T-test result:

Major findings:

The dissimilarity in the protein level albumin in patients with a liver condition and those without is very obvious. So, an albumine level in blood scale can reliably be used to assess the health of a person’s kidneys. For this investigation that dataset was large enough, but as an improvement, and for further investigation perhaps a set of albumine level data from another countries can be put under test. The current dataset is for patients from India, perhaps findings will differ for people with a different diet.

References

https://www.kaggle.com/jeevannagaraj/indian-liver-patient-dataset

https://labtestsonline.org/understanding/analytes/albumin/tab/test/