Jayprakash Nugasur S3214435
Last updated: 22 October, 2017
The dataset indian_liver_patient.csv was acquired from https://www.kaggle.com/jeevannagaraj/indian-liver-patient-dataset. The column fields and descriptions are listed below:
Age: of patients and non-patients
Gender: male, female
Protein types and level: Total_Bilirubin, Direct_Bilirubin, Alkaline_Phosphotase, Alamine_Aminotransferase, Aspartate_Aminotransferase
Total_Protiens Albumin Albumin_and_Globulin_Ratio
Unit of measure: grams per deciliter
Dataset: 1 = patient, 2= non-patient
The following changes were made to the dataset before it was further processed:
indian_liver_patient <- read_csv("C:/Users/Jay Nugasur/Desktop/Datasets/indian-liver-patient-records/indian_liver_patient.csv")
names(indian_liver_patient)[names(indian_liver_patient) == "Dataset"] <- "Liver_patient"
indian_liver_patient$Liver_patient<-indian_liver_patient$Liver_patient %>% factor(levels=c(1,2), labels=c("Yes","No"),ordered= TRUE)indian_liver_patient %>% group_by(Liver_patient) %>% summarise(Min = min(Albumin,na.rm = TRUE),
Q1 = quantile(Albumin,probs = .25,na.rm = TRUE),
Median = median(Albumin, na.rm = TRUE),
Q3 = quantile(Albumin,probs = .75,na.rm = TRUE),
Max = max(Albumin,na.rm = TRUE),
Mean = mean(Albumin, na.rm = TRUE),
SD = sd(Albumin, na.rm = TRUE),
n = n(),
Missing = sum(is.na(Albumin))) ->table1
knitr::kable(table1)| Liver_patient | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| Yes | 0.9 | 2.5 | 3.0 | 3.625 | 5.5 | 3.060577 | 0.7865947 | 416 | 0 |
| No | 1.4 | 2.9 | 3.4 | 4.000 | 5.0 | 3.344311 | 0.7836895 | 167 | 0 |
boxplot(Albumin~Liver_patient, data = indian_liver_patient, ylab = "Albumin Level", xlab= "Liver patient")Patient <- indian_liver_patient %>% filter(Liver_patient == "Yes")
Patient$Albumin %>% qqPlot(dist="norm")Non_patient <- indian_liver_patient %>% filter(Liver_patient == "No")
Non_patient$Albumin %>% qqPlot(dist="norm")leveneTest(Albumin ~ Liver_patient, data = indian_liver_patient)->table2
knitr::kable(table2)| Df | F value | Pr(>F) | |
|---|---|---|---|
| group | 1 | 0.235723 | 0.6274954 |
| 581 | NA | NA |
A two samplet-test used to compare the mean of the two independent groups.
The null hypothesis assumes that mean value between the two groups are equal.
The alternative hypothesis assumes that there is a significant difference in the mean of the two groups.
\[H_0: \mu_1 = \mu_2 \]
\[H_A: \mu_1 \ne \mu_2\]
result<- t.test(Albumin ~ Liver_patient, data= indian_liver_patient, var.equal = TRUE,alternative = "two.sided")
result##
## Two Sample t-test
##
## data: Albumin by Liver_patient
## t = -3.9418, df = 581, p-value = 9.074e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.4251106 -0.1423583
## sample estimates:
## mean in group Yes mean in group No
## 3.060577 3.344311
result$p.value->table3
knitr::kable(table3)| 9.07e-05 |
result$conf.int->table4
knitr::kable(table4)| -0.4251106 |
| -0.1423583 |
T-test result:
Based on the t-test result is clear that the null hypothesis Ho should be rejected as the value of p<0.05.
The 95% confidence interval [-0.4251106, -0.1423583] also did not capture Ho.
In conclusion, the results of the two-sample t-test were statistically significant.
Major findings:
The dissimilarity in the protein level albumin in patients with a liver condition and those without is very obvious. So, an albumine level in blood scale can reliably be used to assess the health of a person’s kidneys. For this investigation that dataset was large enough, but as an improvement, and for further investigation perhaps a set of albumine level data from another countries can be put under test. The current dataset is for patients from India, perhaps findings will differ for people with a different diet.