ASSIGNMENT 3 MATH1324 INTRODUCTION TO STATISTICS

ANALYSIS OF INDIAN LIVER PATIENTS WITH RESPECTIVE TO THEIR AGE AND GENDER (NORTH EAST ANDHRA PRADESH, INDIA)

ANNALISHIA GEORGE CHETTIAR (S3794870)
TAMIL MUHIL Karuppiah (S3775152)
Niranjan Kumar Ramachandran (S3711568)

Last updated: 27 October, 2019

Introduction

Introduction Cont.

Problem Statement

-To test if there is any difference in the mean age in case of male and female we will be using statistical hypothesis.

Data

The dataset selected for this investigation is from Kaggle.com. This dataset is an open dataset and has data for indian liver patient records. It has patient records collected from North East of Andhra Pradesh, India.

Data Cont.

Descriptive Statistics and Visualisation

library(readr)
indian_liver_patient<-read_csv("indian_liver_patient.csv")
View(indian_liver_patient)

indian_liver_patient$Gender = factor(indian_liver_patient$Gender , ordered = TRUE )
str(indian_liver_patient)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 583 obs. of  11 variables:
##  $ Age                       : num  65 62 62 58 72 46 26 29 17 55 ...
##  $ Gender                    : Ord.factor w/ 2 levels "Female"<"Male": 1 2 2 2 2 2 1 1 2 2 ...
##  $ Total_Bilirubin           : num  0.7 10.9 7.3 1 3.9 1.8 0.9 0.9 0.9 0.7 ...
##  $ Direct_Bilirubin          : num  0.1 5.5 4.1 0.4 2 0.7 0.2 0.3 0.3 0.2 ...
##  $ Alkaline_Phosphotase      : num  187 699 490 182 195 208 154 202 202 290 ...
##  $ Alamine_Aminotransferase  : num  16 64 60 14 27 19 16 14 22 53 ...
##  $ Aspartate_Aminotransferase: num  18 100 68 20 59 14 12 11 19 58 ...
##  $ Total_Protiens            : num  6.8 7.5 7 6.8 7.3 7.6 7 6.7 7.4 6.8 ...
##  $ Albumin                   : num  3.3 3.2 3.3 3.4 2.4 4.4 3.5 3.6 4.1 3.4 ...
##  $ Albumin_and_Globulin_Ratio: num  0.9 0.74 0.89 1 0.4 1.3 1 1.1 1.2 1 ...
##  $ Dataset                   : num  1 1 1 1 1 1 1 1 2 1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Age = col_double(),
##   ..   Gender = col_character(),
##   ..   Total_Bilirubin = col_double(),
##   ..   Direct_Bilirubin = col_double(),
##   ..   Alkaline_Phosphotase = col_double(),
##   ..   Alamine_Aminotransferase = col_double(),
##   ..   Aspartate_Aminotransferase = col_double(),
##   ..   Total_Protiens = col_double(),
##   ..   Albumin = col_double(),
##   ..   Albumin_and_Globulin_Ratio = col_double(),
##   ..   Dataset = col_double()
##   .. )
colSums(is.na(indian_liver_patient))
##                        Age                     Gender 
##                          0                          0 
##            Total_Bilirubin           Direct_Bilirubin 
##                          0                          0 
##       Alkaline_Phosphotase   Alamine_Aminotransferase 
##                          0                          0 
## Aspartate_Aminotransferase             Total_Protiens 
##                          0                          0 
##                    Albumin Albumin_and_Globulin_Ratio 
##                          0                          4 
##                    Dataset 
##                          0

Decsriptive Statistics Cont.

indian_liver_patient_Male = indian_liver_patient %>% filter(Gender == 'Male')

indian_liver_patient_Male %>% summarise(       Min = min(indian_liver_patient_Male$Age, na.rm = TRUE),       Q1 = quantile(indian_liver_patient_Male$Age, probs = .25, na.rm = TRUE),       Median = median(indian_liver_patient_Male$Age, na.rm = TRUE),       Q3 = quantile(indian_liver_patient_Male$Age, probs = .75, na.rm = TRUE),       Max = max(indian_liver_patient_Male$Age, na.rm = TRUE),       Mean = mean(indian_liver_patient_Male$Age, na.rm = TRUE),       SD = sd(indian_liver_patient_Male$Age, na.rm = TRUE),       n = n(),       Missing = sum(is.na(indian_liver_patient_Male$Age))     ) 
indian_liver_patient_Female = indian_liver_patient %>% filter(Gender =='Female')

indian_liver_patient_Female %>% summarise(       Min = min(indian_liver_patient_Female$Age, na.rm = TRUE),       Q1 = quantile(indian_liver_patient_Female$Age, probs = .25, na.rm = TRUE),       Median = median(indian_liver_patient_Female$Age, na.rm = TRUE),       Q3 = quantile(indian_liver_patient_Female$Age, probs = .75, na.rm = TRUE),       Max = max(indian_liver_patient_Female$Age, na.rm = TRUE),       Mean = mean(indian_liver_patient_Female$Age, na.rm = TRUE),       SD = sd(indian_liver_patient_Female$Age, na.rm = TRUE),       n = n(),       Missing = sum(is.na(indian_liver_patient_Female$Age))     ) 
indian_liver_patient %>% boxplot(Age ~ Gender,data = ., ylab = "AGE", main = "BOX PLOT OF MALE AND FEMALE" , ylim= c(5,100), col = "pink")

Hypothesis Testing

library(car)
leveneTest(Age ~ Gender , data= indian_liver_patient)
t.test(Age ~ Gender , data = indian_liver_patient , var.equal =  TRUE, alternative = "two.sided")
## 
##  Two Sample t-test
## 
## data:  Age by Gender
## t = -1.3655, df = 581, p-value = 0.1726
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.197309  0.934302
## sample estimates:
## mean in group Female   mean in group Male 
##             43.13380             45.26531

Hypthesis Testing Cont.

\[H_0: \mu_1 = \mu_2 \]

\[H_A: \mu_1 \ne \mu_2\]

\[S = \sum^n_{i = 1}d^2_i\]

Discussion

-Further in the two sample t-test assumimg equal variances the p-value is 0.1726 which is greater than 0.05 hence we fail to reject the null hypothesis and accept Ho, that there is no significant mean age difference in case of Indian male and female patients of liver disease.

References