KOMPUTASI STATISTIKA
~ Ujian Tengah Semester ~
Kontak | : \(\downarrow\) |
dsciencelabs@outlook.com | |
https://www.instagram.com/dsciencelabs/ | |
RPubs | https://rpubs.com/dsciencelabs/ |
Data Set
Tugas 1
Lakukan proses persiapan data dengan R dan Python, dengan beberapa langkah berikut:
Import Data
<- read.csv("loan-train.csv", stringsAsFactors = T, na.strings=c("","","NA")) loan_train
Penanganan Data Hilang
Kita cek tipe data dan nilai NA dari data :
summary(loan_train)
## ï..Loan_ID Gender Married Dependents Education
## LP001002: 1 Female:112 No :213 0 :345 Graduate :480
## LP001003: 1 Male :489 Yes :398 1 :102 Not Graduate:134
## LP001005: 1 NA's : 13 NA's: 3 2 :101
## LP001006: 1 3+ : 51
## LP001008: 1 NA's: 15
## LP001011: 1
## (Other) :608
## Self_Employed ApplicantIncome CoapplicantIncome LoanAmount
## No :500 Min. : 150 Min. : 0 Min. : 9.0
## Yes : 82 1st Qu.: 2878 1st Qu.: 0 1st Qu.:100.0
## NA's: 32 Median : 3812 Median : 1188 Median :128.0
## Mean : 5403 Mean : 1621 Mean :146.4
## 3rd Qu.: 5795 3rd Qu.: 2297 3rd Qu.:168.0
## Max. :81000 Max. :41667 Max. :700.0
## NA's :22
## Loan_Amount_Term Credit_History Property_Area Loan_Status
## Min. : 12 Min. :0.0000 Rural :179 N:192
## 1st Qu.:360 1st Qu.:1.0000 Semiurban:233 Y:422
## Median :360 Median :1.0000 Urban :202
## Mean :342 Mean :0.8422
## 3rd Qu.:360 3rd Qu.:1.0000
## Max. :480 Max. :1.0000
## NA's :14 NA's :50
glimpse(loan_train)
## Rows: 614
## Columns: 13
## $ ï..Loan_ID <fct> LP001002, LP001003, LP001005, LP001006, LP001008, LP~
## $ Gender <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male~
## $ Married <fct> No, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes,~
## $ Dependents <fct> 0, 1, 0, 0, 0, 2, 0, 3+, 2, 1, 2, 2, 2, 0, 2, 0, 1, ~
## $ Education <fct> Graduate, Graduate, Graduate, Not Graduate, Graduate~
## $ Self_Employed <fct> No, No, Yes, No, No, Yes, No, No, No, No, No, NA, No~
## $ ApplicantIncome <int> 5849, 4583, 3000, 2583, 6000, 5417, 2333, 3036, 4006~
## $ CoapplicantIncome <dbl> 0, 1508, 0, 2358, 0, 4196, 1516, 2504, 1526, 10968, ~
## $ LoanAmount <int> NA, 128, 66, 120, 141, 267, 95, 158, 168, 349, 70, 1~
## $ Loan_Amount_Term <int> 360, 360, 360, 360, 360, 360, 360, 360, 360, 360, 36~
## $ Credit_History <int> 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, NA, ~
## $ Property_Area <fct> Urban, Rural, Urban, Urban, Urban, Urban, Urban, Sem~
## $ Loan_Status <fct> Y, N, Y, Y, Y, Y, Y, N, Y, N, Y, Y, Y, N, Y, Y, Y, N~
anyNA(loan_train)
## [1] TRUE
colSums(is.na(loan_train))
## ï..Loan_ID Gender Married Dependents
## 0 13 3 15
## Education Self_Employed ApplicantIncome CoapplicantIncome
## 0 32 0 0
## LoanAmount Loan_Amount_Term Credit_History Property_Area
## 22 14 50 0
## Loan_Status
## 0
Ada dua tipe data yang perlu diubah:
- Loan_Amount_Term : Ubah sebagai tipe data faktor
- Credit_History : Ubah sebagai tipe data faktor
names(loan_train)[1] <- "Loan_ID"
<- loan_train %>%
loan.train ::select(-Loan_ID) %>%
dplyrmutate(Loan_Amount_Term = as.factor(Loan_Amount_Term),
Credit_History = as.factor(Credit_History))
head(loan.train)
Ada juga nilai NA berdasarkan pemeriksaan awal pada:
- LoanAmount
- Loan_Amount_Term
- Credit_History
Fungsi untuk data cleansing :
= function(x){
Mode = table(x)
a = max(a)
b if(all(a == b))
= NA
mod else if(is.numeric(x))
= as.numeric(names(a))[a==b]
mod else
= names(a)[a==b]
mod return(mod)
}
Untuk membuat hasil keseluruhan yang lebih baik, kita akan mencoba mengganti nilai yang hilang/ NA berdasarkan tipenya:
- Data dengan nilai tipe Numerik yang hilang akan diganti dengan nilai rata-ratanya (menggunakan fungsi mean()).
- Nilai data dengan tipe data faktor akan diganti dengan nilai yang memiliki jumlah kemunculan tertinggi dalam kumpulan datanya (menggunakan fungsi mode()).
$Gender[is.na(loan.train$Gender)] <- Mode(loan.train$Gender)
loan.train$Married[is.na(loan.train$Married)] <- Mode(loan.train$Married)
loan.train$Dependents[is.na(loan.train$Dependents)] <- Mode(loan.train$Dependents)
loan.train$Credit_History[is.na(loan.train$Credit_History)] <- Mode(loan.train$Credit_History) loan.train
$LoanAmount[is.na(loan.train$LoanAmount)] <- mean(loan.train$LoanAmount, na.rm = T)
loan.train$Loan_Amount_Term[is.na(loan.train$Loan_Amount_Term)] <- mean(loan.train$Loan_Amount_Term, na.rm = T)
loan.trainsummary(loan.train)
## Gender Married Dependents Education Self_Employed
## Female:112 No :213 0 :360 Graduate :480 No :500
## Male :502 Yes:401 1 :102 Not Graduate:134 Yes : 82
## 2 :101 NA's: 32
## 3+: 51
##
##
##
## ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term
## Min. : 150 Min. : 0 Min. : 9.0 360 :512
## 1st Qu.: 2878 1st Qu.: 0 1st Qu.:100.2 180 : 44
## Median : 3812 Median : 1188 Median :129.0 480 : 15
## Mean : 5403 Mean : 1621 Mean :146.4 300 : 13
## 3rd Qu.: 5795 3rd Qu.: 2297 3rd Qu.:164.8 84 : 4
## Max. :81000 Max. :41667 Max. :700.0 (Other): 12
## NA's : 14
## Credit_History Property_Area Loan_Status
## 0: 89 Rural :179 N:192
## 1:525 Semiurban:233 Y:422
## Urban :202
##
##
##
##
na.omit(loan.train)
Periksa Data Duplikat
sum(duplicated(loan.train))
## [1] 0
Tidak ada data duplikat
Pemisahan Data Kategori dan Numerik
Kategori
<- loan.train%>% dplyr::select_if(is.factor)
Cat_data names(Cat_data)
## [1] "Gender" "Married" "Dependents" "Education"
## [5] "Self_Employed" "Loan_Amount_Term" "Credit_History" "Property_Area"
## [9] "Loan_Status"
Numerik
<- loan.train%>% dplyr::select_if(is.numeric)
Num_data names(Num_data)
## [1] "ApplicantIncome" "CoapplicantIncome" "LoanAmount"
Penanganan Data Numerik
Penganann Data Pencilan
Penanganan Data Kategorikal
Tugas 2
Lakukan Proses Visualisasi Data dengan menggunakan R dan Python dengan beberapa langkah berikut:
Visualisasi Univariabel
Categorical
Gender
<- loan.train %>%
plotdata count(Gender) %>%
mutate(pct = n / sum(n),
pctlabel = paste0(round(pct*100), "%"))
ggplot(plotdata,
aes(x = reorder(Gender, -pct),
y = pct)) +
geom_bar(stat = "identity",
color = "azure4") +
geom_text(aes(label = pctlabel),
vjust = -0.25) +
theme_minimal() + # use a minimal theme
scale_y_continuous(labels = percent) +
labs(x = "Gender",
y = "Percent",
title = "Loan by gender")
Married
<- loan.train %>%
plotdata count(Married) %>%
mutate(pct = n / sum(n),
pctlabel = paste0(round(pct*100), "%"))
ggplot(plotdata,
aes(x = reorder(Married, -pct),
y = pct)) +
geom_bar(stat = "identity",
color = "azure4") +
geom_text(aes(label = pctlabel),
vjust = -0.25) +
theme_minimal() + # use a minimal theme
scale_y_continuous(labels = percent) +
labs(x = "Married",
y = "Percent",
title = "Loan by married")
Dependents
<- loan.train %>%
plotdata count(Dependents) %>%
mutate(pct = n / sum(n),
pctlabel = paste0(round(pct*100), "%"))
ggplot(plotdata,
aes(x = reorder(Dependents, -pct),
y = pct)) +
geom_bar(stat = "identity",
color = "azure4") +
geom_text(aes(label = pctlabel),
vjust = -0.25) +
theme_minimal() + # use a minimal theme
scale_y_continuous(labels = percent) +
labs(x = "Dependents",
y = "Percent",
title = "Loan by Dependents")
Education
<- loan.train %>%
plotdata count(Education) %>%
mutate(pct = n / sum(n),
pctlabel = paste0(round(pct*100), "%"))
ggplot(plotdata,
aes(x = reorder(Education, -pct),
y = pct)) +
geom_bar(stat = "identity",
color = "azure4") +
geom_text(aes(label = pctlabel),
vjust = -0.25) +
theme_minimal() + # use a minimal theme
scale_y_continuous(labels = percent) +
labs(x = "Education",
y = "Percent",
title = "Loan by Education")
Self_employed
<- loan.train %>%
plotdata count(Self_Employed) %>%
mutate(pct = n / sum(n),
pctlabel = paste0(round(pct*100), "%"))
ggplot(plotdata,
aes(x = reorder(Self_Employed, -pct),
y = pct)) +
geom_bar(stat = "identity",
color = "azure4") +
geom_text(aes(label = pctlabel),
vjust = -0.25) +
theme_minimal() + # use a minimal theme
scale_y_continuous(labels = percent) +
labs(x = "Self_Employed",
y = "Percent",
title = "Loan by Self_Employed")
Loan_Amount_Term
<- loan.train %>%
plotdata count(Loan_Amount_Term) %>%
mutate(pct = n / sum(n),
pctlabel = paste0(round(pct*100), "%"))
ggplot(plotdata,
aes(x = reorder(Loan_Amount_Term, -pct),
y = pct)) +
geom_bar(stat = "identity",
color = "azure4") +
geom_text(aes(label = pctlabel),
vjust = -0.25) +
theme_minimal() +
scale_y_continuous(labels = percent) +
labs(x = "Loan_Amount_Term",
y = "Percent",
title = "Loan by Loan_Amount_Term")
Credit_History
<- loan.train %>%
plotdata count(Credit_History) %>%
mutate(pct = n / sum(n),
pctlabel = paste0(round(pct*100), "%"))
ggplot(plotdata,
aes(x = reorder(Credit_History, -pct),
y = pct)) +
geom_bar(stat = "identity",
color = "azure4") +
geom_text(aes(label = pctlabel),
vjust = -0.25) +
theme_minimal() +
scale_y_continuous(labels = percent) +
labs(x = "Credit_History",
y = "Percent",
title = "Loan by Credit_History")
Property_Area
<- loan.train %>%
plotdata count(Property_Area) %>%
mutate(pct = n / sum(n),
pctlabel = paste0(round(pct*100), "%"))
ggplot(plotdata,
aes(x = reorder(Property_Area, -pct),
y = pct)) +
geom_bar(stat = "identity",
color = "azure4") +
geom_text(aes(label = pctlabel),
vjust = -0.25) +
theme_minimal() +
scale_y_continuous(labels = percent) +
labs(x = "Property_Area",
y = "Percent",
title = "Loan by Property_Area")
Numerical
ApplicantIncome
ggplot(loan.train, aes(x = ApplicantIncome)) +
geom_histogram(fill = "cornflowerblue",
color = "white",bins = 20) +
theme_minimal() + # use a minimal theme
labs(title="Loan by ApplicantIncome",
x = "ApplicantIncome")
CoapplicantIncome
ggplot(loan.train, aes(x = CoapplicantIncome)) +
geom_histogram(fill = "cornflowerblue",
color = "white",bins = 20) +
theme_minimal() + # use a minimal theme
labs(title="Loan by CoapplicantIncome",
x = "CoapplicantIncome")
LoanAmount
ggplot(loan.train, aes(x = LoanAmount)) +
geom_histogram(fill = "cornflowerblue",
color = "white",bins = 20) +
theme_minimal() + # use a minimal theme
labs(title="Loan by LoanAmount",
x = "LoanAmount")
Visualisasi Bivariabel
Categorical vs Categorical
Gender vs Married
ggplot(loan.train,
aes(x = Gender,
fill = Married)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Gender vs Education
ggplot(loan.train,
aes(x = Gender,
fill = Education)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Gender vs Dependents
ggplot(loan.train,
aes(x = Gender,
fill = Dependents)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Gender vs Self_Employed
ggplot(loan.train,
aes(x = Gender,
fill = Self_Employed)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Gender vs loanAmountTerm
ggplot(loan.train,
aes(x = Gender,
fill = Loan_Amount_Term)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Gender vs credit_history
ggplot(loan.train,
aes(x = Gender,
fill = Credit_History)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Gender vs PropertyArea
ggplot(loan.train,
aes(x = Gender,
fill = Property_Area)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Gender vs LoanStatus
ggplot(loan.train,
aes(x = Gender,
fill = Loan_Status )) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Married vs Dependents
ggplot(loan.train,
aes(x = Married,
fill = Education)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Married vs Education
ggplot(loan.train,
aes(x = Married,
fill = Dependents)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Married vs Self_Employed
ggplot(loan.train,
aes(x = Married,
fill = Self_Employed)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Married vs loanAmountTerm
ggplot(loan.train,
aes(x = Married,
fill = Loan_Amount_Term)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Married vs credit_history
ggplot(loan.train,
aes(x = Married,
fill = Credit_History)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Married vs PropertyArea
ggplot(loan.train,
aes(x = Married,
fill = Property_Area)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Married vs LoanStatus
ggplot(loan.train,
aes(x = Married,
fill = Loan_Status )) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Dependents vs Education
ggplot(loan.train,
aes(x = Dependents,
fill = Education)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Dependents vs Self_Employed
ggplot(loan.train,
aes(x = Dependents,
fill = Self_Employed)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Dependents vs loanAmountTerm
ggplot(loan.train,
aes(x = Dependents,
fill = Loan_Amount_Term)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Dependents vs credit_history
ggplot(loan.train,
aes(x = Dependents,
fill = Credit_History)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Dependents vs PropertyArea
ggplot(loan.train,
aes(x = Dependents,
fill = Property_Area)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Dependents vs LoanStatus
ggplot(loan.train,
aes(x = Dependents,
fill = Loan_Status )) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Education vs Self_Employed
ggplot(loan.train,
aes(x = Education,
fill = Self_Employed)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Education vs loanAmountTerm
ggplot(loan.train,
aes(x = Education,
fill = Loan_Amount_Term)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Education vs credit_history
ggplot(loan.train,
aes(x = Education,
fill = Credit_History)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Education vs PropertyArea
ggplot(loan.train,
aes(x = Education,
fill = Property_Area)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Education vs LoanStatus
ggplot(loan.train,
aes(x = Education,
fill = Loan_Status )) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Self_Employed vs loanAmountTerm
ggplot(loan.train,
aes(x = Self_Employed,
fill = Loan_Amount_Term)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Self_Employed vs credit_history
ggplot(loan.train,
aes(x = Self_Employed,
fill = Credit_History)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Self_Employed vs PropertyArea
ggplot(loan.train,
aes(x = Self_Employed,
fill = Property_Area)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Self_Employed vs LoanStatus
ggplot(loan.train,
aes(x = Self_Employed,
fill = Loan_Status )) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Loan_Amount_Term vs credit_history
ggplot(loan.train,
aes(x = Loan_Amount_Term,
fill = Credit_History)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Loan_Amount_Term vs PropertyArea
ggplot(loan.train,
aes(x = Loan_Amount_Term,
fill = Property_Area)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Loan_Amount_Term vs LoanStatus
ggplot(loan.train,
aes(x = Loan_Amount_Term,
fill = Loan_Status )) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Credit_History vs PropertyArea
ggplot(loan.train,
aes(x = Credit_History,
fill = Property_Area)) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Credit_History vs LoanStatus
ggplot(loan.train,
aes(x = Credit_History,
fill = Loan_Status )) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Property_Area vs LoanStatus
ggplot(loan.train,
aes(x = Property_Area,
fill = Loan_Status )) +
geom_bar(position = "fill") +
theme_minimal() +
labs(y = "Proportion")
Numerical vs Numerical
ApplicantIncome vs CoapplicantIncome
ggplot(loan.train,
aes(x = ApplicantIncome,
y = CoapplicantIncome)) +
geom_point(color="cornflowerblue",
size = 1.5,
alpha=.8) +
scale_y_continuous(label = scales::dollar,
limits = c(0, 50000)) +
scale_x_continuous(breaks = seq(0, 40000, 5000),
limits=c(0, 50000)) +
theme_minimal() + # use a minimal theme
labs(x = "ApplicantIncome",
y = "",
title = "ApplicantIncome vs CoapplicantIncome")
CoapplicantIncome vs LoanAmount
ggplot(loan.train,
aes(x = CoapplicantIncome,
y = LoanAmount)) +
geom_point(color="cornflowerblue",
size = 1.5,
alpha=.8) +
scale_y_continuous(label = scales::dollar,
limits = c(0, 800)) +
scale_x_continuous(breaks = seq(0, 40000, 5000),
limits=c(0, 50000)) +
theme_minimal() + # use a minimal theme
labs(x = "CoapplicantIncome",
y = "",
title = "CoapplicantIncome vs LoanAmount")
ApplicantIncome vs LoanAmount
ggplot(loan.train,
aes(x = ApplicantIncome,
y = LoanAmount)) +
geom_point(color="cornflowerblue",
size = 1,
alpha=.8) +
scale_y_continuous(label = scales::dollar,
limits = c(0, 800)) +
scale_x_continuous(breaks = seq(0, 40000, 5000),
limits=c(0, 50000)) +
theme_minimal() + # use a minimal theme
labs(x = "ApplicantIncome",
y = "",
title = "ApplicantIncome vs LoanAmount")
Categorical vs Numerical
ApplicantIncome vs Gender
ggplot(loan.train,
aes(x = ApplicantIncome,
fill = Gender)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "Applicant Income by Gender")
ApplicantIncome vs Married
ggplot(loan.train,
aes(x = ApplicantIncome,
fill = Married)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "Applicant Income by Married")
ApplicantIncome vs Dependents
ggplot(loan.train,
aes(x = ApplicantIncome,
fill = Dependents)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "Applicant Income by Dependents")
ApplicantIncome vs Education
ggplot(loan.train,
aes(x = ApplicantIncome,
fill = Education)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "Applicant Income by Education")
ApplicantIncome vs Self_Employed
ggplot(loan.train,
aes(x = ApplicantIncome,
fill = Self_Employed)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "Applicant Income by Self_Employed")
ApplicantIncome vs Loan_Amount_Term
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = ApplicantIncome,
y = Loan_Amount_Term,
fill = Loan_Amount_Term)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
ApplicantIncome vs Credit_History
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = ApplicantIncome,
y = Credit_History,
fill = Credit_History)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
ApplicantIncome vs Property_Area
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = ApplicantIncome,
y = Property_Area,
fill = Property_Area)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
ApplicantIncome vs Loan_Status
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = ApplicantIncome,
y = Loan_Status,
fill = Loan_Status)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
CoapplicantIncome vs Gender
ggplot(loan.train,
aes(x = CoapplicantIncome,
fill = Gender)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "CoapplicantIncome by Gender")
CoapplicantIncome vs Married
ggplot(loan.train,
aes(x = CoapplicantIncome,
fill = Married)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "CoapplicantIncome by Married")
CoapplicantIncome vs Dependents
ggplot(loan.train,
aes(x = CoapplicantIncome,
fill = Dependents)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "CoapplicantIncome by Dependents")
CoapplicantIncome vs Education
ggplot(loan.train,
aes(x = CoapplicantIncome,
fill = Education)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "CoapplicantIncome by Education")
CoapplicantIncome vs Self_Employed
ggplot(loan.train,
aes(x = ApplicantIncome,
fill = Self_Employed)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "CoapplicantIncome by Self_Employed")
CoapplicantIncome vs Loan_Amount_Term
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = CoapplicantIncome,
y = Loan_Amount_Term,
fill = Loan_Amount_Term)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
CoapplicantIncome vs Credit_History
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = CoapplicantIncome,
y = Credit_History,
fill = Credit_History)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
CoapplicantIncome vs Property_Area
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = CoapplicantIncome,
y = Property_Area,
fill = Property_Area)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
CoapplicantIncome vs Loan_Status
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = CoapplicantIncome,
y = Loan_Status,
fill = Loan_Status)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
LoanAmount vs Gender
ggplot(loan.train,
aes(x = LoanAmount,
fill = Gender)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "LoanAmount by Gender")
LoanAmount vs Married
ggplot(loan.train,
aes(x = LoanAmount,
fill = Married)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "LoanAmount by Married")
LoanAmount vs Dependents
ggplot(loan.train,
aes(x = LoanAmount,
fill = Dependents)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "LoanAmount by Dependents")
LoanAmount vs Education
ggplot(loan.train,
aes(x = LoanAmount,
fill = Education)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "LoanAmount by Education")
LoanAmount vs Self_Employed
ggplot(loan.train,
aes(x = LoanAmount,
fill = Self_Employed)) +
geom_density(alpha = 0.4) +
theme_minimal() +
labs(title = "LoanAmount by Self_Employed")
LoanAmount vs Loan_Amount_Term
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = LoanAmount,
y = Loan_Amount_Term,
fill = Loan_Amount_Term)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
LoanAmount vs Credit_History
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = CoapplicantIncome,
y = Credit_History,
fill = Credit_History)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
LoanAmount vs Property_Area
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = LoanAmount,
y = Property_Area,
fill = Property_Area)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
LoanAmount vs Loan_Status
library(ggridges) # to handle overlapping visulization
ggplot(loan.train,
aes(x = LoanAmount,
y = Loan_Status,
fill = Loan_Status)) +
geom_density_ridges(alpha = 0.7) +
theme_ridges() +
theme(legend.position = "none")
- Visualisasi Multivariabel
Visualisasi Multivariabel
ApplicantIncome by Married, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = ApplicantIncome,
color = Married,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "ApplicantIncome by Married, Gender, and Loan Term")
ApplicantIncome by Education, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = ApplicantIncome,
color = Education,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "ApplicantIncome by Married, Gender, and Loan Term")
ApplicantIncome by Dependents, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = ApplicantIncome,
color = Dependents,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "ApplicantIncome by Dependents, Gender, and Loan Term")
ApplicantIncome by Self_Employed, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = ApplicantIncome,
color = Self_Employed,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "ApplicantIncome by Self_Employed, Gender, and Loan Term")
ApplicantIncome by Credit_History, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = ApplicantIncome,
color = Credit_History,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "ApplicantIncome by Credit_History, Gender, and Loan Term")
ApplicantIncome by Property_Area, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = ApplicantIncome,
color = Property_Area,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "ApplicantIncome by Property_Area, Gender, and Loan Term")
ApplicantIncome by Loan_Status, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = ApplicantIncome,
color = Loan_Status,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "ApplicantIncome by Loan_Status, Gender, and Loan Term")
CoapplicantIncome by Married, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = CoapplicantIncome,
color = Married,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "CoapplicantIncome by Married, Gender, and Loan Term")
CoapplicantIncome by Education, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = CoapplicantIncome,
color = Education,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "CoapplicantIncome by Married, Gender, and Loan Term")
CoapplicantIncome by Dependents, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = CoapplicantIncome,
color = Dependents,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "CoapplicantIncome by Dependents, Gender, and Loan Term")
CoapplicantIncome by Self_Employed, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = CoapplicantIncome,
color = Self_Employed,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "CoapplicantIncome by Self_Employed, Gender, and Loan Term")
CoapplicantIncome by Credit_History, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = CoapplicantIncome,
color = Credit_History,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "CoapplicantIncome by Credit_History, Gender, and Loan Term")
CoapplicantIncome by Property_Area, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = CoapplicantIncome,
color = Property_Area,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "CoapplicantIncome by Property_Area, Gender, and Loan Term")
CoapplicantIncome by Loan_Status, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = CoapplicantIncome,
color = Loan_Status,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "CoapplicantIncome by Loan_Status, Gender, and Loan Term")
LoanAmount by Married, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = LoanAmount,
color = Married,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "LoanAmount by Married, Gender, and Loan Term")
LoanAmount by Education, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = LoanAmount,
color = Education,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "LoanAmount by Married, Gender, and Loan Term")
LoanAmount by Dependents, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = LoanAmount,
color = Dependents,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "LoanAmount by Dependents, Gender, and Loan Term")
LoanAmount by Self_Employed, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = LoanAmount,
color = Self_Employed,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "LoanAmount by Self_Employed, Gender, and Loan Term")
LoanAmount by Credit_History, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = LoanAmount,
color = Credit_History,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "LoanAmount by Credit_History, Gender, and Loan Term")
LoanAmount by Property_Area, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = LoanAmount,
color = Property_Area,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "LoanAmount by Property_Area, Gender, and Loan Term")
LoanAmount by Loan_Status, Gender, and Loan Term
ggplot(loan.train,
aes(x = Loan_Amount_Term,
y = LoanAmount,
color = Loan_Status,
shape = Gender)) +
geom_point(size = 3, alpha = .6) +
theme_minimal() +
labs(title = "LoanAmount by Loan_Status, Gender, and Loan Term")
Tugas 3
Lakukan proses analisa data secara deskriptif menggunakan R dan Python dengan beberapa langkah berikut:
Kualitatif
Kategori Univariat
Loan_ID
<- table(loan.train$Gender)
Cat1 Cat1
##
## Female Male
## 112 502
<- table(loan.train$Married)
Cat2 Cat2
##
## No Yes
## 213 401
<- table(loan.train$Dependents)
Cat3 Cat3
##
## 0 1 2 3+
## 360 102 101 51
<- table(loan.train$Education)
Cat4 Cat4
##
## Graduate Not Graduate
## 480 134
<- table(loan.train$Self_Employed)
Cat5 Cat5
##
## No Yes
## 500 82
<- table(loan.train$Loan_Amount_Term)
Cat6 Cat6
##
## 12 36 60 84 120 180 240 300 360 480
## 1 2 2 4 3 44 4 13 512 15
<- table(loan.train$Credit_History)
Cat7 Cat7
##
## 0 1
## 89 525
<- table(loan.train$Property_Area)
Cat8 Cat8
##
## Rural Semiurban Urban
## 179 233 202
<- table(loan.train$Loan_Status)
Cat9 Cat9
##
## N Y
## 192 422
Kategori Bivariat
<- loan.train %>%dplyr::select(Gender, Married) %>%table()
bicat1 bicat1
## Married
## Gender No Yes
## Female 80 32
## Male 133 369
<- loan.train %>%dplyr::select(Gender, Dependents) %>%table()
bicat2 bicat2
## Dependents
## Gender 0 1 2 3+
## Female 83 19 7 3
## Male 277 83 94 48
<- loan.train %>%dplyr::select(Gender, Education) %>%table()
bicat3 bicat3
## Education
## Gender Graduate Not Graduate
## Female 92 20
## Male 388 114
<- loan.train %>%dplyr::select(Gender, Self_Employed) %>%table()
bicat4 bicat4
## Self_Employed
## Gender No Yes
## Female 89 15
## Male 411 67
<- loan.train %>%dplyr::select(Gender, Loan_Amount_Term) %>%table()
bicat5 bicat5
## Loan_Amount_Term
## Gender 12 36 60 84 120 180 240 300 360 480
## Female 0 1 0 1 0 3 1 1 98 4
## Male 1 1 2 3 3 41 3 12 414 11
<- loan.train %>%dplyr::select(Gender, Credit_History) %>%table()
bicat6 bicat6
## Credit_History
## Gender 0 1
## Female 17 95
## Male 72 430
<- loan.train %>%dplyr::select(Gender, Property_Area) %>%table()
bicat7 bicat7
## Property_Area
## Gender Rural Semiurban Urban
## Female 24 55 33
## Male 155 178 169
<- loan.train %>%dplyr::select(Gender, Loan_Status) %>%table()
bicat8 bicat8
## Loan_Status
## Gender N Y
## Female 37 75
## Male 155 347
<- loan.train %>%dplyr::select(Married, Dependents) %>%table()
bicat9 bicat9
## Dependents
## Married 0 1 2 3+
## No 175 23 8 7
## Yes 185 79 93 44
<- loan.train %>%dplyr::select(Married, Education) %>%table()
bicat10 bicat10
## Education
## Married Graduate Not Graduate
## No 168 45
## Yes 312 89
<- loan.train %>%dplyr::select(Married, Self_Employed) %>%table()
bicat11 bicat11
## Self_Employed
## Married No Yes
## No 171 28
## Yes 329 54
<- loan.train %>%dplyr::select(Married, Loan_Amount_Term) %>%table()
bicat12 bicat12
## Loan_Amount_Term
## Married 12 36 60 84 120 180 240 300 360 480
## No 0 2 1 0 1 8 1 3 183 9
## Yes 1 0 1 4 2 36 3 10 329 6
<- loan.train %>%dplyr::select(Married, Credit_History) %>%table()
bicat13 bicat13
## Credit_History
## Married 0 1
## No 32 181
## Yes 57 344
<- loan.train %>%dplyr::select(Married, Property_Area) %>%table()
bicat14 bicat14
## Property_Area
## Married Rural Semiurban Urban
## No 63 80 70
## Yes 116 153 132
<- loan.train %>%dplyr::select(Married, Loan_Status) %>%table()
bicat15 bicat15
## Loan_Status
## Married N Y
## No 79 134
## Yes 113 288
<- loan.train %>%dplyr::select(Dependents, Education) %>%table()
bicat16 bicat16
## Education
## Dependents Graduate Not Graduate
## 0 286 74
## 1 81 21
## 2 77 24
## 3+ 36 15
<- loan.train %>%dplyr::select(Dependents, Self_Employed) %>%table()
bicat17 bicat17
## Self_Employed
## Dependents No Yes
## 0 302 39
## 1 76 20
## 2 80 16
## 3+ 42 7
<- loan.train %>%dplyr::select(Dependents, Loan_Amount_Term) %>%table()
bicat18 bicat18
## Loan_Amount_Term
## Dependents 12 36 60 84 120 180 240 300 360 480
## 0 1 1 1 0 2 19 1 6 306 11
## 1 0 1 0 2 0 11 2 2 82 1
## 2 0 0 0 2 1 6 1 3 86 2
## 3+ 0 0 1 0 0 8 0 2 38 1
<- loan.train %>%dplyr::select(Dependents, Credit_History) %>%table()
bicat19 bicat19
## Credit_History
## Dependents 0 1
## 0 50 310
## 1 14 88
## 2 14 87
## 3+ 11 40
<- loan.train %>%dplyr::select(Dependents, Property_Area) %>%table()
bicat20 bicat20
## Property_Area
## Dependents Rural Semiurban Urban
## 0 111 136 113
## 1 21 40 41
## 2 29 37 35
## 3+ 18 20 13
<- loan.train %>%dplyr::select(Dependents, Loan_Status) %>%table()
bicat21 bicat21
## Loan_Status
## Dependents N Y
## 0 113 247
## 1 36 66
## 2 25 76
## 3+ 18 33
<- loan.train %>%dplyr::select(Education, Self_Employed) %>%table()
bicat22 bicat22
## Self_Employed
## Education No Yes
## Graduate 389 65
## Not Graduate 111 17
<- loan.train %>%dplyr::select(Education, Loan_Amount_Term) %>%table()
bicat23 bicat23
## Loan_Amount_Term
## Education 12 36 60 84 120 180 240 300 360 480
## Graduate 1 1 1 4 2 28 3 10 411 11
## Not Graduate 0 1 1 0 1 16 1 3 101 4
<- loan.train %>%dplyr::select(Education, Credit_History) %>%table()
bicat25 bicat25
## Credit_History
## Education 0 1
## Graduate 63 417
## Not Graduate 26 108
<- loan.train %>%dplyr::select(Education, Property_Area) %>%table()
bicat26 bicat26
## Property_Area
## Education Rural Semiurban Urban
## Graduate 131 187 162
## Not Graduate 48 46 40
<- loan.train %>%dplyr::select(Education, Loan_Status) %>%table()
bicat27 bicat27
## Loan_Status
## Education N Y
## Graduate 140 340
## Not Graduate 52 82
<- loan.train %>%dplyr::select(Self_Employed, Loan_Amount_Term) %>%table()
bicat28 bicat28
## Loan_Amount_Term
## Self_Employed 12 36 60 84 120 180 240 300 360 480
## No 1 2 1 3 2 35 3 10 418 14
## Yes 0 0 1 1 1 5 1 3 67 1
<- loan.train %>%dplyr::select(Self_Employed, Credit_History) %>%table()
bicat29 bicat29
## Credit_History
## Self_Employed 0 1
## No 76 424
## Yes 12 70
<- loan.train %>%dplyr::select(Self_Employed, Property_Area) %>%table()
bicat30 bicat30
## Property_Area
## Self_Employed Rural Semiurban Urban
## No 143 191 166
## Yes 26 32 24
<- loan.train %>%dplyr::select(Self_Employed, Loan_Status) %>%table()
bicat31 bicat31
## Loan_Status
## Self_Employed N Y
## No 157 343
## Yes 26 56
<- loan.train %>%dplyr::select(Loan_Amount_Term, Credit_History) %>%table()
bicat32 bicat32
## Credit_History
## Loan_Amount_Term 0 1
## 12 0 1
## 36 0 2
## 60 0 2
## 84 0 4
## 120 0 3
## 180 10 34
## 240 0 4
## 300 3 10
## 360 66 446
## 480 4 11
<- loan.train %>%dplyr::select(Loan_Amount_Term, Property_Area) %>%table()
bicat33 bicat33
## Property_Area
## Loan_Amount_Term Rural Semiurban Urban
## 12 0 0 1
## 36 0 2 0
## 60 0 0 2
## 84 2 1 1
## 120 0 2 1
## 180 11 10 23
## 240 0 2 2
## 300 4 6 3
## 360 156 200 156
## 480 2 7 6
<- loan.train %>%dplyr::select(Loan_Amount_Term, Loan_Status) %>%table()
bicat34 bicat34
## Loan_Status
## Loan_Amount_Term N Y
## 12 0 1
## 36 2 0
## 60 0 2
## 84 1 3
## 120 0 3
## 180 15 29
## 240 1 3
## 300 5 8
## 360 153 359
## 480 9 6
<- loan.train %>%dplyr::select(Credit_History, Property_Area) %>%table()
bicat35 bicat35
## Property_Area
## Credit_History Rural Semiurban Urban
## 0 28 30 31
## 1 151 203 171
<- loan.train %>%dplyr::select(Credit_History, Loan_Status) %>%table()
bicat36 bicat36
## Loan_Status
## Credit_History N Y
## 0 82 7
## 1 110 415
<- loan.train %>%dplyr::select(Property_Area, Loan_Status) %>%table()
bicat37 bicat37
## Loan_Status
## Property_Area N Y
## Rural 69 110
## Semiurban 54 179
## Urban 69 133
Kategori Multivariat
<- loan.train %>%dplyr::select(Gender, Married, Dependents) %>% ftable()
mulcat1 mulcat1
## Dependents 0 1 2 3+
## Gender Married
## Female No 62 13 2 3
## Yes 21 6 5 0
## Male No 113 10 6 4
## Yes 164 73 88 44
<- loan.train %>%dplyr::select(Gender, Married, Education) %>% ftable()
mulcat2 mulcat2
## Education Graduate Not Graduate
## Gender Married
## Female No 66 14
## Yes 26 6
## Male No 102 31
## Yes 286 83
<- loan.train %>%dplyr::select(Gender, Married, Self_Employed) %>% ftable()
mulcat3 mulcat3
## Self_Employed No Yes
## Gender Married
## Female No 63 11
## Yes 26 4
## Male No 108 17
## Yes 303 50
<- loan.train %>%dplyr::select(Gender, Married, Loan_Amount_Term) %>% ftable()
mulcat4 mulcat4
## Loan_Amount_Term 12 36 60 84 120 180 240 300 360 480
## Gender Married
## Female No 0 1 0 0 0 2 0 1 70 3
## Yes 0 0 0 1 0 1 1 0 28 1
## Male No 0 1 1 0 1 6 1 2 113 6
## Yes 1 0 1 3 2 35 2 10 301 5
<- loan.train %>%dplyr::select(Gender, Married, Credit_History) %>% ftable()
mulcat5 mulcat5
## Credit_History 0 1
## Gender Married
## Female No 13 67
## Yes 4 28
## Male No 19 114
## Yes 53 316
<- loan.train %>%dplyr::select(Gender, Married, Property_Area) %>% ftable()
mulcat6 mulcat6
## Property_Area Rural Semiurban Urban
## Gender Married
## Female No 19 34 27
## Yes 5 21 6
## Male No 44 46 43
## Yes 111 132 126
<- loan.train %>%dplyr::select(Gender, Married, Loan_Status) %>% ftable()
mulcat7 mulcat7
## Loan_Status N Y
## Gender Married
## Female No 29 51
## Yes 8 24
## Male No 50 83
## Yes 105 264
<- loan.train %>%dplyr::select(Married, Dependents, Education) %>% ftable()
mulcat8 mulcat8
## Education Graduate Not Graduate
## Married Dependents
## No 0 139 36
## 1 16 7
## 2 8 0
## 3+ 5 2
## Yes 0 147 38
## 1 65 14
## 2 69 24
## 3+ 31 13
<- loan.train %>%dplyr::select(Married, Dependents, Self_Employed) %>% ftable()
mulcat9 mulcat9
## Self_Employed No Yes
## Married Dependents
## No 0 145 21
## 1 14 6
## 2 7 0
## 3+ 5 1
## Yes 0 157 18
## 1 62 14
## 2 73 16
## 3+ 37 6
<- loan.train %>%dplyr::select(Married, Dependents, Loan_Amount_Term) %>% ftable()
mulcat11 mulcat11
## Loan_Amount_Term 12 36 60 84 120 180 240 300 360 480
## Married Dependents
## No 0 0 1 1 0 1 5 0 3 150 9
## 1 0 1 0 0 0 2 1 0 19 0
## 2 0 0 0 0 0 0 0 0 8 0
## 3+ 0 0 0 0 0 1 0 0 6 0
## Yes 0 1 0 0 0 1 14 1 3 156 2
## 1 0 0 0 2 0 9 1 2 63 1
## 2 0 0 0 2 1 6 1 3 78 2
## 3+ 0 0 1 0 0 7 0 2 32 1
<- loan.train %>%dplyr::select(Married, Dependents, Credit_History) %>% ftable()
mulcat12 mulcat12
## Credit_History 0 1
## Married Dependents
## No 0 26 149
## 1 2 21
## 2 3 5
## 3+ 1 6
## Yes 0 24 161
## 1 12 67
## 2 11 82
## 3+ 10 34
<- loan.train %>%dplyr::select(Married, Dependents, Property_Area) %>% ftable()
mulcat13 mulcat13
## Property_Area Rural Semiurban Urban
## Married Dependents
## No 0 53 63 59
## 1 3 12 8
## 2 4 3 1
## 3+ 3 2 2
## Yes 0 58 73 54
## 1 18 28 33
## 2 25 34 34
## 3+ 15 18 11
<- loan.train %>%dplyr::select(Married, Dependents, Loan_Status) %>% ftable()
mulcat14 mulcat14
## Loan_Status N Y
## Married Dependents
## No 0 63 112
## 1 10 13
## 2 3 5
## 3+ 3 4
## Yes 0 50 135
## 1 26 53
## 2 22 71
## 3+ 15 29
<- loan.train %>%dplyr::select( Dependents, Self_Employed, Loan_Amount_Term) %>% ftable()
mulcat15 mulcat15
## Loan_Amount_Term 12 36 60 84 120 180 240 300 360 480
## Dependents Self_Employed
## 0 No 1 1 1 0 1 14 1 5 259 10
## Yes 0 0 0 0 1 3 0 1 31 1
## 1 No 0 1 0 2 0 8 2 1 60 1
## Yes 0 0 0 0 0 2 0 1 17 0
## 2 No 0 0 0 1 1 6 0 2 68 2
## Yes 0 0 0 1 0 0 1 1 13 0
## 3+ No 0 0 0 0 0 7 0 2 31 1
## Yes 0 0 1 0 0 0 0 0 6 0
<- loan.train %>%dplyr::select(Dependents, Self_Employed, Credit_History) %>% ftable()
mulcat16 mulcat16
## Credit_History 0 1
## Dependents Self_Employed
## 0 No 45 257
## Yes 5 34
## 1 No 9 67
## Yes 5 15
## 2 No 11 69
## Yes 2 14
## 3+ No 11 31
## Yes 0 7
<- loan.train %>%dplyr::select(Dependents, Self_Employed, Property_Area) %>% ftable()
mulcat17 mulcat17
## Property_Area Rural Semiurban Urban
## Dependents Self_Employed
## 0 No 93 117 92
## Yes 10 15 14
## 1 No 15 28 33
## Yes 5 9 6
## 2 No 19 30 31
## Yes 9 4 3
## 3+ No 16 16 10
## Yes 2 4 1
<- loan.train %>%dplyr::select(Dependents, Self_Employed, Loan_Status) %>% ftable()
mulcat18 mulcat18
## Loan_Status N Y
## Dependents Self_Employed
## 0 No 97 205
## Yes 11 28
## 1 No 24 52
## Yes 10 10
## 2 No 19 61
## Yes 5 11
## 3+ No 17 25
## Yes 0 7
<- loan.train %>%dplyr::select(Dependents, Self_Employed, Married) %>% ftable()
mulcat25 mulcat25
## Married No Yes
## Dependents Self_Employed
## 0 No 145 157
## Yes 21 18
## 1 No 14 62
## Yes 6 14
## 2 No 7 73
## Yes 0 16
## 3+ No 5 37
## Yes 1 6
<- loan.train %>%dplyr::select(Dependents, Self_Employed, Gender) %>% ftable()
mulcat26 mulcat26
## Gender Female Male
## Dependents Self_Employed
## 0 No 70 232
## Yes 10 29
## 1 No 12 64
## Yes 5 15
## 2 No 5 75
## Yes 0 16
## 3+ No 2 40
## Yes 0 7
<- loan.train %>%dplyr::select(Dependents, Self_Employed, Education) %>% ftable()
mulcat27 mulcat27
## Education Graduate Not Graduate
## Dependents Self_Employed
## 0 No 241 61
## Yes 30 9
## 1 No 59 17
## Yes 17 3
## 2 No 59 21
## Yes 14 2
## 3+ No 30 12
## Yes 4 3
<- loan.train %>%dplyr::select(Self_Employed, Credit_History, Education) %>% ftable()
mulcat19 mulcat19
## Education Graduate Not Graduate
## Self_Employed Credit_History
## No 0 52 24
## 1 337 87
## Yes 0 10 2
## 1 55 15
<- loan.train %>%dplyr::select(Self_Employed, Credit_History, Property_Area) %>% ftable()
mulcat20 mulcat20
## Property_Area Rural Semiurban Urban
## Self_Employed Credit_History
## No 0 23 27 26
## 1 120 164 140
## Yes 0 5 2 5
## 1 21 30 19
<- loan.train %>%dplyr::select(Self_Employed, Credit_History, Loan_Status) %>% ftable()
mulcat21 mulcat21
## Loan_Status N Y
## Self_Employed Credit_History
## No 0 69 7
## 1 88 336
## Yes 0 12 0
## 1 14 56
<- loan.train %>%dplyr::select(Self_Employed, Credit_History, Married) %>% ftable()
mulcat22 mulcat22
## Married No Yes
## Self_Employed Credit_History
## No 0 28 48
## 1 143 281
## Yes 0 4 8
## 1 24 46
<- loan.train %>%dplyr::select(Self_Employed, Credit_History, Gender) %>% ftable()
mulcat23 mulcat23
## Gender Female Male
## Self_Employed Credit_History
## No 0 14 62
## 1 75 349
## Yes 0 3 9
## 1 12 58
<- loan.train %>%dplyr::select(Self_Employed, Credit_History, Loan_Amount_Term) %>% ftable()
mulcat24 mulcat24
## Loan_Amount_Term 12 36 60 84 120 180 240 300 360 480
## Self_Employed Credit_History
## No 0 0 0 0 0 0 8 0 2 56 4
## 1 1 2 1 3 2 27 3 8 362 10
## Yes 0 0 0 0 0 0 2 0 1 9 0
## 1 0 0 1 1 1 3 1 2 58 1
<- loan.train %>%dplyr::select(Credit_History, Property_Area, Education) %>% ftable()
mulcat28 mulcat28
## Education Graduate Not Graduate
## Credit_History Property_Area
## 0 Rural 20 8
## Semiurban 24 6
## Urban 19 12
## 1 Rural 111 40
## Semiurban 163 40
## Urban 143 28
<- loan.train %>%dplyr::select(Credit_History, Property_Area, Dependents) %>% ftable()
mulcat29 mulcat29
## Dependents 0 1 2 3+
## Credit_History Property_Area
## 0 Rural 15 5 6 2
## Semiurban 16 3 5 6
## Urban 19 6 3 3
## 1 Rural 96 16 23 16
## Semiurban 120 37 32 14
## Urban 94 35 32 10
<- loan.train %>%dplyr::select(Credit_History, Property_Area, Loan_Status) %>% ftable()
mulcat30 mulcat30
## Loan_Status N Y
## Credit_History Property_Area
## 0 Rural 26 2
## Semiurban 26 4
## Urban 30 1
## 1 Rural 43 108
## Semiurban 28 175
## Urban 39 132
<- loan.train %>%dplyr::select(Credit_History, Property_Area, Married) %>% ftable()
mulcat31 mulcat31
## Married No Yes
## Credit_History Property_Area
## 0 Rural 8 20
## Semiurban 12 18
## Urban 12 19
## 1 Rural 55 96
## Semiurban 68 135
## Urban 58 113
<- loan.train %>%dplyr::select(Credit_History, Property_Area, Gender) %>% ftable()
mulcat32 mulcat32
## Gender Female Male
## Credit_History Property_Area
## 0 Rural 1 27
## Semiurban 8 22
## Urban 8 23
## 1 Rural 23 128
## Semiurban 47 156
## Urban 25 146
<- loan.train %>%dplyr::select(Credit_History, Property_Area, Loan_Amount_Term) %>% ftable()
mulcat33 mulcat3
## Self_Employed No Yes
## Gender Married
## Female No 63 11
## Yes 26 4
## Male No 108 17
## Yes 303 50
Memahami distribusi data terhadap status pinjaman (yaitu variabel target) memvisualisasikan variabel kategori dan status pinjaman
<- loan.train %>%
train.new filter(Loan_Status!="test") %>%
mutate(Loan_Status=ifelse(Loan_Status=="Y",1,0))
%>%
train.new group_by(Loan_Status) %>%
summarise(n.count=n()) %>%
mutate(percent=round(n.count/nrow(train.new)*100,1),
Loan_Status=as.factor(Loan_Status)) %>%
ungroup() %>%
ggplot(aes(x=Loan_Status, y=percent, fill=Loan_Status)) +
geom_bar(stat="identity")+
theme_economist_white()
# buat fungsi untuk memplot banyak variabel kategori dan bagaimana mereka berinteraksi dengan variabel target
<- function(dataframe,x,y){
PlotSimple <- enquo(x)
aaa <- enquo(y)
bbb %>%
dataframe filter(!is.na(!! aaa), !is.na(!! bbb)) %>%
group_by(!! aaa,!! bbb) %>%
summarise(n=n())%>%
mutate(percent=n/nrow(dataframe)) %>%
ggplot(aes_(fill=aaa, y=~percent, x=bbb)) +
geom_bar(position="dodge", stat="identity") +
theme_economist_white()
}
<- list(as.name("Married"),
xvars as.name("Credit_History"),
as.name("Gender"),
as.name("Education"),
as.name("Self_Employed"),
as.name("Property_Area"))
<- loan.train%>% dplyr::select_if(is.factor)
cat.data
<-lapply (xvars, PlotSimple, dataframe=cat.data, y =Loan_Status)
all_plots::plot_grid(plotlist = all_plots) cowplot
60% dari klien memiliki pinjaman mereka disetujui. Demikian pula, 60% klien yang memiliki riwayat kredit kemungkinan besar akan menyetujui pinjaman mereka. Ini merupakan indikasi sejarah kredit dan persetujuan pinjaman memiliki beberapa korelasi. 29,2% Pemohon yang tinggal di area properti semi-perkotaan cenderung menyetujui pinjaman mereka. Menikah cenderung memiliki status pinjaman disetujui
Kuantitatif
Univariat numerik
Mean
mean(loan.train$ApplicantIncome)
## [1] 5403.459
mean(loan.train$CoapplicantIncome)
## [1] 1621.246
mean(loan.train$LoanAmount)
## [1] 146.4122
Quantile
quantile(loan.train$ApplicantIncome)
## 0% 25% 50% 75% 100%
## 150.0 2877.5 3812.5 5795.0 81000.0
quantile(loan.train$CoapplicantIncome)
## 0% 25% 50% 75% 100%
## 0.00 0.00 1188.50 2297.25 41667.00
quantile(loan.train$LoanAmount)
## 0% 25% 50% 75% 100%
## 9.00 100.25 129.00 164.75 700.00
Median
median(loan.train$ApplicantIncome)
## [1] 3812.5
median(loan.train$CoapplicantIncome)
## [1] 1188.5
median(loan.train$LoanAmount)
## [1] 129
Mode
mode(loan.train$ApplicantIncome)
## [1] "numeric"
mode(loan.train$CoapplicantIncome)
## [1] "numeric"
mode(loan.train$LoanAmount)
## [1] "numeric"
<- loan.train%>% dplyr::select_if(is.numeric)
loantrain summary(loantrain)
## ApplicantIncome CoapplicantIncome LoanAmount
## Min. : 150 Min. : 0 Min. : 9.0
## 1st Qu.: 2878 1st Qu.: 0 1st Qu.:100.2
## Median : 3812 Median : 1188 Median :129.0
## Mean : 5403 Mean : 1621 Mean :146.4
## 3rd Qu.: 5795 3rd Qu.: 2297 3rd Qu.:164.8
## Max. :81000 Max. :41667 Max. :700.0
Var
var(loan.train$ApplicantIncome)
## [1] 37320390
var(loan.train$CoapplicantIncome)
## [1] 8562930
var(loan.train$LoanAmount)
## [1] 7062.296
standar deviation
sd(loan.train$ApplicantIncome)
## [1] 6109.042
sd(loan.train$CoapplicantIncome)
## [1] 2926.248
sd(loan.train$LoanAmount)
## [1] 84.03747
Media Absolute Deviation
mad(loan.train$ApplicantIncome)
## [1] 1822.857
mad(loan.train$CoapplicantIncome)
## [1] 1762.07
mad(loan.train$LoanAmount)
## [1] 45.2193
IQR
IQR(loan.train$ApplicantIncome)
## [1] 2917.5
IQR(loan.train$CoapplicantIncome)
## [1] 2297.25
IQR(loan.train$LoanAmount)
## [1] 64.5
Skewness
library(e1071)
skewness(loan.train$ApplicantIncome)
## [1] 6.507596
skewness(loan.train$CoapplicantIncome)
## [1] 7.454967
skewness(loan.train$LoanAmount)
## [1] 2.713293
Kurtosis
kurtosis(loan.train$ApplicantIncome)
## [1] 59.83387
kurtosis(loan.train$CoapplicantIncome)
## [1] 83.97239
kurtosis(loan.train$LoanAmount)
## [1] 10.75326
Bivariat numerik
Z-score
cov(loan.train$ApplicantIncome,loan.train$CoapplicantIncome)
## [1] -2084490
cov(loan.train$ApplicantIncome,loan.train$LoanAmount)
## [1] 290383
cov(loan.train$CoapplicantIncome,loan.train$LoanAmount)
## [1] 46189.73
cor(loan.train$ApplicantIncome,loan.train$CoapplicantIncome)
## [1] -0.1166046
cor(loan.train$ApplicantIncome,loan.train$LoanAmount)
## [1] 0.5656205
cor(loan.train$CoapplicantIncome,loan.train$LoanAmount)
## [1] 0.1878284
=(loan.train$ApplicantIncome-mean(loan.train$ApplicantIncome))/sd(loan.train$ApplicantIncome)
zscore_applicantincome=(loan.train$CoapplicantIncome-mean(loan.train$CoapplicantIncome))/sd(loan.train$CoapplicantIncome)
zscore_coapplicantincome=(loan.train$LoanAmount-mean(loan.train$LoanAmount))/sd(loan.train$LoanAmount) zscore_LoanAmount
Multivariat numerik
cov(loantrain)
## ApplicantIncome CoapplicantIncome LoanAmount
## ApplicantIncome 37320390 -2084490.34 290382.977
## CoapplicantIncome -2084490 8562929.52 46189.726
## LoanAmount 290383 46189.73 7062.296
cor(loantrain)
## ApplicantIncome CoapplicantIncome LoanAmount
## ApplicantIncome 1.0000000 -0.1166046 0.5656205
## CoapplicantIncome -0.1166046 1.0000000 0.1878284
## LoanAmount 0.5656205 0.1878284 1.0000000
<- train.new %>%
train.new mutate(Loan_Status=as.factor(Loan_Status))
<- c("ApplicantIncome", "CoapplicantIncome", "LoanAmount")
varlist
<- function(varName) {
PlotFast
%>%
train.new group_by_("Loan_Status") %>%
::select_("Loan_Status",varName) %>%
dplyrggplot(aes_string("Loan_Status",varName,fill="Loan_Status")) +
geom_boxplot() +
theme_economist_white()
}
<-lapply(varlist,PlotFast)
all_plot_cont::plot_grid(plotlist = all_plot_cont, ncol=3) cowplot
rm(train.new)
Sulit untuk melihat pola khusus di antara variabel kontinu saat ini. Ini mungkin berarti bahwa kasus yang disetujui dan tidak disetujui memiliki jumlah pinjaman yang sama, pendapatan pemohon/pemohon.
EDA dengan cara Malas
library(funModeling)
library(tidyverse)
library(Hmisc)
library(skimr)
<- function(loan.train)
basic_eda
{glimpse(loan.train)
skim(loan.train)
df_status(loan.train)
freq(loan.train)
profiling_num(loan.train)
plot_num(loan.train)
describe(loan.train)
}basic_eda(loan.train)
## Rows: 614
## Columns: 12
## $ Gender <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male~
## $ Married <fct> No, Yes, Yes, Yes, No, Yes, Yes, Yes, Yes, Yes, Yes,~
## $ Dependents <fct> 0, 1, 0, 0, 0, 2, 0, 3+, 2, 1, 2, 2, 2, 0, 2, 0, 1, ~
## $ Education <fct> Graduate, Graduate, Graduate, Not Graduate, Graduate~
## $ Self_Employed <fct> No, No, Yes, No, No, Yes, No, No, No, No, No, NA, No~
## $ ApplicantIncome <int> 5849, 4583, 3000, 2583, 6000, 5417, 2333, 3036, 4006~
## $ CoapplicantIncome <dbl> 0, 1508, 0, 2358, 0, 4196, 1516, 2504, 1526, 10968, ~
## $ LoanAmount <dbl> 146.4122, 128.0000, 66.0000, 120.0000, 141.0000, 267~
## $ Loan_Amount_Term <fct> 360, 360, 360, 360, 360, 360, 360, 360, 360, 360, 36~
## $ Credit_History <fct> 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0~
## $ Property_Area <fct> Urban, Rural, Urban, Urban, Urban, Urban, Urban, Sem~
## $ Loan_Status <fct> Y, N, Y, Y, Y, Y, Y, N, Y, N, Y, Y, Y, N, Y, Y, Y, N~
## variable q_zeros p_zeros q_na p_na q_inf p_inf type unique
## 1 Gender 0 0.00 0 0.00 0 0 factor 2
## 2 Married 0 0.00 0 0.00 0 0 factor 2
## 3 Dependents 360 58.63 0 0.00 0 0 factor 4
## 4 Education 0 0.00 0 0.00 0 0 factor 2
## 5 Self_Employed 0 0.00 32 5.21 0 0 factor 2
## 6 ApplicantIncome 0 0.00 0 0.00 0 0 integer 505
## 7 CoapplicantIncome 273 44.46 0 0.00 0 0 numeric 287
## 8 LoanAmount 0 0.00 0 0.00 0 0 numeric 204
## 9 Loan_Amount_Term 0 0.00 14 2.28 0 0 factor 10
## 10 Credit_History 89 14.50 0 0.00 0 0 factor 2
## 11 Property_Area 0 0.00 0 0.00 0 0 factor 3
## 12 Loan_Status 0 0.00 0 0.00 0 0 factor 2
## Gender frequency percentage cumulative_perc
## 1 Male 502 81.76 81.76
## 2 Female 112 18.24 100.00
## Married frequency percentage cumulative_perc
## 1 Yes 401 65.31 65.31
## 2 No 213 34.69 100.00
## Dependents frequency percentage cumulative_perc
## 1 0 360 58.63 58.63
## 2 1 102 16.61 75.24
## 3 2 101 16.45 91.69
## 4 3+ 51 8.31 100.00
## Education frequency percentage cumulative_perc
## 1 Graduate 480 78.18 78.18
## 2 Not Graduate 134 21.82 100.00
## Self_Employed frequency percentage cumulative_perc
## 1 No 500 81.43 81.43
## 2 Yes 82 13.36 94.79
## 3 <NA> 32 5.21 100.00
## Loan_Amount_Term frequency percentage cumulative_perc
## 1 360 512 83.39 83.39
## 2 180 44 7.17 90.56
## 3 480 15 2.44 93.00
## 4 <NA> 14 2.28 95.28
## 5 300 13 2.12 97.40
## 6 84 4 0.65 98.05
## 7 240 4 0.65 98.70
## 8 120 3 0.49 99.19
## 9 36 2 0.33 99.52
## 10 60 2 0.33 99.85
## 11 12 1 0.16 100.00
## Credit_History frequency percentage cumulative_perc
## 1 1 525 85.5 85.5
## 2 0 89 14.5 100.0
## Property_Area frequency percentage cumulative_perc
## 1 Semiurban 233 37.95 37.95
## 2 Urban 202 32.90 70.85
## 3 Rural 179 29.15 100.00
## Loan_Status frequency percentage cumulative_perc
## 1 Y 422 68.73 68.73
## 2 N 192 31.27 100.00
## loan.train
##
## 12 Variables 614 Observations
## --------------------------------------------------------------------------------
## Gender
## n missing distinct
## 614 0 2
##
## Value Female Male
## Frequency 112 502
## Proportion 0.182 0.818
## --------------------------------------------------------------------------------
## Married
## n missing distinct
## 614 0 2
##
## Value No Yes
## Frequency 213 401
## Proportion 0.347 0.653
## --------------------------------------------------------------------------------
## Dependents
## n missing distinct
## 614 0 4
##
## Value 0 1 2 3+
## Frequency 360 102 101 51
## Proportion 0.586 0.166 0.164 0.083
## --------------------------------------------------------------------------------
## Education
## n missing distinct
## 614 0 2
##
## Value Graduate Not Graduate
## Frequency 480 134
## Proportion 0.782 0.218
## --------------------------------------------------------------------------------
## Self_Employed
## n missing distinct
## 582 32 2
##
## Value No Yes
## Frequency 500 82
## Proportion 0.859 0.141
## --------------------------------------------------------------------------------
## ApplicantIncome
## n missing distinct Info Mean Gmd .05 .10
## 614 0 505 1 5403 4183 1898 2216
## .25 .50 .75 .90 .95
## 2878 3812 5795 9460 14583
##
## lowest : 150 210 416 645 674, highest: 39147 39999 51763 63337 81000
## --------------------------------------------------------------------------------
## CoapplicantIncome
## n missing distinct Info Mean Gmd .05 .10
## 614 0 287 0.912 1621 2118 0 0
## .25 .50 .75 .90 .95
## 0 1188 2297 3782 4997
##
## lowest : 0.00 16.12 189.00 240.00 242.00
## highest: 10968.00 11300.00 20000.00 33837.00 41667.00
## --------------------------------------------------------------------------------
## LoanAmount
## n missing distinct Info Mean Gmd .05 .10
## 614 0 204 1 146.4 77.79 57.3 72.3
## .25 .50 .75 .90 .95
## 100.2 129.0 164.8 229.4 293.4
##
## lowest : 9 17 25 26 30, highest: 500 570 600 650 700
## --------------------------------------------------------------------------------
## Loan_Amount_Term
## n missing distinct
## 600 14 10
##
## lowest : 12 36 60 84 120, highest: 180 240 300 360 480
##
## Value 12 36 60 84 120 180 240 300 360 480
## Frequency 1 2 2 4 3 44 4 13 512 15
## Proportion 0.002 0.003 0.003 0.007 0.005 0.073 0.007 0.022 0.853 0.025
## --------------------------------------------------------------------------------
## Credit_History
## n missing distinct
## 614 0 2
##
## Value 0 1
## Frequency 89 525
## Proportion 0.145 0.855
## --------------------------------------------------------------------------------
## Property_Area
## n missing distinct
## 614 0 3
##
## Value Rural Semiurban Urban
## Frequency 179 233 202
## Proportion 0.292 0.379 0.329
## --------------------------------------------------------------------------------
## Loan_Status
## n missing distinct
## 614 0 2
##
## Value N Y
## Frequency 192 422
## Proportion 0.313 0.687
## --------------------------------------------------------------------------------
Tugas 4
Lakukan pemeriksaan distribusi densitas menggunakan R dan Python pada setiap variabel kuantitatif dengan beberapa bagian sebagai berikut:
Univariat numerik
Applicant Income
ggplot(loan.train, aes(x = ApplicantIncome)) +
geom_density()
Coapplicant Income
ggplot(loan.train, aes(x = CoapplicantIncome)) +
geom_density()
Loan Amount
ggplot(loan.train, aes(x = LoanAmount)) +
geom_density()
Bivariat numerik
Applicant Income vs Coapplicant Income
<- ggplot(loan.train, aes(x = ApplicantIncome, y = CoapplicantIncome)) +
p1 geom_point(alpha = .5) +
geom_density_2d()
ggplotly(p1)
Coapplicant Income vs LoanAmount
<-ggplot(loan.train, aes(x = CoapplicantIncome, y = LoanAmount)) +
p2 geom_point(alpha = .5) +
geom_density_2d()
ggplotly(p2)
ApplicantIncome vs LoanAmount
<- ggplot(loan.train, aes(x = ApplicantIncome, y = LoanAmount)) +
p3 geom_point(alpha = .5) +
geom_density_2d()
ggplotly(p3)
Multivariat numerik
library(GGally)
ggpairs(loantrain)
Tugas 5
Lakukan proses pengujian Hipotesis menggunakan R dan Python pada setiap variabel kuantitatif dengan beberapa bagian sebagai berikut:
Hitunglah margin of error dan estimasi interval untuk proporsi peminjam bejenis kelamin perempuan dalam pada tingkat kepercayaan 95%.
library(MASS)
= sum(loan.train$Gender == "Female")
k = length(loan.train$Gender)
n = k/n
pbar = sqrt(pbar*(1-pbar)/n); SE SE
## [1] 0.01558505
= qnorm(0.975)*SE; E E
## [1] 0.03054614
+ c(-E, E) pbar
## [1] 0.1518643 0.2129566
Pada tingkat kepercayaan 95%, antara 15,2% dan 21,3% peminjam bejenis kelamin perempuan, dan margin of error adalah 3,05%.
Jika anda berencana menggunakan perkiraan proporsi 50% data konsumen berjenis kelamin perempuan, temukan ukuran sampel yang diperlukan untuk mencapai margin kesalahan 5% untuk data obeservasi pada tingkat kepercayaan 95%.
= qnorm(.975)
zstar = 0.5
p = 0.05
E ^2*p*(1-p)/E^2 zstar
## [1] 384.1459
Lakukan pembuktian kebenaran assumsi dengan tingakat signifikansi 0.05, jika Bank mengklaim bahwa pinjaman rata-rata konsumen adalah:
Lebih besar $ 150.
= 150
mu0 = mean(loan.train$LoanAmount)
xbar = sd(loan.train$LoanAmount)
s = length(loan.train$LoanAmount)
n = (xbar-mu0)/(s/sqrt(n)) ; t t
## [1] -1.057899
= 0.05
alpha = qt(1-alpha, df=n-1)
t.alpha -t.alpha
## [1] -1.647343
Karena \(\mu_0 \ge \mu\), dalam hal ini kita harus fokus pada nilai kritis left tail. Di sini, ditemukan bahwa statistik uji -1.057899 lebih besar dari nilai kritis -1.644854. Akibatnya, pada tingkat signifikansi 0,05, kami menolak klaim bahwa rata-rata pinjaman konsumen lebih dari 150 dolar.
Lebih kecil $ 150
= 150
mu0 = mean(loan.train$LoanAmount)
xbar = sd(loan.train$LoanAmount)
s = length(loan.train$LoanAmount)
n = (xbar-mu0)/(s/sqrt(n)) ; t t
## [1] -1.057899
= 0.05
alpha = qt(1-alpha, df=n-1)
t.alpha t.alpha
## [1] 1.647343
Nilai statistiknya -1.058 lebih kecil dari nilai kritis yaitu 1.645. maka pada tingkat signifikan 0.05, kita menerima bahwa rata-rata pinjaman konsumen kurang dari 150 dolar.
Sama dengan $ 150.
= 150
mu0 = mean(loan.train$LoanAmount)
xbar = sd(loan.train$LoanAmount)
s = length(loan.train$LoanAmount)
n = (xbar-mu0)/(s/sqrt(n)) ; t t
## [1] -1.057899
= .05
alpha = qt(1-alpha/2, df=n-1)
t.half.alpha c(-t.half.alpha, t.half.alpha)
## [1] -1.963841 1.963841
Statistik uji -1.057899 terletak di antara nilai kritis -1,96 dan 1,96. Oleh karena itu, pada tingkat signifikansi 0,05, kita tidak menolak hipotesis nol bahwa rata-rata penguin tidak jauh berbeda dari 150.
Lakukan pembuktian kebenaran assumsi dengan tingakat signifikansi 0.05, seperti diatas jika diketahui simpangan baku pinjaman adalah $ 85.
Lebih besar $ 150.
= 150
mu0 = mean(loan.train$LoanAmount)
xbar = 85
sigma = length(loan.train$LoanAmount)
n = (xbar-mu0)/(sigma/sqrt(n)) ; z z
## [1] -1.045919
= 0.05
alpha = qnorm(1-alpha)
z.alpha -z.alpha
## [1] -1.644854
Karena \(\mu_0 \ge \mu\), dalam hal ini kita harus fokus pada nilai kritis left tail. Di sini, ditemukan bahwa statistik uji -1.045 lebih besar dari nilai kritis -1.644854. Akibatnya, pada tingkat signifikansi 0,05, kami menolak klaim bahwa rata-rata pinjaman konsumen lebih dari 150 dolar.
Lebih kecil $ 150
= 150
mu0 = mean(loan.train$LoanAmount)
xbar = 85
sigma = length(loan.train$LoanAmount)
n = (xbar-mu0)/(sigma/sqrt(n)) ; z z
## [1] -1.045919
= 0.05
alpha = qnorm(1-alpha)
z.alpha z.alpha
## [1] 1.644854
Nilai statistiknya -1.046 lebih kecil dari nilai kritis yaitu 1.645. maka pada tingkat signifikan 0.05, kita menerima bahwa rata-rata pinjaman konsumen kurang dari 150 dolar.
Sama dengan $ 150.
= 150
mu0 = mean(loan.train$LoanAmount)
xbar = 85
sigma = length(loan.train$LoanAmount)
n = (xbar-mu0)/(sigma/sqrt(n)) ; z z
## [1] -1.045919
= .05
alpha = qnorm(1-alpha/2)
z.half.alpha c(-z.half.alpha, z.half.alpha)
## [1] -1.959964 1.959964
Statistik uji -1.046 terletak di antara nilai kritis -1,96 dan 1,96. Oleh karena itu, pada tingkat signifikansi 0,05, kita tidak menolak hipotesis nol bahwa rata-rata penguin tidak jauh berbeda dari 150.