KOMPUTASI STATISTIKA
~ Ujian Tengah Semester ~
| Kontak | : \(\downarrow\) |
| yosia.yosia@student.matanauniversity.ac.id | |
| yyosia | |
| RPubs | https://rpubs.com/yosia/ |
Data Set
Kumpulan data akan anda gunakan dalam ujian tengah semester ini adalah data konsumen yang melakukan pinjaman di suatu Bank. Dataset ini memiliki 613 observasi, 13 atribut sebagai berikut:
Tugas 1
Lakukan proses persiapan data dengan R dan Python, dengan beberapa langkah berikut:
Import Data
R
df <- read.csv("loan-train.csv", header=T, na.strings=c("","NA"))
dfPython
import pandas as pd
df = pd.read_csv("loan-train.csv")
df## Loan_ID Gender Married ... Credit_History Property_Area Loan_Status
## 0 LP001002 Male No ... 1.0 Urban Y
## 1 LP001003 Male Yes ... 1.0 Rural N
## 2 LP001005 Male Yes ... 1.0 Urban Y
## 3 LP001006 Male Yes ... 1.0 Urban Y
## 4 LP001008 Male No ... 1.0 Urban Y
## .. ... ... ... ... ... ... ...
## 609 LP002978 Female No ... 1.0 Rural Y
## 610 LP002979 Male Yes ... 1.0 Rural Y
## 611 LP002983 Male Yes ... 1.0 Urban Y
## 612 LP002984 Male Yes ... 1.0 Urban Y
## 613 LP002990 Female No ... 0.0 Semiurban N
##
## [614 rows x 13 columns]
Penanganan Data Hilang
R
Kita lihat type data dari setiap coloumn
str(df)## 'data.frame': 614 obs. of 13 variables:
## $ Loan_ID : chr "LP001002" "LP001003" "LP001005" "LP001006" ...
## $ Gender : chr "Male" "Male" "Male" "Male" ...
## $ Married : chr "No" "Yes" "Yes" "Yes" ...
## $ Dependents : chr "0" "1" "0" "0" ...
## $ Education : chr "Graduate" "Graduate" "Graduate" "Not Graduate" ...
## $ Self_Employed : chr "No" "No" "Yes" "No" ...
## $ ApplicantIncome : int 5849 4583 3000 2583 6000 5417 2333 3036 4006 12841 ...
## $ CoapplicantIncome: num 0 1508 0 2358 0 ...
## $ LoanAmount : int NA 128 66 120 141 267 95 158 168 349 ...
## $ Loan_Amount_Term : int 360 360 360 360 360 360 360 360 360 360 ...
## $ Credit_History : int 1 1 1 1 1 1 1 0 1 1 ...
## $ Property_Area : chr "Urban" "Rural" "Urban" "Urban" ...
## $ Loan_Status : chr "Y" "N" "Y" "Y" ...
colSums(is.na(df))## Loan_ID Gender Married Dependents
## 0 13 3 15
## Education Self_Employed ApplicantIncome CoapplicantIncome
## 0 32 0 0
## LoanAmount Loan_Amount_Term Credit_History Property_Area
## 22 14 50 0
## Loan_Status
## 0
Lalu setelah itu kita dapat mereplace data yang hilang tersebut dengan 0
df[is.na(df)] = 0
dfPython
df.isna().sum()## Loan_ID 0
## Gender 13
## Married 3
## Dependents 15
## Education 0
## Self_Employed 32
## ApplicantIncome 0
## CoapplicantIncome 0
## LoanAmount 22
## Loan_Amount_Term 14
## Credit_History 50
## Property_Area 0
## Loan_Status 0
## dtype: int64
df['Gender'] = df['Gender'].fillna(df['Gender'].value_counts().index[0])
df['Married'] = df['Married'].fillna(method='ffill')
df['Dependents'] = df['Dependents'].fillna((df['Dependents'].mode()).iloc[0])
df['Self_Employed'] = df['Self_Employed'].fillna(method='bfill')
df['LoanAmount'] = df['LoanAmount'].fillna((df['LoanAmount'].median()))
df['Loan_Amount_Term'] = df['Loan_Amount_Term'].fillna((df['Loan_Amount_Term'].mean()))
df['Credit_History'] = df['Credit_History'].interpolate(limit_direction="both")df.isna().sum()## Loan_ID 0
## Gender 0
## Married 0
## Dependents 0
## Education 0
## Self_Employed 0
## ApplicantIncome 0
## CoapplicantIncome 0
## LoanAmount 0
## Loan_Amount_Term 0
## Credit_History 0
## Property_Area 0
## Loan_Status 0
## dtype: int64
Periksa Data Duplikat
R
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
duplicated(df)## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [301] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [349] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [361] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [373] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [409] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [421] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [445] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [457] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [469] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [493] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [505] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [517] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [541] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [553] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [565] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [589] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [601] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [613] FALSE FALSE
Python
df[df.duplicated(keep=False)]## Empty DataFrame
## Columns: [Loan_ID, Gender, Married, Dependents, Education, Self_Employed, ApplicantIncome, CoapplicantIncome, LoanAmount, Loan_Amount_Term, Credit_History, Property_Area, Loan_Status]
## Index: []
Tidak ada data yang duplikat maka kita bisa melanjutkan analisis data kita
Pemisahan Data Kategori dan Numerik
R
df_kat = select_if(df, is.character)
df_katdf_num = select_if(df, is.numeric)
df_numPython
import numpy as np
df_num = df.select_dtypes(include=[np.number]) # Number
df_kat = df.select_dtypes(exclude=[np.number]) # CategoricalPenanganan Data Numerik
R
Standarisasi
standarisasi_datar = as.data.frame(lapply(df_num, scale))
standarisasi_datarNormalisasi
normalisasi = function(x){return((x-min(x))/(max(x)-min(x)))}
normalisasi_datar = as.data.frame(lapply(df_num,normalisasi))
normalisasi_datarPenskalaan Robust
robust = function(x){return((x-median(x))/(quantile(x,probs=.75)-quantile(x,probs=.25)))}
robust_r = as.data.frame(lapply(df_num, robust))
robust_rPython
Standarisasi
import seaborn as sns
import matplotlib.pyplot as plt
x = sns.pairplot(df)
plt.show(x)kita buat standarisasi karena data tidak normal.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler = scaler.fit(df_num)normalized = scaler.transform(df_num)
for i in range(5):
print(normalized[i])## [ 0.07299082 -0.55448733 -0.21124125 0.27985054 0.44783036]
## [-0.13441195 -0.03873155 -0.21124125 0.27985054 0.44783036]
## [-0.39374734 -0.55448733 -0.94899647 0.27985054 0.44783036]
## [-0.46206247 0.2519796 -0.30643547 0.27985054 0.44783036]
## [ 0.09772844 -0.55448733 -0.05655064 0.27985054 0.44783036]
inversed = scaler.inverse_transform(normalized)
for i in range(5):
print(inversed[i])## [5.849e+03 0.000e+00 1.280e+02 3.600e+02 1.000e+00]
## [4.583e+03 1.508e+03 1.280e+02 3.600e+02 1.000e+00]
## [3.0e+03 0.0e+00 6.6e+01 3.6e+02 1.0e+00]
## [2.583e+03 2.358e+03 1.200e+02 3.600e+02 1.000e+00]
## [6.00e+03 0.00e+00 1.41e+02 3.60e+02 1.00e+00]
Normalisasi
from sklearn.preprocessing import normalize
normal = normalize(df_num)
normal## array([[9.97873194e-01, 0.00000000e+00, 2.18375396e-02, 6.14180800e-02,
## 1.70605778e-04],
## [9.46934419e-01, 3.11581301e-01, 2.64472192e-02, 7.43828040e-02,
## 2.06618900e-04],
## [9.92640004e-01, 0.00000000e+00, 2.18380801e-02, 1.19116800e-01,
## 3.30880001e-04],
## ...,
## [9.98077833e-01, 2.96752577e-02, 3.12826675e-02, 4.45128865e-02,
## 1.23646907e-04],
## [9.98572068e-01, 0.00000000e+00, 2.46252112e-02, 4.74068237e-02,
## 1.31685622e-04],
## [9.96512102e-01, 0.00000000e+00, 2.89190726e-02, 7.82771889e-02,
## 0.00000000e+00]])
Penskalaan Robust
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
scaler.fit(df_num)RobustScaler()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RobustScaler()
scaler.transform(df_num)## array([[ 0.69802913, -0.51735771, 0. , 0. , 0. ],
## [ 0.26409597, 0.13907933, 0. , 0. , 0. ],
## [-0.27849186, -0.51735771, -0.96124031, 0. , 0. ],
## ...,
## [ 1.45998286, -0.41288497, 1.9379845 , 0. , 0. ],
## [ 1.29237361, -0.51735771, 0.91472868, 0. , 0. ],
## [ 0.26409597, -0.51735771, 0.07751938, 0. , -1. ]])
Pengananan Data Pencilan
R
dpen <- df
outliers<-function(x){
Q1 <- quantile (x, probs=.25)
Q3 <- quantile(x, probs=.75)
iqr = Q3-Q1
upper_limit =Q3 + (iqr*1.5)
lower_limit =Q1 -(iqr*1.5)
x > upper_limit | x < lower_limit}
outlier1 <- subset(dpen, outliers (dpen$LoanAmount))
outlier2 <- subset(dpen, outliers (dpen$ApplicantIncome))
outlier3 <- subset(dpen, outliers (dpen$CoapplicantIncome))
dpenlier <- rbind(outlier1, outlier2, outlier3) %>% distinct()
dpenlierdata_aa = dpen %>% anti_join (dpenlier)## Joining, by = c("Loan_ID", "Gender", "Married", "Dependents", "Education",
## "Self_Employed", "ApplicantIncome", "CoapplicantIncome", "LoanAmount",
## "Loan_Amount_Term", "Credit_History", "Property_Area", "Loan_Status")
data_aaPython
import numpy as np
q25, q75 = np.percentile(df['ApplicantIncome'], 25), np.percentile(df['ApplicantIncome'], 75)
iqr = q75-q25
print(f'Percentile ApplicantIncome: 25th={np.round(q25)}, 75th={np.round(q75)}, IQR={np.round(iqr)}')## Percentile ApplicantIncome: 25th=2878.0, 75th=5795.0, IQR=2918.0
cut_off = iqr*1.5
lower, upper = q25 - cut_off, q75 + cut_off
outliers = [x for x in df['ApplicantIncome'] if x < lower or x > upper]
print('Data Pencilan ApplicantIncome: %d' % len(outliers))## Data Pencilan ApplicantIncome: 50
df.drop(df[ (df['ApplicantIncome'] > upper) | (df['ApplicantIncome'] < lower) ].index , inplace=True)import numpy as np
q25, q75 = np.percentile(df['CoapplicantIncome'], 25), np.percentile(df['CoapplicantIncome'], 75)
iqr = q75-q25
print(f'Percentile CoapplicantIncome: 25th={np.round(q25)}, 75th={np.round(q75)}, IQR={np.round(iqr)}')## Percentile CoapplicantIncome: 25th=0.0, 75th=2337.0, IQR=2337.0
cut_off = iqr*1.5
lower, upper = q25 - cut_off, q75 + cut_off
outliers = [x for x in df['CoapplicantIncome'] if x < lower or x > upper]
print('Data Pencilan CoapplicantIncome: %d' % len(outliers))## Data Pencilan CoapplicantIncome: 16
df.drop(df[ (df['CoapplicantIncome'] > upper) | (df['CoapplicantIncome'] < lower) ].index , inplace=True)import numpy as np
q25, q75 = np.percentile(df['LoanAmount'], 25), np.percentile(df['LoanAmount'], 75)
iqr = q75-q25
print(f'Percentile LoanAmount: 25th={np.round(q25)}, 75th={np.round(q75)}, IQR={np.round(iqr)}')## Percentile LoanAmount: 25th=100.0, 75th=155.0, IQR=55.0
cut_off = iqr*1.5
lower, upper = q25 - cut_off, q75 + cut_off
outliers = [x for x in df['LoanAmount'] if x < lower or x > upper]
print('Data Pencilan LoanAmount: %d' % len(outliers))## Data Pencilan LoanAmount: 28
df.drop(df[ (df['LoanAmount'] > upper) | (df['LoanAmount'] < lower) ].index , inplace=True)import numpy as np
q25, q75 = np.percentile(df['Loan_Amount_Term'], 25), np.percentile(df['Loan_Amount_Term'], 75)
iqr = q75-q25
print(f'Percentile Loan_Amount_Term: 25th={np.round(q25)}, 75th={np.round(q75)}, IQR={np.round(iqr)}')## Percentile Loan_Amount_Term: 25th=360.0, 75th=360.0, IQR=0.0
cut_off = iqr*1.5
lower, upper = q25 - cut_off, q75 + cut_off
outliers = [x for x in df['Loan_Amount_Term'] if x < lower or x > upper]
print('Data Pencilan Loan_Amount_Term: %d' % len(outliers))## Data Pencilan Loan_Amount_Term: 89
df.drop(df[ (df['Loan_Amount_Term'] > upper) | (df['Loan_Amount_Term'] < lower) ].index , inplace=True)import numpy as np
q25, q75 = np.percentile(df['Credit_History'], 25), np.percentile(df['Credit_History'], 75)
iqr = q75-q25
print(f'Percentile ApplicantIncome: 25th={np.round(q25)}, 75th={np.round(q75)}, IQR={np.round(iqr)}')## Percentile ApplicantIncome: 25th=1.0, 75th=1.0, IQR=0.0
cut_off = iqr*1.5
lower, upper = q25 - cut_off, q75 + cut_off
outliers = [x for x in df['Credit_History'] if x < lower or x > upper]
print('Data Pencilan Credit_History: %d' % len(outliers))## Data Pencilan Credit_History: 69
df.drop(df[ (df['Credit_History'] > upper) | (df['Credit_History'] < lower) ].index , inplace=True)Data yang sudah tidak ada pencilan
df.shape ## (362, 13)
Penanganan Data Kategorikal
R
df_kat %>% summarise_all(n_distinct)GenderLabel <-factor(df_kat$Gender, labels=c(0, 1, 2))
MarriedLabel <-factor(df_kat$Married, labels=c(0, 1, 2))
DependentsLabel <-factor (df_kat$Dependents, labels=c(0, 1, 2, 3))
EducationLabel <-factor(df_kat$Education, labels=c(0, 1))
Self_EmployedLabel <-factor(df_kat$Self_Employed, labels=c(0, 1, 2))
Property_AreaLabel <- factor (df_kat$Property_Area, labels=c(0, 1, 2))
Loan_StatusLabel <-factor (df_kat$Loan_Status, labels=c(0, 1))
df_kat_label <- data.frame("ID" = df_kat$Loan_ID, GenderLabel, MarriedLabel, DependentsLabel, EducationLabel, Self_EmployedLabel, Property_AreaLabel, Loan_StatusLabel)
df_kat_labelPython
df_kat.info()## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 614 entries, 0 to 613
## Data columns (total 8 columns):
## # Column Non-Null Count Dtype
## --- ------ -------------- -----
## 0 Loan_ID 614 non-null object
## 1 Gender 614 non-null object
## 2 Married 614 non-null object
## 3 Dependents 614 non-null object
## 4 Education 614 non-null object
## 5 Self_Employed 614 non-null object
## 6 Property_Area 614 non-null object
## 7 Loan_Status 614 non-null object
## dtypes: object(8)
## memory usage: 38.5+ KB
from sklearn.preprocessing import LabelEncoder
label = df_kat.apply(LabelEncoder().fit_transform)
label.head()## Loan_ID Gender Married ... Self_Employed Property_Area Loan_Status
## 0 0 1 0 ... 0 2 1
## 1 1 1 1 ... 0 0 0
## 2 2 1 1 ... 1 2 1
## 3 3 1 1 ... 0 2 1
## 4 4 1 0 ... 0 2 1
##
## [5 rows x 8 columns]
df_kat1 = df_kat[['Gender', 'Married','Dependents','Education','Self_Employed','Property_Area','Loan_Status']]
df_kat1.head()## Gender Married Dependents ... Self_Employed Property_Area Loan_Status
## 0 Male No 0 ... No Urban Y
## 1 Male Yes 1 ... No Rural N
## 2 Male Yes 0 ... Yes Urban Y
## 3 Male Yes 0 ... No Urban Y
## 4 Male No 0 ... No Urban Y
##
## [5 rows x 7 columns]
import pandas as pd
cat_df = df_kat1.copy()
cat_df = pd.get_dummies(cat_df, columns=['Gender'], prefix = ['Gender'])
print(cat_df)## Married Dependents Education ... Loan_Status Gender_Female Gender_Male
## 0 No 0 Graduate ... Y 0 1
## 1 Yes 1 Graduate ... N 0 1
## 2 Yes 0 Graduate ... Y 0 1
## 3 Yes 0 Not Graduate ... Y 0 1
## 4 No 0 Graduate ... Y 0 1
## .. ... ... ... ... ... ... ...
## 609 No 0 Graduate ... Y 1 0
## 610 Yes 3+ Graduate ... Y 0 1
## 611 Yes 1 Graduate ... Y 0 1
## 612 Yes 2 Graduate ... Y 0 1
## 613 No 0 Graduate ... N 1 0
##
## [614 rows x 8 columns]
Tugas 2
Lakukan Proses Visualisasi Data dengan menggunakan R dan Python dengan beberapa langkah berikut:
Visualisasi Univariabel
R
Data Kategori
library(dplyr)
library(gridExtra)##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(ggplot2)
plot_gender <- ggplot(data_aa, aes(x = Gender))+geom_bar()
plot_married <- ggplot(data_aa, aes(x = Married))+geom_bar()
plot_dependents <- ggplot(data_aa, aes(x = Dependents))+geom_bar()
plot_education <- ggplot(data_aa, aes(x = Education))+geom_bar()
plot_selfemployed <- ggplot(data_aa, aes(x = Self_Employed))+geom_bar()
plot_loanamountterm <- ggplot(data_aa, aes(x = Loan_Amount_Term))+geom_bar() + theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=0.5))
plot_credithist<- ggplot(data_aa, aes(x = Credit_History))+geom_bar()
plot_proparea<- ggplot(data_aa, aes(x = Property_Area))+geom_bar()
plot_loanstat<- ggplot(data_aa, aes(x = Loan_Status))+geom_bar()
grid.arrange(plot_gender,plot_married,plot_dependents,plot_education,plot_selfemployed,plot_loanamountterm,plot_credithist,plot_proparea,plot_loanstat)
Data Numerik
data_aa$APPCO_TotalIncome <- data_aa$ApplicantIncome + data_aa$CoapplicantIncome
plot_ApIn <- ggplot(data_aa, aes(x = ApplicantIncome))+geom_histogram(bins = 12, colour = "white")
plot_COAp <- ggplot(data_aa, aes(x = APPCO_TotalIncome))+geom_histogram(bins = 12, colour = "white")
plot_LoAm <- ggplot(data_aa, aes(x = LoanAmount))+geom_histogram(bins = 12, colour = "white")
grid.arrange(plot_ApIn,plot_COAp, plot_LoAm)Python
Data Kategori
df_kat['Gender'].value_counts().plot(kind='bar', figsize=(12, 6), rot=0, color="red")df_kat['Education'].value_counts().plot(kind='bar', figsize=(7, 6), rot=0)df_kat['Dependents'].value_counts().plot(kind='bar', figsize=(7, 6), rot=0, color="gold")df_kat['Married'].value_counts().plot(kind='pie', figsize=(7, 6), rot=0)df_kat['Property_Area'].value_counts().plot(kind='pie', figsize=(7, 6), rot=0)Data Numerik
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(8,6))
sns.set_theme(style="darkgrid")
sns.histplot(data=df, x="LoanAmount", bins=20, kde=True)
plt.show()plt.figure(figsize=(8,6))
sns.set_theme(style="darkgrid")
sns.histplot(data=df, x="LoanAmount", bins=20, kde=True, color="tomato")
plt.show()plt.figure(figsize=(8,6))
sns.set_theme(style="darkgrid")
sns.histplot(data=df, x="CoapplicantIncome", bins=20, kde=True, color="crimson")
plt.show()Visualisasi Bivariabel
R
Category vs Category
plot_Gen_Mar <- ggplot(data_aa, aes(x = Gender, fill = Married)) +
theme_minimal() + # use a minimal theme
geom_bar(position = position_dodge(preserve = "single"))
plot_Gen_Edu <- ggplot(data_aa, aes(x = Gender, fill = Education)) +
theme_minimal() + # use a minimal theme
geom_bar(position = position_dodge(preserve = "single"))
plot_Mar_Edu <- ggplot(data_aa, aes(x = Married, fill = Education)) +
theme_minimal() + # use a minimal theme
geom_bar(position = position_dodge(preserve = "single"))
Plot_Edu_ProAre <- ggplot(data_aa, aes(x = Education, fill = Property_Area)) +
theme_minimal() + # use a minimal theme
geom_bar(position = position_dodge(preserve = "single"))
grid.arrange(plot_Gen_Mar,plot_Gen_Edu,plot_Mar_Edu,Plot_Edu_ProAre)
Numerik vs Numerik
plot_LoAm_ACT <- ggplot(data_aa, aes(x = LoanAmount, y = APPCO_TotalIncome )) +
theme_minimal() + # use a minimal theme
geom_line()
plot_ApIn_CoIn <- ggplot(data_aa, aes(x = ApplicantIncome, y = CoapplicantIncome )) +
theme_minimal() + # use a minimal theme
geom_line()
plot_ApIn_LoAm <- ggplot(data_aa, aes(x = ApplicantIncome, y = LoanAmount )) +
theme_minimal() + # use a minimal theme
geom_line()
grid.arrange(plot_LoAm_ACT,plot_ApIn_CoIn,plot_ApIn_LoAm)
Numerik vs Kategorikal
plot_LoAm_LoStat <- ggplot(data_aa,
aes(x = LoanAmount,
fill = Loan_Status)) +
geom_density(alpha = 0.3) +
theme_minimal() +
labs(title = "Loan Amount distribution by Loan Amount Term")
plot_CoIn_Mar <- ggplot(data_aa,
aes(x = CoapplicantIncome,
fill = Married)) +
geom_density(alpha = 0.3) +
theme_minimal() +
labs(title = "Coapplicant Income distribution by Married")
plot_ApIn_Edu <- ggplot(data_aa,
aes(x = ApplicantIncome,
fill = Education)) +
geom_density(alpha = 0.3) +
theme_minimal() +
labs(title = "Applicant Income distribution by Education")
plot_LoAm_ProAre <- ggplot(data_aa,
aes(x = ApplicantIncome,
fill = Property_Area)) +
geom_density(alpha = 0.3) +
theme_minimal() +
labs(title = "Applicant Income distribution by Property Area")
grid.arrange(plot_LoAm_LoStat, plot_CoIn_Mar, plot_ApIn_Edu, plot_LoAm_ProAre)Python
df_line = df[['ApplicantIncome','CoapplicantIncome']]
df_line.plot.line()Numerik vs Numerik
plt.figure(figsize=(8,6))
sns.set_theme(style="darkgrid")
sns.scatterplot(data=df, y="CoapplicantIncome",x="ApplicantIncome")
plt.show()Category vs Category
sns.set(rc={'figure.figsize':(11.7,8.27)})
plt.subplot(231)
sns.countplot(x="Gender", hue='Loan_Status', data=df)
plt.subplot(232)
sns.countplot(x="Married", hue='Loan_Status', data=df)
plt.subplot(233)
sns.countplot(x="Education", hue='Loan_Status', data=df)
plt.subplot(234)
sns.countplot(x="Self_Employed", hue='Loan_Status', data=df)
plt.subplot(235)
sns.countplot(x="Dependents", hue='Loan_Status', data=df)
plt.subplot(236)
sns.countplot(x="Property_Area", hue='Loan_Status', data=df)plt.figure(figsize=(8,6))
sns.set_theme(style = "darkgrid")
sns.violinplot(x = "Property_Area",
y = "LoanAmount",
hue = "Gender",
data=df,
palette="Pastel1")
plt.show()plt.figure(figsize=(8,6))
sns.set_theme(style = "darkgrid")
sns.boxplot(x = "Dependents",
y = "LoanAmount",
hue = "Married",
data=df)
plt.show()Visualisasi Multivariabel
R
plot_ApIn_LoAm_ProAre <-ggplot(data_aa, aes(x=ApplicantIncome, y=LoanAmount, shape=Property_Area, colour=Property_Area))+geom_point()
plot_ApIn_LoAm_Edu <-ggplot(data_aa, aes(x=ApplicantIncome, y=LoanAmount, shape=Education, colour=Property_Area))+geom_point()
plot_LoAm_CoIn_SeEm <-ggplot(data_aa, aes(x=LoanAmount, y=APPCO_TotalIncome, shape=Self_Employed, colour=Property_Area))+geom_point()
grid.arrange(plot_ApIn_LoAm_ProAre,plot_ApIn_LoAm_Edu, plot_LoAm_CoIn_SeEm)Python
bins = np.linspace(df.CoapplicantIncome.min(), df.CoapplicantIncome.max(),12)
graph = sns.FacetGrid(df, col="Gender", hue="Loan_Status", palette="Set2", col_wrap=2)
graph.map(plt.hist, 'CoapplicantIncome', bins=bins, ec="k")graph.axes[-1].legend()
plt.show()sns.set_theme(style="darkgrid")
plt.figure(figsize=(8,6))
n = sns.lmplot(x = "ApplicantIncome",
y = "LoanAmount",
data = df,
fit_reg = False,
hue = "Property_Area",
legend = False,
markers=["o","+","x"])
plt.legend(loc="lower right")
plt.show(n)n = sns.pairplot(df,
kind = "scatter",
hue = "Gender",
markers = ["o","s"],
palette = "Set2")
plt.show(n)Tugas 3
Lakukan proses analisa data secara deskriptif menggunakan R dan Python dengan beberapa langkah berikut:
Kualitatif Kategori Univariat
R
prop.table(table(data_aa$Gender))##
## 0 Female Male
## 0.01682243 0.18691589 0.79626168
prop.table(table(data_aa$Married))##
## 0 No Yes
## 0.005607477 0.349532710 0.644859813
prop.table(table(data_aa$Dependents))##
## 0 1 2 3+
## 0.60000000 0.15700935 0.16822430 0.07476636
prop.table(table(data_aa$Education))##
## Graduate Not Graduate
## 0.7551402 0.2448598
prop.table(table(data_aa$Self_Employed))##
## 0 No Yes
## 0.05420561 0.83177570 0.11401869
prop.table(table(data_aa$Loan_Amount_Term))##
## 0 12 36 60 84 120
## 0.026168224 0.001869159 0.003738318 0.003738318 0.007476636 0.005607477
## 180 240 300 360 480
## 0.067289720 0.007476636 0.018691589 0.831775701 0.026168224
prop.table(table(data_aa$Credit_History))##
## 0 1
## 0.2261682 0.7738318
prop.table(table(data_aa$Property_Area))##
## Rural Semiurban Urban
## 0.2990654 0.3831776 0.3177570
prop.table(table(data_aa$Loan_Status))##
## N Y
## 0.3046729 0.6953271
Python
from collections import Counter
Counter(df_kat['Gender'])## Counter({'Male': 502, 'Female': 112})
c = Counter(df_kat['Gender'])
[(i, c[i] / len(df_kat['Gender']) * 100.0) for i in c]## [('Male', 81.75895765472313), ('Female', 18.241042345276874)]
Kualitatif Kategori Bivariat
R
data_aa %>% select(Gender, Married) %>% table() ## Married
## Gender 0 No Yes
## 0 0 2 7
## Female 1 72 27
## Male 2 113 311
data_aa %>% select(Gender, Education) %>% table()## Education
## Gender Graduate Not Graduate
## 0 8 1
## Female 82 18
## Male 314 112
data_aa %>% select(Gender, Property_Area) %>% table() ## Property_Area
## Gender Rural Semiurban Urban
## 0 3 4 2
## Female 24 49 27
## Male 133 152 141
data_aa %>% select(Education, Self_Employed) %>% table()## Self_Employed
## Education 0 No Yes
## Graduate 23 335 46
## Not Graduate 6 110 15
data_aa %>% select(Gender, Loan_Amount_Term) %>% table()## Loan_Amount_Term
## Gender 0 12 36 60 84 120 180 240 300 360 480
## 0 0 0 0 0 0 0 1 0 0 8 0
## Female 3 0 1 0 1 0 2 1 1 87 4
## Male 11 1 1 2 3 3 33 3 9 350 10
data_aa %>% select(Married, Loan_Amount_Term) %>% table()## Loan_Amount_Term
## Married 0 12 36 60 84 120 180 240 300 360 480
## 0 0 0 0 0 0 0 0 1 0 1 1
## No 5 0 2 1 0 1 7 1 3 159 8
## Yes 9 1 0 1 4 2 29 2 7 285 5
Python
cat2 = df_kat[['Gender','Property_Area']]
pd.crosstab(cat2.Gender,cat2.Property_Area)## Property_Area Rural Semiurban Urban
## Gender
## Female 24 55 33
## Male 155 178 169
cat2 = df_kat[['Dependents','Education']]
pd.crosstab(cat2.Dependents,cat2.Education)## Education Graduate Not Graduate
## Dependents
## 0 286 74
## 1 81 21
## 2 77 24
## 3+ 36 15
cat2 = df_kat[['Property_Area','Loan_Status']]
pd.crosstab(cat2.Property_Area,cat2.Loan_Status)## Loan_Status N Y
## Property_Area
## Rural 69 110
## Semiurban 54 179
## Urban 69 133
Kualitatif Kategori Multivariat
R
data_aa %>% select(Gender, Married, Education) %>% ftable()## Education Graduate Not Graduate
## Gender Married
## 0 0 0 0
## No 2 0
## Yes 6 1
## Female 0 1 0
## No 60 12
## Yes 21 6
## Male 0 2 0
## No 82 31
## Yes 230 81
data_aa %>% select(Gender, Married, Education,Property_Area,Loan_Amount_Term) %>% ftable()## Loan_Amount_Term 0 12 36 60 84 120 180 240 300 360 480
## Gender Married Education Property_Area
## 0 0 Graduate Rural 0 0 0 0 0 0 0 0 0 0 0
## Semiurban 0 0 0 0 0 0 0 0 0 0 0
## Urban 0 0 0 0 0 0 0 0 0 0 0
## Not Graduate Rural 0 0 0 0 0 0 0 0 0 0 0
## Semiurban 0 0 0 0 0 0 0 0 0 0 0
## Urban 0 0 0 0 0 0 0 0 0 0 0
## No Graduate Rural 0 0 0 0 0 0 0 0 0 0 0
## Semiurban 0 0 0 0 0 0 0 0 0 1 0
## Urban 0 0 0 0 0 0 0 0 0 1 0
## Not Graduate Rural 0 0 0 0 0 0 0 0 0 0 0
## Semiurban 0 0 0 0 0 0 0 0 0 0 0
## Urban 0 0 0 0 0 0 0 0 0 0 0
## Yes Graduate Rural 0 0 0 0 0 0 0 0 0 2 0
## Semiurban 0 0 0 0 0 0 0 0 0 3 0
## Urban 0 0 0 0 0 0 1 0 0 0 0
## Not Graduate Rural 0 0 0 0 0 0 0 0 0 1 0
## Semiurban 0 0 0 0 0 0 0 0 0 0 0
## Urban 0 0 0 0 0 0 0 0 0 0 0
## Female 0 Graduate Rural 0 0 0 0 0 0 0 0 0 0 0
## Semiurban 0 0 0 0 0 0 0 1 0 0 0
## Urban 0 0 0 0 0 0 0 0 0 0 0
## Not Graduate Rural 0 0 0 0 0 0 0 0 0 0 0
## Semiurban 0 0 0 0 0 0 0 0 0 0 0
## Urban 0 0 0 0 0 0 0 0 0 0 0
## No Graduate Rural 0 0 0 0 0 0 0 0 0 13 2
## Semiurban 0 0 1 0 0 0 0 0 1 23 0
## Urban 1 0 0 0 0 0 1 0 0 17 1
## Not Graduate Rural 0 0 0 0 0 0 0 0 0 4 0
## Semiurban 0 0 0 0 0 0 0 0 0 5 0
## Urban 2 0 0 0 0 0 0 0 0 1 0
## Yes Graduate Rural 0 0 0 0 0 0 0 0 0 3 0
## Semiurban 0 0 0 0 1 0 1 0 0 11 1
## Urban 0 0 0 0 0 0 0 0 0 4 0
## Not Graduate Rural 0 0 0 0 0 0 0 0 0 2 0
## Semiurban 0 0 0 0 0 0 0 0 0 4 0
## Urban 0 0 0 0 0 0 0 0 0 0 0
## Male 0 Graduate Rural 0 0 0 0 0 0 0 0 0 0 0
## Semiurban 0 0 0 0 0 0 0 0 0 0 1
## Urban 0 0 0 0 0 0 0 0 0 1 0
## Not Graduate Rural 0 0 0 0 0 0 0 0 0 0 0
## Semiurban 0 0 0 0 0 0 0 0 0 0 0
## Urban 0 0 0 0 0 0 0 0 0 0 0
## No Graduate Rural 1 0 0 0 0 0 0 0 0 26 0
## Semiurban 0 0 0 0 0 0 1 0 1 22 2
## Urban 0 0 0 1 0 0 3 0 0 23 2
## Not Graduate Rural 1 0 0 0 0 0 0 0 1 11 0
## Semiurban 0 0 1 0 0 1 0 0 0 9 0
## Urban 0 0 0 0 0 0 2 1 0 3 1
## Yes Graduate Rural 1 0 0 0 2 0 7 0 1 54 0
## Semiurban 2 0 0 0 0 1 3 1 3 79 1
## Urban 3 1 0 0 1 1 4 1 1 63 0
## Not Graduate Rural 1 0 0 0 0 0 3 0 0 24 0
## Semiurban 1 0 0 0 0 0 2 0 1 18 2
## Urban 1 0 0 1 0 0 8 0 1 17 1
Python
cat3 = df_kat[['Gender','Property_Area','Loan_Status']]
pd.crosstab([cat3.Gender,cat3.Property_Area], cat3.Loan_Status, rownames=['Gender', 'Property_Area'], colnames=['Loan_Status'])## Loan_Status N Y
## Gender Property_Area
## Female Rural 10 14
## Semiurban 13 42
## Urban 14 19
## Male Rural 59 96
## Semiurban 41 137
## Urban 55 114
cat3 = df_kat[['Married','Education','Loan_Status']]
pd.crosstab([cat3.Married,cat3.Education], cat3.Loan_Status, rownames=['Married', 'Education'], colnames=['Loan_Status'])## Loan_Status N Y
## Married Education
## No Graduate 62 107
## Not Graduate 17 28
## Yes Graduate 78 233
## Not Graduate 35 54
Kuantitatif Univariat numerik
R
da_numerik_1 <- select_if(data_aa, is.numeric)
summary(da_numerik_1)## ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term
## Min. : 150 Min. : 0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 2752 1st Qu.: 0 1st Qu.: 96.0 1st Qu.:360.0
## Median : 3598 Median :1260 Median :120.0 Median :360.0
## Mean : 4054 Mean :1323 Mean :121.6 Mean :333.2
## 3rd Qu.: 4891 3rd Qu.:2194 3rd Qu.:151.5 3rd Qu.:360.0
## Max. :10139 Max. :5701 Max. :260.0 Max. :480.0
## Credit_History APPCO_TotalIncome
## Min. :0.0000 Min. : 1442
## 1st Qu.:1.0000 1st Qu.: 3900
## Median :1.0000 Median : 5000
## Mean :0.7738 Mean : 5377
## 3rd Qu.:1.0000 3rd Qu.: 6411
## Max. :1.0000 Max. :13746
library(moments)
fungsi <- function(x){
data.frame(var(x),
sd(x),
mad(x),
IQR(x),
skewness(x),
kurtosis(x))
}
sapply(da_numerik_1,fungsi)## ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term
## var.x. 3435005 2019827 2551.245 7266.282
## sd.x. 1853.377 1421.206 50.50985 85.24249
## mad.x. 1504.839 1868.076 38.5476 0
## IQR.x. 2138.5 2194 55.5 0
## skewness.x. 1.139029 0.8435314 0.003766545 -2.467683
## kurtosis.x. 4.208463 3.030757 3.554249 9.183479
## Credit_History APPCO_TotalIncome
## var.x. 0.1753439 4049507
## sd.x. 0.4187409 2012.339
## mad.x. 0 1813.22
## IQR.x. 0 2511
## skewness.x. -1.309106 0.9294353
## kurtosis.x. 2.713758 3.963494
Python
Mean
import statistics as st
print("Mean:",st.mean(df.ApplicantIncome) )## Mean: 3987.6546961325967
Quartile
df.ApplicantIncome.quantile([0, .25, .5, .75, 1])## 0.00 150.00
## 0.25 2698.25
## 0.50 3600.50
## 0.75 4746.25
## 1.00 10000.00
## Name: ApplicantIncome, dtype: float64
Median
print("Median:",st.median(df.ApplicantIncome))## Median: 3600.5
Mode
print("Mode:",st.mode(df.ApplicantIncome))## Mode: 2500
Alternatif Statistic
df.describe()## ApplicantIncome CoapplicantIncome ... Loan_Amount_Term Credit_History
## count 362.000000 362.000000 ... 362.0 362.0
## mean 3987.654696 1319.892597 ... 360.0 1.0
## std 1817.816886 1371.817473 ... 0.0 0.0
## min 150.000000 0.000000 ... 360.0 1.0
## 25% 2698.250000 0.000000 ... 360.0 1.0
## 50% 3600.500000 1339.000000 ... 360.0 1.0
## 75% 4746.250000 2167.750000 ... 360.0 1.0
## max 10000.000000 5625.000000 ... 360.0 1.0
##
## [8 rows x 5 columns]
Scale
median=st.median(df.ApplicantIncome)
mad = st.median([abs(number-median) for number in df.ApplicantIncome])
print("Median Absolute Deviation:" ,mad)## Median Absolute Deviation: 1017.5
#calculate interquartile range
q3, q1 = np.percentile((df.ApplicantIncome), [75 ,25])
iqr = q3 - q1
#display interquartile range
print("IQR:",iqr)## IQR: 2048.0
Skewness
print("Skewness:",df.ApplicantIncome.skew(axis=0))## Skewness: 1.1768488832164874
Kurtosis
print("Kurtosis:",df.ApplicantIncome.kurt(axis=0))## Kurtosis: 1.4935966674040806
Kuantitatif Bivariat numerik
R
cov(da_numerik_1$ApplicantIncome,da_numerik_1$CoapplicantIncome)## [1] -702662.6
cov(da_numerik_1$CoapplicantIncome,da_numerik_1$LoanAmount)## [1] 19171.36
cov(da_numerik_1$LoanAmount,da_numerik_1$ApplicantIncome)## [1] 38506.02
cov(da_numerik_1$LoanAmount, da_numerik_1$APPCO_TotalIncome)## [1] 57677.38
cor(da_numerik_1$ApplicantIncome,da_numerik_1$CoapplicantIncome)## [1] -0.2667633
cor(da_numerik_1$CoapplicantIncome,da_numerik_1$LoanAmount)## [1] 0.2670667
cor(da_numerik_1$LoanAmount,da_numerik_1$ApplicantIncome)## [1] 0.4113285
cor(da_numerik_1$LoanAmount, da_numerik_1$APPCO_TotalIncome)## [1] 0.567451
Python
Covariance
print("Covariance:",df.ApplicantIncome.cov(df.LoanAmount))## Covariance: 33566.25470990649
Correlation
print("Correlation:",df.ApplicantIncome.corr(df.LoanAmount))## Correlation: 0.46607451449801046
import matplotlib.pyplot as plt
correlation_mat = df.corr()## <string>:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
mask = np.zeros_like(correlation_mat)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
f, ax = plt.subplots(figsize=(7,6))
ax = sns.heatmap(correlation_mat,
mask=mask,annot=True,cmap="YlGnBu")
plt.show()Z-Score
import scipy.stats as stats
zscores = stats.zscore(df.ApplicantIncome)
print(zscores)## 0 1.025363
## 1 0.327959
## 2 -0.544071
## 3 -0.773785
## 4 1.108544
## ...
## 606 -0.323722
## 607 -0.000361
## 608 -0.416269
## 609 -0.599158
## 612 1.980574
## Name: ApplicantIncome, Length: 362, dtype: float64
Kuantitatif Multivariat numerik
R
cov(da_numerik_1)## ApplicantIncome CoapplicantIncome LoanAmount
## ApplicantIncome 3435005.10002 -702662.55117 3.850602e+04
## CoapplicantIncome -702662.55117 2019826.68432 1.917136e+04
## LoanAmount 38506.01963 19171.35832 2.551245e+03
## Loan_Amount_Term -8272.62165 -4656.60781 3.423033e+02
## Credit_History 55.92708 -18.61241 1.349715e-02
## APPCO_TotalIncome 2732342.54885 1317164.13315 5.767738e+04
## Loan_Amount_Term Credit_History APPCO_TotalIncome
## ApplicantIncome -8272.621653 55.92708180 2732342.54885
## CoapplicantIncome -4656.607813 -18.61241094 1317164.13315
## LoanAmount 342.303266 0.01349715 57677.37794
## Loan_Amount_Term 7266.281634 1.72424656 -12929.22947
## Credit_History 1.724247 0.17534390 37.31467
## APPCO_TotalIncome -12929.229466 37.31467086 4049506.68200
cor(da_numerik_1)## ApplicantIncome CoapplicantIncome LoanAmount
## ApplicantIncome 1.00000000 -0.26676329 0.4113285245
## CoapplicantIncome -0.26676329 1.00000000 0.2670666904
## LoanAmount 0.41132852 0.26706669 1.0000000000
## Loan_Amount_Term -0.05236286 -0.03843762 0.0795021406
## Credit_History 0.07206313 -0.03127521 0.0006381467
## APPCO_TotalIncome 0.73260587 0.46055530 0.5674509687
## Loan_Amount_Term Credit_History APPCO_TotalIncome
## ApplicantIncome -0.05236286 0.0720631314 0.73260587
## CoapplicantIncome -0.03843762 -0.0312752107 0.46055530
## LoanAmount 0.07950214 0.0006381467 0.56745097
## Loan_Amount_Term 1.00000000 0.0483056487 -0.07537294
## Credit_History 0.04830565 1.0000000000 0.04428261
## APPCO_TotalIncome -0.07537294 0.0442826109 1.00000000
Python
Covariance
cof = df.select_dtypes(include=[np.number]) # Number
pd.set_option('display.max_columns', None)
cof.cov()## ApplicantIncome CoapplicantIncome LoanAmount \
## ApplicantIncome 3.304458e+06 -7.126391e+05 33566.254710
## CoapplicantIncome -7.126391e+05 1.881883e+06 16731.448762
## LoanAmount 3.356625e+04 1.673145e+04 1569.620973
## Loan_Amount_Term 0.000000e+00 0.000000e+00 0.000000
## Credit_History 0.000000e+00 0.000000e+00 0.000000
##
## Loan_Amount_Term Credit_History
## ApplicantIncome 0.0 0.0
## CoapplicantIncome 0.0 0.0
## LoanAmount 0.0 0.0
## Loan_Amount_Term 0.0 0.0
## Credit_History 0.0 0.0
Covariance
pd.set_option('display.max_columns', None)
cof.corr()## ApplicantIncome CoapplicantIncome LoanAmount \
## ApplicantIncome 1.000000 -0.285774 0.466075
## CoapplicantIncome -0.285774 1.000000 0.307850
## LoanAmount 0.466075 0.307850 1.000000
## Loan_Amount_Term NaN NaN NaN
## Credit_History NaN NaN NaN
##
## Loan_Amount_Term Credit_History
## ApplicantIncome NaN NaN
## CoapplicantIncome NaN NaN
## LoanAmount NaN NaN
## Loan_Amount_Term NaN NaN
## Credit_History NaN NaN
EDA dengan cara Malas
R
library(funModeling) ## Loading required package: Hmisc
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
## funModeling v.1.9.4 :)
## Examples and tutorials at livebook.datascienceheroes.com
## / Now in Spanish: librovivodecienciadedatos.ai
library(tidyverse) ## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ tibble 3.1.8 ✔ purrr 0.3.4
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ gridExtra::combine() masks dplyr::combine()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ Hmisc::src() masks dplyr::src()
## ✖ Hmisc::summarize() masks dplyr::summarize()
library(Hmisc)
library(skimr)
eda <- function(data)
{
glimpse(data)
skim(data)
df_status(data)
freq(data)
profiling_num(data)
plot_num(data)
describe(data)
}
eda(df)## Rows: 614
## Columns: 13
## $ Loan_ID <chr> "LP001002", "LP001003", "LP001005", "LP001006", "LP0…
## $ Gender <chr> "Male", "Male", "Male", "Male", "Male", "Male", "Mal…
## $ Married <chr> "No", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes"…
## $ Dependents <chr> "0", "1", "0", "0", "0", "2", "0", "3+", "2", "1", "…
## $ Education <chr> "Graduate", "Graduate", "Graduate", "Not Graduate", …
## $ Self_Employed <chr> "No", "No", "Yes", "No", "No", "Yes", "No", "No", "N…
## $ ApplicantIncome <int> 5849, 4583, 3000, 2583, 6000, 5417, 2333, 3036, 4006…
## $ CoapplicantIncome <dbl> 0, 1508, 0, 2358, 0, 4196, 1516, 2504, 1526, 10968, …
## $ LoanAmount <dbl> 0, 128, 66, 120, 141, 267, 95, 158, 168, 349, 70, 10…
## $ Loan_Amount_Term <dbl> 360, 360, 360, 360, 360, 360, 360, 360, 360, 360, 36…
## $ Credit_History <dbl> 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0…
## $ Property_Area <chr> "Urban", "Rural", "Urban", "Urban", "Urban", "Urban"…
## $ Loan_Status <chr> "Y", "N", "Y", "Y", "Y", "Y", "Y", "N", "Y", "N", "Y…
## variable q_zeros p_zeros q_na p_na q_inf p_inf type unique
## 1 Loan_ID 0 0.00 0 0 0 0 character 614
## 2 Gender 13 2.12 0 0 0 0 character 3
## 3 Married 3 0.49 0 0 0 0 character 3
## 4 Dependents 360 58.63 0 0 0 0 character 4
## 5 Education 0 0.00 0 0 0 0 character 2
## 6 Self_Employed 32 5.21 0 0 0 0 character 3
## 7 ApplicantIncome 0 0.00 0 0 0 0 integer 505
## 8 CoapplicantIncome 273 44.46 0 0 0 0 numeric 287
## 9 LoanAmount 22 3.58 0 0 0 0 numeric 204
## 10 Loan_Amount_Term 14 2.28 0 0 0 0 numeric 11
## 11 Credit_History 139 22.64 0 0 0 0 numeric 2
## 12 Property_Area 0 0.00 0 0 0 0 character 3
## 13 Loan_Status 0 0.00 0 0 0 0 character 2
## Warning in freq_logic(data = data, input = input[i], plot, na.rm, path_out =
## path_out): Skipping plot for variable 'Loan_ID' (more than 100 categories)
## Loan_ID frequency percentage cumulative_perc
## 1 LP001002 1 0.16 0.16
## 2 LP001003 1 0.16 0.32
## 3 LP001005 1 0.16 0.48
## 4 LP001006 1 0.16 0.64
## 5 LP001008 1 0.16 0.80
## 6 LP001011 1 0.16 0.96
## 7 LP001013 1 0.16 1.12
## 8 LP001014 1 0.16 1.28
## 9 LP001018 1 0.16 1.44
## 10 LP001020 1 0.16 1.60
## 11 LP001024 1 0.16 1.76
## 12 LP001027 1 0.16 1.92
## 13 LP001028 1 0.16 2.08
## 14 LP001029 1 0.16 2.24
## 15 LP001030 1 0.16 2.40
## 16 LP001032 1 0.16 2.56
## 17 LP001034 1 0.16 2.72
## 18 LP001036 1 0.16 2.88
## 19 LP001038 1 0.16 3.04
## 20 LP001041 1 0.16 3.20
## 21 LP001043 1 0.16 3.36
## 22 LP001046 1 0.16 3.52
## 23 LP001047 1 0.16 3.68
## 24 LP001050 1 0.16 3.84
## 25 LP001052 1 0.16 4.00
## 26 LP001066 1 0.16 4.16
## 27 LP001068 1 0.16 4.32
## 28 LP001073 1 0.16 4.48
## 29 LP001086 1 0.16 4.64
## 30 LP001087 1 0.16 4.80
## 31 LP001091 1 0.16 4.96
## 32 LP001095 1 0.16 5.12
## 33 LP001097 1 0.16 5.28
## 34 LP001098 1 0.16 5.44
## 35 LP001100 1 0.16 5.60
## 36 LP001106 1 0.16 5.76
## 37 LP001109 1 0.16 5.92
## 38 LP001112 1 0.16 6.08
## 39 LP001114 1 0.16 6.24
## 40 LP001116 1 0.16 6.40
## 41 LP001119 1 0.16 6.56
## 42 LP001120 1 0.16 6.72
## 43 LP001123 1 0.16 6.88
## 44 LP001131 1 0.16 7.04
## 45 LP001136 1 0.16 7.20
## 46 LP001137 1 0.16 7.36
## 47 LP001138 1 0.16 7.52
## 48 LP001144 1 0.16 7.68
## 49 LP001146 1 0.16 7.84
## 50 LP001151 1 0.16 8.00
## 51 LP001155 1 0.16 8.16
## 52 LP001157 1 0.16 8.32
## 53 LP001164 1 0.16 8.48
## 54 LP001179 1 0.16 8.64
## 55 LP001186 1 0.16 8.80
## 56 LP001194 1 0.16 8.96
## 57 LP001195 1 0.16 9.12
## 58 LP001197 1 0.16 9.28
## 59 LP001198 1 0.16 9.44
## 60 LP001199 1 0.16 9.60
## 61 LP001205 1 0.16 9.76
## 62 LP001206 1 0.16 9.92
## 63 LP001207 1 0.16 10.08
## 64 LP001213 1 0.16 10.24
## 65 LP001222 1 0.16 10.40
## 66 LP001225 1 0.16 10.56
## 67 LP001228 1 0.16 10.72
## 68 LP001233 1 0.16 10.88
## 69 LP001238 1 0.16 11.04
## 70 LP001241 1 0.16 11.20
## 71 LP001243 1 0.16 11.36
## 72 LP001245 1 0.16 11.52
## 73 LP001248 1 0.16 11.68
## 74 LP001250 1 0.16 11.84
## 75 LP001253 1 0.16 12.00
## 76 LP001255 1 0.16 12.16
## 77 LP001256 1 0.16 12.32
## 78 LP001259 1 0.16 12.48
## 79 LP001263 1 0.16 12.64
## 80 LP001264 1 0.16 12.80
## 81 LP001265 1 0.16 12.96
## 82 LP001266 1 0.16 13.12
## 83 LP001267 1 0.16 13.28
## 84 LP001273 1 0.16 13.44
## 85 LP001275 1 0.16 13.60
## 86 LP001279 1 0.16 13.76
## 87 LP001280 1 0.16 13.92
## 88 LP001282 1 0.16 14.08
## 89 LP001289 1 0.16 14.24
## 90 LP001310 1 0.16 14.40
## 91 LP001316 1 0.16 14.56
## 92 LP001318 1 0.16 14.72
## 93 LP001319 1 0.16 14.88
## 94 LP001322 1 0.16 15.04
## 95 LP001325 1 0.16 15.20
## 96 LP001326 1 0.16 15.36
## 97 LP001327 1 0.16 15.52
## 98 LP001333 1 0.16 15.68
## 99 LP001334 1 0.16 15.84
## 100 LP001343 1 0.16 16.00
## 101 LP001345 1 0.16 16.16
## 102 LP001349 1 0.16 16.32
## 103 LP001350 1 0.16 16.48
## 104 LP001356 1 0.16 16.64
## 105 LP001357 1 0.16 16.80
## 106 LP001367 1 0.16 16.96
## 107 LP001369 1 0.16 17.12
## 108 LP001370 1 0.16 17.28
## 109 LP001379 1 0.16 17.44
## 110 LP001384 1 0.16 17.60
## 111 LP001385 1 0.16 17.76
## 112 LP001387 1 0.16 17.92
## 113 LP001391 1 0.16 18.08
## 114 LP001392 1 0.16 18.24
## 115 LP001398 1 0.16 18.40
## 116 LP001401 1 0.16 18.56
## 117 LP001404 1 0.16 18.72
## 118 LP001405 1 0.16 18.88
## 119 LP001421 1 0.16 19.04
## 120 LP001422 1 0.16 19.20
## 121 LP001426 1 0.16 19.36
## 122 LP001430 1 0.16 19.52
## 123 LP001431 1 0.16 19.68
## 124 LP001432 1 0.16 19.84
## 125 LP001439 1 0.16 20.00
## 126 LP001443 1 0.16 20.16
## 127 LP001448 1 0.16 20.32
## 128 LP001449 1 0.16 20.48
## 129 LP001451 1 0.16 20.64
## 130 LP001465 1 0.16 20.80
## 131 LP001469 1 0.16 20.96
## 132 LP001473 1 0.16 21.12
## 133 LP001478 1 0.16 21.28
## 134 LP001482 1 0.16 21.44
## 135 LP001487 1 0.16 21.60
## 136 LP001488 1 0.16 21.76
## 137 LP001489 1 0.16 21.92
## 138 LP001491 1 0.16 22.08
## 139 LP001492 1 0.16 22.24
## 140 LP001493 1 0.16 22.40
## 141 LP001497 1 0.16 22.56
## 142 LP001498 1 0.16 22.72
## 143 LP001504 1 0.16 22.88
## 144 LP001507 1 0.16 23.04
## 145 LP001508 1 0.16 23.20
## 146 LP001514 1 0.16 23.36
## 147 LP001516 1 0.16 23.52
## 148 LP001518 1 0.16 23.68
## 149 LP001519 1 0.16 23.84
## 150 LP001520 1 0.16 24.00
## 151 LP001528 1 0.16 24.16
## 152 LP001529 1 0.16 24.32
## 153 LP001531 1 0.16 24.48
## 154 LP001532 1 0.16 24.64
## 155 LP001535 1 0.16 24.80
## 156 LP001536 1 0.16 24.96
## 157 LP001541 1 0.16 25.12
## 158 LP001543 1 0.16 25.28
## 159 LP001546 1 0.16 25.44
## 160 LP001552 1 0.16 25.60
## 161 LP001560 1 0.16 25.76
## 162 LP001562 1 0.16 25.92
## 163 LP001565 1 0.16 26.08
## 164 LP001570 1 0.16 26.24
## 165 LP001572 1 0.16 26.40
## 166 LP001574 1 0.16 26.56
## 167 LP001577 1 0.16 26.72
## 168 LP001578 1 0.16 26.88
## 169 LP001579 1 0.16 27.04
## 170 LP001580 1 0.16 27.20
## 171 LP001581 1 0.16 27.36
## 172 LP001585 1 0.16 27.52
## 173 LP001586 1 0.16 27.68
## 174 LP001594 1 0.16 27.84
## 175 LP001603 1 0.16 28.00
## 176 LP001606 1 0.16 28.16
## 177 LP001608 1 0.16 28.32
## 178 LP001610 1 0.16 28.48
## 179 LP001616 1 0.16 28.64
## 180 LP001630 1 0.16 28.80
## 181 LP001633 1 0.16 28.96
## 182 LP001634 1 0.16 29.12
## 183 LP001636 1 0.16 29.28
## 184 LP001637 1 0.16 29.44
## 185 LP001639 1 0.16 29.60
## 186 LP001640 1 0.16 29.76
## 187 LP001641 1 0.16 29.92
## 188 LP001643 1 0.16 30.08
## 189 LP001644 1 0.16 30.24
## 190 LP001647 1 0.16 30.40
## 191 LP001653 1 0.16 30.56
## 192 LP001656 1 0.16 30.72
## 193 LP001657 1 0.16 30.88
## 194 LP001658 1 0.16 31.04
## 195 LP001664 1 0.16 31.20
## 196 LP001665 1 0.16 31.36
## 197 LP001666 1 0.16 31.52
## 198 LP001669 1 0.16 31.68
## 199 LP001671 1 0.16 31.84
## 200 LP001673 1 0.16 32.00
## 201 LP001674 1 0.16 32.16
## 202 LP001677 1 0.16 32.32
## 203 LP001682 1 0.16 32.48
## 204 LP001688 1 0.16 32.64
## 205 LP001691 1 0.16 32.80
## 206 LP001692 1 0.16 32.96
## 207 LP001693 1 0.16 33.12
## 208 LP001698 1 0.16 33.28
## 209 LP001699 1 0.16 33.44
## 210 LP001702 1 0.16 33.60
## 211 LP001708 1 0.16 33.76
## 212 LP001711 1 0.16 33.92
## 213 LP001713 1 0.16 34.08
## 214 LP001715 1 0.16 34.24
## 215 LP001716 1 0.16 34.40
## 216 LP001720 1 0.16 34.56
## 217 LP001722 1 0.16 34.72
## 218 LP001726 1 0.16 34.88
## 219 LP001732 1 0.16 35.04
## 220 LP001734 1 0.16 35.20
## 221 LP001736 1 0.16 35.36
## 222 LP001743 1 0.16 35.52
## 223 LP001744 1 0.16 35.68
## 224 LP001749 1 0.16 35.84
## 225 LP001750 1 0.16 36.00
## 226 LP001751 1 0.16 36.16
## 227 LP001754 1 0.16 36.32
## 228 LP001758 1 0.16 36.48
## 229 LP001760 1 0.16 36.64
## 230 LP001761 1 0.16 36.80
## 231 LP001765 1 0.16 36.96
## 232 LP001768 1 0.16 37.12
## 233 LP001770 1 0.16 37.28
## 234 LP001776 1 0.16 37.44
## 235 LP001778 1 0.16 37.60
## 236 LP001784 1 0.16 37.76
## 237 LP001786 1 0.16 37.92
## 238 LP001788 1 0.16 38.08
## 239 LP001790 1 0.16 38.24
## 240 LP001792 1 0.16 38.40
## 241 LP001798 1 0.16 38.56
## 242 LP001800 1 0.16 38.72
## 243 LP001806 1 0.16 38.88
## 244 LP001807 1 0.16 39.04
## 245 LP001811 1 0.16 39.20
## 246 LP001813 1 0.16 39.36
## 247 LP001814 1 0.16 39.52
## 248 LP001819 1 0.16 39.68
## 249 LP001824 1 0.16 39.84
## 250 LP001825 1 0.16 40.00
## 251 LP001835 1 0.16 40.16
## 252 LP001836 1 0.16 40.32
## 253 LP001841 1 0.16 40.48
## 254 LP001843 1 0.16 40.64
## 255 LP001844 1 0.16 40.80
## 256 LP001846 1 0.16 40.96
## 257 LP001849 1 0.16 41.12
## 258 LP001854 1 0.16 41.28
## 259 LP001859 1 0.16 41.44
## 260 LP001864 1 0.16 41.60
## 261 LP001865 1 0.16 41.76
## 262 LP001868 1 0.16 41.92
## 263 LP001870 1 0.16 42.08
## 264 LP001871 1 0.16 42.24
## 265 LP001872 1 0.16 42.40
## 266 LP001875 1 0.16 42.56
## 267 LP001877 1 0.16 42.72
## 268 LP001882 1 0.16 42.88
## 269 LP001883 1 0.16 43.04
## 270 LP001884 1 0.16 43.20
## 271 LP001888 1 0.16 43.36
## 272 LP001891 1 0.16 43.52
## 273 LP001892 1 0.16 43.68
## 274 LP001894 1 0.16 43.84
## 275 LP001896 1 0.16 44.00
## 276 LP001900 1 0.16 44.16
## 277 LP001903 1 0.16 44.32
## 278 LP001904 1 0.16 44.48
## 279 LP001907 1 0.16 44.64
## 280 LP001908 1 0.16 44.80
## 281 LP001910 1 0.16 44.96
## 282 LP001914 1 0.16 45.12
## 283 LP001915 1 0.16 45.28
## 284 LP001917 1 0.16 45.44
## 285 LP001922 1 0.16 45.60
## 286 LP001924 1 0.16 45.76
## 287 LP001925 1 0.16 45.92
## 288 LP001926 1 0.16 46.08
## 289 LP001931 1 0.16 46.24
## 290 LP001935 1 0.16 46.40
## 291 LP001936 1 0.16 46.56
## 292 LP001938 1 0.16 46.72
## 293 LP001940 1 0.16 46.88
## 294 LP001945 1 0.16 47.04
## 295 LP001947 1 0.16 47.20
## 296 LP001949 1 0.16 47.36
## 297 LP001953 1 0.16 47.52
## 298 LP001954 1 0.16 47.68
## 299 LP001955 1 0.16 47.84
## 300 LP001963 1 0.16 48.00
## 301 LP001964 1 0.16 48.16
## 302 LP001972 1 0.16 48.32
## 303 LP001974 1 0.16 48.48
## 304 LP001977 1 0.16 48.64
## 305 LP001978 1 0.16 48.80
## 306 LP001990 1 0.16 48.96
## 307 LP001993 1 0.16 49.12
## 308 LP001994 1 0.16 49.28
## 309 LP001996 1 0.16 49.44
## 310 LP001998 1 0.16 49.60
## 311 LP002002 1 0.16 49.76
## 312 LP002004 1 0.16 49.92
## 313 LP002006 1 0.16 50.08
## 314 LP002008 1 0.16 50.24
## 315 LP002024 1 0.16 50.40
## 316 LP002031 1 0.16 50.56
## 317 LP002035 1 0.16 50.72
## 318 LP002036 1 0.16 50.88
## 319 LP002043 1 0.16 51.04
## 320 LP002050 1 0.16 51.20
## 321 LP002051 1 0.16 51.36
## 322 LP002053 1 0.16 51.52
## 323 LP002054 1 0.16 51.68
## 324 LP002055 1 0.16 51.84
## 325 LP002065 1 0.16 52.00
## 326 LP002067 1 0.16 52.16
## 327 LP002068 1 0.16 52.32
## 328 LP002082 1 0.16 52.48
## 329 LP002086 1 0.16 52.64
## 330 LP002087 1 0.16 52.80
## 331 LP002097 1 0.16 52.96
## 332 LP002098 1 0.16 53.12
## 333 LP002100 1 0.16 53.28
## 334 LP002101 1 0.16 53.44
## 335 LP002103 1 0.16 53.60
## 336 LP002106 1 0.16 53.76
## 337 LP002110 1 0.16 53.92
## 338 LP002112 1 0.16 54.08
## 339 LP002113 1 0.16 54.24
## 340 LP002114 1 0.16 54.40
## 341 LP002115 1 0.16 54.56
## 342 LP002116 1 0.16 54.72
## 343 LP002119 1 0.16 54.88
## 344 LP002126 1 0.16 55.04
## 345 LP002128 1 0.16 55.20
## 346 LP002129 1 0.16 55.36
## 347 LP002130 1 0.16 55.52
## 348 LP002131 1 0.16 55.68
## 349 LP002137 1 0.16 55.84
## 350 LP002138 1 0.16 56.00
## 351 LP002139 1 0.16 56.16
## 352 LP002140 1 0.16 56.32
## 353 LP002141 1 0.16 56.48
## 354 LP002142 1 0.16 56.64
## 355 LP002143 1 0.16 56.80
## 356 LP002144 1 0.16 56.96
## 357 LP002149 1 0.16 57.12
## 358 LP002151 1 0.16 57.28
## 359 LP002158 1 0.16 57.44
## 360 LP002160 1 0.16 57.60
## 361 LP002161 1 0.16 57.76
## 362 LP002170 1 0.16 57.92
## 363 LP002175 1 0.16 58.08
## 364 LP002178 1 0.16 58.24
## 365 LP002180 1 0.16 58.40
## 366 LP002181 1 0.16 58.56
## 367 LP002187 1 0.16 58.72
## 368 LP002188 1 0.16 58.88
## 369 LP002190 1 0.16 59.04
## 370 LP002191 1 0.16 59.20
## 371 LP002194 1 0.16 59.36
## 372 LP002197 1 0.16 59.52
## 373 LP002201 1 0.16 59.68
## 374 LP002205 1 0.16 59.84
## 375 LP002209 1 0.16 60.00
## 376 LP002211 1 0.16 60.16
## 377 LP002219 1 0.16 60.32
## 378 LP002223 1 0.16 60.48
## 379 LP002224 1 0.16 60.64
## 380 LP002225 1 0.16 60.80
## 381 LP002226 1 0.16 60.96
## 382 LP002229 1 0.16 61.12
## 383 LP002231 1 0.16 61.28
## 384 LP002234 1 0.16 61.44
## 385 LP002236 1 0.16 61.60
## 386 LP002237 1 0.16 61.76
## 387 LP002239 1 0.16 61.92
## 388 LP002243 1 0.16 62.08
## 389 LP002244 1 0.16 62.24
## 390 LP002250 1 0.16 62.40
## 391 LP002255 1 0.16 62.56
## 392 LP002262 1 0.16 62.72
## 393 LP002263 1 0.16 62.88
## 394 LP002265 1 0.16 63.04
## 395 LP002266 1 0.16 63.20
## 396 LP002272 1 0.16 63.36
## 397 LP002277 1 0.16 63.52
## 398 LP002281 1 0.16 63.68
## 399 LP002284 1 0.16 63.84
## 400 LP002287 1 0.16 64.00
## 401 LP002288 1 0.16 64.16
## 402 LP002296 1 0.16 64.32
## 403 LP002297 1 0.16 64.48
## 404 LP002300 1 0.16 64.64
## 405 LP002301 1 0.16 64.80
## 406 LP002305 1 0.16 64.96
## 407 LP002308 1 0.16 65.12
## 408 LP002314 1 0.16 65.28
## 409 LP002315 1 0.16 65.44
## 410 LP002317 1 0.16 65.60
## 411 LP002318 1 0.16 65.76
## 412 LP002319 1 0.16 65.92
## 413 LP002328 1 0.16 66.08
## 414 LP002332 1 0.16 66.24
## 415 LP002335 1 0.16 66.40
## 416 LP002337 1 0.16 66.56
## 417 LP002341 1 0.16 66.72
## 418 LP002342 1 0.16 66.88
## 419 LP002345 1 0.16 67.04
## 420 LP002347 1 0.16 67.20
## 421 LP002348 1 0.16 67.36
## 422 LP002357 1 0.16 67.52
## 423 LP002361 1 0.16 67.68
## 424 LP002362 1 0.16 67.84
## 425 LP002364 1 0.16 68.00
## 426 LP002366 1 0.16 68.16
## 427 LP002367 1 0.16 68.32
## 428 LP002368 1 0.16 68.48
## 429 LP002369 1 0.16 68.64
## 430 LP002370 1 0.16 68.80
## 431 LP002377 1 0.16 68.96
## 432 LP002379 1 0.16 69.12
## 433 LP002386 1 0.16 69.28
## 434 LP002387 1 0.16 69.44
## 435 LP002390 1 0.16 69.60
## 436 LP002393 1 0.16 69.76
## 437 LP002398 1 0.16 69.92
## 438 LP002401 1 0.16 70.08
## 439 LP002403 1 0.16 70.24
## 440 LP002407 1 0.16 70.40
## 441 LP002408 1 0.16 70.56
## 442 LP002409 1 0.16 70.72
## 443 LP002418 1 0.16 70.88
## 444 LP002422 1 0.16 71.04
## 445 LP002424 1 0.16 71.20
## 446 LP002429 1 0.16 71.36
## 447 LP002434 1 0.16 71.52
## 448 LP002435 1 0.16 71.68
## 449 LP002443 1 0.16 71.84
## 450 LP002444 1 0.16 72.00
## 451 LP002446 1 0.16 72.16
## 452 LP002447 1 0.16 72.32
## 453 LP002448 1 0.16 72.48
## 454 LP002449 1 0.16 72.64
## 455 LP002453 1 0.16 72.80
## 456 LP002455 1 0.16 72.96
## 457 LP002459 1 0.16 73.12
## 458 LP002467 1 0.16 73.28
## 459 LP002472 1 0.16 73.44
## 460 LP002473 1 0.16 73.60
## 461 LP002478 1 0.16 73.76
## 462 LP002484 1 0.16 73.92
## 463 LP002487 1 0.16 74.08
## 464 LP002489 1 0.16 74.24
## 465 LP002493 1 0.16 74.40
## 466 LP002494 1 0.16 74.56
## 467 LP002500 1 0.16 74.72
## 468 LP002501 1 0.16 74.88
## 469 LP002502 1 0.16 75.04
## 470 LP002505 1 0.16 75.20
## 471 LP002515 1 0.16 75.36
## 472 LP002517 1 0.16 75.52
## 473 LP002519 1 0.16 75.68
## 474 LP002522 1 0.16 75.84
## 475 LP002524 1 0.16 76.00
## 476 LP002527 1 0.16 76.16
## 477 LP002529 1 0.16 76.32
## 478 LP002530 1 0.16 76.48
## 479 LP002531 1 0.16 76.64
## 480 LP002533 1 0.16 76.80
## 481 LP002534 1 0.16 76.96
## 482 LP002536 1 0.16 77.12
## 483 LP002537 1 0.16 77.28
## 484 LP002541 1 0.16 77.44
## 485 LP002543 1 0.16 77.60
## 486 LP002544 1 0.16 77.76
## 487 LP002545 1 0.16 77.92
## 488 LP002547 1 0.16 78.08
## 489 LP002555 1 0.16 78.24
## 490 LP002556 1 0.16 78.40
## 491 LP002560 1 0.16 78.56
## 492 LP002562 1 0.16 78.72
## 493 LP002571 1 0.16 78.88
## 494 LP002582 1 0.16 79.04
## 495 LP002585 1 0.16 79.20
## 496 LP002586 1 0.16 79.36
## 497 LP002587 1 0.16 79.52
## 498 LP002588 1 0.16 79.68
## 499 LP002600 1 0.16 79.84
## 500 LP002602 1 0.16 80.00
## 501 LP002603 1 0.16 80.16
## 502 LP002606 1 0.16 80.32
## 503 LP002615 1 0.16 80.48
## 504 LP002618 1 0.16 80.64
## 505 LP002619 1 0.16 80.80
## 506 LP002622 1 0.16 80.96
## 507 LP002624 1 0.16 81.12
## 508 LP002625 1 0.16 81.28
## 509 LP002626 1 0.16 81.44
## 510 LP002634 1 0.16 81.60
## 511 LP002637 1 0.16 81.76
## 512 LP002640 1 0.16 81.92
## 513 LP002643 1 0.16 82.08
## 514 LP002648 1 0.16 82.24
## 515 LP002652 1 0.16 82.40
## 516 LP002659 1 0.16 82.56
## 517 LP002670 1 0.16 82.72
## 518 LP002682 1 0.16 82.88
## 519 LP002683 1 0.16 83.04
## 520 LP002684 1 0.16 83.20
## 521 LP002689 1 0.16 83.36
## 522 LP002690 1 0.16 83.52
## 523 LP002692 1 0.16 83.68
## 524 LP002693 1 0.16 83.84
## 525 LP002697 1 0.16 84.00
## 526 LP002699 1 0.16 84.16
## 527 LP002705 1 0.16 84.32
## 528 LP002706 1 0.16 84.48
## 529 LP002714 1 0.16 84.64
## 530 LP002716 1 0.16 84.80
## 531 LP002717 1 0.16 84.96
## 532 LP002720 1 0.16 85.12
## 533 LP002723 1 0.16 85.28
## 534 LP002729 1 0.16 85.44
## 535 LP002731 1 0.16 85.60
## 536 LP002732 1 0.16 85.76
## 537 LP002734 1 0.16 85.92
## 538 LP002738 1 0.16 86.08
## 539 LP002739 1 0.16 86.24
## 540 LP002740 1 0.16 86.40
## 541 LP002741 1 0.16 86.56
## 542 LP002743 1 0.16 86.72
## 543 LP002753 1 0.16 86.88
## 544 LP002755 1 0.16 87.04
## 545 LP002757 1 0.16 87.20
## 546 LP002767 1 0.16 87.36
## 547 LP002768 1 0.16 87.52
## 548 LP002772 1 0.16 87.68
## 549 LP002776 1 0.16 87.84
## 550 LP002777 1 0.16 88.00
## 551 LP002778 1 0.16 88.16
## 552 LP002784 1 0.16 88.32
## 553 LP002785 1 0.16 88.48
## 554 LP002788 1 0.16 88.64
## 555 LP002789 1 0.16 88.80
## 556 LP002792 1 0.16 88.96
## 557 LP002794 1 0.16 89.12
## 558 LP002795 1 0.16 89.28
## 559 LP002798 1 0.16 89.44
## 560 LP002804 1 0.16 89.60
## 561 LP002807 1 0.16 89.76
## 562 LP002813 1 0.16 89.92
## 563 LP002820 1 0.16 90.08
## 564 LP002821 1 0.16 90.24
## 565 LP002832 1 0.16 90.40
## 566 LP002833 1 0.16 90.56
## 567 LP002836 1 0.16 90.72
## 568 LP002837 1 0.16 90.88
## 569 LP002840 1 0.16 91.04
## 570 LP002841 1 0.16 91.20
## 571 LP002842 1 0.16 91.36
## 572 LP002847 1 0.16 91.52
## 573 LP002855 1 0.16 91.68
## 574 LP002862 1 0.16 91.84
## 575 LP002863 1 0.16 92.00
## 576 LP002868 1 0.16 92.16
## 577 LP002872 1 0.16 92.32
## 578 LP002874 1 0.16 92.48
## 579 LP002877 1 0.16 92.64
## 580 LP002888 1 0.16 92.80
## 581 LP002892 1 0.16 92.96
## 582 LP002893 1 0.16 93.12
## 583 LP002894 1 0.16 93.28
## 584 LP002898 1 0.16 93.44
## 585 LP002911 1 0.16 93.60
## 586 LP002912 1 0.16 93.76
## 587 LP002916 1 0.16 93.92
## 588 LP002917 1 0.16 94.08
## 589 LP002925 1 0.16 94.24
## 590 LP002926 1 0.16 94.40
## 591 LP002928 1 0.16 94.56
## 592 LP002931 1 0.16 94.72
## 593 LP002933 1 0.16 94.88
## 594 LP002936 1 0.16 95.04
## 595 LP002938 1 0.16 95.20
## 596 LP002940 1 0.16 95.36
## 597 LP002941 1 0.16 95.52
## 598 LP002943 1 0.16 95.68
## 599 LP002945 1 0.16 95.84
## 600 LP002948 1 0.16 96.00
## 601 LP002949 1 0.16 96.16
## 602 LP002950 1 0.16 96.32
## 603 LP002953 1 0.16 96.48
## 604 LP002958 1 0.16 96.64
## 605 LP002959 1 0.16 96.80
## 606 LP002960 1 0.16 96.96
## 607 LP002961 1 0.16 97.12
## 608 LP002964 1 0.16 97.28
## 609 LP002974 1 0.16 97.44
## 610 LP002978 1 0.16 97.60
## 611 LP002979 1 0.16 97.76
## 612 LP002983 1 0.16 97.92
## 613 LP002984 1 0.16 98.08
## 614 LP002990 1 0.16 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Gender frequency percentage cumulative_perc
## 1 Male 489 79.64 79.64
## 2 Female 112 18.24 97.88
## 3 0 13 2.12 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Married frequency percentage cumulative_perc
## 1 Yes 398 64.82 64.82
## 2 No 213 34.69 99.51
## 3 0 3 0.49 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Dependents frequency percentage cumulative_perc
## 1 0 360 58.63 58.63
## 2 1 102 16.61 75.24
## 3 2 101 16.45 91.69
## 4 3+ 51 8.31 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Education frequency percentage cumulative_perc
## 1 Graduate 480 78.18 78.18
## 2 Not Graduate 134 21.82 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Self_Employed frequency percentage cumulative_perc
## 1 No 500 81.43 81.43
## 2 Yes 82 13.36 94.79
## 3 0 32 5.21 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Property_Area frequency percentage cumulative_perc
## 1 Semiurban 233 37.95 37.95
## 2 Urban 202 32.90 70.85
## 3 Rural 179 29.15 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## Loan_Status frequency percentage cumulative_perc
## 1 Y 422 68.73 68.73
## 2 N 192 31.27 100.00
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.
## data
##
## 13 Variables 614 Observations
## --------------------------------------------------------------------------------
## Loan_ID
## n missing distinct
## 614 0 614
##
## lowest : LP001002 LP001003 LP001005 LP001006 LP001008
## highest: LP002978 LP002979 LP002983 LP002984 LP002990
## --------------------------------------------------------------------------------
## Gender
## n missing distinct
## 614 0 3
##
## Value 0 Female Male
## Frequency 13 112 489
## Proportion 0.021 0.182 0.796
## --------------------------------------------------------------------------------
## Married
## n missing distinct
## 614 0 3
##
## Value 0 No Yes
## Frequency 3 213 398
## Proportion 0.005 0.347 0.648
## --------------------------------------------------------------------------------
## Dependents
## n missing distinct
## 614 0 4
##
## Value 0 1 2 3+
## Frequency 360 102 101 51
## Proportion 0.586 0.166 0.164 0.083
## --------------------------------------------------------------------------------
## Education
## n missing distinct
## 614 0 2
##
## Value Graduate Not Graduate
## Frequency 480 134
## Proportion 0.782 0.218
## --------------------------------------------------------------------------------
## Self_Employed
## n missing distinct
## 614 0 3
##
## Value 0 No Yes
## Frequency 32 500 82
## Proportion 0.052 0.814 0.134
## --------------------------------------------------------------------------------
## ApplicantIncome
## n missing distinct Info Mean Gmd .05 .10
## 614 0 505 1 5403 4183 1898 2216
## .25 .50 .75 .90 .95
## 2878 3812 5795 9460 14583
##
## lowest : 150 210 416 645 674, highest: 39147 39999 51763 63337 81000
## --------------------------------------------------------------------------------
## CoapplicantIncome
## n missing distinct Info Mean Gmd .05 .10
## 614 0 287 0.912 1621 2118 0 0
## .25 .50 .75 .90 .95
## 0 1188 2297 3782 4997
##
## lowest : 0.00 16.12 189.00 240.00 242.00
## highest: 10968.00 11300.00 20000.00 33837.00 41667.00
## --------------------------------------------------------------------------------
## LoanAmount
## n missing distinct Info Mean Gmd .05 .10
## 614 0 204 1 141.2 84.09 38.6 63.6
## .25 .50 .75 .90 .95
## 98.0 125.0 164.8 229.4 293.4
##
## lowest : 0 9 17 25 26, highest: 500 570 600 650 700
## --------------------------------------------------------------------------------
## Loan_Amount_Term
## n missing distinct Info Mean Gmd .05 .10
## 614 0 11 0.42 334.2 57.12 180 180
## .25 .50 .75 .90 .95
## 360 360 360 360 360
##
## lowest : 0 12 36 60 84, highest: 180 240 300 360 480
##
## Value 0 12 36 60 84 120 180 240 300 360 480
## Frequency 14 1 2 2 4 3 44 4 13 512 15
## Proportion 0.023 0.002 0.003 0.003 0.007 0.005 0.072 0.007 0.021 0.834 0.024
## --------------------------------------------------------------------------------
## Credit_History
## n missing distinct Info Sum Mean Gmd
## 614 0 2 0.525 475 0.7736 0.3508
##
## --------------------------------------------------------------------------------
## Property_Area
## n missing distinct
## 614 0 3
##
## Value Rural Semiurban Urban
## Frequency 179 233 202
## Proportion 0.292 0.379 0.329
## --------------------------------------------------------------------------------
## Loan_Status
## n missing distinct
## 614 0 2
##
## Value N Y
## Frequency 192 422
## Proportion 0.313 0.687
## --------------------------------------------------------------------------------
Python
Tugas 4
Lakukan pemeriksaan distribusi densitas menggunakan R dan Python pada setiap variabel kuantitatif dengan beberapa bagian sebagai berikut:
Univariat numerik
R
library(ks)
library(MASS)##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
fhat<-kde(x=da_numerik_1[,3])
plot_1<-plot(fhat,cont=50,col.cont=4,cont.lwd=2,xlab="LoanAmount",drawpoints=TRUE)fhat1<-kde(x=da_numerik_1[,2])
plot_2<-plot(fhat,cont=50,col.cont=4,cont.lwd=2,xlab="Coapplicant Income",drawpoints=TRUE)fhat2<-kde(x=da_numerik_1[,2])
plot_3<-plot(fhat,cont=50,col.cont=4,cont.lwd=2,xlab="AppCoapplicant Income",drawpoints=TRUE)fhat3<-kde(x=da_numerik_1[,2])
plot_4<-plot(fhat,cont=50,col.cont=4,cont.lwd=2,xlab="Applicant Income",drawpoints=TRUE)Python
sns.set_theme(style="darkgrid")
fig, axs = plt.subplots(2,2, figsize=(7,7))
sns.kdeplot(data=df, x="ApplicantIncome", color="skyblue", shade=True, ax=axs[0,0])
sns.kdeplot(data=df, x="CoapplicantIncome", color="teal", shade=True, ax=axs[0,1])
sns.kdeplot(data=df, x="LoanAmount", color="gold", shade=True, ax=axs[1,0])sns.distplot(df['CoapplicantIncome'], hist=True, kde=True,
bins=int(180/5), color = 'darkblue',
hist_kws={'edgecolor':'black'},
kde_kws={'linewidth': 2})sns.distplot(df['ApplicantIncome'], hist=True, kde=True,
bins=int(180/5), color = 'darkblue',
hist_kws={'edgecolor':'black'},
kde_kws={'linewidth': 2})sns.distplot(df['LoanAmount'], hist=True, kde=True,
bins=int(180/5), color = 'darkblue',
hist_kws={'edgecolor':'black'},
kde_kws={'linewidth': 2})Bivariat numerik
R
fhat4<-kde(x=da_numerik_1[,3:4])
plot(fhat4,display="filled.contour",cont=seq(10,90,by=10),lwd=1)plot(fhat4,display="persp",border=1)Python
plt.rcParams["figure.figsize"]=(8,6)
sns.set_theme(style="darkgrid")
sns.kdeplot(data = df,
x = "ApplicantIncome",
hue = "Property_Area",
cut = 0,
fill = True,
alpha = 0.4)
plt.show()plt.rcParams["figure.figsize"]=(8,6)
sns.set_theme(style="darkgrid")
sns.kdeplot(data = df,
x = "LoanAmount",
hue = "Loan_Status",
cut = 0,
fill = True,
alpha = 0.4)
plt.show()Multivariat numerik
R
fhat5<-kde(x=da_numerik_1[,1:3])
plot(fhat5)Python
data = df[["ApplicantIncome","CoapplicantIncome","LoanAmount","Loan_Status"]]
n = sns.pairplot(data,
kind = "scatter",
hue = "Loan_Status",
markers = ["o","s"],
palette = "flare")
plt.legend(loc="center right")
plt.show(n)data = df[["ApplicantIncome","CoapplicantIncome","LoanAmount","Self_Employed"]]
n = sns.pairplot(data,
kind = "scatter",
hue = "Self_Employed",
markers = ["D","s"],
palette = "bright")
plt.legend(loc="center right")
plt.show(n)Tugas 5
Lakukan proses pengujian Hipotesis menggunakan R dan Python pada setiap variabel kuantitatif dengan beberapa bagian sebagai berikut:
Hitunglah margin of error dan estimasi interval untuk proporsi peminjam bejenis kelamin perempuan dalam pada tingkat kepercayaan 95%.
R
k = sum(data_aa$Gender == "Female")
n = sum(count(data_aa))
pbar = k/n
SE = sqrt (pbar*(1-pbar)/n); SE## [1] 0.01685443
E = qnorm(.975)*SE; E## [1] 0.03303407
pbar + c(-E, E)## [1] 0.1538818 0.2199500
Python
pd.crosstab(df.Loan_Status, df.Gender)## Gender Female Male
## Loan_Status
## N 16 49
## Y 54 243
p_fm = 75/(37+75)
n=37+75
se_female = np.sqrt(p_fm * (1 - p_fm) / n)
print("Standard Error:",se_female)## Standard Error: 0.044443111813668223
Confidence Interval
z_score = 1.96
lcb = p_fm - z_score* se_female #lower limit of the CI
ucb = p_fm + z_score* se_female #upper limit of the CI
print("Lower:",lcb)## Lower: 0.5825343579880674
print("Upper:",ucb)## Upper: 0.7567513562976468
Margin of Error
ME = z_score*se_female
print("Margin of Error:",ME)## Margin of Error: 0.08710849915478971
import math
import scipy.stats as ss
P_Gender = df.Gender.value_counts() / len(df)
P_Female = P_Gender.Female
z_critical = ss.norm.ppf(q = 0.975)
margin_error = z_critical * math.sqrt(P_Female *(1 - P_Female)/len(df))
interval = (P_Female - margin_error, P_Female + margin_error)
print("Margin of Error :",margin_error)## Margin of Error : 0.040684190665113
print('Interval of Estimation :',interval)## Interval of Estimation : (0.15268597508074336, 0.23405435641096936)
Jika anda berencana menggunakan perkiraan proporsi 50% data konsumen berjenis kelamin perempuan, temukan ukuran sampel yang diperlukan untuk mencapai margin kesalahan 5% untuk data obeservasi pada tingkat kepercayaan 95%.
R
zstar = qnorm(.975)
p = 0.5
E = 0.05
zstar^2*p*(1-p)/E^2## [1] 384.1459
Python
p = 0.5
E = 0.05
z_score**2*p*(1-p)/E**2## 384.1599999999999
z_critical = ss.norm.ppf(q = 0.975)
p = 0.5
E = 0.05
sample_needed = int(z_critical**2*p*(1-p)/E**2)
print('ukuran sample yang dibutuhkan :', sample_needed)## ukuran sample yang dibutuhkan : 384
Lakukan pembuktian kebenaran assumsi dengan tingakat signifikansi 0.05, jika Bank mengklaim bahwa pinjaman rata-rata konsumen adalah:
- Lebih besar $ 150.
- Lebih kecil $ 150
- Sama dengan $ 150.
R
Lebih besar $ 150
set.seed(100)
Data1 <- sample_n(data_aa,30)
Data1\[ \begin{align*} H_0 &= >= $150 \\ H_1 &= ?? > $150 \\ \end{align*}\\ \]
mu0 = 150
xbar = mean(Data1$LoanAmount)
s = sd(Data1$LoanAmount)
n = sum(count(Data1))
t = (xbar-mu0)/(s/sqrt(n));t## [1] -2.520249
alpha = .05
t.alpha = qt(1-alpha, df=n-1)
t.alpha## [1] 1.699127
Lebih kecil $ 150
mu0 = 150
xbar = mean(Data1$LoanAmount)
s = sd(Data1$LoanAmount)
n = sum(count(Data1))
t = (xbar-mu0)/(s/sqrt(n));t## [1] -2.520249
alpha = .05
t.alpha = qt(1-alpha, df=n-1)
-t.alpha## [1] -1.699127
Sama dengan $ 150
mu0 = 150
xbar = mean(Data1$LoanAmount)
s = sd(Data1$LoanAmount)
n = sum(count(Data1))
t = (xbar-mu0)/(s/sqrt(n));t## [1] -2.520249
alpha = .05
t.alpha = qt(1-alpha, df=n-1)
t.alpha## [1] 1.699127
-t.alpha## [1] -1.699127
Python
mu0 = 150
xbar = df['LoanAmount'].mean()
s = df['LoanAmount'].std()
n = len(df)
t = (xbar-mu0)/(s/(n**(1/2)))
print("t hitung",t)## t hitung -12.048412160809425
- Lebih besar $ 150.
alpha = 0.05
t_alpha = ss.t.isf(alpha, n - 1)
print("t hitung:",t)## t hitung: -12.048412160809425
print("t table 0.5 one tail positive:",t_alpha)## t table 0.5 one tail positive: 1.6490855127686623
- Lebih kecil $ 150
alpha = 0.05
t_alpha = ss.t.isf(alpha, n - 1)
print("t hitung",t)## t hitung -12.048412160809425
print("t table 0.5 one tail negative:", -t_alpha)## t table 0.5 one tail negative: -1.6490855127686623
- Sama dengan $ 150.
alpha = 0.05
t_alpha = ss.t.isf(alpha/2, n-1)
print("t hitung:",t)## t hitung: -12.048412160809425
print("t table 0.5 two tail:",-t_alpha, t_alpha)## t table 0.5 two tail: -1.9665570854590666 1.9665570854590666
Lakukan pembuktian kebenaran assumsi dengan tingakat signifikansi 0.05, seperti diatas jika diketahui simpangan baku pinjaman adalah $ 85.
R
mu0 = 150
xbar = mean(Data1$LoanAmount)
s = 85
n = sum(count(Data1))
t = (xbar-mu0)/(s/sqrt(n));t## [1] -1.546511
alpha = .05
t.alpha = qt(1-alpha, df=n-1)
t.alpha## [1] 1.699127
Python
mu0 = 150
xbar = df['LoanAmount'].mean()
s_2 = 85
n = len(df)
t_2 = (xbar-mu0)/(s_2/(n**(1/2)))
print("t hitung:",t_2)## t hitung: -5.6157567343639965
- Lebih besar $ 150
alpha = 0.05
t_alpha = ss.t.isf(alpha, n - 1)
print("t hitung:",t_2)## t hitung: -5.6157567343639965
print("t table 0.5 one tail positive:",t_alpha)## t table 0.5 one tail positive: 1.6490855127686623
- Lebih kecil $ 150
alpha = 0.05
t_alpha = ss.t.isf(alpha, n - 1)
print("t hitung:",t_2)## t hitung: -5.6157567343639965
print("t table 0.5 one tail negative:", -t_alpha)## t table 0.5 one tail negative: -1.6490855127686623
- Sama dengan $ 150.
alpha = 0.05
t_alpha = ss.t.isf(alpha/2, n - 1)
print("t hitung:",t_2)## t hitung: -5.6157567343639965
print("t table 0.5 two tail:",-t_alpha, t_alpha)## t table 0.5 two tail: -1.9665570854590666 1.9665570854590666
Referensi
- ref 1
- ref 2
- ref 3