The dataset includes information about:
==================================================================================================================
# mendapatkan lokasi directory pengerjaan
getwd()
## [1] "D:/BINUS UNIVERSITY/SEMESTER 5/Bayessian"
Import dataset
# loading dataset
file_path <- "WA_Fn-UseC_-Telco-Customer-Churn.csv"
data <- read.csv(file_path, stringsAsFactors = FALSE)
# Menampilkan beberapa baris awal
kable(head(data))
| customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
| 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No |
| 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
| 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
| 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
| 9305-CDSKC | Female | 0 | No | No | 8 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 99.65 | 820.50 | Yes |
Checking Dataset
str(data)
## 'data.frame': 7043 obs. of 21 variables:
## $ customerID : chr "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ...
## $ gender : chr "Female" "Male" "Male" "Male" ...
## $ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Partner : chr "Yes" "No" "No" "No" ...
## $ Dependents : chr "No" "No" "No" "No" ...
## $ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
## $ PhoneService : chr "No" "Yes" "Yes" "No" ...
## $ MultipleLines : chr "No phone service" "No" "No" "No phone service" ...
## $ InternetService : chr "DSL" "DSL" "DSL" "DSL" ...
## $ OnlineSecurity : chr "No" "Yes" "Yes" "Yes" ...
## $ OnlineBackup : chr "Yes" "No" "Yes" "No" ...
## $ DeviceProtection: chr "No" "Yes" "No" "Yes" ...
## $ TechSupport : chr "No" "No" "No" "Yes" ...
## $ StreamingTV : chr "No" "No" "No" "No" ...
## $ StreamingMovies : chr "No" "No" "No" "No" ...
## $ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ...
## $ PaperlessBilling: chr "Yes" "No" "Yes" "No" ...
## $ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
## $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
## $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
## $ Churn : chr "No" "No" "Yes" "No" ...
Checking Null Value
# Mengecek jumlah nilai yang hilang
kable(colSums(is.na(data)))
| x | |
|---|---|
| customerID | 0 |
| gender | 0 |
| SeniorCitizen | 0 |
| Partner | 0 |
| Dependents | 0 |
| tenure | 0 |
| PhoneService | 0 |
| MultipleLines | 0 |
| InternetService | 0 |
| OnlineSecurity | 0 |
| OnlineBackup | 0 |
| DeviceProtection | 0 |
| TechSupport | 0 |
| StreamingTV | 0 |
| StreamingMovies | 0 |
| Contract | 0 |
| PaperlessBilling | 0 |
| PaymentMethod | 0 |
| MonthlyCharges | 0 |
| TotalCharges | 11 |
| Churn | 0 |
There is null value in feature “TotalCharge”
==================================================================================================================
Cleaning data
# Menghapus baris dengan NA pada 'TotalCharges'
cleaned_data <- na.omit(data)
# Memeriksa ulang dataset setelah pembersihan
str(cleaned_data)
## 'data.frame': 7032 obs. of 21 variables:
## $ customerID : chr "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ...
## $ gender : chr "Female" "Male" "Male" "Male" ...
## $ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Partner : chr "Yes" "No" "No" "No" ...
## $ Dependents : chr "No" "No" "No" "No" ...
## $ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
## $ PhoneService : chr "No" "Yes" "Yes" "No" ...
## $ MultipleLines : chr "No phone service" "No" "No" "No phone service" ...
## $ InternetService : chr "DSL" "DSL" "DSL" "DSL" ...
## $ OnlineSecurity : chr "No" "Yes" "Yes" "Yes" ...
## $ OnlineBackup : chr "Yes" "No" "Yes" "No" ...
## $ DeviceProtection: chr "No" "Yes" "No" "Yes" ...
## $ TechSupport : chr "No" "No" "No" "Yes" ...
## $ StreamingTV : chr "No" "No" "No" "No" ...
## $ StreamingMovies : chr "No" "No" "No" "No" ...
## $ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ...
## $ PaperlessBilling: chr "Yes" "No" "Yes" "No" ...
## $ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
## $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
## $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
## $ Churn : chr "No" "No" "Yes" "No" ...
## - attr(*, "na.action")= 'omit' Named int [1:11] 489 754 937 1083 1341 3332 3827 4381 5219 6671 ...
## ..- attr(*, "names")= chr [1:11] "489" "754" "937" "1083" ...
Save cleaned dataset
# Menyimpan dataset yang telah dibersihkan
write.csv(cleaned_data, "cleaned_telco_churn.csv", row.names = FALSE)
# Menampilkan lokasi file
cat("Dataset telah disimpan sebagai 'cleaned_telco_churn.csv'")
## Dataset telah disimpan sebagai 'cleaned_telco_churn.csv'
==================================================================================================================
#install.packages("brms")
#install.packages("loo") # Untuk model comparison
# loading dataset
file_path <- "cleaned_telco_churn.csv"
data <- read.csv(file_path, stringsAsFactors = FALSE)
A. Model 1 : Prior Uninformative
library(brms)
# Model pertama: Regresi logistik dengan prior default
model1 <- brm(
formula = Churn ~ tenure + MonthlyCharges + TotalCharges + SeniorCitizen,
family = bernoulli(link = "logit"),
data = data,
prior = set_prior("", class = "b"), # Uninformative prior
chains = 4, iter = 200, warmup = 10, seed = 123
)
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000177 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.77 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: No variance estimation is
## Chain 1: performed for num_warmup < 20
## Chain 1:
## Chain 1: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 1: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 1: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 1: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 1: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 1: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 1: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 1: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 1: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 1: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 1: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 1: Iteration: 200 / 200 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.006 seconds (Warm-up)
## Chain 1: 0.114 seconds (Sampling)
## Chain 1: 0.12 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0.000194 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.94 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: WARNING: No variance estimation is
## Chain 2: performed for num_warmup < 20
## Chain 2:
## Chain 2: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 2: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 2: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 2: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 2: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 2: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 2: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 2: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 2: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 2: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 2: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 2: Iteration: 200 / 200 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.004 seconds (Warm-up)
## Chain 2: 0.077 seconds (Sampling)
## Chain 2: 0.081 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0.000207 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.07 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: WARNING: No variance estimation is
## Chain 3: performed for num_warmup < 20
## Chain 3:
## Chain 3: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 3: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 3: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 3: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 3: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 3: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 3: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 3: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 3: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 3: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 3: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 3: Iteration: 200 / 200 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.005 seconds (Warm-up)
## Chain 3: 0.071 seconds (Sampling)
## Chain 3: 0.076 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0.000203 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 2.03 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: WARNING: No variance estimation is
## Chain 4: performed for num_warmup < 20
## Chain 4:
## Chain 4: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 4: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 4: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 4: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 4: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 4: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 4: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 4: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 4: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 4: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 4: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 4: Iteration: 200 / 200 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.005 seconds (Warm-up)
## Chain 4: 0.071 seconds (Sampling)
## Chain 4: 0.076 seconds (Total)
## Chain 4:
This model is intended to serve as a baseline without incorporating domain knowledge or prior assumptions.
A. Model 2 : Prior Informative
# Prior informative
prior2 <- c(
set_prior("normal(0, 5)", class = "b", coef = "MonthlyCharges"),
set_prior("normal(0, 10)", class = "b") # Prior lemah untuk lainnya
)
# Model kedua: Regresi logistik dengan prior informative
model2 <- brm(
formula = Churn ~ tenure + MonthlyCharges + TotalCharges + SeniorCitizen,
family = bernoulli(link = "logit"),
data = data,
prior = prior2,
chains = 4, iter = 200, warmup = 10, seed = 123
)
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000186 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.86 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: No variance estimation is
## Chain 1: performed for num_warmup < 20
## Chain 1:
## Chain 1: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 1: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 1: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 1: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 1: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 1: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 1: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 1: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 1: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 1: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 1: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 1: Iteration: 200 / 200 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.006 seconds (Warm-up)
## Chain 1: 0.114 seconds (Sampling)
## Chain 1: 0.12 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0.000202 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2.02 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: WARNING: No variance estimation is
## Chain 2: performed for num_warmup < 20
## Chain 2:
## Chain 2: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 2: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 2: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 2: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 2: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 2: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 2: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 2: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 2: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 2: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 2: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 2: Iteration: 200 / 200 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.004 seconds (Warm-up)
## Chain 2: 0.078 seconds (Sampling)
## Chain 2: 0.082 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0.000202 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.02 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: WARNING: No variance estimation is
## Chain 3: performed for num_warmup < 20
## Chain 3:
## Chain 3: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 3: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 3: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 3: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 3: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 3: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 3: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 3: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 3: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 3: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 3: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 3: Iteration: 200 / 200 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.005 seconds (Warm-up)
## Chain 3: 0.072 seconds (Sampling)
## Chain 3: 0.077 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0.000205 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 2.05 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: WARNING: No variance estimation is
## Chain 4: performed for num_warmup < 20
## Chain 4:
## Chain 4: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 4: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 4: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 4: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 4: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 4: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 4: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 4: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 4: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 4: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 4: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 4: Iteration: 200 / 200 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.005 seconds (Warm-up)
## Chain 4: 0.071 seconds (Sampling)
## Chain 4: 0.076 seconds (Total)
## Chain 4:
This model leverages prior information to refine predictions, enabling better comparison with the uninformative model.
==================================================================================================================
summary(model1) # R-hat dan konvergensi untuk model pertama
## Family: bernoulli
## Links: mu = logit
## Formula: Churn ~ tenure + MonthlyCharges + TotalCharges + SeniorCitizen
## Data: data (Number of observations: 7032)
## Draws: 4 chains, each with iter = 200; warmup = 10; thin = 1;
## total post-warmup draws = 760
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 9.86 39.95 -38.22 56.21 Inf 4 NA
## tenure 0.56 0.58 -0.42 1.05 Inf 4 NA
## MonthlyCharges -0.03 0.95 -1.14 1.07 Inf 4 NA
## TotalCharges -0.01 0.02 -0.04 0.01 Inf 4 NA
## SeniorCitizen 0.54 1.07 -1.03 1.81 Inf 4 NA
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
summary(model2) # R-hat dan konvergensi untuk model kedua
## Family: bernoulli
## Links: mu = logit
## Formula: Churn ~ tenure + MonthlyCharges + TotalCharges + SeniorCitizen
## Data: data (Number of observations: 7032)
## Draws: 4 chains, each with iter = 200; warmup = 10; thin = 1;
## total post-warmup draws = 760
##
## Regression Coefficients:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 9.86 39.95 -38.22 56.21 Inf 4 NA
## tenure 0.56 0.58 -0.42 1.05 Inf 4 NA
## MonthlyCharges -0.03 0.95 -1.14 1.07 Inf 4 NA
## TotalCharges -0.01 0.02 -0.04 0.01 Inf 4 NA
## SeniorCitizen 0.54 1.07 -1.03 1.81 Inf 4 NA
##
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
# Plot trace untuk masing-masing model
plot(model1)
plot(model2)
compare 2 model
==================================================================================================================
In conclusion, the analysis of social assistance distribution in Bandung City reveals that these programs are not only well-targeted but also effective in reaching the most impoverished areas. This alignment with the Sustainable Development Goal of No Poverty underscores the city government’s commitment to addressing socio-economic disparities. Moving forward, continuous monitoring and evaluation will be crucial to ensure these efforts remain responsive to evolving community needs and contribute effectively to long-term poverty alleviation strategies.
==================================================================================================================
==================================================================================================================