A. Data Description

The dataset includes information about:

==================================================================================================================

B. Data Analysis:

show directory location
# mendapatkan lokasi directory pengerjaan
getwd()
## [1] "D:/BINUS UNIVERSITY/SEMESTER 5/Bayessian"

Import dataset

# loading dataset
file_path <- "WA_Fn-UseC_-Telco-Customer-Churn.csv"
data <- read.csv(file_path, stringsAsFactors = FALSE)

# Menampilkan beberapa baris awal 
kable(head(data))
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
7590-VHVEG Female 0 Yes No 1 No No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No
5575-GNVDE Male 0 No No 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.50 No
3668-QPYBK Male 0 No No 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
7795-CFOCW Male 0 No No 45 No No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
9237-HQITU Female 0 No No 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes
9305-CDSKC Female 0 No No 8 Yes Yes Fiber optic No No Yes No Yes Yes Month-to-month Yes Electronic check 99.65 820.50 Yes

Checking Dataset

str(data)
## 'data.frame':    7043 obs. of  21 variables:
##  $ customerID      : chr  "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ...
##  $ gender          : chr  "Female" "Male" "Male" "Male" ...
##  $ SeniorCitizen   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Partner         : chr  "Yes" "No" "No" "No" ...
##  $ Dependents      : chr  "No" "No" "No" "No" ...
##  $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
##  $ PhoneService    : chr  "No" "Yes" "Yes" "No" ...
##  $ MultipleLines   : chr  "No phone service" "No" "No" "No phone service" ...
##  $ InternetService : chr  "DSL" "DSL" "DSL" "DSL" ...
##  $ OnlineSecurity  : chr  "No" "Yes" "Yes" "Yes" ...
##  $ OnlineBackup    : chr  "Yes" "No" "Yes" "No" ...
##  $ DeviceProtection: chr  "No" "Yes" "No" "Yes" ...
##  $ TechSupport     : chr  "No" "No" "No" "Yes" ...
##  $ StreamingTV     : chr  "No" "No" "No" "No" ...
##  $ StreamingMovies : chr  "No" "No" "No" "No" ...
##  $ Contract        : chr  "Month-to-month" "One year" "Month-to-month" "One year" ...
##  $ PaperlessBilling: chr  "Yes" "No" "Yes" "No" ...
##  $ PaymentMethod   : chr  "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
##  $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
##  $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
##  $ Churn           : chr  "No" "No" "Yes" "No" ...

Checking Null Value

# Mengecek jumlah nilai yang hilang
kable(colSums(is.na(data)))
x
customerID 0
gender 0
SeniorCitizen 0
Partner 0
Dependents 0
tenure 0
PhoneService 0
MultipleLines 0
InternetService 0
OnlineSecurity 0
OnlineBackup 0
DeviceProtection 0
TechSupport 0
StreamingTV 0
StreamingMovies 0
Contract 0
PaperlessBilling 0
PaymentMethod 0
MonthlyCharges 0
TotalCharges 11
Churn 0

There is null value in feature “TotalCharge”

==================================================================================================================

C. Data Processing:

Cleaning data

# Menghapus baris dengan NA pada 'TotalCharges'
cleaned_data <- na.omit(data)

# Memeriksa ulang dataset setelah pembersihan
str(cleaned_data)
## 'data.frame':    7032 obs. of  21 variables:
##  $ customerID      : chr  "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ...
##  $ gender          : chr  "Female" "Male" "Male" "Male" ...
##  $ SeniorCitizen   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Partner         : chr  "Yes" "No" "No" "No" ...
##  $ Dependents      : chr  "No" "No" "No" "No" ...
##  $ tenure          : int  1 34 2 45 2 8 22 10 28 62 ...
##  $ PhoneService    : chr  "No" "Yes" "Yes" "No" ...
##  $ MultipleLines   : chr  "No phone service" "No" "No" "No phone service" ...
##  $ InternetService : chr  "DSL" "DSL" "DSL" "DSL" ...
##  $ OnlineSecurity  : chr  "No" "Yes" "Yes" "Yes" ...
##  $ OnlineBackup    : chr  "Yes" "No" "Yes" "No" ...
##  $ DeviceProtection: chr  "No" "Yes" "No" "Yes" ...
##  $ TechSupport     : chr  "No" "No" "No" "Yes" ...
##  $ StreamingTV     : chr  "No" "No" "No" "No" ...
##  $ StreamingMovies : chr  "No" "No" "No" "No" ...
##  $ Contract        : chr  "Month-to-month" "One year" "Month-to-month" "One year" ...
##  $ PaperlessBilling: chr  "Yes" "No" "Yes" "No" ...
##  $ PaymentMethod   : chr  "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
##  $ MonthlyCharges  : num  29.9 57 53.9 42.3 70.7 ...
##  $ TotalCharges    : num  29.9 1889.5 108.2 1840.8 151.7 ...
##  $ Churn           : chr  "No" "No" "Yes" "No" ...
##  - attr(*, "na.action")= 'omit' Named int [1:11] 489 754 937 1083 1341 3332 3827 4381 5219 6671 ...
##   ..- attr(*, "names")= chr [1:11] "489" "754" "937" "1083" ...

Save cleaned dataset

# Menyimpan dataset yang telah dibersihkan
write.csv(cleaned_data, "cleaned_telco_churn.csv", row.names = FALSE)

# Menampilkan lokasi file
cat("Dataset telah disimpan sebagai 'cleaned_telco_churn.csv'")
## Dataset telah disimpan sebagai 'cleaned_telco_churn.csv'

==================================================================================================================

D. Modeling:

show package
#install.packages("brms")
#install.packages("loo") # Untuk model comparison
show code import cleaned_data
# loading dataset
file_path <- "cleaned_telco_churn.csv"
data <- read.csv(file_path, stringsAsFactors = FALSE)

A. Model 1 : Prior Uninformative

library(brms)

# Model pertama: Regresi logistik dengan prior default
model1 <- brm(
  formula = Churn ~ tenure + MonthlyCharges + TotalCharges + SeniorCitizen,
  family = bernoulli(link = "logit"),
  data = data,
  prior = set_prior("", class = "b"), # Uninformative prior
  chains = 4, iter = 200, warmup = 10, seed = 123
)
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000182 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.82 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: No variance estimation is
## Chain 1:          performed for num_warmup < 20
## Chain 1: 
## Chain 1: Iteration:   1 / 200 [  0%]  (Warmup)
## Chain 1: Iteration:  11 / 200 [  5%]  (Sampling)
## Chain 1: Iteration:  30 / 200 [ 15%]  (Sampling)
## Chain 1: Iteration:  50 / 200 [ 25%]  (Sampling)
## Chain 1: Iteration:  70 / 200 [ 35%]  (Sampling)
## Chain 1: Iteration:  90 / 200 [ 45%]  (Sampling)
## Chain 1: Iteration: 110 / 200 [ 55%]  (Sampling)
## Chain 1: Iteration: 130 / 200 [ 65%]  (Sampling)
## Chain 1: Iteration: 150 / 200 [ 75%]  (Sampling)
## Chain 1: Iteration: 170 / 200 [ 85%]  (Sampling)
## Chain 1: Iteration: 190 / 200 [ 95%]  (Sampling)
## Chain 1: Iteration: 200 / 200 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.006 seconds (Warm-up)
## Chain 1:                0.117 seconds (Sampling)
## Chain 1:                0.123 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000201 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2.01 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: WARNING: No variance estimation is
## Chain 2:          performed for num_warmup < 20
## Chain 2: 
## Chain 2: Iteration:   1 / 200 [  0%]  (Warmup)
## Chain 2: Iteration:  11 / 200 [  5%]  (Sampling)
## Chain 2: Iteration:  30 / 200 [ 15%]  (Sampling)
## Chain 2: Iteration:  50 / 200 [ 25%]  (Sampling)
## Chain 2: Iteration:  70 / 200 [ 35%]  (Sampling)
## Chain 2: Iteration:  90 / 200 [ 45%]  (Sampling)
## Chain 2: Iteration: 110 / 200 [ 55%]  (Sampling)
## Chain 2: Iteration: 130 / 200 [ 65%]  (Sampling)
## Chain 2: Iteration: 150 / 200 [ 75%]  (Sampling)
## Chain 2: Iteration: 170 / 200 [ 85%]  (Sampling)
## Chain 2: Iteration: 190 / 200 [ 95%]  (Sampling)
## Chain 2: Iteration: 200 / 200 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.004 seconds (Warm-up)
## Chain 2:                0.077 seconds (Sampling)
## Chain 2:                0.081 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0.000207 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.07 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: WARNING: No variance estimation is
## Chain 3:          performed for num_warmup < 20
## Chain 3: 
## Chain 3: Iteration:   1 / 200 [  0%]  (Warmup)
## Chain 3: Iteration:  11 / 200 [  5%]  (Sampling)
## Chain 3: Iteration:  30 / 200 [ 15%]  (Sampling)
## Chain 3: Iteration:  50 / 200 [ 25%]  (Sampling)
## Chain 3: Iteration:  70 / 200 [ 35%]  (Sampling)
## Chain 3: Iteration:  90 / 200 [ 45%]  (Sampling)
## Chain 3: Iteration: 110 / 200 [ 55%]  (Sampling)
## Chain 3: Iteration: 130 / 200 [ 65%]  (Sampling)
## Chain 3: Iteration: 150 / 200 [ 75%]  (Sampling)
## Chain 3: Iteration: 170 / 200 [ 85%]  (Sampling)
## Chain 3: Iteration: 190 / 200 [ 95%]  (Sampling)
## Chain 3: Iteration: 200 / 200 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.005 seconds (Warm-up)
## Chain 3:                0.075 seconds (Sampling)
## Chain 3:                0.08 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0.000223 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 2.23 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: WARNING: No variance estimation is
## Chain 4:          performed for num_warmup < 20
## Chain 4: 
## Chain 4: Iteration:   1 / 200 [  0%]  (Warmup)
## Chain 4: Iteration:  11 / 200 [  5%]  (Sampling)
## Chain 4: Iteration:  30 / 200 [ 15%]  (Sampling)
## Chain 4: Iteration:  50 / 200 [ 25%]  (Sampling)
## Chain 4: Iteration:  70 / 200 [ 35%]  (Sampling)
## Chain 4: Iteration:  90 / 200 [ 45%]  (Sampling)
## Chain 4: Iteration: 110 / 200 [ 55%]  (Sampling)
## Chain 4: Iteration: 130 / 200 [ 65%]  (Sampling)
## Chain 4: Iteration: 150 / 200 [ 75%]  (Sampling)
## Chain 4: Iteration: 170 / 200 [ 85%]  (Sampling)
## Chain 4: Iteration: 190 / 200 [ 95%]  (Sampling)
## Chain 4: Iteration: 200 / 200 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.005 seconds (Warm-up)
## Chain 4:                0.071 seconds (Sampling)
## Chain 4:                0.076 seconds (Total)
## Chain 4:

Explanation :

  • Description: This model uses a logistic regression approach to predict customer churn (Churn). It applies uninformative priors, meaning it does not include strong assumptions about the coefficients beforehand. The priors for all predictors are weak and set by default.
  • Parameters:
    • tenure: The number of months a customer has been with the company.
    • MonthlyCharges: The monthly charges for the customer.
    • TotalCharges: The total charges incurred by the customer.
    • SeniorCitizen: Indicates whether the customer is a senior citizen (1 = Yes, 0 = No).
  • Settings:
    • Chains: 4 (to ensure robust sampling).
    • Iterations: 2000 (with 1000 warm-up iterations for convergence).
    • Seed: 123 (to ensure reproducibility).

This model is intended to serve as a baseline without incorporating domain knowledge or prior assumptions.

A. Model 2 : Prior Informative

# Prior informative
prior2 <- c(
  set_prior("normal(0, 5)", class = "b", coef = "MonthlyCharges"),
  set_prior("normal(0, 10)", class = "b") # Prior lemah untuk lainnya
)

# Model kedua: Regresi logistik dengan prior informative
model2 <- brm(
  formula = Churn ~ tenure + MonthlyCharges + TotalCharges + SeniorCitizen,
  family = bernoulli(link = "logit"),
  data = data,
  prior = prior2,
  chains = 4, iter = 200, warmup = 10, seed = 123
)
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000183 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.83 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: WARNING: No variance estimation is
## Chain 1:          performed for num_warmup < 20
## Chain 1: 
## Chain 1: Iteration:   1 / 200 [  0%]  (Warmup)
## Chain 1: Iteration:  11 / 200 [  5%]  (Sampling)
## Chain 1: Iteration:  30 / 200 [ 15%]  (Sampling)
## Chain 1: Iteration:  50 / 200 [ 25%]  (Sampling)
## Chain 1: Iteration:  70 / 200 [ 35%]  (Sampling)
## Chain 1: Iteration:  90 / 200 [ 45%]  (Sampling)
## Chain 1: Iteration: 110 / 200 [ 55%]  (Sampling)
## Chain 1: Iteration: 130 / 200 [ 65%]  (Sampling)
## Chain 1: Iteration: 150 / 200 [ 75%]  (Sampling)
## Chain 1: Iteration: 170 / 200 [ 85%]  (Sampling)
## Chain 1: Iteration: 190 / 200 [ 95%]  (Sampling)
## Chain 1: Iteration: 200 / 200 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.006 seconds (Warm-up)
## Chain 1:                0.115 seconds (Sampling)
## Chain 1:                0.121 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 0.000201 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2.01 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: WARNING: No variance estimation is
## Chain 2:          performed for num_warmup < 20
## Chain 2: 
## Chain 2: Iteration:   1 / 200 [  0%]  (Warmup)
## Chain 2: Iteration:  11 / 200 [  5%]  (Sampling)
## Chain 2: Iteration:  30 / 200 [ 15%]  (Sampling)
## Chain 2: Iteration:  50 / 200 [ 25%]  (Sampling)
## Chain 2: Iteration:  70 / 200 [ 35%]  (Sampling)
## Chain 2: Iteration:  90 / 200 [ 45%]  (Sampling)
## Chain 2: Iteration: 110 / 200 [ 55%]  (Sampling)
## Chain 2: Iteration: 130 / 200 [ 65%]  (Sampling)
## Chain 2: Iteration: 150 / 200 [ 75%]  (Sampling)
## Chain 2: Iteration: 170 / 200 [ 85%]  (Sampling)
## Chain 2: Iteration: 190 / 200 [ 95%]  (Sampling)
## Chain 2: Iteration: 200 / 200 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.004 seconds (Warm-up)
## Chain 2:                0.077 seconds (Sampling)
## Chain 2:                0.081 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 0.000208 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.08 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: WARNING: No variance estimation is
## Chain 3:          performed for num_warmup < 20
## Chain 3: 
## Chain 3: Iteration:   1 / 200 [  0%]  (Warmup)
## Chain 3: Iteration:  11 / 200 [  5%]  (Sampling)
## Chain 3: Iteration:  30 / 200 [ 15%]  (Sampling)
## Chain 3: Iteration:  50 / 200 [ 25%]  (Sampling)
## Chain 3: Iteration:  70 / 200 [ 35%]  (Sampling)
## Chain 3: Iteration:  90 / 200 [ 45%]  (Sampling)
## Chain 3: Iteration: 110 / 200 [ 55%]  (Sampling)
## Chain 3: Iteration: 130 / 200 [ 65%]  (Sampling)
## Chain 3: Iteration: 150 / 200 [ 75%]  (Sampling)
## Chain 3: Iteration: 170 / 200 [ 85%]  (Sampling)
## Chain 3: Iteration: 190 / 200 [ 95%]  (Sampling)
## Chain 3: Iteration: 200 / 200 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.006 seconds (Warm-up)
## Chain 3:                0.075 seconds (Sampling)
## Chain 3:                0.081 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 0.00021 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 2.1 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: WARNING: No variance estimation is
## Chain 4:          performed for num_warmup < 20
## Chain 4: 
## Chain 4: Iteration:   1 / 200 [  0%]  (Warmup)
## Chain 4: Iteration:  11 / 200 [  5%]  (Sampling)
## Chain 4: Iteration:  30 / 200 [ 15%]  (Sampling)
## Chain 4: Iteration:  50 / 200 [ 25%]  (Sampling)
## Chain 4: Iteration:  70 / 200 [ 35%]  (Sampling)
## Chain 4: Iteration:  90 / 200 [ 45%]  (Sampling)
## Chain 4: Iteration: 110 / 200 [ 55%]  (Sampling)
## Chain 4: Iteration: 130 / 200 [ 65%]  (Sampling)
## Chain 4: Iteration: 150 / 200 [ 75%]  (Sampling)
## Chain 4: Iteration: 170 / 200 [ 85%]  (Sampling)
## Chain 4: Iteration: 190 / 200 [ 95%]  (Sampling)
## Chain 4: Iteration: 200 / 200 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.005 seconds (Warm-up)
## Chain 4:                0.071 seconds (Sampling)
## Chain 4:                0.076 seconds (Total)
## Chain 4:

Explanation :

  • Description: This model is similar to Model 1 in structure but incorporates informative priors. These priors are based on domain knowledge or previous studies. For example, it assumes that MonthlyCharges significantly influences churn, and this influence is captured with a Normal(0, 5) prior for its coefficient.
  • Parameters:
    • The predictors (tenure, MonthlyCharges, TotalCharges, SeniorCitizen) are the same as in Model 1.
    • Informative prior for MonthlyCharges: Normal(0, 5).
    • Weak priors for other predictors: Normal(0, 10).
  • Settings:
    • Chains: 4.
    • Iterations: 2000 (1000 warm-up).
    • Seed: 123.

This model leverages prior information to refine predictions, enabling better comparison with the uninformative model.

==================================================================================================================

F. Discussion:

The results of this statistical analysis indicate that social assistance programs in Bandung City can be considered effective in targeting districts that are most in need. The strong correlation between the number of poor families and the amount of assistance suggests that aid is allocated proportionally based on poverty levels in each district. Additionally, the strong correlations between the number of poor families and each type of assistance indicate that all types of assistance (PKH, BPNT, PBI-JK) contribute effectively to targeting poor families.

These findings align with the positive linear relationship observed in the scatter plot we created earlier, showing that there is a direct relationship between the number of poor families per district and the amount of assistance provided.

From our analysis results, it is evident that the Bandung government is implementing data-driven policies. They utilize poverty distribution maps at the district level to tailor the allocation of assistance to each district accordingly. This approach ensures that resources are directed where they are most needed, reflecting a strategic and informed approach to social assistance distribution.

However, this analysis only measures the linear relationship between the variables of social assistance and the distribution of poor families across districts. Other factors outside the scope of the available dataset, such as community participation rates, service quality, and the impact of assistance on family well-being, were not considered in this analysis.

==================================================================================================================

G. Conclusion:

In conclusion, the analysis of social assistance distribution in Bandung City reveals that these programs are not only well-targeted but also effective in reaching the most impoverished areas. This alignment with the Sustainable Development Goal of No Poverty underscores the city government’s commitment to addressing socio-economic disparities. Moving forward, continuous monitoring and evaluation will be crucial to ensure these efforts remain responsive to evolving community needs and contribute effectively to long-term poverty alleviation strategies.

==================================================================================================================

H. Refrence:

==================================================================================================================