The dataset includes information about:
==================================================================================================================
# mendapatkan lokasi directory pengerjaan
getwd()
## [1] "D:/BINUS UNIVERSITY/SEMESTER 5/Bayessian"
Import dataset
# loading dataset
file_path <- "WA_Fn-UseC_-Telco-Customer-Churn.csv"
data <- read.csv(file_path, stringsAsFactors = FALSE)
# Menampilkan beberapa baris awal
kable(head(data))
| customerID | gender | SeniorCitizen | Partner | Dependents | tenure | PhoneService | MultipleLines | InternetService | OnlineSecurity | OnlineBackup | DeviceProtection | TechSupport | StreamingTV | StreamingMovies | Contract | PaperlessBilling | PaymentMethod | MonthlyCharges | TotalCharges | Churn |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7590-VHVEG | Female | 0 | Yes | No | 1 | No | No phone service | DSL | No | Yes | No | No | No | No | Month-to-month | Yes | Electronic check | 29.85 | 29.85 | No |
| 5575-GNVDE | Male | 0 | No | No | 34 | Yes | No | DSL | Yes | No | Yes | No | No | No | One year | No | Mailed check | 56.95 | 1889.50 | No |
| 3668-QPYBK | Male | 0 | No | No | 2 | Yes | No | DSL | Yes | Yes | No | No | No | No | Month-to-month | Yes | Mailed check | 53.85 | 108.15 | Yes |
| 7795-CFOCW | Male | 0 | No | No | 45 | No | No phone service | DSL | Yes | No | Yes | Yes | No | No | One year | No | Bank transfer (automatic) | 42.30 | 1840.75 | No |
| 9237-HQITU | Female | 0 | No | No | 2 | Yes | No | Fiber optic | No | No | No | No | No | No | Month-to-month | Yes | Electronic check | 70.70 | 151.65 | Yes |
| 9305-CDSKC | Female | 0 | No | No | 8 | Yes | Yes | Fiber optic | No | No | Yes | No | Yes | Yes | Month-to-month | Yes | Electronic check | 99.65 | 820.50 | Yes |
Checking Dataset
str(data)
## 'data.frame': 7043 obs. of 21 variables:
## $ customerID : chr "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ...
## $ gender : chr "Female" "Male" "Male" "Male" ...
## $ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Partner : chr "Yes" "No" "No" "No" ...
## $ Dependents : chr "No" "No" "No" "No" ...
## $ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
## $ PhoneService : chr "No" "Yes" "Yes" "No" ...
## $ MultipleLines : chr "No phone service" "No" "No" "No phone service" ...
## $ InternetService : chr "DSL" "DSL" "DSL" "DSL" ...
## $ OnlineSecurity : chr "No" "Yes" "Yes" "Yes" ...
## $ OnlineBackup : chr "Yes" "No" "Yes" "No" ...
## $ DeviceProtection: chr "No" "Yes" "No" "Yes" ...
## $ TechSupport : chr "No" "No" "No" "Yes" ...
## $ StreamingTV : chr "No" "No" "No" "No" ...
## $ StreamingMovies : chr "No" "No" "No" "No" ...
## $ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ...
## $ PaperlessBilling: chr "Yes" "No" "Yes" "No" ...
## $ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
## $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
## $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
## $ Churn : chr "No" "No" "Yes" "No" ...
Checking Null Value
# Mengecek jumlah nilai yang hilang
kable(colSums(is.na(data)))
| x | |
|---|---|
| customerID | 0 |
| gender | 0 |
| SeniorCitizen | 0 |
| Partner | 0 |
| Dependents | 0 |
| tenure | 0 |
| PhoneService | 0 |
| MultipleLines | 0 |
| InternetService | 0 |
| OnlineSecurity | 0 |
| OnlineBackup | 0 |
| DeviceProtection | 0 |
| TechSupport | 0 |
| StreamingTV | 0 |
| StreamingMovies | 0 |
| Contract | 0 |
| PaperlessBilling | 0 |
| PaymentMethod | 0 |
| MonthlyCharges | 0 |
| TotalCharges | 11 |
| Churn | 0 |
There is null value in feature “TotalCharge”
==================================================================================================================
Cleaning data
# Menghapus baris dengan NA pada 'TotalCharges'
cleaned_data <- na.omit(data)
# Memeriksa ulang dataset setelah pembersihan
str(cleaned_data)
## 'data.frame': 7032 obs. of 21 variables:
## $ customerID : chr "7590-VHVEG" "5575-GNVDE" "3668-QPYBK" "7795-CFOCW" ...
## $ gender : chr "Female" "Male" "Male" "Male" ...
## $ SeniorCitizen : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Partner : chr "Yes" "No" "No" "No" ...
## $ Dependents : chr "No" "No" "No" "No" ...
## $ tenure : int 1 34 2 45 2 8 22 10 28 62 ...
## $ PhoneService : chr "No" "Yes" "Yes" "No" ...
## $ MultipleLines : chr "No phone service" "No" "No" "No phone service" ...
## $ InternetService : chr "DSL" "DSL" "DSL" "DSL" ...
## $ OnlineSecurity : chr "No" "Yes" "Yes" "Yes" ...
## $ OnlineBackup : chr "Yes" "No" "Yes" "No" ...
## $ DeviceProtection: chr "No" "Yes" "No" "Yes" ...
## $ TechSupport : chr "No" "No" "No" "Yes" ...
## $ StreamingTV : chr "No" "No" "No" "No" ...
## $ StreamingMovies : chr "No" "No" "No" "No" ...
## $ Contract : chr "Month-to-month" "One year" "Month-to-month" "One year" ...
## $ PaperlessBilling: chr "Yes" "No" "Yes" "No" ...
## $ PaymentMethod : chr "Electronic check" "Mailed check" "Mailed check" "Bank transfer (automatic)" ...
## $ MonthlyCharges : num 29.9 57 53.9 42.3 70.7 ...
## $ TotalCharges : num 29.9 1889.5 108.2 1840.8 151.7 ...
## $ Churn : chr "No" "No" "Yes" "No" ...
## - attr(*, "na.action")= 'omit' Named int [1:11] 489 754 937 1083 1341 3332 3827 4381 5219 6671 ...
## ..- attr(*, "names")= chr [1:11] "489" "754" "937" "1083" ...
Save cleaned dataset
# Menyimpan dataset yang telah dibersihkan
write.csv(cleaned_data, "cleaned_telco_churn.csv", row.names = FALSE)
# Menampilkan lokasi file
cat("Dataset telah disimpan sebagai 'cleaned_telco_churn.csv'")
## Dataset telah disimpan sebagai 'cleaned_telco_churn.csv'
==================================================================================================================
#install.packages("brms")
#install.packages("loo") # Untuk model comparison
# loading dataset
file_path <- "cleaned_telco_churn.csv"
data <- read.csv(file_path, stringsAsFactors = FALSE)
A. Model 1 : Prior Uninformative
library(brms)
# Model pertama: Regresi logistik dengan prior default
model1 <- brm(
formula = Churn ~ tenure + MonthlyCharges + TotalCharges + SeniorCitizen,
family = bernoulli(link = "logit"),
data = data,
prior = set_prior("", class = "b"), # Uninformative prior
chains = 4, iter = 200, warmup = 10, seed = 123
)
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000182 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.82 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: No variance estimation is
## Chain 1: performed for num_warmup < 20
## Chain 1:
## Chain 1: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 1: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 1: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 1: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 1: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 1: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 1: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 1: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 1: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 1: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 1: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 1: Iteration: 200 / 200 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.006 seconds (Warm-up)
## Chain 1: 0.117 seconds (Sampling)
## Chain 1: 0.123 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0.000201 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2.01 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: WARNING: No variance estimation is
## Chain 2: performed for num_warmup < 20
## Chain 2:
## Chain 2: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 2: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 2: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 2: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 2: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 2: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 2: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 2: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 2: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 2: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 2: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 2: Iteration: 200 / 200 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.004 seconds (Warm-up)
## Chain 2: 0.077 seconds (Sampling)
## Chain 2: 0.081 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0.000207 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.07 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: WARNING: No variance estimation is
## Chain 3: performed for num_warmup < 20
## Chain 3:
## Chain 3: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 3: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 3: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 3: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 3: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 3: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 3: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 3: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 3: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 3: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 3: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 3: Iteration: 200 / 200 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.005 seconds (Warm-up)
## Chain 3: 0.075 seconds (Sampling)
## Chain 3: 0.08 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0.000223 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 2.23 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: WARNING: No variance estimation is
## Chain 4: performed for num_warmup < 20
## Chain 4:
## Chain 4: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 4: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 4: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 4: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 4: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 4: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 4: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 4: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 4: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 4: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 4: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 4: Iteration: 200 / 200 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.005 seconds (Warm-up)
## Chain 4: 0.071 seconds (Sampling)
## Chain 4: 0.076 seconds (Total)
## Chain 4:
This model is intended to serve as a baseline without incorporating domain knowledge or prior assumptions.
A. Model 2 : Prior Informative
# Prior informative
prior2 <- c(
set_prior("normal(0, 5)", class = "b", coef = "MonthlyCharges"),
set_prior("normal(0, 10)", class = "b") # Prior lemah untuk lainnya
)
# Model kedua: Regresi logistik dengan prior informative
model2 <- brm(
formula = Churn ~ tenure + MonthlyCharges + TotalCharges + SeniorCitizen,
family = bernoulli(link = "logit"),
data = data,
prior = prior2,
chains = 4, iter = 200, warmup = 10, seed = 123
)
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000183 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.83 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: WARNING: No variance estimation is
## Chain 1: performed for num_warmup < 20
## Chain 1:
## Chain 1: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 1: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 1: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 1: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 1: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 1: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 1: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 1: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 1: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 1: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 1: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 1: Iteration: 200 / 200 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.006 seconds (Warm-up)
## Chain 1: 0.115 seconds (Sampling)
## Chain 1: 0.121 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 0.000201 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 2.01 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: WARNING: No variance estimation is
## Chain 2: performed for num_warmup < 20
## Chain 2:
## Chain 2: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 2: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 2: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 2: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 2: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 2: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 2: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 2: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 2: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 2: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 2: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 2: Iteration: 200 / 200 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.004 seconds (Warm-up)
## Chain 2: 0.077 seconds (Sampling)
## Chain 2: 0.081 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 0.000208 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 2.08 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: WARNING: No variance estimation is
## Chain 3: performed for num_warmup < 20
## Chain 3:
## Chain 3: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 3: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 3: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 3: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 3: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 3: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 3: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 3: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 3: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 3: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 3: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 3: Iteration: 200 / 200 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.006 seconds (Warm-up)
## Chain 3: 0.075 seconds (Sampling)
## Chain 3: 0.081 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 0.00021 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 2.1 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: WARNING: No variance estimation is
## Chain 4: performed for num_warmup < 20
## Chain 4:
## Chain 4: Iteration: 1 / 200 [ 0%] (Warmup)
## Chain 4: Iteration: 11 / 200 [ 5%] (Sampling)
## Chain 4: Iteration: 30 / 200 [ 15%] (Sampling)
## Chain 4: Iteration: 50 / 200 [ 25%] (Sampling)
## Chain 4: Iteration: 70 / 200 [ 35%] (Sampling)
## Chain 4: Iteration: 90 / 200 [ 45%] (Sampling)
## Chain 4: Iteration: 110 / 200 [ 55%] (Sampling)
## Chain 4: Iteration: 130 / 200 [ 65%] (Sampling)
## Chain 4: Iteration: 150 / 200 [ 75%] (Sampling)
## Chain 4: Iteration: 170 / 200 [ 85%] (Sampling)
## Chain 4: Iteration: 190 / 200 [ 95%] (Sampling)
## Chain 4: Iteration: 200 / 200 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.005 seconds (Warm-up)
## Chain 4: 0.071 seconds (Sampling)
## Chain 4: 0.076 seconds (Total)
## Chain 4:
This model leverages prior information to refine predictions, enabling better comparison with the uninformative model.
==================================================================================================================
The results of this statistical analysis indicate that social assistance programs in Bandung City can be considered effective in targeting districts that are most in need. The strong correlation between the number of poor families and the amount of assistance suggests that aid is allocated proportionally based on poverty levels in each district. Additionally, the strong correlations between the number of poor families and each type of assistance indicate that all types of assistance (PKH, BPNT, PBI-JK) contribute effectively to targeting poor families.
These findings align with the positive linear relationship observed in the scatter plot we created earlier, showing that there is a direct relationship between the number of poor families per district and the amount of assistance provided.
From our analysis results, it is evident that the Bandung government is implementing data-driven policies. They utilize poverty distribution maps at the district level to tailor the allocation of assistance to each district accordingly. This approach ensures that resources are directed where they are most needed, reflecting a strategic and informed approach to social assistance distribution.
However, this analysis only measures the linear relationship between the variables of social assistance and the distribution of poor families across districts. Other factors outside the scope of the available dataset, such as community participation rates, service quality, and the impact of assistance on family well-being, were not considered in this analysis.
==================================================================================================================
In conclusion, the analysis of social assistance distribution in Bandung City reveals that these programs are not only well-targeted but also effective in reaching the most impoverished areas. This alignment with the Sustainable Development Goal of No Poverty underscores the city government’s commitment to addressing socio-economic disparities. Moving forward, continuous monitoring and evaluation will be crucial to ensure these efforts remain responsive to evolving community needs and contribute effectively to long-term poverty alleviation strategies.
==================================================================================================================
==================================================================================================================