Members

Objectives

A. Dataset Understanding & Exploratory Data Analysis (EDA)

(Weight: ±25%)

Students are required to:

  • Describe the dataset context and analytical objectives.
  • Explain the data structure and variable types.
  • Present key descriptive statistics.
  • Identify and discuss:
    • missing values,
    • outliers,
    • data distributions.
  • Provide at least five (5) relevant data visualizations.

B. Relationship and Pattern Analysis

(Weight: ±20%)

Students are required to:

  • Analyze relationships among key variables.
  • Apply appropriate analytical techniques (e.g., correlation, regression, cross-tabulation).
  • Identify potential data issues (e.g., multicollinearity, heterogeneity).
  • Interpret analytical results clearly and logically.

C. Advanced Analysis (Context-Dependent)

(Weight: ±20%)

Students are required to apply an advanced analytical approach that is appropriate to the dataset, such as:

  • Time series analysis (if time-related variables exist),
  • Clustering or segmentation,
  • Risk or anomaly detection,
  • Classification or forecasting.

D. Analytical / Predictive Modeling

(Weight: ±25%)

Students are required to:

  • Develop at least one analytical or predictive model.
  • Explain model selection and underlying assumptions.
  • Evaluate model performance using appropriate metrics.
  • Discuss model limitations and potential improvements.

E. Insights, Conclusions, and Recommendations

(Weight: ±10%)

Students are required to:

  • Summarize key findings from the analysis.
  • Present data-driven insights.
  • Provide logical and actionable recommendations aligned with the dataset context.

Dataset

Table

Jumlah observasi setelah cleaning: 4636 
Jumlah kolom: 16 
tibble [4,636 × 16] (S3: tbl_df/tbl/data.frame)
 $ date                : Date[1:4636], format: "2010-01-01" "2010-01-02" ...
 $ patient_visits      : num [1:4636] 3.11e+14 3.34e+13 4.59e+14 3.55e+13 3.59e+14 ...
 $ staff_workload      : num [1:4636] 2.70e+14 3.23e+14 3.90e+14 3.49e+14 3.41e+14 ...
 $ avg_treatment_cost  : num [1:4636] 1.49e+14 NA 8.21e+12 4.93e+13 1.29e+14 ...
 $ bed_occupancy_rate  : num [1:4636] 6.51e+14 8.06e+14 6.91e+14 5.31e+14 7.21e+14 ...
 $ treatment_intensity : num [1:4636] 1.02e+14 4.71e+14 6.23e+14 4.16e+13 8.91e+14 ...
 $ operational_cost    : num [1:4636] 2.11e+14 2.19e+14 2.12e+14 2.34e+14 2.61e+14 ...
 $ num_procedures      : num [1:4636] 3.83e+14 4.98e+14 3.65e+14 2.46e+14 5.71e+13 ...
 $ patient_satisfaction: num [1:4636] 7.31e+14 7.76e+14 6.96e+14 6.32e+12 8.40e+14 ...
 $ efficiency_index    : num [1:4636] 1.15e+14 1.03e+14 1.18e+14 1.02e+14 1.05e+14 ...
 $ clinical_noise      : num [1:4636] -8.88e+14 3.60e+14 4.22e+14 -7.43e+14 1.80e+14 ...
 $ revenue             : num [1:4636] 3.14e+14 2.18e+14 2.51e+14 1.93e+14 2.96e+14 ...
 $ profit              : num [1:4636] 1.14e+14 8.74e+14 9.17e+14 8.09e+14 1.04e+14 ...
 $ patient_category    : chr [1:4636] "High Risk" "High Risk" "High Risk" "Low Risk" ...
 $ hospital_region     : chr [1:4636] "Central" "East" "West" "East" ...
 $ churn               : chr [1:4636] "No" "Yes" "No" "No" ...
      date            patient_visits      staff_workload     
 Min.   :2010-01-01   Min.   :3.148e+11   Min.   :2.423e+11  
 1st Qu.:2013-05-13   1st Qu.:2.805e+14   1st Qu.:2.512e+14  
 Median :2016-10-16   Median :3.402e+14   Median :3.068e+14  
 Mean   :2016-10-25   Mean   :3.162e+14   Mean   :2.875e+14  
 3rd Qu.:2020-04-06   3rd Qu.:3.913e+14   3rd Qu.:3.540e+14  
 Max.   :2023-09-09   Max.   :5.912e+14   Max.   :5.441e+14  
                                                             
 avg_treatment_cost  bed_occupancy_rate  treatment_intensity
 Min.   :5.999e+10   Min.   :5.618e+10   Min.   :1.001e+11  
 1st Qu.:1.188e+14   1st Qu.:5.483e+14   1st Qu.:2.904e+14  
 Median :3.047e+14   Median :6.329e+14   Median :5.324e+14  
 Mean   :3.755e+14   Mean   :5.850e+14   Mean   :5.195e+14  
 3rd Qu.:6.459e+14   3rd Qu.:7.050e+14   3rd Qu.:7.793e+14  
 Max.   :9.994e+14   Max.   :9.814e+14   Max.   :9.999e+14  
 NA's   :281                                                
 operational_cost    num_procedures      patient_satisfaction
 Min.   :8.000e+04   Min.   :2.202e+11   Min.   :1.000e+02   
 1st Qu.:1.606e+14   1st Qu.:1.990e+14   1st Qu.:6.461e+14   
 Median :1.933e+14   Median :3.326e+14   Median :7.335e+14   
 Mean   :1.835e+14   Mean   :3.322e+14   Mean   :6.681e+14   
 3rd Qu.:2.233e+14   3rd Qu.:4.759e+14   3rd Qu.:8.081e+14   
 Max.   :9.896e+14   Max.   :9.981e+14   Max.   :9.990e+14   
 NA's   :327                                                 
 efficiency_index    clinical_noise          revenue         
 Min.   :1.011e+11   Min.   :-9.996e+14   Min.   :3.129e+10  
 1st Qu.:1.056e+14   1st Qu.:-2.436e+14   1st Qu.:2.181e+14  
 Median :1.111e+14   Median :-8.603e+12   Median :2.581e+14  
 Mean   :1.400e+14   Mean   : 1.522e+12   Mean   :2.409e+14  
 3rd Qu.:1.168e+14   3rd Qu.: 2.563e+14   3rd Qu.:2.967e+14  
 Max.   :9.998e+14   Max.   : 9.998e+14   Max.   :9.423e+14  
                                                             
     profit          patient_category   hospital_region       churn          
 Min.   :1.137e+11   Length:4636        Length:4636        Length:4636       
 1st Qu.:1.087e+14   Class :character   Class :character   Class :character  
 Median :6.674e+14   Mode  :character   Mode  :character   Mode  :character  
 Mean   :5.086e+14                                                           
 3rd Qu.:8.574e+14                                                           
 Max.   :1.000e+15                                                           
                                                                             

EDA

## Column

Line + Marker

Histogram

Scatter plot

Bubble Chart

Bar Chart

Regresi

## Column

Chart

Data valid untuk regresi. Baris tersedia: 4043 
# A tibble: 6 × 6
  term                estimate std.error statistic  p.value significance
  <chr>                  <dbl>     <dbl>     <dbl>    <dbl> <chr>       
1 (Intercept)         3.50e+14  3.07e+13    11.4   1.26e-29 "***"       
2 patient_visits     -1.12e- 1  5.24e- 2    -2.14  3.26e- 2 "*"         
3 avg_treatment_cost  4.16e- 1  2.06e- 2    20.2   1.43e-85 "***"       
4 operational_cost    9.94e- 2  8.13e- 2     1.22  2.22e- 1 ""          
5 bed_occupancy_rate  2.13e- 2  2.98e- 2     0.713 4.76e- 1 ""          
6 efficiency_index    3.91e- 2  3.43e- 2     1.14  2.55e- 1 ""          
  Dataset         RMSE         MAE
1   Train 3.510826e+14 3.15377e+14
2    Test 3.537675e+14 3.16967e+14

Chart 1

Chart 2

Chart 3

Chart 4

Chart 5

Klasifikasi

## Column

Chart

Data valid. Baris tersedia: 4262 

  No  Yes 
3927  335 
Train set: 2984 rows
Test set: 1278 rows
# A tibble: 7 × 6
  term                  estimate std.error statistic  p.value significance
  <chr>                    <dbl>     <dbl>     <dbl>    <dbl> <chr>       
1 (Intercept)          -2.58e+ 0  3.81e- 1    -6.76  1.38e-11 "***"       
2 patient_visits       -1.58e-16  5.69e-16    -0.278 7.81e- 1 ""          
3 avg_treatment_cost    1.40e-15  2.17e-16     6.48  9.29e-11 "***"       
4 operational_cost     -4.01e-16  8.76e-16    -0.457 6.47e- 1 ""          
5 bed_occupancy_rate   -4.45e-17  3.32e-16    -0.134 8.94e- 1 ""          
6 patient_satisfaction -4.90e-16  2.59e-16    -1.89  5.84e- 2 "."         
7 efficiency_index     -1.97e-16  3.98e-16    -0.495 6.21e- 1 ""          

Confusion Matrix:
  Prediction Reference Freq
1         No        No 1178
2        Yes        No    0
3         No       Yes  100
4        Yes       Yes    0

Evaluation Metrics:
                 Metric     Value
Accuracy       Accuracy 0.9217527
Kappa             Kappa 0.0000000
Sensitivity Sensitivity 0.0000000
Specificity Specificity 1.0000000
Precision     Precision        NA
Recall           Recall 0.0000000
F1                   F1        NA
                    AUC 0.6508913

Chart 1

Chart 2

Chart 3

Chart 4

Chart 5

Clustering

## Column

Chart

Data clustering bersih. Observasi: 4262 Variabel: 5 
# A tibble: 3 × 7
  cluster patient_visits_mean avg_treatment_cost_mean operational_cost_mean
  <fct>                 <dbl>                   <dbl>                 <dbl>
1 1                   3.16e14                 3.97e14               1.81e14
2 2                   3.22e14                 6.97e14               1.81e14
3 3                   3.10e14                 1.45e14               1.85e14
# ℹ 3 more variables: bed_occupancy_rate_mean <dbl>,
#   patient_satisfaction_mean <dbl>, n_obs <int>

Chart 1

Chart 1

Time Series

## Column

Chart 1

Data valid. Baris tersedia: 4636 
Dataset mulai dari: 14610 sampai 19609 

    Augmented Dickey-Fuller Test

data:  ts_profit
Dickey-Fuller = -15.974, Lag order = 16, p-value = 0.01
alternative hypothesis: stationary
Data stasioner.
Series: ts_train 
ARIMA(5,1,0) 

Coefficients:
          ar1      ar2      ar3      ar4      ar5
      -0.8504  -0.6615  -0.5032  -0.3423  -0.1622
s.e.   0.0162   0.0207   0.0218   0.0207   0.0162

sigma^2 = 153129:  log likelihood = -27387.03
AIC=54786.06   AICc=54786.08   BIC=54823.37

Training set error measures:
                     ME     RMSE      MAE      MPE    MAPE      MASE
Training set 0.09331779 391.0008 345.7127 -506.767 547.346 0.8714264
                    ACF1
Training set -0.02368917
                ME     RMSE      MAE       MPE     MAPE       ACF1 Theil's U
Test set -78.23293 391.1101 385.1575 -552.4264 587.2661 0.01419953  1.771845

INTERPRETASI & CATATAN:
1. Dataset bulanan, profit sudah diskalakan untuk stabilitas numerik.
2. Data stasioner → cocok untuk ARIMA.
3. Komponen TS: trend, seasonal, noise bisa dianalisis dari decompose.
4. Model ARIMA dipilih otomatis oleh auto.arima.
5. Evaluasi forecasting: MAE, RMSE, MAPE memberikan performa model.
6. Interval kepercayaan ARIMA menunjukkan ketidakpastian prediksi.
7. Potensi sumber ketidakpastian: outlier, faktor eksternal, perubahan struktural.

Chart 1

Chart 2

Chart 3

Chart 4


Insights

Mahasiswa diwajibkan untuk: