1 Overview (Read This First)

Goal: Understand the dataset well enough to justify later modeling choices.
This lab is not about building predictive models.

Dataset: week2_churn_data.csv

1.1 What counts as a “substantive insight” (required)

An insight must include: 1. Evidence (a plot or statistic) 2. A clear pattern (direction and/or magnitude) 3. Business meaning (why it matters)

Use this sentence frame: > “Customers with ___ show ___ compared to , which suggests .”


2 Setup


3 1) Data Familiarization

3.1 1.1 Identify the outcome variable

In this dataset, the outcome variable is: churn (values: Yes/No).

# Confirm churn values
table(churn$churn)
## 
##  No Yes 
## 290 110
# Create a 0/1 version for calculations (Yes=1, No=0)
churn01 <- ifelse(churn$churn == "Yes", 1, 0)

table(churn01)
## churn01
##   0   1 
## 290 110
str(churn)
## 'data.frame':    400 obs. of  8 variables:
##  $ customer_id    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ tenure_months  : int  49 29 35 1 23 31 24 9 3 11 ...
##  $ monthly_charges: num  68.6 77.4 80.2 58.4 50.8 ...
##  $ contract_type  : chr  "One year" "One year" "Two year" "One year" ...
##  $ online_security: chr  "Yes" "Yes" "Yes" "Yes" ...
##  $ tech_support   : chr  "Yes" "Yes" "No" "Yes" ...
##  $ churn          : chr  "Yes" "Yes" "No" "No" ...
##  $ total_charges  : num  3691 2114.3 2915.4 57.1 1259.1 ...

3.2 1.2 Identify predictors and their types

Task: Create a short table (or bullet list) showing each variable’s: - role (Outcome / Predictor) - type (numeric / categorical / binary)

Hint: sapply(churn, class) helps.

var_types <- sapply(churn, class)
var_types
##     customer_id   tenure_months monthly_charges   contract_type online_security 
##       "integer"       "integer"       "numeric"     "character"     "character" 
##    tech_support           churn   total_charges 
##     "character"     "character"       "numeric"

Your table/list (fill in):

  • Outcome:
    • churn — (binary categorical; Yes/No)
  • Predictors:
    • customer_id — (Int____) (note: likely an identifier)
    • tenure_months — (Int____)
    • monthly_charges — (Num________)
    • total_charges — (____num_____)
    • contract_type — (_____Chr_____)
    • online_security — (chr______)
    • tech_support — (Chr______)

4 2) Data Quality Assessment

4.1 2.1 Missing values

“The original dataset does not contain missing values. To practice realistic data preparation, you will intentionally introduce a small amount of missingness into one numeric variable and then assess and address it.”

# Create a working copy (do NOT overwrite original)
churn_work <- churn

# Introduce ~5% missingness in monthly_charges
set.seed(123)
n <- nrow(churn_work)
n
## [1] 400
missing_index <- sample(1:n, size = round(0.05 * n))

churn_work$monthly_charges[missing_index] <- NA
churn_work
##     customer_id tenure_months monthly_charges  contract_type online_security
## 1             1            49           68.63       One year             Yes
## 2             2            29           77.40       One year             Yes
## 3             3            35           80.22       Two year             Yes
## 4             4             1           58.44       One year             Yes
## 5             5            23           50.76 Month-to-month             Yes
## 6             6            31           61.05 Month-to-month              No
## 7             7            24           43.53       Two year              No
## 8             8             9           59.97       One year              No
## 9             9             3           89.90 Month-to-month              No
## 10           10            11           53.75       One year             Yes
## 11           11            19          115.79       One year             Yes
## 12           12            34           69.14       Two year              No
## 13           13            15           37.72       Two year             Yes
## 14           14            32              NA       Two year             Yes
## 15           15            22           83.67       Two year             Yes
## 16           16            36           64.54 Month-to-month              No
## 17           17            13           35.08       Two year             Yes
## 18           18            57           81.94       One year              No
## 19           19            14           98.28 Month-to-month              No
## 20           20            59           41.90       Two year              No
## 21           21            42           56.64       One year              No
## 22           22            20           78.47       Two year             Yes
## 23           23            32           62.67       Two year              No
## 24           24            12           54.29       Two year             Yes
## 25           25            40           63.34       Two year              No
## 26           26            18           76.28       Two year             Yes
## 27           27             9           78.40       One year             Yes
## 28           28            44           38.92       Two year              No
## 29           29             1           56.97 Month-to-month              No
## 30           30            37           78.12 Month-to-month              No
## 31           31            31           29.17       Two year              No
## 32           32             6           61.54       Two year              No
## 33           33            40           76.39       One year             Yes
## 34           34            46           67.00 Month-to-month              No
## 35           35            15          134.90       One year              No
## 36           36            55           75.86       Two year              No
## 37           37            13           51.15       Two year              No
## 38           38            10           94.28       Two year              No
## 39           39            56           40.47       One year              No
## 40           40            13           94.64       One year             Yes
## 41           41            27           66.64       Two year              No
## 42           42             7           56.31 Month-to-month             Yes
## 43           43            13           71.02 Month-to-month              No
## 44           44            12           32.93       One year             Yes
## 45           45            24           77.50 Month-to-month              No
## 46           46            25           54.65 Month-to-month             Yes
## 47           47            40           88.87       Two year              No
## 48           48            15           40.06       Two year              No
## 49           49            30           67.43 Month-to-month             Yes
## 50           50            18           21.47       Two year              No
## 51           51            49           81.36       One year             Yes
## 52           52            21           92.31 Month-to-month             Yes
## 53           53            53           52.79       Two year              No
## 54           54            50           48.51       One year              No
## 55           55            51           94.60 Month-to-month              No
## 56           56            39           86.98 Month-to-month              No
## 57           57            50           59.32       One year              No
## 58           58            32           55.75 Month-to-month             Yes
## 59           59            20           59.79       Two year              No
## 60           60             7           28.41 Month-to-month              No
## 61           61            34          103.55       One year             Yes
## 62           62            17           27.16 Month-to-month              No
## 63           63            15           89.82       Two year             Yes
## 64           64             8           75.86       One year              No
## 65           65            40           62.77 Month-to-month             Yes
## 66           66            19           91.32       Two year             Yes
## 67           67            32           87.64 Month-to-month              No
## 68           68            51           78.05       One year              No
## 69           69            33           91.46       Two year              No
## 70           70            21          102.56       Two year              No
## 71           71            24           49.73       Two year             Yes
## 72           72            16           51.57 Month-to-month              No
## 73           73            49           75.46       One year             Yes
## 74           74            23           89.42       Two year             Yes
## 75           75            22           86.92       One year              No
## 76           76            40           94.11       Two year             Yes
## 77           77            33           73.91 Month-to-month             Yes
## 78           78            56           54.97       One year             Yes
## 79           79             5           76.33       Two year             Yes
## 80           80            25           90.99       Two year             Yes
## 81           81            51           64.07 Month-to-month              No
## 82           82            48           51.68       Two year             Yes
## 83           83            14           27.45       Two year              No
## 84           84            46           45.40 Month-to-month              No
## 85           85            54           90.77       Two year              No
## 86           86            57           65.79 Month-to-month             Yes
## 87           87            22           61.03 Month-to-month             Yes
## 88           88            28           53.73       One year             Yes
## 89           89            28           60.05 Month-to-month             Yes
## 90           90            19              NA       Two year              No
## 91           91            47              NA       One year             Yes
## 92           92            38           29.03       One year              No
## 93           93             1           66.34       Two year             Yes
## 94           94            33           81.98       Two year              No
## 95           95            29           48.52       Two year             Yes
## 96           96             8           52.56       One year             Yes
## 97           97            45          111.56       Two year             Yes
## 98           98            53           59.50 Month-to-month              No
## 99           99            29           60.34       Two year              No
## 100         100            31           60.17 Month-to-month              No
## 101         101            51           62.68       Two year              No
## 102         102            45           37.16       Two year             Yes
## 103         103             4           52.19       One year              No
## 104         104            56           68.54       One year              No
## 105         105            51           70.21       One year             Yes
## 106         106            19           78.47       Two year             Yes
## 107         107            21           69.02 Month-to-month             Yes
## 108         108             8           58.59       Two year              No
## 109         109            10           76.01       One year              No
## 110         110            18           47.36 Month-to-month             Yes
## 111         111            17          109.45       Two year             Yes
## 112         112            57           64.15       Two year              No
## 113         113            35           57.52 Month-to-month             Yes
## 114         114            52           33.38 Month-to-month              No
## 115         115            44          101.08       One year              No
## 116         116            22           65.55       One year              No
## 117         117            35           66.91       Two year             Yes
## 118         118            30              NA       One year              No
## 119         119            37           73.39       Two year             Yes
## 120         120            10           81.95 Month-to-month             Yes
## 121         121            12           85.88 Month-to-month              No
## 122         122            27           63.99       Two year             Yes
## 123         123            50           48.87 Month-to-month             Yes
## 124         124            31           91.33       Two year             Yes
## 125         125            13           43.47 Month-to-month              No
## 126         126            43           40.06 Month-to-month              No
## 127         127            49           62.28 Month-to-month             Yes
## 128         128            54           58.20       Two year              No
## 129         129            30           70.17       One year             Yes
## 130         130            35           81.87 Month-to-month             Yes
## 131         131            43           46.93 Month-to-month              No
## 132         132             3           72.48       One year             Yes
## 133         133            56           59.20       One year              No
## 134         134             3           90.66       One year             Yes
## 135         135            21           74.80       Two year             Yes
## 136         136             2           58.08 Month-to-month              No
## 137         137            27              NA Month-to-month              No
## 138         138            44           57.12       Two year              No
## 139         139            39           55.78       Two year              No
## 140         140             8           66.48       One year              No
## 141         141            22           74.01 Month-to-month              No
## 142         142            16           26.20       Two year             Yes
## 143         143            34           53.24       One year              No
## 144         144             4           81.79       Two year              No
## 145         145            44           82.38       One year              No
## 146         146            15           52.60       One year              No
## 147         147            13           90.15       Two year              No
## 148         148            21           77.27       Two year              No
## 149         149             3           63.18       Two year              No
## 150         150            31           85.65 Month-to-month             Yes
## 151         151            31           79.17 Month-to-month              No
## 152         152            42           13.08 Month-to-month              No
## 153         153             1              NA       One year             Yes
## 154         154            53           76.38 Month-to-month              No
## 155         155            27           51.27 Month-to-month             Yes
## 156         156            51           83.46       One year              No
## 157         157            50           78.84       One year              No
## 158         158             3           76.97 Month-to-month              No
## 159         159            49           49.48       One year              No
## 160         160            54           93.60       Two year              No
## 161         161            37           65.55       Two year             Yes
## 162         162             7           31.73       Two year              No
## 163         163            10           96.88       One year              No
## 164         164            51           82.24 Month-to-month             Yes
## 165         165            43           74.98       Two year              No
## 166         166            58           56.00       One year             Yes
## 167         167            17           72.09       Two year              No
## 168         168            41           65.44 Month-to-month             Yes
## 169         169             6          104.19 Month-to-month              No
## 170         170            21           50.49       Two year              No
## 171         171            11           94.68       One year             Yes
## 172         172            53           62.76 Month-to-month             Yes
## 173         173            42           86.30 Month-to-month             Yes
## 174         174            15           41.25 Month-to-month              No
## 175         175            43           85.74       Two year              No
## 176         176             9           78.35       Two year              No
## 177         177            57           83.32 Month-to-month              No
## 178         178            48           67.86 Month-to-month             Yes
## 179         179            39              NA       Two year             Yes
## 180         180            18           75.40       One year              No
## 181         181            20           81.21       One year             Yes
## 182         182             3           66.92 Month-to-month              No
## 183         183            58           63.79       Two year              No
## 184         184             5           90.14       One year             Yes
## 185         185            59           77.81       Two year              No
## 186         186             6           35.44 Month-to-month              No
## 187         187            54          119.57 Month-to-month              No
## 188         188            43          100.70       One year              No
## 189         189            23           81.21       One year             Yes
## 190         190            16           57.52       Two year              No
## 191         191            54           40.96       One year             Yes
## 192         192            34           73.97 Month-to-month              No
## 193         193            34          121.40       Two year              No
## 194         194             3           29.87       Two year              No
## 195         195            24              NA       One year             Yes
## 196         196            27           62.30       Two year             Yes
## 197         197            27              NA Month-to-month             Yes
## 198         198            27          104.31       One year             Yes
## 199         199            50           61.03 Month-to-month             Yes
## 200         200             9           63.47 Month-to-month             Yes
## 201         201            42           91.15 Month-to-month             Yes
## 202         202            32           35.81       Two year             Yes
## 203         203             8           62.90 Month-to-month              No
## 204         204            40           78.80       One year             Yes
## 205         205            47           57.06 Month-to-month             Yes
## 206         206            28           85.18       Two year             Yes
## 207         207            39           77.17 Month-to-month              No
## 208         208            16           71.90       Two year             Yes
## 209         209            55           86.52       Two year              No
## 210         210            24           91.93       One year              No
## 211         211             1           71.79       One year              No
## 212         212            22           52.00       One year              No
## 213         213            35           63.46       One year             Yes
## 214         214            44          114.32       One year              No
## 215         215             9           69.37       One year             Yes
## 216         216            35           84.74       One year             Yes
## 217         217            50           83.47 Month-to-month              No
## 218         218            16          103.75       Two year             Yes
## 219         219            33           48.24 Month-to-month              No
## 220         220            45           57.02 Month-to-month             Yes
## 221         221             9           60.29       Two year             Yes
## 222         222            57           49.18 Month-to-month             Yes
## 223         223            32           60.30       One year              No
## 224         224            55           78.67 Month-to-month             Yes
## 225         225            12           64.80       One year              No
## 226         226            53           64.08 Month-to-month             Yes
## 227         227            22           57.56       Two year              No
## 228         228            25           57.48       One year             Yes
## 229         229            53              NA       One year              No
## 230         230            46           72.99 Month-to-month             Yes
## 231         231            27           63.68       One year              No
## 232         232            41           54.32 Month-to-month              No
## 233         233            36           68.72       Two year             Yes
## 234         234            27           75.37       One year             Yes
## 235         235            29           71.38       Two year              No
## 236         236            45           40.60       One year              No
## 237         237            20           46.90       One year              No
## 238         238            45           61.20 Month-to-month              No
## 239         239            13           76.04       Two year              No
## 240         240            17           23.92 Month-to-month             Yes
## 241         241            18           93.38       Two year             Yes
## 242         242            15           43.47       Two year              No
## 243         243            48           38.55 Month-to-month              No
## 244         244            56              NA Month-to-month              No
## 245         245            22           90.97       Two year              No
## 246         246             6           37.38 Month-to-month              No
## 247         247             4           65.63 Month-to-month              No
## 248         248            31           53.86 Month-to-month             Yes
## 249         249            15           78.90       Two year             Yes
## 250         250            40           65.69       One year              No
## 251         251            16           95.25       Two year              No
## 252         252             3          116.10 Month-to-month              No
## 253         253            53           49.82 Month-to-month             Yes
## 254         254            49           49.62       Two year             Yes
## 255         255            43           65.24 Month-to-month              No
## 256         256             9              NA       Two year             Yes
## 257         257            54           75.03       Two year             Yes
## 258         258            31           98.45       Two year             Yes
## 259         259            48           55.99       Two year             Yes
## 260         260            39           86.11       One year              No
## 261         261            29           81.87       One year             Yes
## 262         262            16           49.73 Month-to-month              No
## 263         263            47          103.23 Month-to-month             Yes
## 264         264            57           99.59       Two year             Yes
## 265         265            41           75.08       One year             Yes
## 266         266            17           62.79       One year              No
## 267         267            56           99.22 Month-to-month             Yes
## 268         268            40          100.42       One year             Yes
## 269         269            43           94.87       One year              No
## 270         270             2           65.74 Month-to-month             Yes
## 271         271            25           68.85 Month-to-month              No
## 272         272            26           56.34       Two year             Yes
## 273         273            44           54.05       Two year             Yes
## 274         274            50           55.32 Month-to-month             Yes
## 275         275             5          115.91 Month-to-month             Yes
## 276         276            54          129.33       One year              No
## 277         277            15           82.67       One year             Yes
## 278         278             3           77.84 Month-to-month              No
## 279         279            58           47.98       Two year             Yes
## 280         280            11           83.75 Month-to-month              No
## 281         281            55           48.17       One year             Yes
## 282         282            32           83.25 Month-to-month              No
## 283         283            48          104.98 Month-to-month             Yes
## 284         284            41           41.13 Month-to-month             Yes
## 285         285            10           43.67 Month-to-month             Yes
## 286         286            15           60.07 Month-to-month             Yes
## 287         287            37           67.56       One year             Yes
## 288         288             5           26.30 Month-to-month              No
## 289         289            43           90.30 Month-to-month             Yes
## 290         290            44           51.83 Month-to-month              No
## 291         291            48           76.11 Month-to-month             Yes
## 292         292            20           54.11       Two year             Yes
## 293         293             4           62.29 Month-to-month             Yes
## 294         294            57           51.76 Month-to-month              No
## 295         295            24           15.96 Month-to-month             Yes
## 296         296            42           53.56       Two year              No
## 297         297            49           88.72       Two year              No
## 298         298            36           45.36       One year              No
## 299         299            49              NA       One year              No
## 300         300            44           72.57       One year              No
## 301         301            30           56.41       Two year              No
## 302         302            37           53.34       Two year             Yes
## 303         303            29           72.44       Two year             Yes
## 304         304            18           36.17       One year              No
## 305         305            26           58.92       One year              No
## 306         306            30              NA       One year             Yes
## 307         307            12           80.11       Two year              No
## 308         308            29           68.26       One year              No
## 309         309            11           51.63 Month-to-month             Yes
## 310         310            39           54.70       One year              No
## 311         311            35           90.32       Two year              No
## 312         312             8           52.56       Two year             Yes
## 313         313            37           57.81 Month-to-month              No
## 314         314            19           91.94       Two year              No
## 315         315            30           24.53 Month-to-month             Yes
## 316         316            41           38.28 Month-to-month             Yes
## 317         317            57           21.10       Two year             Yes
## 318         318            43           32.27       One year              No
## 319         319            51           60.55       One year             Yes
## 320         320            59           67.02 Month-to-month              No
## 321         321            19           62.32       Two year             Yes
## 322         322            45           62.10       One year             Yes
## 323         323            55           62.48       Two year             Yes
## 324         324            25           39.37 Month-to-month              No
## 325         325            27           83.79       One year             Yes
## 326         326            15           37.03       One year              No
## 327         327            39           76.27 Month-to-month              No
## 328         328            29              NA       Two year             Yes
## 329         329            54           52.07 Month-to-month             Yes
## 330         330            21           77.36       Two year              No
## 331         331            14           71.69       Two year             Yes
## 332         332            23           81.98       Two year             Yes
## 333         333            10           69.96       Two year             Yes
## 334         334            34           76.99       Two year             Yes
## 335         335            35           22.65       One year              No
## 336         336            11           60.06       Two year              No
## 337         337            16           63.46 Month-to-month              No
## 338         338             2           57.97       Two year              No
## 339         339            50          137.16 Month-to-month              No
## 340         340            17           95.76       One year              No
## 341         341            15           63.09 Month-to-month              No
## 342         342            36           46.82       One year             Yes
## 343         343            50           61.78       One year              No
## 344         344            32           81.49 Month-to-month             Yes
## 345         345            44           33.70       One year             Yes
## 346         346            26           72.79       Two year             Yes
## 347         347            29           70.10       One year              No
## 348         348            14              NA       One year              No
## 349         349            15           50.04       Two year              No
## 350         350            34           66.90       Two year              No
## 351         351            26           67.22       One year             Yes
## 352         352            10           64.83       One year             Yes
## 353         353             1           67.50       One year              No
## 354         354            55          101.56       One year              No
## 355         355             8              NA       Two year             Yes
## 356         356            42           68.06 Month-to-month              No
## 357         357            39           31.13       One year              No
## 358         358            43           47.31       Two year              No
## 359         359            31           84.13       One year              No
## 360         360            30           99.49       One year              No
## 361         361            41          106.14 Month-to-month              No
## 362         362            25           69.96       Two year              No
## 363         363            18           76.44       Two year             Yes
## 364         364             7           69.11       One year              No
## 365         365            11           71.57       Two year              No
## 366         366            12           93.67       Two year             Yes
## 367         367             5           70.94 Month-to-month             Yes
## 368         368             6           49.97       One year             Yes
## 369         369            22           34.93       One year             Yes
## 370         370            55           82.82       One year             Yes
## 371         371            52           41.63 Month-to-month             Yes
## 372         372            18           77.20       Two year              No
## 373         373            25           78.37       Two year              No
## 374         374             1              NA       One year              No
## 375         375            51           48.97       One year              No
## 376         376             9           60.69       One year             Yes
## 377         377            35           74.45       Two year              No
## 378         378            17           74.78       One year              No
## 379         379            15           89.34       One year              No
## 380         380            37           65.66       One year              No
## 381         381            25           85.38       Two year             Yes
## 382         382            43           53.61 Month-to-month              No
## 383         383             3          106.75       Two year              No
## 384         384            21           80.23       Two year              No
## 385         385            22           25.70       Two year             Yes
## 386         386            17           62.82 Month-to-month              No
## 387         387            35           89.73 Month-to-month              No
## 388         388            53              NA       One year             Yes
## 389         389             3          103.85       Two year              No
## 390         390            53           55.24       Two year             Yes
## 391         391            29           77.06 Month-to-month              No
## 392         392            14           67.17       One year              No
## 393         393            42           70.35 Month-to-month             Yes
## 394         394            18           56.68       Two year             Yes
## 395         395            16           30.97       Two year             Yes
## 396         396            28           88.76       One year              No
## 397         397            53           56.88 Month-to-month              No
## 398         398            27           84.70 Month-to-month             Yes
## 399         399            41              NA Month-to-month              No
## 400         400            59          104.29 Month-to-month             Yes
##     tech_support churn total_charges
## 1            Yes   Yes       3691.04
## 2            Yes   Yes       2114.31
## 3             No    No       2915.41
## 4            Yes    No         57.15
## 5             No    No       1259.07
## 6            Yes    No       1846.72
## 7             No    No       1092.90
## 8            Yes    No        542.48
## 9            Yes    No        296.32
## 10           Yes    No        551.88
## 11           Yes   Yes       2106.81
## 12            No    No       2135.23
## 13           Yes    No        621.59
## 14            No    No       3447.74
## 15            No    No       1999.76
## 16           Yes    No       2310.44
## 17            No    No        440.77
## 18           Yes   Yes       4488.92
## 19           Yes    No       1295.28
## 20           Yes   Yes       2499.29
## 21           Yes    No       2533.76
## 22            No    No       1698.45
## 23           Yes    No       1842.28
## 24           Yes   Yes        647.74
## 25            No   Yes       2355.42
## 26           Yes   Yes       1313.89
## 27            No   Yes        707.04
## 28           Yes    No       1712.88
## 29            No    No         58.64
## 30           Yes   Yes       2695.03
## 31           Yes   Yes        854.04
## 32           Yes    No        399.58
## 33            No    No       3274.77
## 34            No    No       3323.81
## 35           Yes    No       2016.07
## 36            No   Yes       4452.85
## 37           Yes    No        667.95
## 38            No   Yes        892.55
## 39            No    No       2381.27
## 40           Yes    No       1151.78
## 41            No    No       1853.56
## 42            No    No        374.74
## 43            No    No        890.24
## 44           Yes    No        372.93
## 45           Yes    No       1692.48
## 46           Yes    No       1305.07
## 47            No   Yes       3568.65
## 48            No    No        545.21
## 49            No    No       2174.75
## 50            No   Yes        364.29
## 51            No   Yes       4123.53
## 52           Yes    No       1939.21
## 53           Yes   Yes       2884.25
## 54            No   Yes       2384.10
## 55            No    No       4910.25
## 56           Yes    No       3272.61
## 57           Yes    No       3185.87
## 58           Yes   Yes       1695.50
## 59           Yes    No       1224.72
## 60            No    No        211.36
## 61            No    No       3750.18
## 62           Yes   Yes        464.30
## 63            No    No       1279.29
## 64           Yes    No        552.53
## 65           Yes    No       2305.49
## 66           Yes    No       1837.04
## 67            No    No       2612.22
## 68            No    No       3622.91
## 69           Yes    No       2724.58
## 70           Yes    No       2157.15
## 71           Yes    No       1232.41
## 72           Yes    No        803.06
## 73           Yes   Yes       3527.17
## 74           Yes    No       1967.53
## 75           Yes    No       2098.09
## 76           Yes   Yes       3497.02
## 77            No    No       2535.89
## 78            No   Yes       3020.74
## 79            No   Yes        381.50
## 80            No    No       2158.19
## 81           Yes   Yes       3514.90
## 82            No    No       2339.99
## 83           Yes   Yes        387.62
## 84           Yes   Yes       2138.94
## 85            No    No       4937.43
## 86           Yes    No       3396.48
## 87           Yes   Yes       1266.50
## 88           Yes    No       1531.76
## 89           Yes    No       1599.59
## 90           Yes    No       1599.48
## 91            No    No       3290.23
## 92           Yes   Yes       1133.89
## 93           Yes   Yes         65.59
## 94           Yes    No       2660.40
## 95            No    No       1301.97
## 96           Yes    No        392.93
## 97           Yes    No       4530.82
## 98            No    No       2982.99
## 99            No    No       1781.79
## 100           No    No       1802.34
## 101           No    No       3514.94
## 102           No    No       1572.10
## 103          Yes   Yes        214.71
## 104          Yes    No       3657.09
## 105          Yes    No       3514.09
## 106           No    No       1623.77
## 107          Yes   Yes       1559.61
## 108           No    No        478.94
## 109          Yes    No        742.49
## 110           No    No        817.13
## 111          Yes    No       1801.68
## 112           No   Yes       3350.63
## 113          Yes    No       1963.81
## 114           No    No       1613.89
## 115          Yes    No       4546.62
## 116          Yes    No       1374.66
## 117          Yes    No       2145.34
## 118           No    No       2092.52
## 119          Yes    No       2579.30
## 120          Yes    No        881.02
## 121           No   Yes       1040.46
## 122           No    No       1774.49
## 123           No    No       2493.35
## 124           No    No       2735.23
## 125           No   Yes        583.99
## 126           No    No       1642.37
## 127          Yes    No       2880.42
## 128           No    No       3417.13
## 129           No    No       2148.03
## 130          Yes    No       3006.07
## 131          Yes   Yes       2044.35
## 132           No    No        213.92
## 133          Yes    No       3026.48
## 134           No   Yes        254.11
## 135          Yes    No       1489.28
## 136           No    No        119.73
## 137           No    No       1557.65
## 138          Yes    No       2638.46
## 139           No    No       2130.23
## 140          Yes    No        576.72
## 141           No    No       1503.04
## 142           No    No        406.11
## 143          Yes   Yes       1981.67
## 144           No    No        356.94
## 145           No    No       3421.98
## 146          Yes    No        775.22
## 147          Yes    No       1199.48
## 148          Yes    No       1564.62
## 149           No    No        177.72
## 150           No   Yes       2397.76
## 151           No   Yes       2548.55
## 152           No    No        517.78
## 153           No    No         81.35
## 154           No   Yes       4216.19
## 155          Yes   Yes       1449.90
## 156          Yes   Yes       3857.58
## 157           No    No       3901.66
## 158          Yes    No        232.23
## 159           No   Yes       2600.89
## 160           No    No       4889.94
## 161          Yes    No       2433.55
## 162          Yes    No        236.63
## 163          Yes    No        922.25
## 164           No   Yes       4301.23
## 165           No    No       3333.42
## 166           No    No       3220.07
## 167          Yes    No       1168.31
## 168           No   Yes       2568.99
## 169          Yes    No        664.70
## 170          Yes    No       1131.19
## 171          Yes   Yes       1025.40
## 172           No   Yes       3480.77
## 173           No    No       3559.65
## 174           No    No        559.05
## 175          Yes    No       3341.23
## 176           No   Yes        643.46
## 177          Yes   Yes       4608.38
## 178           No    No       3201.20
## 179          Yes    No       2938.08
## 180           No    No       1477.88
## 181          Yes    No       1656.81
## 182          Yes    No        187.89
## 183           No    No       3696.04
## 184           No    No        482.20
## 185           No    No       4836.18
## 186           No    No        212.53
## 187          Yes    No       5955.77
## 188           No    No       4031.48
## 189          Yes    No       1736.52
## 190          Yes   Yes        968.85
## 191          Yes   Yes       2426.85
## 192           No    No       2703.32
## 193           No    No       3815.08
## 194           No    No         85.18
## 195          Yes    No       1047.77
## 196          Yes   Yes       1649.39
## 197          Yes    No       1047.64
## 198           No    No       3090.90
## 199           No   Yes       2915.58
## 200           No    No        522.71
## 201           No    No       4145.29
## 202           No   Yes       1037.66
## 203           No    No        461.82
## 204           No    No       3399.51
## 205           No    No       2893.77
## 206           No   Yes       2617.07
## 207          Yes    No       3281.83
## 208           No   Yes       1166.69
## 209           No    No       4665.48
## 210          Yes    No       2066.10
## 211          Yes    No         68.18
## 212          Yes    No       1144.96
## 213           No    No       2191.97
## 214          Yes    No       4591.68
## 215           No    No        605.85
## 216           No    No       2873.86
## 217          Yes   Yes       3783.36
## 218          Yes   Yes       1642.96
## 219           No    No       1447.36
## 220          Yes    No       2540.47
## 221           No    No        588.37
## 222           No    No       2738.20
## 223           No    No       1751.16
## 224           No    No       3918.10
## 225          Yes    No        801.02
## 226          Yes    No       3646.96
## 227          Yes   Yes       1235.34
## 228           No   Yes       1431.47
## 229          Yes   Yes       4364.65
## 230          Yes    No       3366.68
## 231          Yes    No       1835.25
## 232           No   Yes       2159.21
## 233          Yes   Yes       2522.45
## 234           No    No       1900.61
## 235          Yes    No       2122.74
## 236           No    No       1775.37
## 237          Yes    No       1031.59
## 238          Yes    No       2911.20
## 239          Yes    No        955.96
## 240           No   Yes        433.79
## 241          Yes    No       1642.13
## 242          Yes    No        686.81
## 243           No    No       1751.71
## 244           No   Yes       3540.94
## 245          Yes    No       2017.06
## 246           No    No        219.44
## 247          Yes    No        280.28
## 248          Yes   Yes       1609.13
## 249          Yes    No       1192.68
## 250           No    No       2731.86
## 251          Yes    No       1419.29
## 252          Yes    No        347.52
## 253           No    No       2701.38
## 254           No    No       2639.72
## 255          Yes    No       2553.15
## 256          Yes    No        582.43
## 257          Yes   Yes       3869.08
## 258          Yes   Yes       3151.67
## 259           No    No       2798.73
## 260           No    No       3286.17
## 261          Yes    No       2408.16
## 262          Yes    No        844.13
## 263           No   Yes       5263.02
## 264          Yes    No       6107.17
## 265           No   Yes       2813.32
## 266           No    No       1158.82
## 267          Yes    No       5039.21
## 268          Yes    No       4222.66
## 269          Yes    No       4146.27
## 270          Yes    No        140.74
## 271           No    No       1697.42
## 272           No    No       1363.39
## 273           No    No       2322.93
## 274          Yes    No       2621.37
## 275           No    No        564.59
## 276           No    No       7023.40
## 277          Yes    No       1271.58
## 278          Yes    No        251.73
## 279           No    No       3025.16
## 280           No   Yes        980.39
## 281          Yes   Yes       2449.76
## 282          Yes    No       2638.06
## 283           No    No       5330.28
## 284          Yes    No       1792.01
## 285          Yes    No        437.20
## 286           No   Yes        835.65
## 287          Yes    No       2739.73
## 288           No    No        141.04
## 289           No    No       3934.86
## 290          Yes    No       2237.58
## 291           No    No       3558.06
## 292          Yes    No       1084.22
## 293          Yes   Yes        255.42
## 294           No    No       2794.20
## 295          Yes    No        401.31
## 296           No    No       2047.57
## 297          Yes    No       4218.24
## 298           No    No       1549.11
## 299           No    No       3401.60
## 300           No    No       3179.91
## 301           No    No       1566.94
## 302          Yes   Yes       2142.95
## 303           No   Yes       2166.68
## 304           No    No        618.37
## 305          Yes    No       1655.20
## 306          Yes   Yes       2212.42
## 307          Yes   Yes        949.21
## 308          Yes    No       2047.16
## 309          Yes   Yes        516.85
## 310           No    No       2220.67
## 311           No    No       3023.89
## 312          Yes    No        400.07
## 313          Yes    No       2246.43
## 314          Yes    No       1859.82
## 315          Yes    No        773.72
## 316          Yes   Yes       1654.47
## 317           No   Yes       1270.06
## 318           No   Yes       1296.10
## 319          Yes    No       3254.47
## 320           No   Yes       4137.79
## 321          Yes    No       1108.25
## 322          Yes   Yes       2574.16
## 323           No    No       3557.54
## 324           No    No       1062.64
## 325          Yes    No       2247.29
## 326          Yes   Yes        557.25
## 327           No   Yes       2690.08
## 328           No    No       1177.60
## 329          Yes    No       2968.28
## 330           No    No       1483.23
## 331          Yes    No       1099.89
## 332          Yes    No       1816.26
## 333          Yes    No        733.20
## 334           No   Yes       2412.09
## 335          Yes    No        823.17
## 336           No    No        638.02
## 337           No    No       1086.30
## 338           No    No        109.11
## 339           No   Yes       7310.18
## 340           No   Yes       1719.93
## 341          Yes   Yes        883.63
## 342           No    No       1841.36
## 343           No   Yes       2971.16
## 344          Yes    No       2442.89
## 345          Yes    No       1399.91
## 346           No   Yes       1884.15
## 347          Yes    No       1924.85
## 348           No    No       1195.35
## 349           No    No        725.16
## 350          Yes   Yes       2151.67
## 351           No   Yes       1691.54
## 352           No    No        616.87
## 353          Yes    No         73.29
## 354           No   Yes       5997.33
## 355          Yes    No        480.24
## 356           No    No       2954.26
## 357          Yes    No       1247.87
## 358          Yes    No       1992.68
## 359          Yes    No       2375.45
## 360           No    No       2704.00
## 361          Yes   Yes       4620.40
## 362           No    No       1633.63
## 363          Yes    No       1289.60
## 364           No   Yes        464.31
## 365           No    No        848.31
## 366          Yes    No       1189.77
## 367          Yes    No        375.56
## 368           No   Yes        283.57
## 369          Yes   Yes        820.78
## 370           No   Yes       4921.27
## 371           No    No       2049.38
## 372           No    No       1450.35
## 373           No    No       1827.32
## 374           No    No         68.32
## 375           No    No       2654.76
## 376          Yes   Yes        494.11
## 377           No    No       2674.51
## 378           No   Yes       1286.92
## 379          Yes    No       1313.62
## 380          Yes    No       2629.13
## 381           No    No       2123.84
## 382           No   Yes       2327.87
## 383          Yes    No        300.50
## 384          Yes   Yes       1758.85
## 385          Yes   Yes        583.67
## 386          Yes    No       1073.04
## 387          Yes    No       2994.91
## 388           No    No       3473.55
## 389           No   Yes        310.55
## 390          Yes    No       2934.33
## 391           No    No       2314.64
## 392          Yes    No       1030.17
## 393          Yes    No       2742.07
## 394          Yes   Yes        964.01
## 395          Yes    No        487.61
## 396           No   Yes       2496.74
## 397           No    No       2839.64
## 398          Yes    No       2170.52
## 399          Yes    No       3679.29
## 400           No   Yes       6427.26
missing_index <- sample(1:n, size = round(0.05 * n))

churn_work$tenure_months[missing_index] <- NA
missing_counts <- colSums(is.na(churn_work))

missing_counts
##     customer_id   tenure_months monthly_charges   contract_type online_security 
##               0              20              20               0               0 
##    tech_support           churn   total_charges 
##               0               0               0
missing_counts[missing_counts > 0]
##   tenure_months monthly_charges 
##              20              20
summary(churn_work)
##   customer_id    tenure_months   monthly_charges  contract_type     
##  Min.   :  1.0   Min.   : 1.00   Min.   : 13.08   Length:400        
##  1st Qu.:100.8   1st Qu.:16.00   1st Qu.: 54.09   Class :character  
##  Median :200.5   Median :30.00   Median : 66.92   Mode  :character  
##  Mean   :200.5   Mean   :30.03   Mean   : 67.97                     
##  3rd Qu.:300.2   3rd Qu.:44.00   3rd Qu.: 81.89                     
##  Max.   :400.0   Max.   :59.00   Max.   :137.16                     
##                  NA's   :20      NA's   :20                         
##  online_security    tech_support          churn           total_charges    
##  Length:400         Length:400         Length:400         Min.   :  57.15  
##  Class :character   Class :character   Class :character   1st Qu.: 962.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :1841.82  
##                                                           Mean   :2021.82  
##                                                           3rd Qu.:2848.20  
##                                                           Max.   :7310.18  
## 
# Create an example vector with NAs
x <- churn_work$tenure_months

# Attempt to calculate the mean without removing NAs (result is NA)
mean(x)
## [1] NA
# Calculate the mean by removing NAs (result is 5.333333)
mean(x, na.rm = TRUE)
## [1] 30.02895

Interpretation (fill in):
- Which variables have missing values? Tenure_months and Monthly Charges each have missing values.______
- Do the missing values appear minor or substantial? ____minor 5% of the observation____
- What would you do about them (omit / impute / investigate)? Impute__

4.2 2.2 Outliers / extreme values (numeric variables)

An outlier is defined by: Distance from the central mass of data Standard deviation, quartiles, mean Supported by a plot or statistic A boxplot with observations above the upper whisker or below the lower whisker suggests potential high-end outliers. Interpreted in business context

No automatic action

# Choose numeric variables (excluding customer_id, which is likely an ID)
numeric_vars <- c("tenure_months", "monthly_charges", "total_charges")

# Summary stats
summary(churn[numeric_vars])
##  tenure_months   monthly_charges  total_charges    
##  Min.   : 1.00   Min.   : 13.08   Min.   :  57.15  
##  1st Qu.:16.00   1st Qu.: 54.31   1st Qu.: 962.00  
##  Median :29.00   Median : 66.91   Median :1841.82  
##  Mean   :29.79   Mean   : 68.01   Mean   :2021.82  
##  3rd Qu.:44.00   3rd Qu.: 81.89   3rd Qu.:2848.20  
##  Max.   :59.00   Max.   :137.16   Max.   :7310.18
summary(churn[c(2,3,8)])
##  tenure_months   monthly_charges  total_charges    
##  Min.   : 1.00   Min.   : 13.08   Min.   :  57.15  
##  1st Qu.:16.00   1st Qu.: 54.31   1st Qu.: 962.00  
##  Median :29.00   Median : 66.91   Median :1841.82  
##  Mean   :29.79   Mean   : 68.01   Mean   :2021.82  
##  3rd Qu.:44.00   3rd Qu.: 81.89   3rd Qu.:2848.20  
##  Max.   :59.00   Max.   :137.16   Max.   :7310.18
# Boxplots
par(mfrow=c(1,3))
boxplot(churn$tenure_months, main="tenure_months", ylab="months")
boxplot(churn$monthly_charges, main="monthly_charges", ylab="charges")
boxplot(churn$total_charges, main="total_charges", ylab="charges")

par(mfrow=c(1,1))


summary(churn$monthly_charges)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   13.08   54.31   66.91   68.01   81.89  137.16
sd(churn$monthly_charges, na.rm = TRUE)
## [1] 21.01485
summary(churn$total_charges)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   57.15  962.00 1841.82 2021.82 2848.20 7310.18
sd(churn$total_charges, na.rm = TRUE)
## [1] 1343.435

Interpretation (fill in):
- Any extreme values or unusual distributions? Monthly charges show moderate right skew, while total charges exhibit substantial right skew with extreme high values.
- Would you transform or cap anything before modeling?Total charges should be log-transformed (or capped) to reduce skewness, while monthly charges can be left untransformed or lightly scaled

If keeping, you can state: “Potential outliers were identified using boxplots and summary statistics; while a small number of extreme values appear, they are plausible given the business context and were therefore retained.

4.3 2.3 Basic correlations (numeric variables)

cor_mat <- cor(churn[numeric_vars], use="pairwise.complete.obs")
round(cor_mat, 3)
##                 tenure_months monthly_charges total_charges
## tenure_months           1.000           0.011         0.842
## monthly_charges         0.011           1.000         0.469
## total_charges           0.842           0.469         1.000

Interpretation (fill in):
- Are any numeric predictors strongly correlated? ________
- What might that imply for modeling later? ________

4.4 2.4 Required visualizations (at least 2)

Below are sample plots. Add/replace with your own if you prefer.

# Histogram
hist(churn$monthly_charges, breaks=20,
     main="Histogram: monthly_charges",
     xlab="monthly_charges")

# Churn rate by contract type
churn_rate_by_contract <- tapply(churn01, churn$contract_type, mean, na.rm=TRUE)
barplot(churn_rate_by_contract,
        main="Churn Rate by Contract Type",
        ylab="Churn Rate", las=2)

Interpretation (fill in):
- Plot 1 shows diffferences churn rate varies by contract type________ and suggests longer contract lower churn______.
- Plot 2 shows differences in customer tenure by churn status____ and suggests suggests that customers with shorter tenure are more likely to churn.____.


5 3) Exploratory Data Analysis (EDA)

You must provide at least three distinct insights, each supported by a plot or statistic.

5.1 Insight 1 (required)

Insight statement (fill in):
> Customers with longer-term contracts show lower churn rates_____ compared to to customers on month-to-month contracts, which , which suggests that contract length plays a key role in customer retention.

# Example idea: churn rate by contract_type (edit or replace)
ins1_tbl <- tapply(churn01, churn$contract_type, mean, na.rm=TRUE)
ins1_tbl
## Month-to-month       One year       Two year 
##      0.2761194      0.2903226      0.2605634
barplot(ins1_tbl, main="Churn Rate by Contract Type", ylab="Churn Rate", las=2)

5.2 Insight 2 (required)

Insight statement (fill in):
> Customers with shorter tenure show higher churn compared to customers with longer tenure, which suggests that churn is more likely early in the customer lifecycle.

# Example idea: compare tenure_months by churn status
mean_no <- mean(churn$tenure_months[churn01==0], na.rm=TRUE)
mean_yes <- mean(churn$tenure_months[churn01==1], na.rm=TRUE)
mean_no; mean_yes
## [1] 28.80345
## [1] 32.37273
boxplot(churn$tenure_months ~ churn01,
        main="tenure_months by churn (0=No, 1=Yes)",
        xlab="churn01", ylab="tenure_months")

5.3 Insight 3 (required)

Insight statement (fill in):
> Customers with Customers with online security show higher churn rates compared to customers without online security, which suggests that online security alone may not be sufficient to reduce churn and may be correlated with higher-risk customer segments.

# Example idea: churn rate by online_security (Yes/No)
ins3_tbl <- tapply(churn01, churn$online_security, mean, na.rm=TRUE)
ins3_tbl
##        No       Yes 
## 0.2453704 0.3097826
barplot(ins3_tbl, main="Churn Rate by Online Security", ylab="Churn Rate", las=2)

## 8) Three “substantive insights” scaffold (simple and repeatable) ----
# Insight Example A: churn rate by a categorical variable
# Replace contract_type with another category if needed.
if ("contract_type" %in% names(churn)) {
  tbl <- table(churn$contract_type, churn01)
  print(tbl)
  # Churn rate by category (again, but now you can show counts too)
  churn_rate <- tapply(churn01, churn$contract_type, mean, na.rm = TRUE)
  print(churn_rate)

  # You can verbalize: "Category X is about ___ compared to category Y"
}
##                 churn01
##                    0   1
##   Month-to-month  97  37
##   One year        88  36
##   Two year       105  37
## Month-to-month       One year       Two year 
##      0.2761194      0.2903226      0.2605634
# Insight Example B: numeric variable difference by churn status
# Use tenure_months (your note) if it exists.
if ("tenure_months" %in% names(churn)) {
  # Compare group means
  mean_churn0 <- mean(churn$tenure_months[churn01 == 0], na.rm = TRUE)
  mean_churn1 <- mean(churn$tenure_months[churn01 == 1], na.rm = TRUE)
  cat("\nMean tenure_months (no churn):", round(mean_churn0, 2), "\n")
  cat("Mean tenure_months (churn):   ", round(mean_churn1, 2), "\n")

  # Simple boxplot by churn status
  boxplot(churn$tenure_months ~ churn01,
          main = "tenure_months by Churn (0=no, 1=yes)",
          xlab = "churn01", ylab = "tenure_months")
}
## 
## Mean tenure_months (no churn): 28.8 
## Mean tenure_months (churn):    32.37

# Insight Example C: Create simple tenure groups WITHOUT cut() (very explicit)
if ("tenure_months" %in% names(churn)) {
  tenure_group <- rep(NA, nrow(churn))
  tenure_group[churn$tenure_months <= 12] <- "0-12"
  tenure_group[churn$tenure_months > 12 & churn$tenure_months <= 36] <- "13-36"
  tenure_group[churn$tenure_months > 36 & churn$tenure_months <= 72] <- "37-72"
  tenure_group[churn$tenure_months > 72] <- "73+"

  # Churn rate by tenure group
  churn_rate_tenure <- tapply(churn01, tenure_group, mean, na.rm = TRUE)
  print(churn_rate_tenure)
  barplot(churn_rate_tenure, main = "Churn Rate by Tenure Group", ylab = "Churn Rate", las = 2)
}
##      0-12     13-36     37-72 
## 0.2465753 0.2429379 0.3266667


6 4) Business Framing (1–2 paragraphs)

Write 1–2 paragraphs answering: - How would your findings influence feature selection? - How would your findings influence modeling choices? - How would your findings influence preprocessing decisions?

Your response (fill in):

(Write here.) The exploratory findings would guide feature selection by emphasizing variables related to customer commitment and service experience. Contract type and tenure show clear relationships with churn and should be retained as key predictors, while service features such as online security and tech support may capture additional variation in churn risk. Identifier variables such as customer ID would be excluded, and highly correlated features like tenure and total charges would be evaluated carefully to avoid redundancy or multicollinearity.

7 These findings also inform modeling and preprocessing decisions. Because churn is a binary outcome, logistic regression is an appropriate modeling choice that allows for interpretable relationships between predictors and churn probability. Preprocessing steps would include encoding categorical variables as dummy variables, imputing the small proportion of missing values in tenure and monthly charges, and addressing skewed distributions through transformation or scaling. In particular, total charges may require log transformation or exclusion due to its strong right skew and close relationship with tenure, ensuring model stability and interpretability.

8 Checklist Before Submitting