Goal: Understand the dataset well enough to justify
later modeling choices.
This lab is not about building predictive
models.
Dataset: week2_churn_data.csv
An insight must include: 1. Evidence (a plot or statistic) 2. A clear pattern (direction and/or magnitude) 3. Business meaning (why it matters)
Use this sentence frame: > “Customers with ___ show ___ compared to , which suggests .”
In this dataset, the outcome variable is: churn (values: Yes/No).
# Confirm churn values
table(churn$churn)
##
## No Yes
## 290 110
# Create a 0/1 version for calculations (Yes=1, No=0)
churn01 <- ifelse(churn$churn == "Yes", 1, 0)
table(churn01)
## churn01
## 0 1
## 290 110
str(churn)
## 'data.frame': 400 obs. of 8 variables:
## $ customer_id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ tenure_months : int 49 29 35 1 23 31 24 9 3 11 ...
## $ monthly_charges: num 68.6 77.4 80.2 58.4 50.8 ...
## $ contract_type : chr "One year" "One year" "Two year" "One year" ...
## $ online_security: chr "Yes" "Yes" "Yes" "Yes" ...
## $ tech_support : chr "Yes" "Yes" "No" "Yes" ...
## $ churn : chr "Yes" "Yes" "No" "No" ...
## $ total_charges : num 3691 2114.3 2915.4 57.1 1259.1 ...
Task: Create a short table (or bullet list) showing each variable’s: - role (Outcome / Predictor) - type (numeric / categorical / binary)
Hint: sapply(churn, class) helps.
var_types <- sapply(churn, class)
var_types
## customer_id tenure_months monthly_charges contract_type online_security
## "integer" "integer" "numeric" "character" "character"
## tech_support churn total_charges
## "character" "character" "numeric"
Your table/list (fill in):
“The original dataset does not contain missing values. To practice realistic data preparation, you will intentionally introduce a small amount of missingness into one numeric variable and then assess and address it.”
# Create a working copy (do NOT overwrite original)
churn_work <- churn
# Introduce ~5% missingness in monthly_charges
set.seed(123)
n <- nrow(churn_work)
n
## [1] 400
missing_index <- sample(1:n, size = round(0.05 * n))
churn_work$monthly_charges[missing_index] <- NA
churn_work
## customer_id tenure_months monthly_charges contract_type online_security
## 1 1 49 68.63 One year Yes
## 2 2 29 77.40 One year Yes
## 3 3 35 80.22 Two year Yes
## 4 4 1 58.44 One year Yes
## 5 5 23 50.76 Month-to-month Yes
## 6 6 31 61.05 Month-to-month No
## 7 7 24 43.53 Two year No
## 8 8 9 59.97 One year No
## 9 9 3 89.90 Month-to-month No
## 10 10 11 53.75 One year Yes
## 11 11 19 115.79 One year Yes
## 12 12 34 69.14 Two year No
## 13 13 15 37.72 Two year Yes
## 14 14 32 NA Two year Yes
## 15 15 22 83.67 Two year Yes
## 16 16 36 64.54 Month-to-month No
## 17 17 13 35.08 Two year Yes
## 18 18 57 81.94 One year No
## 19 19 14 98.28 Month-to-month No
## 20 20 59 41.90 Two year No
## 21 21 42 56.64 One year No
## 22 22 20 78.47 Two year Yes
## 23 23 32 62.67 Two year No
## 24 24 12 54.29 Two year Yes
## 25 25 40 63.34 Two year No
## 26 26 18 76.28 Two year Yes
## 27 27 9 78.40 One year Yes
## 28 28 44 38.92 Two year No
## 29 29 1 56.97 Month-to-month No
## 30 30 37 78.12 Month-to-month No
## 31 31 31 29.17 Two year No
## 32 32 6 61.54 Two year No
## 33 33 40 76.39 One year Yes
## 34 34 46 67.00 Month-to-month No
## 35 35 15 134.90 One year No
## 36 36 55 75.86 Two year No
## 37 37 13 51.15 Two year No
## 38 38 10 94.28 Two year No
## 39 39 56 40.47 One year No
## 40 40 13 94.64 One year Yes
## 41 41 27 66.64 Two year No
## 42 42 7 56.31 Month-to-month Yes
## 43 43 13 71.02 Month-to-month No
## 44 44 12 32.93 One year Yes
## 45 45 24 77.50 Month-to-month No
## 46 46 25 54.65 Month-to-month Yes
## 47 47 40 88.87 Two year No
## 48 48 15 40.06 Two year No
## 49 49 30 67.43 Month-to-month Yes
## 50 50 18 21.47 Two year No
## 51 51 49 81.36 One year Yes
## 52 52 21 92.31 Month-to-month Yes
## 53 53 53 52.79 Two year No
## 54 54 50 48.51 One year No
## 55 55 51 94.60 Month-to-month No
## 56 56 39 86.98 Month-to-month No
## 57 57 50 59.32 One year No
## 58 58 32 55.75 Month-to-month Yes
## 59 59 20 59.79 Two year No
## 60 60 7 28.41 Month-to-month No
## 61 61 34 103.55 One year Yes
## 62 62 17 27.16 Month-to-month No
## 63 63 15 89.82 Two year Yes
## 64 64 8 75.86 One year No
## 65 65 40 62.77 Month-to-month Yes
## 66 66 19 91.32 Two year Yes
## 67 67 32 87.64 Month-to-month No
## 68 68 51 78.05 One year No
## 69 69 33 91.46 Two year No
## 70 70 21 102.56 Two year No
## 71 71 24 49.73 Two year Yes
## 72 72 16 51.57 Month-to-month No
## 73 73 49 75.46 One year Yes
## 74 74 23 89.42 Two year Yes
## 75 75 22 86.92 One year No
## 76 76 40 94.11 Two year Yes
## 77 77 33 73.91 Month-to-month Yes
## 78 78 56 54.97 One year Yes
## 79 79 5 76.33 Two year Yes
## 80 80 25 90.99 Two year Yes
## 81 81 51 64.07 Month-to-month No
## 82 82 48 51.68 Two year Yes
## 83 83 14 27.45 Two year No
## 84 84 46 45.40 Month-to-month No
## 85 85 54 90.77 Two year No
## 86 86 57 65.79 Month-to-month Yes
## 87 87 22 61.03 Month-to-month Yes
## 88 88 28 53.73 One year Yes
## 89 89 28 60.05 Month-to-month Yes
## 90 90 19 NA Two year No
## 91 91 47 NA One year Yes
## 92 92 38 29.03 One year No
## 93 93 1 66.34 Two year Yes
## 94 94 33 81.98 Two year No
## 95 95 29 48.52 Two year Yes
## 96 96 8 52.56 One year Yes
## 97 97 45 111.56 Two year Yes
## 98 98 53 59.50 Month-to-month No
## 99 99 29 60.34 Two year No
## 100 100 31 60.17 Month-to-month No
## 101 101 51 62.68 Two year No
## 102 102 45 37.16 Two year Yes
## 103 103 4 52.19 One year No
## 104 104 56 68.54 One year No
## 105 105 51 70.21 One year Yes
## 106 106 19 78.47 Two year Yes
## 107 107 21 69.02 Month-to-month Yes
## 108 108 8 58.59 Two year No
## 109 109 10 76.01 One year No
## 110 110 18 47.36 Month-to-month Yes
## 111 111 17 109.45 Two year Yes
## 112 112 57 64.15 Two year No
## 113 113 35 57.52 Month-to-month Yes
## 114 114 52 33.38 Month-to-month No
## 115 115 44 101.08 One year No
## 116 116 22 65.55 One year No
## 117 117 35 66.91 Two year Yes
## 118 118 30 NA One year No
## 119 119 37 73.39 Two year Yes
## 120 120 10 81.95 Month-to-month Yes
## 121 121 12 85.88 Month-to-month No
## 122 122 27 63.99 Two year Yes
## 123 123 50 48.87 Month-to-month Yes
## 124 124 31 91.33 Two year Yes
## 125 125 13 43.47 Month-to-month No
## 126 126 43 40.06 Month-to-month No
## 127 127 49 62.28 Month-to-month Yes
## 128 128 54 58.20 Two year No
## 129 129 30 70.17 One year Yes
## 130 130 35 81.87 Month-to-month Yes
## 131 131 43 46.93 Month-to-month No
## 132 132 3 72.48 One year Yes
## 133 133 56 59.20 One year No
## 134 134 3 90.66 One year Yes
## 135 135 21 74.80 Two year Yes
## 136 136 2 58.08 Month-to-month No
## 137 137 27 NA Month-to-month No
## 138 138 44 57.12 Two year No
## 139 139 39 55.78 Two year No
## 140 140 8 66.48 One year No
## 141 141 22 74.01 Month-to-month No
## 142 142 16 26.20 Two year Yes
## 143 143 34 53.24 One year No
## 144 144 4 81.79 Two year No
## 145 145 44 82.38 One year No
## 146 146 15 52.60 One year No
## 147 147 13 90.15 Two year No
## 148 148 21 77.27 Two year No
## 149 149 3 63.18 Two year No
## 150 150 31 85.65 Month-to-month Yes
## 151 151 31 79.17 Month-to-month No
## 152 152 42 13.08 Month-to-month No
## 153 153 1 NA One year Yes
## 154 154 53 76.38 Month-to-month No
## 155 155 27 51.27 Month-to-month Yes
## 156 156 51 83.46 One year No
## 157 157 50 78.84 One year No
## 158 158 3 76.97 Month-to-month No
## 159 159 49 49.48 One year No
## 160 160 54 93.60 Two year No
## 161 161 37 65.55 Two year Yes
## 162 162 7 31.73 Two year No
## 163 163 10 96.88 One year No
## 164 164 51 82.24 Month-to-month Yes
## 165 165 43 74.98 Two year No
## 166 166 58 56.00 One year Yes
## 167 167 17 72.09 Two year No
## 168 168 41 65.44 Month-to-month Yes
## 169 169 6 104.19 Month-to-month No
## 170 170 21 50.49 Two year No
## 171 171 11 94.68 One year Yes
## 172 172 53 62.76 Month-to-month Yes
## 173 173 42 86.30 Month-to-month Yes
## 174 174 15 41.25 Month-to-month No
## 175 175 43 85.74 Two year No
## 176 176 9 78.35 Two year No
## 177 177 57 83.32 Month-to-month No
## 178 178 48 67.86 Month-to-month Yes
## 179 179 39 NA Two year Yes
## 180 180 18 75.40 One year No
## 181 181 20 81.21 One year Yes
## 182 182 3 66.92 Month-to-month No
## 183 183 58 63.79 Two year No
## 184 184 5 90.14 One year Yes
## 185 185 59 77.81 Two year No
## 186 186 6 35.44 Month-to-month No
## 187 187 54 119.57 Month-to-month No
## 188 188 43 100.70 One year No
## 189 189 23 81.21 One year Yes
## 190 190 16 57.52 Two year No
## 191 191 54 40.96 One year Yes
## 192 192 34 73.97 Month-to-month No
## 193 193 34 121.40 Two year No
## 194 194 3 29.87 Two year No
## 195 195 24 NA One year Yes
## 196 196 27 62.30 Two year Yes
## 197 197 27 NA Month-to-month Yes
## 198 198 27 104.31 One year Yes
## 199 199 50 61.03 Month-to-month Yes
## 200 200 9 63.47 Month-to-month Yes
## 201 201 42 91.15 Month-to-month Yes
## 202 202 32 35.81 Two year Yes
## 203 203 8 62.90 Month-to-month No
## 204 204 40 78.80 One year Yes
## 205 205 47 57.06 Month-to-month Yes
## 206 206 28 85.18 Two year Yes
## 207 207 39 77.17 Month-to-month No
## 208 208 16 71.90 Two year Yes
## 209 209 55 86.52 Two year No
## 210 210 24 91.93 One year No
## 211 211 1 71.79 One year No
## 212 212 22 52.00 One year No
## 213 213 35 63.46 One year Yes
## 214 214 44 114.32 One year No
## 215 215 9 69.37 One year Yes
## 216 216 35 84.74 One year Yes
## 217 217 50 83.47 Month-to-month No
## 218 218 16 103.75 Two year Yes
## 219 219 33 48.24 Month-to-month No
## 220 220 45 57.02 Month-to-month Yes
## 221 221 9 60.29 Two year Yes
## 222 222 57 49.18 Month-to-month Yes
## 223 223 32 60.30 One year No
## 224 224 55 78.67 Month-to-month Yes
## 225 225 12 64.80 One year No
## 226 226 53 64.08 Month-to-month Yes
## 227 227 22 57.56 Two year No
## 228 228 25 57.48 One year Yes
## 229 229 53 NA One year No
## 230 230 46 72.99 Month-to-month Yes
## 231 231 27 63.68 One year No
## 232 232 41 54.32 Month-to-month No
## 233 233 36 68.72 Two year Yes
## 234 234 27 75.37 One year Yes
## 235 235 29 71.38 Two year No
## 236 236 45 40.60 One year No
## 237 237 20 46.90 One year No
## 238 238 45 61.20 Month-to-month No
## 239 239 13 76.04 Two year No
## 240 240 17 23.92 Month-to-month Yes
## 241 241 18 93.38 Two year Yes
## 242 242 15 43.47 Two year No
## 243 243 48 38.55 Month-to-month No
## 244 244 56 NA Month-to-month No
## 245 245 22 90.97 Two year No
## 246 246 6 37.38 Month-to-month No
## 247 247 4 65.63 Month-to-month No
## 248 248 31 53.86 Month-to-month Yes
## 249 249 15 78.90 Two year Yes
## 250 250 40 65.69 One year No
## 251 251 16 95.25 Two year No
## 252 252 3 116.10 Month-to-month No
## 253 253 53 49.82 Month-to-month Yes
## 254 254 49 49.62 Two year Yes
## 255 255 43 65.24 Month-to-month No
## 256 256 9 NA Two year Yes
## 257 257 54 75.03 Two year Yes
## 258 258 31 98.45 Two year Yes
## 259 259 48 55.99 Two year Yes
## 260 260 39 86.11 One year No
## 261 261 29 81.87 One year Yes
## 262 262 16 49.73 Month-to-month No
## 263 263 47 103.23 Month-to-month Yes
## 264 264 57 99.59 Two year Yes
## 265 265 41 75.08 One year Yes
## 266 266 17 62.79 One year No
## 267 267 56 99.22 Month-to-month Yes
## 268 268 40 100.42 One year Yes
## 269 269 43 94.87 One year No
## 270 270 2 65.74 Month-to-month Yes
## 271 271 25 68.85 Month-to-month No
## 272 272 26 56.34 Two year Yes
## 273 273 44 54.05 Two year Yes
## 274 274 50 55.32 Month-to-month Yes
## 275 275 5 115.91 Month-to-month Yes
## 276 276 54 129.33 One year No
## 277 277 15 82.67 One year Yes
## 278 278 3 77.84 Month-to-month No
## 279 279 58 47.98 Two year Yes
## 280 280 11 83.75 Month-to-month No
## 281 281 55 48.17 One year Yes
## 282 282 32 83.25 Month-to-month No
## 283 283 48 104.98 Month-to-month Yes
## 284 284 41 41.13 Month-to-month Yes
## 285 285 10 43.67 Month-to-month Yes
## 286 286 15 60.07 Month-to-month Yes
## 287 287 37 67.56 One year Yes
## 288 288 5 26.30 Month-to-month No
## 289 289 43 90.30 Month-to-month Yes
## 290 290 44 51.83 Month-to-month No
## 291 291 48 76.11 Month-to-month Yes
## 292 292 20 54.11 Two year Yes
## 293 293 4 62.29 Month-to-month Yes
## 294 294 57 51.76 Month-to-month No
## 295 295 24 15.96 Month-to-month Yes
## 296 296 42 53.56 Two year No
## 297 297 49 88.72 Two year No
## 298 298 36 45.36 One year No
## 299 299 49 NA One year No
## 300 300 44 72.57 One year No
## 301 301 30 56.41 Two year No
## 302 302 37 53.34 Two year Yes
## 303 303 29 72.44 Two year Yes
## 304 304 18 36.17 One year No
## 305 305 26 58.92 One year No
## 306 306 30 NA One year Yes
## 307 307 12 80.11 Two year No
## 308 308 29 68.26 One year No
## 309 309 11 51.63 Month-to-month Yes
## 310 310 39 54.70 One year No
## 311 311 35 90.32 Two year No
## 312 312 8 52.56 Two year Yes
## 313 313 37 57.81 Month-to-month No
## 314 314 19 91.94 Two year No
## 315 315 30 24.53 Month-to-month Yes
## 316 316 41 38.28 Month-to-month Yes
## 317 317 57 21.10 Two year Yes
## 318 318 43 32.27 One year No
## 319 319 51 60.55 One year Yes
## 320 320 59 67.02 Month-to-month No
## 321 321 19 62.32 Two year Yes
## 322 322 45 62.10 One year Yes
## 323 323 55 62.48 Two year Yes
## 324 324 25 39.37 Month-to-month No
## 325 325 27 83.79 One year Yes
## 326 326 15 37.03 One year No
## 327 327 39 76.27 Month-to-month No
## 328 328 29 NA Two year Yes
## 329 329 54 52.07 Month-to-month Yes
## 330 330 21 77.36 Two year No
## 331 331 14 71.69 Two year Yes
## 332 332 23 81.98 Two year Yes
## 333 333 10 69.96 Two year Yes
## 334 334 34 76.99 Two year Yes
## 335 335 35 22.65 One year No
## 336 336 11 60.06 Two year No
## 337 337 16 63.46 Month-to-month No
## 338 338 2 57.97 Two year No
## 339 339 50 137.16 Month-to-month No
## 340 340 17 95.76 One year No
## 341 341 15 63.09 Month-to-month No
## 342 342 36 46.82 One year Yes
## 343 343 50 61.78 One year No
## 344 344 32 81.49 Month-to-month Yes
## 345 345 44 33.70 One year Yes
## 346 346 26 72.79 Two year Yes
## 347 347 29 70.10 One year No
## 348 348 14 NA One year No
## 349 349 15 50.04 Two year No
## 350 350 34 66.90 Two year No
## 351 351 26 67.22 One year Yes
## 352 352 10 64.83 One year Yes
## 353 353 1 67.50 One year No
## 354 354 55 101.56 One year No
## 355 355 8 NA Two year Yes
## 356 356 42 68.06 Month-to-month No
## 357 357 39 31.13 One year No
## 358 358 43 47.31 Two year No
## 359 359 31 84.13 One year No
## 360 360 30 99.49 One year No
## 361 361 41 106.14 Month-to-month No
## 362 362 25 69.96 Two year No
## 363 363 18 76.44 Two year Yes
## 364 364 7 69.11 One year No
## 365 365 11 71.57 Two year No
## 366 366 12 93.67 Two year Yes
## 367 367 5 70.94 Month-to-month Yes
## 368 368 6 49.97 One year Yes
## 369 369 22 34.93 One year Yes
## 370 370 55 82.82 One year Yes
## 371 371 52 41.63 Month-to-month Yes
## 372 372 18 77.20 Two year No
## 373 373 25 78.37 Two year No
## 374 374 1 NA One year No
## 375 375 51 48.97 One year No
## 376 376 9 60.69 One year Yes
## 377 377 35 74.45 Two year No
## 378 378 17 74.78 One year No
## 379 379 15 89.34 One year No
## 380 380 37 65.66 One year No
## 381 381 25 85.38 Two year Yes
## 382 382 43 53.61 Month-to-month No
## 383 383 3 106.75 Two year No
## 384 384 21 80.23 Two year No
## 385 385 22 25.70 Two year Yes
## 386 386 17 62.82 Month-to-month No
## 387 387 35 89.73 Month-to-month No
## 388 388 53 NA One year Yes
## 389 389 3 103.85 Two year No
## 390 390 53 55.24 Two year Yes
## 391 391 29 77.06 Month-to-month No
## 392 392 14 67.17 One year No
## 393 393 42 70.35 Month-to-month Yes
## 394 394 18 56.68 Two year Yes
## 395 395 16 30.97 Two year Yes
## 396 396 28 88.76 One year No
## 397 397 53 56.88 Month-to-month No
## 398 398 27 84.70 Month-to-month Yes
## 399 399 41 NA Month-to-month No
## 400 400 59 104.29 Month-to-month Yes
## tech_support churn total_charges
## 1 Yes Yes 3691.04
## 2 Yes Yes 2114.31
## 3 No No 2915.41
## 4 Yes No 57.15
## 5 No No 1259.07
## 6 Yes No 1846.72
## 7 No No 1092.90
## 8 Yes No 542.48
## 9 Yes No 296.32
## 10 Yes No 551.88
## 11 Yes Yes 2106.81
## 12 No No 2135.23
## 13 Yes No 621.59
## 14 No No 3447.74
## 15 No No 1999.76
## 16 Yes No 2310.44
## 17 No No 440.77
## 18 Yes Yes 4488.92
## 19 Yes No 1295.28
## 20 Yes Yes 2499.29
## 21 Yes No 2533.76
## 22 No No 1698.45
## 23 Yes No 1842.28
## 24 Yes Yes 647.74
## 25 No Yes 2355.42
## 26 Yes Yes 1313.89
## 27 No Yes 707.04
## 28 Yes No 1712.88
## 29 No No 58.64
## 30 Yes Yes 2695.03
## 31 Yes Yes 854.04
## 32 Yes No 399.58
## 33 No No 3274.77
## 34 No No 3323.81
## 35 Yes No 2016.07
## 36 No Yes 4452.85
## 37 Yes No 667.95
## 38 No Yes 892.55
## 39 No No 2381.27
## 40 Yes No 1151.78
## 41 No No 1853.56
## 42 No No 374.74
## 43 No No 890.24
## 44 Yes No 372.93
## 45 Yes No 1692.48
## 46 Yes No 1305.07
## 47 No Yes 3568.65
## 48 No No 545.21
## 49 No No 2174.75
## 50 No Yes 364.29
## 51 No Yes 4123.53
## 52 Yes No 1939.21
## 53 Yes Yes 2884.25
## 54 No Yes 2384.10
## 55 No No 4910.25
## 56 Yes No 3272.61
## 57 Yes No 3185.87
## 58 Yes Yes 1695.50
## 59 Yes No 1224.72
## 60 No No 211.36
## 61 No No 3750.18
## 62 Yes Yes 464.30
## 63 No No 1279.29
## 64 Yes No 552.53
## 65 Yes No 2305.49
## 66 Yes No 1837.04
## 67 No No 2612.22
## 68 No No 3622.91
## 69 Yes No 2724.58
## 70 Yes No 2157.15
## 71 Yes No 1232.41
## 72 Yes No 803.06
## 73 Yes Yes 3527.17
## 74 Yes No 1967.53
## 75 Yes No 2098.09
## 76 Yes Yes 3497.02
## 77 No No 2535.89
## 78 No Yes 3020.74
## 79 No Yes 381.50
## 80 No No 2158.19
## 81 Yes Yes 3514.90
## 82 No No 2339.99
## 83 Yes Yes 387.62
## 84 Yes Yes 2138.94
## 85 No No 4937.43
## 86 Yes No 3396.48
## 87 Yes Yes 1266.50
## 88 Yes No 1531.76
## 89 Yes No 1599.59
## 90 Yes No 1599.48
## 91 No No 3290.23
## 92 Yes Yes 1133.89
## 93 Yes Yes 65.59
## 94 Yes No 2660.40
## 95 No No 1301.97
## 96 Yes No 392.93
## 97 Yes No 4530.82
## 98 No No 2982.99
## 99 No No 1781.79
## 100 No No 1802.34
## 101 No No 3514.94
## 102 No No 1572.10
## 103 Yes Yes 214.71
## 104 Yes No 3657.09
## 105 Yes No 3514.09
## 106 No No 1623.77
## 107 Yes Yes 1559.61
## 108 No No 478.94
## 109 Yes No 742.49
## 110 No No 817.13
## 111 Yes No 1801.68
## 112 No Yes 3350.63
## 113 Yes No 1963.81
## 114 No No 1613.89
## 115 Yes No 4546.62
## 116 Yes No 1374.66
## 117 Yes No 2145.34
## 118 No No 2092.52
## 119 Yes No 2579.30
## 120 Yes No 881.02
## 121 No Yes 1040.46
## 122 No No 1774.49
## 123 No No 2493.35
## 124 No No 2735.23
## 125 No Yes 583.99
## 126 No No 1642.37
## 127 Yes No 2880.42
## 128 No No 3417.13
## 129 No No 2148.03
## 130 Yes No 3006.07
## 131 Yes Yes 2044.35
## 132 No No 213.92
## 133 Yes No 3026.48
## 134 No Yes 254.11
## 135 Yes No 1489.28
## 136 No No 119.73
## 137 No No 1557.65
## 138 Yes No 2638.46
## 139 No No 2130.23
## 140 Yes No 576.72
## 141 No No 1503.04
## 142 No No 406.11
## 143 Yes Yes 1981.67
## 144 No No 356.94
## 145 No No 3421.98
## 146 Yes No 775.22
## 147 Yes No 1199.48
## 148 Yes No 1564.62
## 149 No No 177.72
## 150 No Yes 2397.76
## 151 No Yes 2548.55
## 152 No No 517.78
## 153 No No 81.35
## 154 No Yes 4216.19
## 155 Yes Yes 1449.90
## 156 Yes Yes 3857.58
## 157 No No 3901.66
## 158 Yes No 232.23
## 159 No Yes 2600.89
## 160 No No 4889.94
## 161 Yes No 2433.55
## 162 Yes No 236.63
## 163 Yes No 922.25
## 164 No Yes 4301.23
## 165 No No 3333.42
## 166 No No 3220.07
## 167 Yes No 1168.31
## 168 No Yes 2568.99
## 169 Yes No 664.70
## 170 Yes No 1131.19
## 171 Yes Yes 1025.40
## 172 No Yes 3480.77
## 173 No No 3559.65
## 174 No No 559.05
## 175 Yes No 3341.23
## 176 No Yes 643.46
## 177 Yes Yes 4608.38
## 178 No No 3201.20
## 179 Yes No 2938.08
## 180 No No 1477.88
## 181 Yes No 1656.81
## 182 Yes No 187.89
## 183 No No 3696.04
## 184 No No 482.20
## 185 No No 4836.18
## 186 No No 212.53
## 187 Yes No 5955.77
## 188 No No 4031.48
## 189 Yes No 1736.52
## 190 Yes Yes 968.85
## 191 Yes Yes 2426.85
## 192 No No 2703.32
## 193 No No 3815.08
## 194 No No 85.18
## 195 Yes No 1047.77
## 196 Yes Yes 1649.39
## 197 Yes No 1047.64
## 198 No No 3090.90
## 199 No Yes 2915.58
## 200 No No 522.71
## 201 No No 4145.29
## 202 No Yes 1037.66
## 203 No No 461.82
## 204 No No 3399.51
## 205 No No 2893.77
## 206 No Yes 2617.07
## 207 Yes No 3281.83
## 208 No Yes 1166.69
## 209 No No 4665.48
## 210 Yes No 2066.10
## 211 Yes No 68.18
## 212 Yes No 1144.96
## 213 No No 2191.97
## 214 Yes No 4591.68
## 215 No No 605.85
## 216 No No 2873.86
## 217 Yes Yes 3783.36
## 218 Yes Yes 1642.96
## 219 No No 1447.36
## 220 Yes No 2540.47
## 221 No No 588.37
## 222 No No 2738.20
## 223 No No 1751.16
## 224 No No 3918.10
## 225 Yes No 801.02
## 226 Yes No 3646.96
## 227 Yes Yes 1235.34
## 228 No Yes 1431.47
## 229 Yes Yes 4364.65
## 230 Yes No 3366.68
## 231 Yes No 1835.25
## 232 No Yes 2159.21
## 233 Yes Yes 2522.45
## 234 No No 1900.61
## 235 Yes No 2122.74
## 236 No No 1775.37
## 237 Yes No 1031.59
## 238 Yes No 2911.20
## 239 Yes No 955.96
## 240 No Yes 433.79
## 241 Yes No 1642.13
## 242 Yes No 686.81
## 243 No No 1751.71
## 244 No Yes 3540.94
## 245 Yes No 2017.06
## 246 No No 219.44
## 247 Yes No 280.28
## 248 Yes Yes 1609.13
## 249 Yes No 1192.68
## 250 No No 2731.86
## 251 Yes No 1419.29
## 252 Yes No 347.52
## 253 No No 2701.38
## 254 No No 2639.72
## 255 Yes No 2553.15
## 256 Yes No 582.43
## 257 Yes Yes 3869.08
## 258 Yes Yes 3151.67
## 259 No No 2798.73
## 260 No No 3286.17
## 261 Yes No 2408.16
## 262 Yes No 844.13
## 263 No Yes 5263.02
## 264 Yes No 6107.17
## 265 No Yes 2813.32
## 266 No No 1158.82
## 267 Yes No 5039.21
## 268 Yes No 4222.66
## 269 Yes No 4146.27
## 270 Yes No 140.74
## 271 No No 1697.42
## 272 No No 1363.39
## 273 No No 2322.93
## 274 Yes No 2621.37
## 275 No No 564.59
## 276 No No 7023.40
## 277 Yes No 1271.58
## 278 Yes No 251.73
## 279 No No 3025.16
## 280 No Yes 980.39
## 281 Yes Yes 2449.76
## 282 Yes No 2638.06
## 283 No No 5330.28
## 284 Yes No 1792.01
## 285 Yes No 437.20
## 286 No Yes 835.65
## 287 Yes No 2739.73
## 288 No No 141.04
## 289 No No 3934.86
## 290 Yes No 2237.58
## 291 No No 3558.06
## 292 Yes No 1084.22
## 293 Yes Yes 255.42
## 294 No No 2794.20
## 295 Yes No 401.31
## 296 No No 2047.57
## 297 Yes No 4218.24
## 298 No No 1549.11
## 299 No No 3401.60
## 300 No No 3179.91
## 301 No No 1566.94
## 302 Yes Yes 2142.95
## 303 No Yes 2166.68
## 304 No No 618.37
## 305 Yes No 1655.20
## 306 Yes Yes 2212.42
## 307 Yes Yes 949.21
## 308 Yes No 2047.16
## 309 Yes Yes 516.85
## 310 No No 2220.67
## 311 No No 3023.89
## 312 Yes No 400.07
## 313 Yes No 2246.43
## 314 Yes No 1859.82
## 315 Yes No 773.72
## 316 Yes Yes 1654.47
## 317 No Yes 1270.06
## 318 No Yes 1296.10
## 319 Yes No 3254.47
## 320 No Yes 4137.79
## 321 Yes No 1108.25
## 322 Yes Yes 2574.16
## 323 No No 3557.54
## 324 No No 1062.64
## 325 Yes No 2247.29
## 326 Yes Yes 557.25
## 327 No Yes 2690.08
## 328 No No 1177.60
## 329 Yes No 2968.28
## 330 No No 1483.23
## 331 Yes No 1099.89
## 332 Yes No 1816.26
## 333 Yes No 733.20
## 334 No Yes 2412.09
## 335 Yes No 823.17
## 336 No No 638.02
## 337 No No 1086.30
## 338 No No 109.11
## 339 No Yes 7310.18
## 340 No Yes 1719.93
## 341 Yes Yes 883.63
## 342 No No 1841.36
## 343 No Yes 2971.16
## 344 Yes No 2442.89
## 345 Yes No 1399.91
## 346 No Yes 1884.15
## 347 Yes No 1924.85
## 348 No No 1195.35
## 349 No No 725.16
## 350 Yes Yes 2151.67
## 351 No Yes 1691.54
## 352 No No 616.87
## 353 Yes No 73.29
## 354 No Yes 5997.33
## 355 Yes No 480.24
## 356 No No 2954.26
## 357 Yes No 1247.87
## 358 Yes No 1992.68
## 359 Yes No 2375.45
## 360 No No 2704.00
## 361 Yes Yes 4620.40
## 362 No No 1633.63
## 363 Yes No 1289.60
## 364 No Yes 464.31
## 365 No No 848.31
## 366 Yes No 1189.77
## 367 Yes No 375.56
## 368 No Yes 283.57
## 369 Yes Yes 820.78
## 370 No Yes 4921.27
## 371 No No 2049.38
## 372 No No 1450.35
## 373 No No 1827.32
## 374 No No 68.32
## 375 No No 2654.76
## 376 Yes Yes 494.11
## 377 No No 2674.51
## 378 No Yes 1286.92
## 379 Yes No 1313.62
## 380 Yes No 2629.13
## 381 No No 2123.84
## 382 No Yes 2327.87
## 383 Yes No 300.50
## 384 Yes Yes 1758.85
## 385 Yes Yes 583.67
## 386 Yes No 1073.04
## 387 Yes No 2994.91
## 388 No No 3473.55
## 389 No Yes 310.55
## 390 Yes No 2934.33
## 391 No No 2314.64
## 392 Yes No 1030.17
## 393 Yes No 2742.07
## 394 Yes Yes 964.01
## 395 Yes No 487.61
## 396 No Yes 2496.74
## 397 No No 2839.64
## 398 Yes No 2170.52
## 399 Yes No 3679.29
## 400 No Yes 6427.26
missing_index <- sample(1:n, size = round(0.05 * n))
churn_work$tenure_months[missing_index] <- NA
missing_counts <- colSums(is.na(churn_work))
missing_counts
## customer_id tenure_months monthly_charges contract_type online_security
## 0 20 20 0 0
## tech_support churn total_charges
## 0 0 0
missing_counts[missing_counts > 0]
## tenure_months monthly_charges
## 20 20
summary(churn_work)
## customer_id tenure_months monthly_charges contract_type
## Min. : 1.0 Min. : 1.00 Min. : 13.08 Length:400
## 1st Qu.:100.8 1st Qu.:16.00 1st Qu.: 54.09 Class :character
## Median :200.5 Median :30.00 Median : 66.92 Mode :character
## Mean :200.5 Mean :30.03 Mean : 67.97
## 3rd Qu.:300.2 3rd Qu.:44.00 3rd Qu.: 81.89
## Max. :400.0 Max. :59.00 Max. :137.16
## NA's :20 NA's :20
## online_security tech_support churn total_charges
## Length:400 Length:400 Length:400 Min. : 57.15
## Class :character Class :character Class :character 1st Qu.: 962.00
## Mode :character Mode :character Mode :character Median :1841.82
## Mean :2021.82
## 3rd Qu.:2848.20
## Max. :7310.18
##
# Create an example vector with NAs
x <- churn_work$tenure_months
# Attempt to calculate the mean without removing NAs (result is NA)
mean(x)
## [1] NA
# Calculate the mean by removing NAs (result is 5.333333)
mean(x, na.rm = TRUE)
## [1] 30.02895
Interpretation (fill in):
- Which variables have missing values? Tenure_months and Monthly Charges
each have missing values.______
- Do the missing values appear minor or substantial? ____minor 5% of the
observation____
- What would you do about them (omit / impute / investigate)?
Impute__
An outlier is defined by: Distance from the central mass of data Standard deviation, quartiles, mean Supported by a plot or statistic A boxplot with observations above the upper whisker or below the lower whisker suggests potential high-end outliers. Interpreted in business context
No automatic action
# Choose numeric variables (excluding customer_id, which is likely an ID)
numeric_vars <- c("tenure_months", "monthly_charges", "total_charges")
# Summary stats
summary(churn[numeric_vars])
## tenure_months monthly_charges total_charges
## Min. : 1.00 Min. : 13.08 Min. : 57.15
## 1st Qu.:16.00 1st Qu.: 54.31 1st Qu.: 962.00
## Median :29.00 Median : 66.91 Median :1841.82
## Mean :29.79 Mean : 68.01 Mean :2021.82
## 3rd Qu.:44.00 3rd Qu.: 81.89 3rd Qu.:2848.20
## Max. :59.00 Max. :137.16 Max. :7310.18
summary(churn[c(2,3,8)])
## tenure_months monthly_charges total_charges
## Min. : 1.00 Min. : 13.08 Min. : 57.15
## 1st Qu.:16.00 1st Qu.: 54.31 1st Qu.: 962.00
## Median :29.00 Median : 66.91 Median :1841.82
## Mean :29.79 Mean : 68.01 Mean :2021.82
## 3rd Qu.:44.00 3rd Qu.: 81.89 3rd Qu.:2848.20
## Max. :59.00 Max. :137.16 Max. :7310.18
# Boxplots
par(mfrow=c(1,3))
boxplot(churn$tenure_months, main="tenure_months", ylab="months")
boxplot(churn$monthly_charges, main="monthly_charges", ylab="charges")
boxplot(churn$total_charges, main="total_charges", ylab="charges")
par(mfrow=c(1,1))
summary(churn$monthly_charges)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13.08 54.31 66.91 68.01 81.89 137.16
sd(churn$monthly_charges, na.rm = TRUE)
## [1] 21.01485
summary(churn$total_charges)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.15 962.00 1841.82 2021.82 2848.20 7310.18
sd(churn$total_charges, na.rm = TRUE)
## [1] 1343.435
Interpretation (fill in):
- Any extreme values or unusual distributions? Monthly charges show
moderate right skew, while total charges exhibit substantial right skew
with extreme high values.
- Would you transform or cap anything before modeling?Total charges
should be log-transformed (or capped) to reduce skewness, while monthly
charges can be left untransformed or lightly scaled
If keeping, you can state: “Potential outliers were identified using boxplots and summary statistics; while a small number of extreme values appear, they are plausible given the business context and were therefore retained.
cor_mat <- cor(churn[numeric_vars], use="pairwise.complete.obs")
round(cor_mat, 3)
## tenure_months monthly_charges total_charges
## tenure_months 1.000 0.011 0.842
## monthly_charges 0.011 1.000 0.469
## total_charges 0.842 0.469 1.000
Interpretation (fill in):
- Are any numeric predictors strongly correlated? ________
- What might that imply for modeling later? ________
Below are sample plots. Add/replace with your own if you prefer.
# Histogram
hist(churn$monthly_charges, breaks=20,
main="Histogram: monthly_charges",
xlab="monthly_charges")
# Churn rate by contract type
churn_rate_by_contract <- tapply(churn01, churn$contract_type, mean, na.rm=TRUE)
barplot(churn_rate_by_contract,
main="Churn Rate by Contract Type",
ylab="Churn Rate", las=2)
Interpretation (fill in):
- Plot 1 shows diffferences churn rate varies by contract type________
and suggests longer contract lower churn______.
- Plot 2 shows differences in customer tenure by churn
status____ and suggests suggests that customers with
shorter tenure are more likely to churn.____.
You must provide at least three distinct insights, each supported by a plot or statistic.
Insight statement (fill in):
> Customers with longer-term contracts show lower churn
rates_____ compared to to customers on month-to-month
contracts, which , which suggests that contract length plays a key role
in customer retention.
# Example idea: churn rate by contract_type (edit or replace)
ins1_tbl <- tapply(churn01, churn$contract_type, mean, na.rm=TRUE)
ins1_tbl
## Month-to-month One year Two year
## 0.2761194 0.2903226 0.2605634
barplot(ins1_tbl, main="Churn Rate by Contract Type", ylab="Churn Rate", las=2)
Insight statement (fill in):
> Customers with shorter tenure show higher churn compared to
customers with longer tenure, which suggests that churn is more likely
early in the customer lifecycle.
# Example idea: compare tenure_months by churn status
mean_no <- mean(churn$tenure_months[churn01==0], na.rm=TRUE)
mean_yes <- mean(churn$tenure_months[churn01==1], na.rm=TRUE)
mean_no; mean_yes
## [1] 28.80345
## [1] 32.37273
boxplot(churn$tenure_months ~ churn01,
main="tenure_months by churn (0=No, 1=Yes)",
xlab="churn01", ylab="tenure_months")
Insight statement (fill in):
> Customers with Customers with online security show higher churn
rates compared to customers without online security, which suggests that
online security alone may not be sufficient to reduce churn and may be
correlated with higher-risk customer segments.
# Example idea: churn rate by online_security (Yes/No)
ins3_tbl <- tapply(churn01, churn$online_security, mean, na.rm=TRUE)
ins3_tbl
## No Yes
## 0.2453704 0.3097826
barplot(ins3_tbl, main="Churn Rate by Online Security", ylab="Churn Rate", las=2)
## 8) Three “substantive insights” scaffold (simple and repeatable) ----
# Insight Example A: churn rate by a categorical variable
# Replace contract_type with another category if needed.
if ("contract_type" %in% names(churn)) {
tbl <- table(churn$contract_type, churn01)
print(tbl)
# Churn rate by category (again, but now you can show counts too)
churn_rate <- tapply(churn01, churn$contract_type, mean, na.rm = TRUE)
print(churn_rate)
# You can verbalize: "Category X is about ___ compared to category Y"
}
## churn01
## 0 1
## Month-to-month 97 37
## One year 88 36
## Two year 105 37
## Month-to-month One year Two year
## 0.2761194 0.2903226 0.2605634
# Insight Example B: numeric variable difference by churn status
# Use tenure_months (your note) if it exists.
if ("tenure_months" %in% names(churn)) {
# Compare group means
mean_churn0 <- mean(churn$tenure_months[churn01 == 0], na.rm = TRUE)
mean_churn1 <- mean(churn$tenure_months[churn01 == 1], na.rm = TRUE)
cat("\nMean tenure_months (no churn):", round(mean_churn0, 2), "\n")
cat("Mean tenure_months (churn): ", round(mean_churn1, 2), "\n")
# Simple boxplot by churn status
boxplot(churn$tenure_months ~ churn01,
main = "tenure_months by Churn (0=no, 1=yes)",
xlab = "churn01", ylab = "tenure_months")
}
##
## Mean tenure_months (no churn): 28.8
## Mean tenure_months (churn): 32.37
# Insight Example C: Create simple tenure groups WITHOUT cut() (very explicit)
if ("tenure_months" %in% names(churn)) {
tenure_group <- rep(NA, nrow(churn))
tenure_group[churn$tenure_months <= 12] <- "0-12"
tenure_group[churn$tenure_months > 12 & churn$tenure_months <= 36] <- "13-36"
tenure_group[churn$tenure_months > 36 & churn$tenure_months <= 72] <- "37-72"
tenure_group[churn$tenure_months > 72] <- "73+"
# Churn rate by tenure group
churn_rate_tenure <- tapply(churn01, tenure_group, mean, na.rm = TRUE)
print(churn_rate_tenure)
barplot(churn_rate_tenure, main = "Churn Rate by Tenure Group", ylab = "Churn Rate", las = 2)
}
## 0-12 13-36 37-72
## 0.2465753 0.2429379 0.3266667
Write 1–2 paragraphs answering: - How would your findings influence feature selection? - How would your findings influence modeling choices? - How would your findings influence preprocessing decisions?
Your response (fill in):
(Write here.) The exploratory findings would guide feature selection by emphasizing variables related to customer commitment and service experience. Contract type and tenure show clear relationships with churn and should be retained as key predictors, while service features such as online security and tech support may capture additional variation in churn risk. Identifier variables such as customer ID would be excluded, and highly correlated features like tenure and total charges would be evaluated carefully to avoid redundancy or multicollinearity.