I am doing this analysis to see if there is a correlation between In State Tuition and the Average ACT Score of accepted students.
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
educ <- read.csv("/Users/kimberlyhatlestad/Data Mining/Educ4HW.csv")
head(educ)
## College State Public MathSAT VerbSAT ACT Received
## 1 Christendom College VA 0 568 568 23 81
## 2 Mount Vernon College DC 0 440 402 20 149
## 3 King's College NY 0 442 465 25 356
## 4 Alaska Pacific University AK 0 490 482 20 193
## 5 Sierra Nevada College NV 0 400 400 21 200
## 6 Trinity College DC 0 470 480 22 247
## Accepted Enrolled Pct10 Pct25 FTUG PTUG ISTuit OSTuit Books PhDs SFRatio
## 1 72 51 33 71 139 3 8730 8730 400 92 9.3
## 2 70 61 15 35 203 138 13780 13780 500 84 5.7
## 3 233 53 16 24 246 18 8190 8190 500 100 7.0
## 4 146 55 16 44 249 869 7560 7560 800 76 11.9
## 5 160 100 5 25 300 50 7500 7500 500 45 7.5
## 6 189 100 19 49 309 639 11412 11412 500 89 8.3
Getting rid of alphanumeric variables State and School Name.
educ <- educ[,-2]
head(educ)
## College Public MathSAT VerbSAT ACT Received Accepted
## 1 Christendom College 0 568 568 23 81 72
## 2 Mount Vernon College 0 440 402 20 149 70
## 3 King's College 0 442 465 25 356 233
## 4 Alaska Pacific University 0 490 482 20 193 146
## 5 Sierra Nevada College 0 400 400 21 200 160
## 6 Trinity College 0 470 480 22 247 189
## Enrolled Pct10 Pct25 FTUG PTUG ISTuit OSTuit Books PhDs SFRatio
## 1 51 33 71 139 3 8730 8730 400 92 9.3
## 2 61 15 35 203 138 13780 13780 500 84 5.7
## 3 53 16 24 246 18 8190 8190 500 100 7.0
## 4 55 16 44 249 869 7560 7560 800 76 11.9
## 5 100 5 25 300 50 7500 7500 500 45 7.5
## 6 100 19 49 309 639 11412 11412 500 89 8.3
set.seed(1)
grpeduc<-kmeans(educ[,-1],centers=8,nstart=10)
Cluseduc<-cbind(educ,cluster=grpeduc$cluster)
head(Cluseduc)
## College Public MathSAT VerbSAT ACT Received Accepted
## 1 Christendom College 0 568 568 23 81 72
## 2 Mount Vernon College 0 440 402 20 149 70
## 3 King's College 0 442 465 25 356 233
## 4 Alaska Pacific University 0 490 482 20 193 146
## 5 Sierra Nevada College 0 400 400 21 200 160
## 6 Trinity College 0 470 480 22 247 189
## Enrolled Pct10 Pct25 FTUG PTUG ISTuit OSTuit Books PhDs SFRatio cluster
## 1 51 33 71 139 3 8730 8730 400 92 9.3 8
## 2 61 15 35 203 138 13780 13780 500 84 5.7 3
## 3 53 16 24 246 18 8190 8190 500 100 7.0 8
## 4 55 16 44 249 869 7560 7560 800 76 11.9 8
## 5 100 5 25 300 50 7500 7500 500 45 7.5 8
## 6 100 19 49 309 639 11412 11412 500 89 8.3 3
o=order(grpeduc$cluster)
data.frame(educ$ISTuit[o],grpeduc$cluster[o])
## educ.ISTuit.o. grpeduc.cluster.o.
## 1 18800 1
## 2 17900 1
## 3 17865 1
## 4 17600 1
## 5 19528 1
## 6 18590 1
## 7 16404 1
## 8 17020 1
## 9 13380 1
## 10 17230 1
## 11 18420 1
## 12 1828 2
## 13 5040 2
## 14 1828 2
## 15 2984 2
## 16 2760 2
## 17 4103 2
## 18 840 2
## 19 840 2
## 20 13780 3
## 21 11412 3
## 22 11400 3
## 23 13290 3
## 24 13500 3
## 25 11550 3
## 26 11230 3
## 27 14500 3
## 28 13900 3
## 29 11200 3
## 30 11020 3
## 31 10800 3
## 32 10850 3
## 33 13970 3
## 34 10720 3
## 35 13000 3
## 36 13500 3
## 37 11790 3
## 38 13960 3
## 39 12900 3
## 40 11172 3
## 41 14340 3
## 42 13470 3
## 43 11200 3
## 44 10900 3
## 45 11138 3
## 46 12580 3
## 47 11850 3
## 48 13780 3
## 49 14067 3
## 50 12474 3
## 51 14210 3
## 52 11190 3
## 53 10760 3
## 54 12680 3
## 55 11320 3
## 56 13240 3
## 57 11280 3
## 58 13925 3
## 59 11859 3
## 60 11090 3
## 61 11850 3
## 62 11328 3
## 63 12660 3
## 64 10850 3
## 65 14320 3
## 66 11600 3
## 67 11718 3
## 68 11844 3
## 69 13252 3
## 70 13900 3
## 71 11660 3
## 72 13125 3
## 73 13404 3
## 74 12065 3
## 75 11120 3
## 76 12950 3
## 77 13540 3
## 78 12730 3
## 79 14200 3
## 80 13380 3
## 81 10570 3
## 82 10456 3
## 83 12247 3
## 84 12224 3
## 85 12247 3
## 86 10965 3
## 87 11910 3
## 88 11290 3
## 89 14360 3
## 90 11431 3
## 91 14550 3
## 92 13306 3
## 93 12140 3
## 94 11902 3
## 95 11130 3
## 96 12350 3
## 97 10300 3
## 98 13240 3
## 99 14125 3
## 100 13000 3
## 101 13312 3
## 102 13130 3
## 103 10800 3
## 104 10970 3
## 105 10995 3
## 106 11100 3
## 107 11520 3
## 108 14350 3
## 109 11750 3
## 110 11985 3
## 111 12825 3
## 112 11700 3
## 113 13420 3
## 114 10230 3
## 115 10700 3
## 116 10690 3
## 117 11712 3
## 118 13226 3
## 119 10870 3
## 120 11500 3
## 121 11380 3
## 122 11214 3
## 123 11600 3
## 124 11610 3
## 125 2650 4
## 126 2650 4
## 127 2650 4
## 128 2650 4
## 129 780 4
## 130 2650 4
## 131 3338 4
## 132 1492 4
## 133 1800 4
## 134 3060 4
## 135 2650 4
## 136 1548 4
## 137 1660 4
## 138 3231 4
## 139 2924 4
## 140 2110 4
## 141 1829 4
## 142 2208 4
## 143 2650 4
## 144 2802 4
## 145 1576 4
## 146 2406 4
## 147 2392 4
## 148 672 4
## 149 1494 4
## 150 780 4
## 151 740 4
## 152 764 4
## 153 1650 4
## 154 3234 4
## 155 1828 4
## 156 1889 4
## 157 2766 5
## 158 1695 5
## 159 1206 5
## 160 1716 5
## 161 3196 5
## 162 2033 5
## 163 764 5
## 164 3070 5
## 165 840 5
## 166 1954 5
## 167 672 5
## 168 1856 5
## 169 3030 5
## 170 3552 5
## 171 3927 5
## 172 2124 5
## 173 1830 5
## 174 2864 5
## 175 2100 5
## 176 2625 5
## 177 1742 5
## 178 2055 5
## 179 3171 5
## 180 14900 6
## 181 17238 6
## 182 15700 6
## 183 15476 6
## 184 15200 6
## 185 17688 6
## 186 17000 6
## 187 15747 6
## 188 15595 6
## 189 15036 6
## 190 16304 6
## 191 14990 6
## 192 15350 6
## 193 15248 6
## 194 16160 6
## 195 19960 6
## 196 16670 6
## 197 15688 6
## 198 18460 6
## 199 15360 6
## 200 19240 6
## 201 17480 6
## 202 19760 6
## 203 14800 6
## 204 17295 6
## 205 16975 6
## 206 16624 6
## 207 16425 6
## 208 16240 6
## 209 18930 6
## 210 24940 6
## 211 16732 6
## 212 19300 6
## 213 15948 6
## 214 15500 6
## 215 18920 6
## 216 18710 6
## 217 18200 6
## 218 15990 6
## 219 19670 6
## 220 19510 6
## 221 15192 6
## 222 19130 6
## 223 16230 6
## 224 15884 6
## 225 15000 6
## 226 14400 6
## 227 20100 6
## 228 5590 7
## 229 3120 7
## 230 5180 7
## 231 3605 7
## 232 3750 7
## 233 5600 7
## 234 4450 7
## 235 840 7
## 236 5400 7
## 237 5500 7
## 238 5190 7
## 239 5120 7
## 240 1856 7
## 241 2013 7
## 242 3660 7
## 243 3330 7
## 244 2320 7
## 245 608 7
## 246 3340 7
## 247 784 7
## 248 5188 7
## 249 2295 7
## 250 5224 7
## 251 1819 7
## 252 2064 7
## 253 2606 7
## 254 5504 7
## 255 3620 7
## 256 1434 7
## 257 2650 7
## 258 1644 7
## 259 1597 7
## 260 1828 7
## 261 1732 7
## 262 3030 7
## 263 672 7
## 264 2650 7
## 265 840 7
## 266 2040 7
## 267 2328 7
## 268 1697 7
## 269 2200 7
## 270 840 7
## 271 864 7
## 272 3510 7
## 273 2200 7
## 274 8730 8
## 275 8190 8
## 276 7560 8
## 277 7500 8
## 278 8438 8
## 279 6900 8
## 280 10300 8
## 281 5950 8
## 282 8064 8
## 283 7070 8
## 284 6500 8
## 285 9700 8
## 286 9950 8
## 287 8170 8
## 288 7950 8
## 289 8670 8
## 290 8955 8
## 291 6900 8
## 292 10475 8
## 293 8080 8
## 294 10430 8
## 295 10500 8
## 296 10194 8
## 297 7536 8
## 298 8280 8
## 299 7800 8
## 300 8242 8
## 301 8950 8
## 302 7150 8
## 303 7600 8
## 304 10430 8
## 305 8000 8
## 306 9600 8
## 307 9900 8
## 308 9570 8
## 309 9300 8
## 310 7600 8
## 311 9600 8
## 312 10260 8
## 313 8734 8
## 314 9400 8
## 315 9900 8
## 316 9500 8
## 317 8950 8
## 318 9520 8
## 319 10468 8
## 320 8578 8
## 321 9600 8
## 322 10390 8
## 323 10535 8
## 324 6930 8
## 325 7500 8
## 326 7850 8
## 327 10440 8
## 328 8050 8
## 329 8950 8
## 330 8800 8
## 331 6398 8
## 332 10500 8
## 333 9456 8
## 334 7700 8
## 335 10200 8
## 336 8550 8
## 337 9100 8
## 338 9384 8
## 339 8200 8
## 340 8300 8
## 341 8550 8
## 342 9000 8
## 343 9300 8
## 344 10100 8
## 345 8920 8
## 346 8650 8
## 347 8988 8
## 348 6950 8
## 349 9800 8
## 350 7470 8
## 351 9700 8
## 352 10100 8
## 353 8840 8
## 354 6400 8
## 355 7820 8
## 356 9990 8
## 357 10060 8
## 358 9650 8
## 359 9476 8
## 360 6230 8
## 361 9250 8
## 362 6300 8
## 363 6400 8
## 364 8025 8
## 365 8840 8
## 366 8180 8
## 367 9858 8
## 368 9450 8
## 369 7350 8
## 370 8840 8
## 371 9428 8
## 372 9160 8
## 373 10100 8
## 374 10176 8
## 375 8325 8
## 376 7800 8
## 377 7000 8
## 378 8678 8
## 379 9200 8
## 380 7344 8
## 381 9210 8
## 382 7950 8
## 383 6900 8
## 384 7050 8
## 385 7800 8
Creating the plot to show the different clusters.
summary(educ$ISTuit)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 608 4103 9600 9187 12820 24940
plot(educ$ISTuit,educ$ACT,type="n",xlim=c(3000,18000),xlab="In State Tuition",ylab="Average ACT Score")
text(x=educ$ISTuit,y=educ$ACT,labels=educ$College,col=rainbow(8)[grpeduc$cluster])
Here you can see a slight positive correlation between In State Tuition and Average ACT scores for accepted students. I would like to have a more spread out plot to clearer see the separations between the clusters, seeing as how this plot shows a lot of overlap between the two groups.