The data set contains 385 colleges that are listed along with characteristics for each school. The file was extremely abbreviated/ The following exercise uses cluster analysis to attempt to meet the objective of grouping colleges by how similar they are.
Educ<-read.csv("c:/users/abbey/Desktop/Data Mining/Educ4HW.csv")
head(Educ)
## College State Public MathSAT VerbSAT ACT Received
## 1 Christendom College VA 0 568 568 23 81
## 2 Mount Vernon College DC 0 440 402 20 149
## 3 King's College NY 0 442 465 25 356
## 4 Alaska Pacific University AK 0 490 482 20 193
## 5 Sierra Nevada College NV 0 400 400 21 200
## 6 Trinity College DC 0 470 480 22 247
## Accepted Enrolled Pct10 Pct25 FTUG PTUG ISTuit OSTuit Books PhDs SFRatio
## 1 72 51 33 71 139 3 8730 8730 400 92 9.3
## 2 70 61 15 35 203 138 13780 13780 500 84 5.7
## 3 233 53 16 24 246 18 8190 8190 500 100 7.0
## 4 146 55 16 44 249 869 7560 7560 800 76 11.9
## 5 160 100 5 25 300 50 7500 7500 500 45 7.5
## 6 189 100 19 49 309 639 11412 11412 500 89 8.3
K-means CLuster 6 different groups is used to differentiate the colleges. The following will list the colleges and put which cluster they are assigned to. In order to run the cluster, the college name and states are taken out when clustering it will focus mainly on the characteristics.
set.seed(1)
grpEduc <- kmeans(Educ[,c(-1,-2)], centers=6, nstart=10)##removes college name and state
clusEduc<-cbind(Educ, cluster = grpEduc$cluster) ##asigns each college a cluster
head(clusEduc)
## College State Public MathSAT VerbSAT ACT Received
## 1 Christendom College VA 0 568 568 23 81
## 2 Mount Vernon College DC 0 440 402 20 149
## 3 King's College NY 0 442 465 25 356
## 4 Alaska Pacific University AK 0 490 482 20 193
## 5 Sierra Nevada College NV 0 400 400 21 200
## 6 Trinity College DC 0 470 480 22 247
## Accepted Enrolled Pct10 Pct25 FTUG PTUG ISTuit OSTuit Books PhDs SFRatio
## 1 72 51 33 71 139 3 8730 8730 400 92 9.3
## 2 70 61 15 35 203 138 13780 13780 500 84 5.7
## 3 233 53 16 24 246 18 8190 8190 500 100 7.0
## 4 146 55 16 44 249 869 7560 7560 800 76 11.9
## 5 160 100 5 25 300 50 7500 7500 500 45 7.5
## 6 189 100 19 49 309 639 11412 11412 500 89 8.3
## cluster
## 1 6
## 2 2
## 3 6
## 4 6
## 5 6
## 6 6
The next code will then list the colleges by clustering them in ascending order by cluster number using the name of each college.
o=order(grpEduc$cluster)
data.frame(Educ$College[o],grpEduc$cluster[o])
## Educ.College.o. grpEduc.cluster.o.
## 1 Webber College 1
## 2 Kendall College 1
## 3 Bartlesville Wesleyan College 1
## 4 College of the Southwest 1
## 5 Tennessee Wesleyan College 1
## 6 Pacific Christian College 1
## 7 Huston-Tillotson College 1
## 8 Texas College 1
## 9 Concordia Lutheran College 1
## 10 Jarvis Christian College 1
## 11 Bennett College 1
## 12 Bluefield College 1
## 13 Voorhees College 1
## 14 Univ. of Texas of the Permian Basin 1
## 15 Dallas Baptist University 1
## 16 McMurry University 1
## 17 John Brown University 1
## 18 Mars Hill College 1
## 19 Freed-Hardeman University 1
## 20 Howard Payne University 1
## 21 Texas Wesleyan University 1
## 22 Cumberland College 1
## 23 Flagler College 1
## 24 Hardin-Simmons University 1
## 25 Oklahoma Christian University 1
## 26 Bluefield State College 1
## 27 Montana College of Mineral Sci. & Tech. MT 1
## 28 Castleton State College 1
## 29 University of Minnesota at Morris 1
## 30 University of South Carolina at Aiken 1
## 31 Spelman College 1
## 32 University of North Carolina at Asheville 1
## 33 Lander University 1
## 34 University of Texas at Dallas 1
## 35 Bethune Cookman College 1
## 36 Cedarville College 1
## 37 Univ. of South Carolina at Spartanburg 1
## 38 Grove City College 1
## 39 Western State College of Colorado 1
## 40 Shepherd College 1
## 41 Xavier University of Louisiana 1
## 42 Morehouse College 1
## 43 Longwood College 1
## 44 Harding University 1
## 45 Winthrop University 1
## 46 Mesa State College 1
## 47 SUNY College at Potsdam 1
## 48 University of Southern Colorado 1
## 49 University of West Florida 1
## 50 University of Colorado at Denver 1
## 51 University of North Florida 1
## 52 University of Missouri at Rolla 1
## 53 West Texas A&M University 1
## 54 SUNY College at Fredonia 1
## 55 Angelo State University 1
## 56 University of Southern Indiana 1
## 57 Salisbury State University 1
## 58 Troy State University at Troy 1
## 59 South Carolina State University 1
## 60 Prairie View A. and M. University 1
## 61 SUNY College at Plattsburgh 1
## 62 Lamar University 1
## 63 Michigan Technological University 1
## 64 Winona State University 1
## 65 Mount Vernon College 2
## 66 Albertus Magnus College 2
## 67 Wells College 2
## 68 Chatham College 2
## 69 Sweet Briar College 2
## 70 Scripps College 2
## 71 Stephens College 2
## 72 Randolph-Macon Woman's College 2
## 73 Monmouth College 2
## 74 Albertson College 2
## 75 Hood College 2
## 76 Bethany College 2
## 77 Antioch University 2
## 78 Ripon College 2
## 79 Pitzer College 2
## 80 Oglethorpe University 2
## 81 Cedar Crest College 2
## 82 Hollins College 2
## 83 Claremont McKenna College 2
## 84 Heidelberg College 2
## 85 Knox College 2
## 86 Albright College 2
## 87 Hiram College 2
## 88 Earlham College 2
## 89 Pacific University 2
## 90 Beloit College 2
## 91 Elmira College 2
## 92 Muskingum College 2
## 93 Coe College 2
## 94 University of the South 2
## 95 Cornell College 2
## 96 Simmons College 2
## 97 Reed College 2
## 98 Whitman College 2
## 99 Westmont College 2
## 100 Grinnell College 2
## 101 Wheaton College 2
## 102 Hamline University 2
## 103 Lycoming College 2
## 104 Eckerd College 2
## 105 Carthage College 2
## 106 Guilford College 2
## 107 California Lutheran University 2
## 108 University of La Verne 2
## 109 Kenyon College 2
## 110 Hartwick College 2
## 111 West Virginia Wesleyan College 2
## 112 Linfield College 2
## 113 Amherst College 2
## 114 Willamette University 2
## 115 Davidson College 2
## 116 Muhlenberg College 2
## 117 Chapman University 2
## 118 Rollins College 2
## 119 College of Wooster 2
## 120 Colby College 2
## 121 Franklin and Marshall College 2
## 122 Ohio Wesleyan University 2
## 123 Illinois Wesleyan University 2
## 124 Mount Holyoke College 2
## 125 Illinois Institute of Technology 2
## 126 Buena Vista College 2
## 127 Wittenberg University 2
## 128 Azusa Pacific University 2
## 129 Vassar College 2
## 130 Luther College 2
## 131 Gustavus Adolphus College 2
## 132 Skidmore College 2
## 133 Pepperdine University 2
## 134 Ohio Northern University 2
## 135 Gonzaga University 2
## 136 Pacific Lutheran University 2
## 137 Oberlin College 2
## 138 Butler University 2
## 139 Colgate University 2
## 140 University of Denver 2
## 141 Wesleyan University 2
## 142 University of Puget Sound 2
## 143 Worcester Polytechnic Institute 2
## 144 Saint Olaf College 2
## 145 Fairfield University 2
## 146 Seattle University 2
## 147 Drake University 2
## 148 Providence College 2
## 149 University of San Francisco 2
## 150 SUNY College at New Paltz 3
## 151 SUNY College at Cortland 3
## 152 SUNY College at Geneseo 3
## 153 Texas Southern University 3
## 154 SUNY College at Brockport 3
## 155 University of Maryland at Baltimore County 3
## 156 University of North Carolina at Wilmington 3
## 157 East Tennessee State University 3
## 158 College of Charleston 3
## 159 SUNY College at Oswego 3
## 160 University of Idaho 3
## 161 Marshall University 3
## 162 Cleveland State University 3
## 163 Radford University 3
## 164 University of North Dakota 3
## 165 University of Northern Colorado 3
## 166 Univ. of Wisconsin at Eau Claire 3
## 167 SUNY at Binghamton 3
## 168 Indiana State University 3
## 169 Montana State University 3
## 170 Western Washington University 3
## 171 Indiana Univ.-Purdue Univ. at Indianapolis 3
## 172 University of Southern Mississippi 3
## 173 University of Texas at San Antonio 3
## 174 Metropolitan State College 3
## 175 Stephen F. Austin State University 3
## 176 Appalachian State University 3
## 177 University of North Carolina at Charlotte 3
## 178 Florida International University 3
## 179 Utah State University 3
## 180 University of Texas at Arlington 3
## 181 University of Cincinnati 3
## 182 Northern Arizona University 3
## 183 University of Central Florida 3
## 184 University of South Carolina at Columbia 3
## 185 Oklahoma State University 3
## 186 California Polytechnic-San Luis 3
## 187 East Carolina University 3
## 188 Bowling Green State University 3
## 189 Southwest Texas State University 3
## 190 University of Utah 3
## 191 University of North Texas 3
## 192 University of South Florida 3
## 193 University of Missouri at Columbia 3
## 194 Ohio University 3
## 195 Kent State University 3
## 196 Colorado State University 3
## 197 University of Tennessee at Knoxville 3
## 198 Ball State University 3
## 199 Auburn University-Main Campus 3
## 200 Louisiana State University at Baton Rouge 3
## 201 San Diego State University 3
## 202 University of Nebraska at Lincoln 3
## 203 University of Minnesota Twin Cities 3
## 204 Johns Hopkins University 4
## 205 Carnegie Mellon University 4
## 206 Massachusetts Institute of Technology 4
## 207 Vanderbilt University 4
## 208 Emory University 4
## 209 Brown University 4
## 210 Duke University 4
## 211 Hofstra University 4
## 212 Northwestern University 4
## 213 University of Pennsylvania 4
## 214 Northeastern University 4
## 215 University of Southern California 4
## 216 Boston University 4
## 217 University of Arizona 5
## 218 University of Michigan at Ann Arbor 5
## 219 Arizona State University Main campus 5
## 220 Indiana University at Bloomington 5
## 221 University of Illinois - Urbana 5
## 222 Michigan State University 5
## 223 University of Texas at Austin 5
## 224 Texas A&M Univ. at College Station 5
## 225 Christendom College 6
## 226 King's College 6
## 227 Alaska Pacific University 6
## 228 Sierra Nevada College 6
## 229 Trinity College 6
## 230 Montreat-Anderson College 6
## 231 Saint Mary-of-the-Woods College 6
## 232 Notre Dame College of Ohio 6
## 233 Centenary College 6
## 234 Covenant College 6
## 235 Huntington College 6
## 236 St. Martin's College 6
## 237 Virginia Intermont College 6
## 238 Nyack College 6
## 239 Saint Francis College 6
## 240 Schreiner College 6
## 241 Center for Creative Studies 6
## 242 Salem College 6
## 243 Huntingdon College 6
## 244 Concordia College 6
## 245 Phillips University 6
## 246 Newberry College 6
## 247 Southwestern Adventist College 6
## 248 Maryville College 6
## 249 Queens College 6
## 250 Sterling College 6
## 251 Union College 6
## 252 Concordia University 6
## 253 Defiance College 6
## 254 North Carolina Wesleyan College 6
## 255 Avila College 6
## 256 Tiffin University 6
## 257 University of Dubuque 6
## 258 Kentucky Wesleyan College 6
## 259 Westminster College 6
## 260 Ursuline College 6
## 261 Fresno Pacific College 6
## 262 College of Mount St. Vincent 6
## 263 Northland College 6
## 264 Bethel College 6
## 265 Belhaven College 6
## 266 Keuka College 6
## 267 Briar Cliff College 6
## 268 Rocky Mountain College 6
## 269 Davis & Elkins College 6
## 270 Bluffton College 6
## 271 University of Charleston 6
## 272 Centenary College of Louisiana 6
## 273 Southern California College 6
## 274 Thiel College 6
## 275 Alderson-Broaddus College 6
## 276 Emory & Henry College 6
## 277 Concordia College 6
## 278 Saint Joseph's College 6
## 279 Franklin College 6
## 280 William Woods University 6
## 281 Transylvania University 6
## 282 College of Santa Fe 6
## 283 North Park College 6
## 284 Hilbert College 6
## 285 Texas Lutheran College 6
## 286 Manchester College 6
## 287 Illinois College 6
## 288 Mount Mary College 6
## 289 Centre College 6
## 290 Hendrix College 6
## 291 Wheeling Jesuit College 6
## 292 Tri-State University 6
## 293 Tusculum College 6
## 294 Marymount Manhattan College 6
## 295 University of St. Thomas 6
## 296 Graceland College 6
## 297 Cazenovia College 6
## 298 Hanover College 6
## 299 Mount St. Mary's College 6
## 300 Central Wesleyan College 6
## 301 Missouri Valley College 6
## 302 Eastern College 6
## 303 University of Dallas 6
## 304 Dowling College 6
## 305 Wofford College 6
## 306 Christian Brothers University 6
## 307 Quincy University 6
## 308 Millsaps College 6
## 309 D'Youville College 6
## 310 Carroll College 6
## 311 Asbury College 6
## 312 Austin College 6
## 313 Presbyterian College 6
## 314 Hillsdale College 6
## 315 College of Mount St. Joseph 6
## 316 Mount Saint Mary College 6
## 317 Point Park College 6
## 318 Southwestern University 6
## 319 Lenoir-Rhyne College 6
## 320 Bellarmine College 6
## 321 Wingate College 6
## 322 Northwestern College 6
## 323 Houghton College 6
## 324 Whitworth College 6
## 325 Nazareth College of Rochester 6
## 326 William Jewell College 6
## 327 Wartburg College 6
## 328 Franciscan University of Steubenville 6
## 329 Geneva College 6
## 330 North Central College 6
## 331 Maryville University 6
## 332 College of St. Scholastica 6
## 333 Birmingham-Southern College 6
## 334 Westminster College 6
## 335 University of Indianapolis 6
## 336 Saint Mary's College 6
## 337 Florida Southern College 6
## 338 LeTourneau University 6
## 339 Gardner Webb University 6
## 340 Malone College 6
## 341 Northwood University 6
## 342 St. Joseph's College 6
## 343 Incarnate Word College 6
## 344 St. John Fisher College 6
## 345 St. Bonaventure University 6
## 346 Saint John's University 6
## 347 College of Saint Catherine 6
## 348 College of Saint Benedict 6
## 349 St. Edward's University 6
## 350 Webster University 6
## 351 Taylor University 6
## 352 Millikin University 6
## 353 Barry University 6
## 354 Anderson University 6
## 355 Ashland University 6
## 356 Point Loma Nazarene College 6
## 357 Savannah Coll. of Art and Design 6
## 358 St. Norbert College 6
## 359 Belmont University 6
## 360 Augsburg College 6
## 361 University of Detroit Mercy 6
## 362 St. Mary's University of San Antonio 6
## 363 Lindenwood College 6
## 364 Widener University 6
## 365 Messiah College 6
## 366 Indiana Wesleyan University 6
## 367 Oral Roberts University 6
## 368 Siena College 6
## 369 Gannon University 6
## 370 Baldwin-Wallace College 6
## 371 Loyola University 6
## 372 Xavier University 6
## 373 University of Tulsa 6
## 374 Mercer University 6
## 375 John Carroll University 6
## 376 Calvin College 6
## 377 Marist College 6
## 378 Iona College 6
## 379 University of St. Thomas 6
## 380 Bradley University 6
## 381 Embry Riddle Aeronautical University 6
## 382 Loyola University Chicago 6
## 383 University of Dayton 6
## 384 DePaul University 6
## 385 Marquette University 6
Next, a summarization of the In-State Tuition data with Math SAT scores.
summary(Educ$MathSAT)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 330.0 474.0 516.0 515.6 554.0 742.0
This tell us that the Math SAT scores range from 330 to 742.
Next is a visualiztion. On the graph, we will plot In-state tuition against Math SAT score data to see if there is visually a relationship. We will also see how the groups are split up on based on the colors each group is clustered into based on both varibles. The state abbreviation for the instituion will be used in order to make the graph more legible.
plot(Educ$ISTuit,Educ$MathSAT,type="n",xlim=c(600,25000),xlab="In-State Tuition", ylab="Math SAT")
text(x=Educ$ISTuit, y=Educ$MathSAT, labels=Educ$State, col=rainbow(6)[grpEduc$cluster])
There is visible seperation between green, purple and yellow. The light blue, red, and dark blue mildy overlap. This suggests we can decrease our clusters to three. It does seem that those with a higher Math SAT score pay higher tuition.
Does Math SAT scores identify these groups in a variance model?
m1 <- aov(MathSAT ~ as.factor(cluster), data=clusEduc)
summary(m1)
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(cluster) 5 489632 97926 32.69 <2e-16 ***
## Residuals 379 1135359 2996
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is very small which means that there is a significant difference between these six groups based on Math SAT scores. But the f-value is low.
Now we can compare each mean of the cluster for each different characteristic.
grpEduc$centers
## Public MathSAT VerbSAT ACT Received Accepted Enrolled
## 1 0.50000000 479.6719 436.9531 20.93750 1548.281 1067.7031 503.2812
## 2 0.01176471 553.3765 507.3647 24.61176 1810.671 1247.8706 404.0353
## 3 0.98148148 511.9074 456.6481 22.27778 6370.667 4363.1667 2128.8519
## 4 0.00000000 644.3846 564.8462 27.61538 10978.308 5620.4615 1806.0769
## 5 1.00000000 566.6250 490.1250 24.62500 15613.250 11946.0000 5332.7500
## 6 0.00000000 498.2360 458.9441 22.32919 1063.410 850.6584 322.8820
## Pct10 Pct25 FTUG PTUG ISTuit OSTuit Books
## 1 20.56250 46.82812 2297.000 680.0625 3911.781 5922.531 535.2188
## 2 36.47059 66.09412 1580.976 227.1294 15471.082 15471.082 531.2588
## 3 22.24074 53.79630 10753.741 3422.4815 2127.685 6677.074 569.8704
## 4 65.84615 86.00000 7520.462 1481.6154 17264.385 17264.385 629.5385
## 5 39.50000 73.12500 25598.375 3694.1250 2527.875 8605.500 581.2500
## 6 22.66460 50.02484 1400.565 490.5590 10011.745 10011.758 558.0745
## PhDs SFRatio
## 1 62.53125 16.29687
## 2 81.31765 12.10118
## 3 80.48148 17.90370
## 4 90.76923 8.40000
## 5 88.25000 18.18750
## 6 63.96894 14.09689
The Math SAT scores means are seperated fairly apart compared to the ACT scores which are only a few. Even though they are faily seperated, there cluster 1 and cluster 6 seem to be a little closer than the others. If we minimze for only 3 total cluster this may fix this problem.
Can’t stop there lets true some hierarchical clustering as well. In order to do this let’s remove the states and college institution names.
educClus<-Educ[,c(-1,-2)]
head(educClus)
## Public MathSAT VerbSAT ACT Received Accepted Enrolled Pct10 Pct25 FTUG
## 1 0 568 568 23 81 72 51 33 71 139
## 2 0 440 402 20 149 70 61 15 35 203
## 3 0 442 465 25 356 233 53 16 24 246
## 4 0 490 482 20 193 146 55 16 44 249
## 5 0 400 400 21 200 160 100 5 25 300
## 6 0 470 480 22 247 189 100 19 49 309
## PTUG ISTuit OSTuit Books PhDs SFRatio
## 1 3 8730 8730 400 92 9.3
## 2 138 13780 13780 500 84 5.7
## 3 18 8190 8190 500 100 7.0
## 4 869 7560 7560 800 76 11.9
## 5 50 7500 7500 500 45 7.5
## 6 639 11412 11412 500 89 8.3
Using Euclidean distance is used to form groups of close individuals.
EducDis<-dist(educClus,method="euclidean",diag=FALSE,upper=FALSE)
uc1<-hclust(EducDis,method="complete")
plot(uc1,hang=0.1,main="Cluster Dendogram")
uc2<-hclust(EducDis,method="centroid")
plot(uc2,hang=0.1,main="Cluster Dendogram")
These graphs are extremely difficult to read when approaching the bottom. To fix this problem we can try doing 3 clusters instead of 6.
set.seed(1)
grpEduc <- kmeans(Educ[,c(-1,-2)], centers=3, nstart=10)##removes state and college when clustering.
clusEduc<-cbind(Educ,cluster=grpEduc$cluster)
plot(Educ$ISTuit, Educ$MathSAT, type="n", xlim=c(600,25000),xlab="In-State Tuition", ylab="Math SAT")
text(x=Educ$ISTuit, y=Educ$MathSAT, labels=Educ$State, col=rainbow(6)[grpEduc$cluster])
Although we get some yellow markers mixed in with green and red, this graph shows distinction between the three cluster groups more clearly than having 6 different clusters.
m3 <- aov(MathSAT ~ as.factor(cluster), data=clusEduc)
summary(m3)
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(cluster) 2 431959 215979 69.16 <2e-16 ***
## Residuals 382 1193032 3123
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The new table for Math SAT scores has a higher F value. The p-value still remains statistically significant. Additional clustering could be made to try and fix the overlapping of the different clusters. Also, another characteristic beside Math SAT scores might better differenciate the clusters. So this could be taken into account for a futher analysis.