Exercise 1
Read in data
boston = read.csv("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/boston6k.csv")
Load in packages
Subset to variables needed and scale subset
boston2 = boston[c("CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT")]
boston2= scale(boston2)
boston2.s = scale(boston2)
Run K-means cluster analysis for 2 to 20 clusters
sil.out <- rep(NA,20)
for (i in 2:20) {
boston2.kmeans = kmeans(boston2, centers = i, nstart = 50)
tmp = silhouette(boston2.kmeans$cluster, dist(boston2))
sil.out[i] = mean(tmp[,3])
}
Plot silhouette index
plot(1:20, sil.out, type = 'b', lwd = 2, xlab = "N Groups", ylab = "Index Values", main = "Silhouette Index Values")
2 clusters gives the best solution but two clusters is not enough data for a place like Boston, so more clusters are preferable in this case. 6 is the next best cluster according to the plot so I will use 6 clusters instead of 2 for the next step.
Re-run K means using 6 clusters
ngrp = 6
boston2.kmeans <- kmeans(boston2, ngrp, nstart = 50, iter.max = 20)
table(boston2.kmeans$cluster)
##
## 1 2 3 4 5 6
## 35 34 215 83 50 89
Using the aggregate() function, provide a table showing the median the variables used in clustering
boston2.aggregate = aggregate(boston2.s, list(boston2.kmeans$cluster), median)
boston2.aggregate
## Group.1 CRIM ZN INDUS CHAS NOX RM
## 1 1 0.9531748 -0.4872402 1.0149946 -0.2723291 1.07272553 -0.11760941
## 2 2 -0.3638042 -0.4872402 0.4013236 3.6647712 -0.04051738 -0.07633515
## 3 3 -0.4032966 -0.4872402 -0.5927944 -0.2723291 -0.41159834 -0.20015792
## 4 4 -0.4151014 1.8710023 -1.1219217 -0.2723291 -1.09335175 0.61678770
## 5 5 -0.2940133 -0.4872402 1.5674443 -0.2723291 0.59808708 -0.57660760
## 6 6 0.5388063 -0.4872402 1.0149946 -0.2723291 1.19354259 -0.24285543
## AGE DIS RAD TAX PTRATIO B
## 1 0.87836941 -0.86756608 1.6596029 1.52941294 0.80577843 -3.5229148
## 2 0.70962369 -0.39245320 -0.5224844 -0.60068166 -0.39517558 0.3713897
## 3 -0.02752869 0.06608569 -0.6373311 -0.66594918 -0.02565127 0.4032644
## 4 -1.34196906 1.53889050 -0.6373311 -0.61848189 -0.85708096 0.3968018
## 5 0.99560328 -0.91291895 -0.6373311 -0.03107419 0.29768250 0.2322800
## 6 0.95297278 -0.87563937 1.6596029 1.52941294 0.80577843 0.3731422
## LSTAT
## 1 0.9980220
## 2 -0.2328874
## 3 -0.4163335
## 4 -0.9918782
## 5 0.4637877
## 6 0.7109499
Clusters with higher values are indicating that the variable in the cluster is above the median. In cluster 1 there is a high crime rate, proportion of non-retail business acres per town, Nitric oxide concentration, and Index of accessibility to radial highways per town. For ZN (Proportion of residential land zoned for lots over 25000 sq. ft), the clusters are all negative except for cluster 6 which shows a value of ZN above the median.
Mean corrected house value per cluster:
bos_kmeans <- kmeans(boston2, 6)
tapply(boston$CMEDV, boston2.kmeans$cluster, mean)
## 1 2 3 4 5 6
## 12.87143 27.80588 23.98465 29.03735 19.55200 16.39663
Use anova() to test whether the values are significantly different between clusters. First create vector of house prices/values and the vector of clusters from kmeans.
boston.lm = lm(boston$CMEDV ~ boston2.kmeans$cluster)
boston.lm
##
## Call:
## lm(formula = boston$CMEDV ~ boston2.kmeans$cluster)
##
## Coefficients:
## (Intercept) boston2.kmeans$cluster
## 25.7682 -0.8794
Run anova
anova(boston.lm)
## Analysis of Variance Table
##
## Response: boston$CMEDV
## Df Sum Sq Mean Sq F value Pr(>F)
## boston2.kmeans$cluster 1 790 789.83 9.526 0.002137 **
## Residuals 504 41788 82.91
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F statistic is 0.8427 and the p value is 0.3591.
Exercise 2
Read in data
wnaclim2 = read.csv("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/wnaclim2.csv")
Extract temperature and precipitation
wnaclim = wnaclim2[, seq(3,26)]
Use the procomp function and scale the data
wnaclim.pca <- prcomp(wnaclim, scale = TRUE)
Make a biplot
biplot(wnaclim.pca)
Make a screeplot
screeplot(wnaclim.pca)
Summary
summary(wnaclim.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.4612 2.8065 1.5846 0.84733 0.63986 0.51316 0.29261
## Proportion of Variance 0.4992 0.3282 0.1046 0.02992 0.01706 0.01097 0.00357
## Cumulative Proportion 0.4992 0.8274 0.9320 0.96190 0.97896 0.98993 0.99350
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.20707 0.17437 0.15014 0.13534 0.10798 0.09353 0.08322
## Proportion of Variance 0.00179 0.00127 0.00094 0.00076 0.00049 0.00036 0.00029
## Cumulative Proportion 0.99528 0.99655 0.99749 0.99825 0.99874 0.99910 0.99939
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.06116 0.05676 0.04584 0.04161 0.03439 0.02946 0.02562
## Proportion of Variance 0.00016 0.00013 0.00009 0.00007 0.00005 0.00004 0.00003
## Cumulative Proportion 0.99955 0.99968 0.99977 0.99984 0.99989 0.99993 0.99995
## PC22 PC23 PC24
## Standard deviation 0.02411 0.01802 0.01392
## Proportion of Variance 0.00002 0.00001 0.00001
## Cumulative Proportion 0.99998 0.99999 1.00000
Give the total variance from the second PCA. Calculate the total percentage of variance explained by axes 1 and 2: The total variance from the second PCA is 32.82%. The Total percentage of variance explained by axes 1 and 2 is 82.74%
Examine the ‘loadings’ of the variables on the first two axes
wnaclim.pca$rotation
## PC1 PC2 PC3 PC4 PC5
## tjan 0.27154635 -0.07007925 0.080386417 -0.1634473814 0.29901716
## tfeb 0.27524689 -0.07830351 0.051384208 -0.0792929685 0.26587174
## tmar 0.27484658 -0.09367701 -0.001067899 0.0006366441 0.19086484
## tapr 0.26473010 -0.12257641 -0.076298000 0.1052955242 0.03981628
## tmay 0.23877755 -0.16889072 -0.127271902 0.1251598792 -0.21848109
## tjun 0.20220435 -0.21701738 -0.139841033 0.0811805985 -0.44268300
## tjul 0.20390908 -0.22851811 -0.090041235 0.0981226254 -0.35337585
## taug 0.22863878 -0.20271341 -0.057726655 0.1081761983 -0.22777782
## tsep 0.25186014 -0.16881584 -0.031188367 -0.0252319485 -0.11100418
## toct 0.26714245 -0.12423680 -0.014213457 0.0039586317 0.12864795
## tnov 0.27214789 -0.09552584 0.029032040 -0.0776651476 0.26276447
## tdec 0.27301687 -0.07380880 0.068758839 -0.1472904833 0.28523902
## pjan 0.16803736 0.25995212 0.198293078 -0.0491302786 -0.14878411
## pfeb 0.17102216 0.26393484 0.174815508 -0.0779200398 -0.13546314
## pmar 0.17796187 0.25650454 0.169544577 -0.0684787805 -0.08941670
## papr 0.17074339 0.27824680 0.066073520 0.0873621609 -0.01709438
## pmay 0.14748106 0.25905950 -0.096201569 0.4454588155 0.21475276
## pjun 0.05730680 0.23790590 -0.271290553 0.6551603362 0.14238968
## pjul 0.01369406 0.10097126 -0.573535723 -0.1391139500 0.07054361
## paug 0.03206302 0.14782502 -0.519764989 -0.4064516753 0.01206271
## psep 0.10354548 0.26528778 -0.300542026 -0.1481576621 -0.04780745
## poct 0.13939969 0.28759576 -0.081905529 -0.1479311802 -0.16441200
## pnov 0.16303542 0.27829605 0.141217106 -0.0566592085 -0.16857230
## pdec 0.16860662 0.26639834 0.170927455 -0.0622112513 -0.15532094
## PC6 PC7 PC8 PC9 PC10
## tjan -0.0284108206 0.006685677 0.150748941 -0.03370819 0.295282973
## tfeb 0.0003503783 0.136618117 0.096325727 0.05374427 0.146770345
## tmar 0.0472553831 0.299836795 0.107879864 0.12796171 -0.002868388
## tapr 0.1142053327 0.361840941 0.005051047 0.18027931 -0.245606706
## tmay 0.1162681105 0.391610249 0.152998679 0.14131428 -0.153895952
## tjun 0.0336846593 0.059413350 0.153361302 0.08729649 0.185251391
## tjul -0.0968873526 -0.282122219 0.078633470 -0.02281332 0.111072241
## taug -0.1100855056 -0.269202717 -0.078353138 -0.16060487 0.002348289
## tsep -0.0494334800 -0.197844681 -0.135371606 -0.17154105 0.019210005
## toct -0.0108001461 -0.228312237 -0.393469988 -0.13609013 -0.315583081
## tnov -0.0341735253 -0.123895015 -0.174524477 -0.10198795 -0.168381612
## tdec -0.0376637342 -0.070659399 0.019616400 -0.04699441 0.196876851
## pjan 0.2458067285 0.032089376 -0.048645619 -0.21338581 0.001100698
## pfeb 0.1820140769 -0.040185881 -0.011301000 0.07554311 -0.126403221
## pmar 0.2288319443 -0.177170757 0.160725487 0.13544683 -0.309717304
## papr -0.0925172437 -0.305189314 0.271689204 0.39781282 -0.204745692
## pmay -0.1948655497 -0.258064679 0.241986147 0.13372542 0.183453709
## pjun 0.1213474134 0.141181422 -0.086059500 -0.36227611 0.043038183
## pjul 0.4811346946 -0.176731790 -0.383344278 0.38567024 0.178533100
## paug 0.0604580646 -0.072069255 0.509760728 -0.38851386 -0.032443473
## psep -0.4843787725 0.196824537 -0.149665107 -0.09624515 -0.407497122
## poct -0.4759093700 0.159625862 -0.266370371 0.26037169 0.346421923
## pnov 0.0644439203 0.128788161 -0.117454901 -0.17604085 0.237115136
## pdec 0.1871590345 0.094644125 -0.105030745 -0.21895144 0.170211499
## PC11 PC12 PC13 PC14 PC15
## tjan -0.3177818979 0.084717452 -0.198073796 0.0306128664 -0.12524773
## tfeb -0.0850920746 0.207170982 0.215049963 -0.0248132032 0.07177408
## tmar 0.0837891432 0.151198192 0.338194830 -0.1213396869 -0.04343164
## tapr 0.2429592526 -0.018873516 0.244346289 0.0007484506 0.04381860
## tmay 0.0431667637 -0.232783794 -0.291244671 0.0480458427 -0.18616504
## tjun -0.0987876247 -0.033840012 -0.422444222 -0.0105642695 0.31526141
## tjul -0.0486904501 0.283065952 0.298939526 -0.0044454340 0.14450174
## taug -0.0152404625 0.136111286 0.291708371 -0.0496293283 -0.21486141
## tsep -0.0444107836 -0.120942065 -0.094460420 0.0209592856 -0.41993874
## toct 0.2093561076 -0.231921667 -0.011130377 0.0997939913 0.27607606
## tnov 0.1006998972 -0.233238733 -0.174605090 -0.0378639293 0.02149880
## tdec -0.1909224250 0.002929791 -0.196357064 0.0633928569 0.15143733
## pjan -0.0005762763 0.039714713 0.006401917 -0.3431345819 0.49814932
## pfeb -0.1595286785 0.149113176 0.045518203 0.2980140260 0.07726715
## pmar -0.0295711250 0.194223506 -0.064458463 0.5326818242 -0.13153712
## papr -0.3048488853 -0.403344025 0.133363364 -0.4467304855 -0.08102153
## pmay 0.5520813847 0.203821338 -0.246609418 0.0048849326 0.02882245
## pjun -0.4115111702 -0.123357939 0.112057802 0.1724675213 0.02767334
## pjul -0.0309151580 0.137303662 -0.039627958 -0.1174288399 -0.07339510
## paug 0.1609697226 -0.224188960 0.158706743 0.0989438683 0.06477934
## psep -0.1703085895 0.441341096 -0.234474281 -0.2164132305 -0.05426607
## poct 0.0248821147 -0.304950065 0.205141127 0.3425036906 0.16345751
## pnov 0.1821268769 -0.095799139 0.015392954 -0.1184016248 -0.41165840
## pdec 0.1898709898 0.053178843 -0.055907092 -0.2039300913 -0.10002376
## PC16 PC17 PC18 PC19 PC20
## tjan 0.148674822 -0.107041343 0.160823567 -0.0269825468 0.4272858093
## tfeb -0.017002651 0.234367308 -0.030332072 0.0627361960 0.0203446585
## tmar -0.035352024 -0.000212998 -0.023742455 -0.0452842790 -0.2687836322
## tapr 0.022030975 0.145866305 0.006613803 -0.0184863673 -0.0467386911
## tmay 0.142255714 -0.298105316 0.126282045 0.0006562435 0.3387549738
## tjun -0.160935797 0.249990410 -0.191135211 0.0289864553 -0.2723565962
## tjul -0.238251861 -0.162051895 -0.033972474 0.1319965479 0.2737881475
## taug 0.106065552 -0.250950490 0.035001101 -0.1434759326 -0.0549829727
## tsep 0.390707549 0.327566676 0.244781447 -0.0916673160 -0.3538791803
## toct 0.018144251 0.349455508 -0.075056635 0.0788151267 0.4142142645
## tnov -0.264971834 -0.588989556 -0.143864858 0.0332551632 -0.3245451022
## tdec -0.157870134 0.092966911 -0.097722453 0.0092514596 -0.1416159968
## pjan 0.206879187 -0.141990574 0.530872235 0.0890666735 -0.0947536147
## pfeb 0.530999363 -0.151466383 -0.533289719 0.2207678709 -0.0675137256
## pmar -0.380592388 0.074655676 0.335006369 -0.1362476881 -0.0616168429
## papr -0.056901612 0.106755938 -0.084786785 -0.0375141498 0.0485745440
## pmay 0.177758906 -0.022014450 0.025024070 0.0376327473 -0.0093294644
## pjun -0.094904165 0.013377094 -0.016298342 -0.0261197497 -0.0346808089
## pjul -0.005543386 -0.049203713 0.044870954 0.0112968029 0.0096717843
## paug 0.047348123 0.015708070 -0.052098593 0.0046396787 0.0009636633
## psep -0.040657537 0.041970363 -0.011347466 0.0246969815 0.0132694536
## poct 0.027120678 -0.090538295 0.148875670 -0.1375823255 -0.0299283689
## pnov -0.283906262 0.105283956 -0.052183791 0.6219718414 0.0532889852
## pdec -0.122022282 0.048156966 -0.325075890 -0.6730608587 0.1687923951
## PC21 PC22 PC23 PC24
## tjan 0.110569448 -0.174362955 0.466461484 0.1410538810
## tfeb 0.411083337 0.195963582 -0.553989382 0.3303973610
## tmar 0.190517645 -0.117562888 0.219088103 -0.6511559490
## tapr -0.404433770 -0.108135234 0.270516403 0.5155541679
## tmay -0.037106821 0.143632598 -0.348991274 -0.1986853412
## tjun 0.254278010 0.063546212 0.239743583 0.0692545700
## tjul -0.179763796 -0.459394939 -0.200353045 -0.0523957985
## taug 0.026308260 0.644090923 0.223105697 0.0795298706
## tsep -0.041652632 -0.338752088 -0.179241161 -0.0007117746
## toct 0.159245830 0.068434255 0.090617300 -0.2078886850
## tnov 0.183602516 -0.198760485 -0.051768256 0.1951652783
## tdec -0.671182075 0.274168418 -0.167825018 -0.2131251230
## pjan -0.025665936 0.023245016 -0.023533194 -0.0018978335
## pfeb -0.017169890 -0.043847725 0.001437087 -0.0140109546
## pmar 0.073977731 0.042827913 0.030519916 -0.0138651710
## papr -0.008348023 0.008701196 -0.001837501 0.0129567032
## pmay -0.007890496 0.009175983 -0.014398707 -0.0095682867
## pjun 0.019589810 -0.050004246 -0.001155038 -0.0001735104
## pjul -0.003853114 0.023997896 -0.014147625 -0.0044508995
## paug 0.004197034 0.012388955 0.011866533 0.0056221737
## psep -0.017989679 -0.007676584 -0.001546521 -0.0008137329
## poct 0.026887105 -0.009846198 -0.006560772 0.0004372792
## pnov -0.015092274 0.074152159 0.081791929 0.0084156928
## pdec -0.018484850 -0.087548286 -0.071983057 0.0171497426
Two variables that are highly associated with axis 1 are tfeb = 0.27524689 and tmar = 0.27484658. Two variables that are highly associated with axis 2 are poct = 0.28759576 and pnov = 0.27829605.
Produce a map of the sites scores on axis 1
wnaclim.pca.score = wnaclim.pca$x[, 1]
quilt.plot(wnaclim2$Longitude, wnaclim2$Latitude, wnaclim.pca.score)
world(add = TRUE)
This map shows temperature in western North America. Higher temperature values are found on the coast and decrease as the longitude increases. Lower temperature values are found in northern Canada and Alaska. These values make sense in terms of the geography and climate in North America as it tends to be hot on the coast and interior of North America and very cold in Alaska and Northern Canada.
Produce a map of the sites scores on axis 2
wnaclim.pca.score = wnaclim.pca$x[, 2]
quilt.plot(wnaclim2$Longitude, wnaclim2$Latitude, wnaclim.pca.score)
world(add = TRUE)
This map shows precipitation across western North America. The coasts have higher precipitation values and there are low precipitation values across the interior of the US, Canada, and Alaska. The interior of western North America does not get as much precipitation as the West Coast of North America so this map does make sense.