Exercise 1

Read in data

boston = read.csv("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/boston6k.csv")

Load in packages

Subset to variables needed and scale subset

boston2 = boston[c("CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "DIS", "RAD", "TAX", "PTRATIO", "B", "LSTAT")]

boston2= scale(boston2)
boston2.s = scale(boston2)

Run K-means cluster analysis for 2 to 20 clusters

sil.out <- rep(NA,20) 
for (i in 2:20) {
  boston2.kmeans = kmeans(boston2, centers = i, nstart = 50)
  tmp = silhouette(boston2.kmeans$cluster, dist(boston2))
  sil.out[i] = mean(tmp[,3])
}

Plot silhouette index

plot(1:20, sil.out, type = 'b', lwd = 2, xlab = "N Groups", ylab = "Index Values", main = "Silhouette Index Values")

2 clusters gives the best solution but two clusters is not enough data for a place like Boston, so more clusters are preferable in this case. 6 is the next best cluster according to the plot so I will use 6 clusters instead of 2 for the next step.

Re-run K means using 6 clusters

ngrp = 6
boston2.kmeans <- kmeans(boston2, ngrp, nstart = 50, iter.max = 20)
table(boston2.kmeans$cluster) 
## 
##   1   2   3   4   5   6 
##  35  34 215  83  50  89

Using the aggregate() function, provide a table showing the median the variables used in clustering

boston2.aggregate = aggregate(boston2.s, list(boston2.kmeans$cluster), median)
boston2.aggregate
##   Group.1       CRIM         ZN      INDUS       CHAS         NOX          RM
## 1       1  0.9531748 -0.4872402  1.0149946 -0.2723291  1.07272553 -0.11760941
## 2       2 -0.3638042 -0.4872402  0.4013236  3.6647712 -0.04051738 -0.07633515
## 3       3 -0.4032966 -0.4872402 -0.5927944 -0.2723291 -0.41159834 -0.20015792
## 4       4 -0.4151014  1.8710023 -1.1219217 -0.2723291 -1.09335175  0.61678770
## 5       5 -0.2940133 -0.4872402  1.5674443 -0.2723291  0.59808708 -0.57660760
## 6       6  0.5388063 -0.4872402  1.0149946 -0.2723291  1.19354259 -0.24285543
##           AGE         DIS        RAD         TAX     PTRATIO          B
## 1  0.87836941 -0.86756608  1.6596029  1.52941294  0.80577843 -3.5229148
## 2  0.70962369 -0.39245320 -0.5224844 -0.60068166 -0.39517558  0.3713897
## 3 -0.02752869  0.06608569 -0.6373311 -0.66594918 -0.02565127  0.4032644
## 4 -1.34196906  1.53889050 -0.6373311 -0.61848189 -0.85708096  0.3968018
## 5  0.99560328 -0.91291895 -0.6373311 -0.03107419  0.29768250  0.2322800
## 6  0.95297278 -0.87563937  1.6596029  1.52941294  0.80577843  0.3731422
##        LSTAT
## 1  0.9980220
## 2 -0.2328874
## 3 -0.4163335
## 4 -0.9918782
## 5  0.4637877
## 6  0.7109499

Clusters with higher values are indicating that the variable in the cluster is above the median. In cluster 1 there is a high crime rate, proportion of non-retail business acres per town, Nitric oxide concentration, and Index of accessibility to radial highways per town. For ZN (Proportion of residential land zoned for lots over 25000 sq. ft), the clusters are all negative except for cluster 6 which shows a value of ZN above the median.

Mean corrected house value per cluster:

bos_kmeans <- kmeans(boston2, 6)
tapply(boston$CMEDV, boston2.kmeans$cluster, mean)
##        1        2        3        4        5        6 
## 12.87143 27.80588 23.98465 29.03735 19.55200 16.39663

Use anova() to test whether the values are significantly different between clusters. First create vector of house prices/values and the vector of clusters from kmeans.

boston.lm = lm(boston$CMEDV ~ boston2.kmeans$cluster)
boston.lm
## 
## Call:
## lm(formula = boston$CMEDV ~ boston2.kmeans$cluster)
## 
## Coefficients:
##            (Intercept)  boston2.kmeans$cluster  
##                25.7682                 -0.8794

Run anova

anova(boston.lm)
## Analysis of Variance Table
## 
## Response: boston$CMEDV
##                         Df Sum Sq Mean Sq F value   Pr(>F)   
## boston2.kmeans$cluster   1    790  789.83   9.526 0.002137 **
## Residuals              504  41788   82.91                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The F statistic is 0.8427 and the p value is 0.3591.

Exercise 2

Read in data

wnaclim2 = read.csv("/Users/jessieeastburn/Documents/Fall 2021/GEOG 6000/datafiles/wnaclim2.csv")

Extract temperature and precipitation

wnaclim = wnaclim2[, seq(3,26)]

Use the procomp function and scale the data

wnaclim.pca <- prcomp(wnaclim, scale = TRUE)

Make a biplot

biplot(wnaclim.pca)

Make a screeplot

screeplot(wnaclim.pca)

Summary

summary(wnaclim.pca)
## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.4612 2.8065 1.5846 0.84733 0.63986 0.51316 0.29261
## Proportion of Variance 0.4992 0.3282 0.1046 0.02992 0.01706 0.01097 0.00357
## Cumulative Proportion  0.4992 0.8274 0.9320 0.96190 0.97896 0.98993 0.99350
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.20707 0.17437 0.15014 0.13534 0.10798 0.09353 0.08322
## Proportion of Variance 0.00179 0.00127 0.00094 0.00076 0.00049 0.00036 0.00029
## Cumulative Proportion  0.99528 0.99655 0.99749 0.99825 0.99874 0.99910 0.99939
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.06116 0.05676 0.04584 0.04161 0.03439 0.02946 0.02562
## Proportion of Variance 0.00016 0.00013 0.00009 0.00007 0.00005 0.00004 0.00003
## Cumulative Proportion  0.99955 0.99968 0.99977 0.99984 0.99989 0.99993 0.99995
##                           PC22    PC23    PC24
## Standard deviation     0.02411 0.01802 0.01392
## Proportion of Variance 0.00002 0.00001 0.00001
## Cumulative Proportion  0.99998 0.99999 1.00000

Give the total variance from the second PCA. Calculate the total percentage of variance explained by axes 1 and 2: The total variance from the second PCA is 32.82%. The Total percentage of variance explained by axes 1 and 2 is 82.74%

Examine the ‘loadings’ of the variables on the first two axes

wnaclim.pca$rotation
##             PC1         PC2          PC3           PC4         PC5
## tjan 0.27154635 -0.07007925  0.080386417 -0.1634473814  0.29901716
## tfeb 0.27524689 -0.07830351  0.051384208 -0.0792929685  0.26587174
## tmar 0.27484658 -0.09367701 -0.001067899  0.0006366441  0.19086484
## tapr 0.26473010 -0.12257641 -0.076298000  0.1052955242  0.03981628
## tmay 0.23877755 -0.16889072 -0.127271902  0.1251598792 -0.21848109
## tjun 0.20220435 -0.21701738 -0.139841033  0.0811805985 -0.44268300
## tjul 0.20390908 -0.22851811 -0.090041235  0.0981226254 -0.35337585
## taug 0.22863878 -0.20271341 -0.057726655  0.1081761983 -0.22777782
## tsep 0.25186014 -0.16881584 -0.031188367 -0.0252319485 -0.11100418
## toct 0.26714245 -0.12423680 -0.014213457  0.0039586317  0.12864795
## tnov 0.27214789 -0.09552584  0.029032040 -0.0776651476  0.26276447
## tdec 0.27301687 -0.07380880  0.068758839 -0.1472904833  0.28523902
## pjan 0.16803736  0.25995212  0.198293078 -0.0491302786 -0.14878411
## pfeb 0.17102216  0.26393484  0.174815508 -0.0779200398 -0.13546314
## pmar 0.17796187  0.25650454  0.169544577 -0.0684787805 -0.08941670
## papr 0.17074339  0.27824680  0.066073520  0.0873621609 -0.01709438
## pmay 0.14748106  0.25905950 -0.096201569  0.4454588155  0.21475276
## pjun 0.05730680  0.23790590 -0.271290553  0.6551603362  0.14238968
## pjul 0.01369406  0.10097126 -0.573535723 -0.1391139500  0.07054361
## paug 0.03206302  0.14782502 -0.519764989 -0.4064516753  0.01206271
## psep 0.10354548  0.26528778 -0.300542026 -0.1481576621 -0.04780745
## poct 0.13939969  0.28759576 -0.081905529 -0.1479311802 -0.16441200
## pnov 0.16303542  0.27829605  0.141217106 -0.0566592085 -0.16857230
## pdec 0.16860662  0.26639834  0.170927455 -0.0622112513 -0.15532094
##                PC6          PC7          PC8         PC9         PC10
## tjan -0.0284108206  0.006685677  0.150748941 -0.03370819  0.295282973
## tfeb  0.0003503783  0.136618117  0.096325727  0.05374427  0.146770345
## tmar  0.0472553831  0.299836795  0.107879864  0.12796171 -0.002868388
## tapr  0.1142053327  0.361840941  0.005051047  0.18027931 -0.245606706
## tmay  0.1162681105  0.391610249  0.152998679  0.14131428 -0.153895952
## tjun  0.0336846593  0.059413350  0.153361302  0.08729649  0.185251391
## tjul -0.0968873526 -0.282122219  0.078633470 -0.02281332  0.111072241
## taug -0.1100855056 -0.269202717 -0.078353138 -0.16060487  0.002348289
## tsep -0.0494334800 -0.197844681 -0.135371606 -0.17154105  0.019210005
## toct -0.0108001461 -0.228312237 -0.393469988 -0.13609013 -0.315583081
## tnov -0.0341735253 -0.123895015 -0.174524477 -0.10198795 -0.168381612
## tdec -0.0376637342 -0.070659399  0.019616400 -0.04699441  0.196876851
## pjan  0.2458067285  0.032089376 -0.048645619 -0.21338581  0.001100698
## pfeb  0.1820140769 -0.040185881 -0.011301000  0.07554311 -0.126403221
## pmar  0.2288319443 -0.177170757  0.160725487  0.13544683 -0.309717304
## papr -0.0925172437 -0.305189314  0.271689204  0.39781282 -0.204745692
## pmay -0.1948655497 -0.258064679  0.241986147  0.13372542  0.183453709
## pjun  0.1213474134  0.141181422 -0.086059500 -0.36227611  0.043038183
## pjul  0.4811346946 -0.176731790 -0.383344278  0.38567024  0.178533100
## paug  0.0604580646 -0.072069255  0.509760728 -0.38851386 -0.032443473
## psep -0.4843787725  0.196824537 -0.149665107 -0.09624515 -0.407497122
## poct -0.4759093700  0.159625862 -0.266370371  0.26037169  0.346421923
## pnov  0.0644439203  0.128788161 -0.117454901 -0.17604085  0.237115136
## pdec  0.1871590345  0.094644125 -0.105030745 -0.21895144  0.170211499
##               PC11         PC12         PC13          PC14        PC15
## tjan -0.3177818979  0.084717452 -0.198073796  0.0306128664 -0.12524773
## tfeb -0.0850920746  0.207170982  0.215049963 -0.0248132032  0.07177408
## tmar  0.0837891432  0.151198192  0.338194830 -0.1213396869 -0.04343164
## tapr  0.2429592526 -0.018873516  0.244346289  0.0007484506  0.04381860
## tmay  0.0431667637 -0.232783794 -0.291244671  0.0480458427 -0.18616504
## tjun -0.0987876247 -0.033840012 -0.422444222 -0.0105642695  0.31526141
## tjul -0.0486904501  0.283065952  0.298939526 -0.0044454340  0.14450174
## taug -0.0152404625  0.136111286  0.291708371 -0.0496293283 -0.21486141
## tsep -0.0444107836 -0.120942065 -0.094460420  0.0209592856 -0.41993874
## toct  0.2093561076 -0.231921667 -0.011130377  0.0997939913  0.27607606
## tnov  0.1006998972 -0.233238733 -0.174605090 -0.0378639293  0.02149880
## tdec -0.1909224250  0.002929791 -0.196357064  0.0633928569  0.15143733
## pjan -0.0005762763  0.039714713  0.006401917 -0.3431345819  0.49814932
## pfeb -0.1595286785  0.149113176  0.045518203  0.2980140260  0.07726715
## pmar -0.0295711250  0.194223506 -0.064458463  0.5326818242 -0.13153712
## papr -0.3048488853 -0.403344025  0.133363364 -0.4467304855 -0.08102153
## pmay  0.5520813847  0.203821338 -0.246609418  0.0048849326  0.02882245
## pjun -0.4115111702 -0.123357939  0.112057802  0.1724675213  0.02767334
## pjul -0.0309151580  0.137303662 -0.039627958 -0.1174288399 -0.07339510
## paug  0.1609697226 -0.224188960  0.158706743  0.0989438683  0.06477934
## psep -0.1703085895  0.441341096 -0.234474281 -0.2164132305 -0.05426607
## poct  0.0248821147 -0.304950065  0.205141127  0.3425036906  0.16345751
## pnov  0.1821268769 -0.095799139  0.015392954 -0.1184016248 -0.41165840
## pdec  0.1898709898  0.053178843 -0.055907092 -0.2039300913 -0.10002376
##              PC16         PC17         PC18          PC19          PC20
## tjan  0.148674822 -0.107041343  0.160823567 -0.0269825468  0.4272858093
## tfeb -0.017002651  0.234367308 -0.030332072  0.0627361960  0.0203446585
## tmar -0.035352024 -0.000212998 -0.023742455 -0.0452842790 -0.2687836322
## tapr  0.022030975  0.145866305  0.006613803 -0.0184863673 -0.0467386911
## tmay  0.142255714 -0.298105316  0.126282045  0.0006562435  0.3387549738
## tjun -0.160935797  0.249990410 -0.191135211  0.0289864553 -0.2723565962
## tjul -0.238251861 -0.162051895 -0.033972474  0.1319965479  0.2737881475
## taug  0.106065552 -0.250950490  0.035001101 -0.1434759326 -0.0549829727
## tsep  0.390707549  0.327566676  0.244781447 -0.0916673160 -0.3538791803
## toct  0.018144251  0.349455508 -0.075056635  0.0788151267  0.4142142645
## tnov -0.264971834 -0.588989556 -0.143864858  0.0332551632 -0.3245451022
## tdec -0.157870134  0.092966911 -0.097722453  0.0092514596 -0.1416159968
## pjan  0.206879187 -0.141990574  0.530872235  0.0890666735 -0.0947536147
## pfeb  0.530999363 -0.151466383 -0.533289719  0.2207678709 -0.0675137256
## pmar -0.380592388  0.074655676  0.335006369 -0.1362476881 -0.0616168429
## papr -0.056901612  0.106755938 -0.084786785 -0.0375141498  0.0485745440
## pmay  0.177758906 -0.022014450  0.025024070  0.0376327473 -0.0093294644
## pjun -0.094904165  0.013377094 -0.016298342 -0.0261197497 -0.0346808089
## pjul -0.005543386 -0.049203713  0.044870954  0.0112968029  0.0096717843
## paug  0.047348123  0.015708070 -0.052098593  0.0046396787  0.0009636633
## psep -0.040657537  0.041970363 -0.011347466  0.0246969815  0.0132694536
## poct  0.027120678 -0.090538295  0.148875670 -0.1375823255 -0.0299283689
## pnov -0.283906262  0.105283956 -0.052183791  0.6219718414  0.0532889852
## pdec -0.122022282  0.048156966 -0.325075890 -0.6730608587  0.1687923951
##              PC21         PC22         PC23          PC24
## tjan  0.110569448 -0.174362955  0.466461484  0.1410538810
## tfeb  0.411083337  0.195963582 -0.553989382  0.3303973610
## tmar  0.190517645 -0.117562888  0.219088103 -0.6511559490
## tapr -0.404433770 -0.108135234  0.270516403  0.5155541679
## tmay -0.037106821  0.143632598 -0.348991274 -0.1986853412
## tjun  0.254278010  0.063546212  0.239743583  0.0692545700
## tjul -0.179763796 -0.459394939 -0.200353045 -0.0523957985
## taug  0.026308260  0.644090923  0.223105697  0.0795298706
## tsep -0.041652632 -0.338752088 -0.179241161 -0.0007117746
## toct  0.159245830  0.068434255  0.090617300 -0.2078886850
## tnov  0.183602516 -0.198760485 -0.051768256  0.1951652783
## tdec -0.671182075  0.274168418 -0.167825018 -0.2131251230
## pjan -0.025665936  0.023245016 -0.023533194 -0.0018978335
## pfeb -0.017169890 -0.043847725  0.001437087 -0.0140109546
## pmar  0.073977731  0.042827913  0.030519916 -0.0138651710
## papr -0.008348023  0.008701196 -0.001837501  0.0129567032
## pmay -0.007890496  0.009175983 -0.014398707 -0.0095682867
## pjun  0.019589810 -0.050004246 -0.001155038 -0.0001735104
## pjul -0.003853114  0.023997896 -0.014147625 -0.0044508995
## paug  0.004197034  0.012388955  0.011866533  0.0056221737
## psep -0.017989679 -0.007676584 -0.001546521 -0.0008137329
## poct  0.026887105 -0.009846198 -0.006560772  0.0004372792
## pnov -0.015092274  0.074152159  0.081791929  0.0084156928
## pdec -0.018484850 -0.087548286 -0.071983057  0.0171497426

Two variables that are highly associated with axis 1 are tfeb = 0.27524689 and tmar = 0.27484658. Two variables that are highly associated with axis 2 are poct = 0.28759576 and pnov = 0.27829605.

Produce a map of the sites scores on axis 1

wnaclim.pca.score = wnaclim.pca$x[, 1]
quilt.plot(wnaclim2$Longitude, wnaclim2$Latitude, wnaclim.pca.score)
world(add = TRUE)

This map shows temperature in western North America. Higher temperature values are found on the coast and decrease as the longitude increases. Lower temperature values are found in northern Canada and Alaska. These values make sense in terms of the geography and climate in North America as it tends to be hot on the coast and interior of North America and very cold in Alaska and Northern Canada.

Produce a map of the sites scores on axis 2

wnaclim.pca.score = wnaclim.pca$x[, 2]
quilt.plot(wnaclim2$Longitude, wnaclim2$Latitude, wnaclim.pca.score)
world(add = TRUE)

This map shows precipitation across western North America. The coasts have higher precipitation values and there are low precipitation values across the interior of the US, Canada, and Alaska. The interior of western North America does not get as much precipitation as the West Coast of North America so this map does make sense.