Practice - Distance Measures and Cluster Analysis

##R setup with packages for this lab


library(ggplot2)
library(ggpubr)
library(plotly)
library(maps) #Displaying geographic data
library(cluster) #Cluster analysis package
library(fpc) #Flexible procedures for clustering
library(fields) #For working with spatial data
library(RColorBrewer) #Adding additional color palettes 

setwd("~/Desktop/University of Utah PhD /Course Work/Fall 2022 Semester/GEOG6000_Data Analysis/lab07")

Distance Measures

cars = read.csv("../datafiles/cars.tab.csv")

cars.use = cars[ , -c(1,2)]
cars.use = scale(cars.use)  ##standardizes to zero mean and unit variance and converts each observation to a number of standard deviations away from the mean.

#What is an atomic vector?? The scaled data??

par(mfrow = c(2,1))
hist(cars[, "Horsepower"], main = "Horsepower: Original Scale")
hist(cars.use[, "Horsepower"], main = "Horsepower: Standardized Scale")

par(mfrow = c(2, 1))

##dist() function to explore the multivariate distances between different cars. 

dist(cars.use[1:4, ], method = 'euclidean')

##           1         2         3
## 2 1.1398481                    
## 3 1.8791302 1.5379233          
## 4 0.8681787 0.6873475 1.5065957

dist(cars.use[1:4, ], method = 'manhattan')

##          1        2        3
## 2 2.057498                  
## 3 3.816091 3.367820         
## 4 1.681029 1.390295 2.785084

cars.dist = dist(cars.use)
cars.dist

##            1         2         3         4         5         6         7
## 2  1.1398481                                                            
## 3  1.8791302 1.5379233                                                  
## 4  0.8681787 0.6873475 1.5065957                                        
## 5  6.5064614 6.4822961 5.1250923 6.3106342                              
## 6  5.1738042 5.0499221 3.8001685 4.9053397 1.8047804                    
## 7  5.5864282 5.6036243 4.3134390 5.4016818 1.2565441 1.0332439          
## 8  6.1771844 6.0905876 4.7485982 5.9244671 0.7161097 1.2442626 1.0742030
## 9  4.8480549 5.0375972 3.8035553 4.8669738 2.3525992 2.1291365 1.6171913
## 10 3.5301708 3.7291650 2.6017142 3.5792512 3.5736168 2.6364429 2.6015184
## 11 4.9604936 5.1492658 3.9973838 4.9557693 2.3895043 1.8575477 1.3722800
## 12 3.3450719 3.6729516 2.6806678 3.5119282 3.9725777 3.0332883 2.9915492
## 13 3.0274764 2.7708193 1.5960990 2.7535396 3.7854036 2.3935780 3.0058421
## 14 3.9438666 3.7671366 2.4605709 3.7360814 2.8933537 1.9550300 2.3505016
## 15 2.7559271 2.5273572 1.4813731 2.5708289 4.1358672 2.7303071 3.3172245
## 16 2.4939094 2.2706162 1.3471376 2.2403426 4.3163472 2.8785776 3.4066196
## 17 1.4399982 0.8377919 0.7252240 1.0165631 5.7430227 4.3656972 4.9093636
## 18 1.7032564 0.9301185 0.7761175 1.1413136 5.7509851 4.3340633 4.9252674
## 19 1.2528877 0.2565576 1.3733210 0.6668156 6.3095525 4.8865115 5.4507553
## 20 1.2632414 0.7830240 0.8012940 0.7558885 5.7850435 4.4033377 4.9308012
## 21 5.2096828 5.0678428 3.8157918 4.9496450 1.6987044 0.3202520 1.0654982
## 22 3.7264974 3.6468502 2.2880581 3.5151842 2.9564835 1.7292372 2.0723052
## 23 6.9749041 6.9555364 5.5791028 6.7513604 0.7010289 2.2372431 1.7489593
## 24 6.4973291 6.3107820 4.9978582 6.1292098 1.7078072 1.6400328 1.8531241
## 25 5.4455210 5.2977556 4.0184339 5.1982029 1.5478613 0.6090464 1.2158603
## 26 6.7281549 6.7264044 5.3582423 6.5269933 0.3924042 2.0229042 1.3853449
## 27 6.2516001 6.0358394 4.7099391 5.9344674 1.2582953 1.2889114 1.5271840
## 28 5.1216822 4.8108545 3.6618306 4.7373244 2.5974309 1.0604331 2.0770315
## 29 4.1658959 3.9606554 2.6594600 3.7477335 3.1072917 1.6871993 2.3063279
## 30 3.9568990 3.8096329 2.4692785 3.6037691 3.0034600 1.5925941 2.1310964
## 31 5.4980541 5.2816705 4.0886676 5.1187130 2.3371058 1.1814421 1.9882024
## 32 6.4810802 6.3980437 5.0519523 6.2111056 0.9131638 1.6397765 1.5291032
## 33 6.8079280 6.7733290 5.3989540 6.5929239 0.3825189 2.0451145 1.5351385
## 34 6.7428379 6.5979100 5.2646847 6.4146012 1.6110434 1.9546073 2.0929158
## 35 6.3291275 6.3396799 4.9879301 6.1412506 0.3891058 1.6270124 0.9753294
## 36 4.5310683 4.6441960 3.2882995 4.4744300 2.3271473 1.9982563 1.7047030
## 37 5.0541206 5.1541528 3.9678998 4.9833487 2.1643824 1.5764399 1.1034142
## 38 6.7948022 6.7891309 5.4196563 6.5851179 0.4869201 2.0802109 1.4464682
##            8         9        10        11        12        13        14
## 2                                                                       
## 3                                                                       
## 4                                                                       
## 5                                                                       
## 6                                                                       
## 7                                                                       
## 8                                                                       
## 9  2.4465576                                                            
## 10 3.4329055 1.5014408                                                  
## 11 2.3609068 0.8441935 1.7107844                                        
## 12 3.8451138 1.8036762 0.5255723 1.9532352                              
## 13 3.3916875 2.7160211 1.9461582 2.8241709 2.2036527                    
## 14 2.6358656 2.0212696 1.8617590 2.3614848 2.2640202 1.1593258          
## 15 3.7620304 2.8598682 1.9126425 2.9541083 2.0712991 0.5003866 1.4839209
## 16 3.9437224 2.9561575 1.8881155 3.0078993 2.0407394 0.7494156 1.7424394
## 17 5.3618900 4.3540948 3.0918327 4.5183655 3.1008994 2.0619163 3.0077541
## 18 5.3391575 4.4700081 3.2336889 4.6116307 3.2820741 2.0714550 3.0411856
## 19 5.9121360 4.9282952 3.6531977 5.0486036 3.6284414 2.6105509 3.6010959
## 20 5.4002110 4.3981637 3.1371642 4.5417264 3.1329324 2.1526641 3.1127122
## 21 1.1892967 2.0608821 2.6384138 1.8743475 3.0478077 2.3563277 1.8038583
## 22 2.6248563 1.7958162 1.3032347 1.9312784 1.7386606 1.1905000 1.0047733
## 23 1.0204637 2.9535998 4.1460710 2.9540731 4.5385930 4.2941996 3.4462460
## 24 1.1176406 3.3610043 4.0842400 3.1635894 4.5058524 3.7556124 3.2240898
## 25 1.0272409 2.2123917 2.8808062 2.0843180 3.2789801 2.5547655 1.9284228
## 26 0.8979276 2.5433909 3.7767096 2.5459045 4.1703806 4.0638586 3.1984995
## 27 0.7226258 2.8135472 3.6275852 2.7364046 4.0695291 3.3295052 2.6087595
## 28 1.9380432 3.0442380 3.2111523 2.8027787 3.5918097 2.2947691 2.1662867
## 29 2.5543986 2.8421867 2.5390312 2.7460931 2.9094252 1.8375723 1.9726533
## 30 2.5124425 2.4557730 2.0997795 2.4032013 2.4798109 1.5608200 1.6455885
## 31 1.6387728 3.2186993 3.5788929 2.9627948 3.9517206 2.8169783 2.5990762
## 32 0.5399900 2.8951122 3.8782418 2.8172944 4.2791128 3.7347060 3.0063712
## 33 0.8409380 2.6961640 3.9043616 2.7177677 4.3053914 4.0843817 3.2041305
## 34 1.1505897 3.5150291 4.3530385 3.3837812 4.7533608 4.0043668 3.3981805
## 35 0.6590117 2.1966443 3.3872917 2.1515380 3.7731194 3.6747621 2.8546416
## 36 2.3239170 0.8284025 1.4496585 1.4566540 1.8538570 2.2807905 1.4742083
## 37 2.0955544 0.9293048 1.7974353 0.4185086 2.1255921 2.7296457 2.2054957
## 38 0.9445111 2.6284194 3.8529893 2.6245026 4.2496167 4.1342707 3.2746391
##           15        16        17        18        19        20        21
## 2                                                                       
## 3                                                                       
## 4                                                                       
## 5                                                                       
## 6                                                                       
## 7                                                                       
## 8                                                                       
## 9                                                                       
## 10                                                                      
## 11                                                                      
## 12                                                                      
## 13                                                                      
## 14                                                                      
## 15                                                                      
## 16 0.6135479                                                            
## 17 1.8523603 1.6612209                                                  
## 18 1.9038060 1.7164735 0.3484249                                        
## 19 2.4022245 2.1432108 0.6896808 0.7442321                              
## 20 1.9721830 1.7262461 0.3112976 0.4990653 0.6197894                    
## 21 2.6970359 2.8720900 4.3735675 4.3486218 4.9026316 4.4256685          
## 22 1.4643705 1.5574226 2.9028656 2.9276306 3.4979870 2.9585980 1.7360496
## 23 4.6691439 4.8427560 6.2148382 6.2075757 6.7729493 6.2406187 2.1896065
## 24 4.1601229 4.3029176 5.6084492 5.5235895 6.1172285 5.6294160 1.7273423
## 25 2.8872833 3.1291034 4.5853313 4.5589804 5.1302201 4.6490493 0.4130040
## 26 4.4209623 4.5764749 5.9890670 5.9944815 6.5523856 6.0187706 1.9654050
## 27 3.7063800 3.9155152 5.3046264 5.2447759 5.8512434 5.3707284 1.1890624
## 28 2.6437355 2.8307240 4.1643391 4.0630542 4.6330340 4.2186003 1.1173955
## 29 2.2156802 2.2263245 3.2752130 3.1804593 3.7742653 3.2678478 1.8515108
## 30 1.9244810 1.9416802 3.1008883 3.0456623 3.6340760 3.1046275 1.7263843
## 31 3.2043755 3.3602005 4.6324672 4.5398596 5.0925051 4.6450790 1.3165811
## 32 4.1237846 4.3128218 5.6710966 5.6389351 6.2104759 5.6973056 1.6161605
## 33 4.4473685 4.6289818 6.0286118 6.0253378 6.5951862 6.0669668 1.9667330
## 34 4.4047432 4.6039058 5.8810526 5.8127673 6.4018476 5.9033347 1.9892055
## 35 4.0185214 4.1758140 5.6107231 5.6215377 6.1714477 5.6390014 1.5802116
## 36 2.5011695 2.6304055 3.9000247 3.9936022 4.5083666 3.9492086 1.9208685
## 37 2.9037674 2.9578242 4.5033203 4.5699679 5.0407289 4.5386834 1.5785176
## 38 4.4968791 4.6449792 6.0525231 6.0544862 6.6134046 6.0798614 2.0305467
##           22        23        24        25        26        27        28
## 2                                                                       
## 3                                                                       
## 4                                                                       
## 5                                                                       
## 6                                                                       
## 7                                                                       
## 8                                                                       
## 9                                                                       
## 10                                                                      
## 11                                                                      
## 12                                                                      
## 13                                                                      
## 14                                                                      
## 15                                                                      
## 16                                                                      
## 17                                                                      
## 18                                                                      
## 19                                                                      
## 20                                                                      
## 21                                                                      
## 22                                                                      
## 23 3.4695843                                                            
## 24 3.0862555 1.5893300                                                  
## 25 1.9739830 2.0167725 1.6235892                                        
## 26 3.1851298 0.4691402 1.6994356 1.8416265                              
## 27 2.6810030 1.5170768 1.0286972 0.9756777 1.4653244                    
## 28 2.0864283 2.9200943 1.8463675 1.1870636 2.8254805 1.6302850          
## 29 1.3888145 3.3986058 2.4827899 2.0676480 3.2487747 2.5191228 1.6192914
## 30 0.9567097 3.3675915 2.6313055 1.9686886 3.1685641 2.5300071 1.7122281
## 31 2.4659351 2.4796851 1.2935247 1.3109424 2.4841532 1.5030967 0.8534042
## 32 3.0224100 0.8164795 0.9631897 1.4177507 0.9588893 0.9580309 2.1679101
## 33 3.2507493 0.3618590 1.6148982 1.7937362 0.2851090 1.3269554 2.7692642
## 34 3.3773124 1.3394565 0.6691116 1.7771885 1.5986781 1.1984472 2.1748994
## 35 2.7966626 0.8094290 1.6246464 1.4827076 0.4532477 1.3225642 2.4994953
## 36 1.3182650 2.8822778 3.1423812 2.0508011 2.5349410 2.6201402 2.7813383
## 37 1.8002414 2.7506750 2.8780007 1.8108202 2.3356329 2.4112481 2.5306662
## 38 3.2520390 0.4240946 1.6773707 1.9123832 0.1103997 1.4947954 2.8688498
##           29        30        31        32        33        34        35
## 2                                                                       
## 3                                                                       
## 4                                                                       
## 5                                                                       
## 6                                                                       
## 7                                                                       
## 8                                                                       
## 9                                                                       
## 10                                                                      
## 11                                                                      
## 12                                                                      
## 13                                                                      
## 14                                                                      
## 15                                                                      
## 16                                                                      
## 17                                                                      
## 18                                                                      
## 19                                                                      
## 20                                                                      
## 21                                                                      
## 22                                                                      
## 23                                                                      
## 24                                                                      
## 25                                                                      
## 26                                                                      
## 27                                                                      
## 28                                                                      
## 29                                                                      
## 30 0.4464760                                                            
## 31 1.7406712 1.9194350                                                  
## 32 2.7848997 2.8042141 1.6824079                                        
## 33 3.2848474 3.2218676 2.4278263 0.8318586                              
## 34 2.8599053 2.9846217 1.5536115 0.7295724 1.4493989                    
## 35 2.9065222 2.8009512 2.2045161 0.9086064 0.5976211 1.5967788          
## 36 2.3532768 1.9696566 2.9421027 2.7289716 2.6452250 3.2869352 2.2009025
## 37 2.5584726 2.2356866 2.7234010 2.5836214 2.4942355 3.1418917 1.9433425
## 38 3.2824562 3.2131639 2.5086130 0.9674219 0.3093247 1.5799187 0.5369531
##           36        37
## 2                     
## 3                     
## 4                     
## 5                     
## 6                     
## 7                     
## 8                     
## 9                     
## 10                    
## 11                    
## 12                    
## 13                    
## 14                    
## 15                    
## 16                    
## 17                    
## 18                    
## 19                    
## 20                    
## 21                    
## 22                    
## 23                    
## 24                    
## 25                    
## 26                    
## 27                    
## 28                    
## 29                    
## 30                    
## 31                    
## 32                    
## 33                    
## 34                    
## 35                    
## 36                    
## 37 1.4103762          
## 38 2.6134362 2.4109537

Hierarchical Clustering

##Use the distance matrix to perform hierarchical clustering for the cars dataset. Ward's method uses the distances between centroids of two clusters to assess similarity.

cars.hclust = hclust(cars.dist, method = 'ward.D2')

plot(cars.hclust,
     labels = cars$Car,
     main = 'Default from hclust')

plot(cars.hclust, labels = cars$Car, main = 'Default from hclust')
rect.hclust(cars.hclust, k = 3)

#Cutree function allows us to cut the tree to provide a set number of groups

groups.3 = cutree(cars.hclust, k = 3) #cutting the tree into 3 groups

table(groups.3) ##number of cars by group

## groups.3
##  1  2  3 
##  8 17 13

cars$Car[groups.3 == 1] ##Which cars are in cluster 1

## [1] "Buick Estate Wagon"        "Ford Country Squire Wagon"
## [3] "Chevy Malibu Wagon"        "Chrysler LeBaron Wagon"   
## [5] "Chevy Caprice Classic"     "Ford LTD"                 
## [7] "Mercury Grand Marquis"     "Dodge St Regis"

table(groups.3, cars$Country) ##checking to see if there is an association between the

##         
## groups.3 France Germany Italy Japan Sweden U.S.
##        1      0       0     0     0      0    8
##        2      0       3     1     6      0    7
##        3      1       2     0     1      2    7

##Characterize the clusters based on the original information on the cars.

aggregate(cars[, -c(1,2)], list(groups.3), median)

##   Group.1  MPG Weight Drive_Ratio Horsepower Displacement Cylinders
## 1       1 17.3   3.89        2.43      136.5          334         8
## 2       2 30.9   2.19        3.37       75.0           98         4
## 3       3 20.8   2.91        3.08      110.0          171         6

##use the cars data set (minus the first two rows), using the groups, calculate the median by group

k-means Clustering

##Utilizing the Western North America Climate Data set
##Ensure the maps package is loaded (initial start script for this lab)
##Read in the data.

wnaclim = read.csv("../datafiles/wnaclim2.csv")

wnaloc = cbind(wnaclim$Longitude, wnaclim$Latitude) #Combine the longitude and latitude into a new variable

ngrp = 6 #number of groups for k-means clustering

##Simple location map
plot(wnaloc,
     xlab = "Longitude",
     ylab = "Latitude") ##basic scatterplot
map(add = TRUE) ##adds the map to the current plot

##Extract the variables we need (temp and precip for all twelve months)

wnaclim = wnaclim[ , seq(3,26)] ##new dataframe with the temp and precip variables
wnaclim.s = scale(wnaclim) ##scale the data for clustering

wna.kmeans = kmeans(wnaclim.s, ngrp, nstart = 50, iter.max = 20) #kmeans set up. nstart and iter.max?

table(wna.kmeans$cluster)

## 
##   1   2   3   4   5   6 
##  59 251 556 563 265 723

##Plotting the distribution of k-means clusters

mycol = rainbow(ngrp)
plot(wnaloc,
     xlab = "Longitude",
     ylab = "Latitude",
     main = "Western North America Climate Clusters",
     pch = 16,
     col = mycol[wna.kmeans$cluster])
map(add =TRUE)

k-means Prototypes

#Prototypes are the centers of the clusters

wna.kmeans$centers #scaled values

##         tjan       tfeb       tmar        tapr        tmay       tjun
## 1  0.8696892  0.7934518  0.6850066  0.44533319  0.07647746 -0.3963639
## 2  1.2825180  1.2543921  1.2806068  1.37581299  1.59705382  1.8318690
## 3 -0.3761156 -0.2909757 -0.1935225 -0.07483776 -0.12525130 -0.2838725
## 4 -1.4292438 -1.5014047 -1.5383907 -1.51416330 -1.29643277 -1.0042640
## 5  0.8399177  0.7783219  0.6436532  0.39417236  0.05729332 -0.2802638
## 6  0.5781227  0.6074024  0.6103689  0.57818028  0.52417146  0.4994329
##         tjul        taug       tsep       toct       tnov       tdec       pjan
## 1 -0.5721759 -0.29236349  0.1102208  0.4310103  0.6821368  0.8699948  2.8770982
## 2  1.7727118  1.68622243  1.7257717  1.4714932  1.3441639  1.3053881 -0.3630834
## 3 -0.3507435 -0.33572744 -0.3507177 -0.2101420 -0.2604109 -0.3602986 -0.2677048
## 4 -1.0448910 -1.20204332 -1.3024300 -1.4841307 -1.4998126 -1.4459498 -0.5811195
## 5 -0.2849497 -0.05094281  0.2264464  0.4708470  0.6530901  0.8092819  2.1507312
## 6  0.6190960  0.65134482  0.5927899  0.5986939  0.6064769  0.5822323 -0.2386507
##         pfeb       pmar       papr        pmay       pjun       pjul       paug
## 1  3.2293089  3.0878762  3.8301542  3.62177688  2.7448484  2.1779382  3.2109996
## 2 -0.3446015 -0.3063620 -0.5330200 -0.78085302 -0.9738620  0.4792803  0.5274905
## 3 -0.2546389 -0.2304234 -0.1051083  0.29642889  0.7587406  0.8371566  0.5602187
## 4 -0.5970656 -0.6599811 -0.6771999 -0.80484085 -0.4849090 -0.2404945 -0.1520241
## 5  2.0448033  2.0401464  1.6629589  1.16116541  0.5528940 -0.6659339 -0.4584676
## 6 -0.2326141 -0.2022710 -0.1288679 -0.05129823 -0.2944393 -0.5565503 -0.5895535
##         psep       poct       pnov       pdec
## 1  4.9542629  5.0214147  3.5076945  3.1658742
## 2 -0.1840932 -0.3027824 -0.4102346 -0.3587566
## 3  0.2953857 -0.1190760 -0.2465633 -0.2561016
## 4 -0.4098152 -0.4293762 -0.5400784 -0.5803758
## 5  0.4577024  1.1450541  2.0002633  2.0964498
## 6 -0.4161739 -0.2984224 -0.2668071 -0.2533248

#Revert back to the original values using the aggregate function

wna.centers <- aggregate(wnaclim, list(wna.kmeans$cluster), mean)
wna.centers

##   Group.1        tjan       tfeb       tmar       tapr       tmay      tjun
## 1       1  -0.1474576   1.201695   2.572881   4.752542  7.9559322 10.884746
## 2       2   4.6573705   6.748606   9.155378  12.996813 17.5936255 22.765339
## 3       3 -14.6471223 -11.848201  -7.136511   0.143705  6.6773381 11.484532
## 4       4 -26.9042629 -26.414387 -21.999822 -12.609059 -0.7458259  7.643517
## 5       5  -0.4939623   1.019623   2.115849   4.299245  7.8343396 11.503774
## 6       6  -3.5409405  -1.037206   1.747994   5.929599 10.7934993 15.660996
##       tjul      taug      tsep      toct       tnov        tdec      pjan
## 1 13.11017 13.352542 10.971186  7.033898   2.528814   0.3355932 297.11864
## 2 25.49482 24.309163 21.023108 15.462550   9.292032   5.0641434  30.27092
## 3 14.27968 13.112410  8.103237  1.840108  -7.100180 -13.0258993  38.12590
## 4 10.61350  8.315098  2.181705 -8.480107 -19.761812 -24.8165187  12.31439
## 5 14.62717 14.689434 11.694340  7.356604   2.232075  -0.3237736 237.29811
## 6 19.40194 18.578423 13.973721  8.392254   1.755878  -2.7896266  40.51867
##         pfeb      pmar       papr      pmay      pjun      pjul      paug
## 1 252.677966 221.64407 184.084746 140.50847 112.83051 102.72881 136.13559
## 2  25.912351  30.29084  14.996016  11.39442  11.39044  56.69323  61.03187
## 3  31.620504  34.57194  31.579137  42.98741  58.65288  66.39209  61.94784
## 4   9.893428  10.35524   9.408526  10.69094  24.72824  37.18650  42.01421
## 5 177.520755 162.57736 100.098113  68.34717  53.03774  25.65660  33.43774
## 6  33.017981  36.15906  30.658368  32.78976  29.92393  28.62102  29.76902
##        psep      poct      pnov      pdec
## 1 223.18644 371.35593 334.55932 334.00000
## 2  32.84462  25.55777  24.11952  31.40637
## 3  50.60612  37.48921  37.08813  40.21942
## 4  24.48313  17.33570  13.83126  12.38011
## 5  56.61887 119.59245 215.11698 242.18868
## 6  24.24758  25.84094  35.48409  40.45781

temp.centers <- wna.centers[ , 2:13]
ppt.centers <- wna.centers[ , 14:25]

##Climatology plot for the clusters using matplot()

matplot(t(temp.centers), type = 'l', lwd = 2, lty = 2, col = mycol,
        xlab = "Month", ylab = "Temp C")

matplot(t(ppt.centers), type = 'l', lwd = 2, lty = 2, col = mycol,
        xlab = "Month", ylab = "PPT mm")

#why transpose? If you dont, the figure will use the cluster row # as the x - axis**

Testing clustering solutions

#cluster package calculates the silhouette index
#fpc pacakge calculates the Calinski-Harabasz index 
#pacakges loaded at the beginning of this script

calinhara(wnaclim.s, wna.kmeans$cluster)

## [1] 1731.109

sil.out = silhouette(wna.kmeans$cluster, dist(wnaclim.s))

sil.out[1:4 , ] #rows 1 -4

##      cluster neighbor    sil_width
## [1,]       6        3 -0.002213331
## [2,]       4        3  0.652592256
## [3,]       3        6  0.473669262
## [4,]       4        3  0.646008575

#this index is between 0-1, but my values are higher?***

mean(sil.out[ , 3])

## [1] 0.3629048

tapply(sil.out[ , 3], sil.out[ , 1], mean)

##         1         2         3         4         5         6 
## 0.3362736 0.2945774 0.2747485 0.5613447 0.4060656 0.2862479

##Saving to pdf - couldn't scale up in the original document
pdf()
plot(sil.out, col = mycol, main = "WNA Climate Silhoutte Plot")
invisible(dev.off()) #suppresses the device off message in the html output

Silhoutte Plot

#Trying the grid pacakage to embed the image

library(grid)

Testing Different Cluster Numbers

source("cluster_num_script.r")

##Calinski-Harabasz Index Plot

plot(1:20,ch.out, type = 'b', lwd = 2,
     xlab = "N Groups", ylab = "C", main = "Calinski-Harabasz index")

#Silhoutte Plot

plot(1:20, sil.out, type = 'b', lwd = 2,
     xlab = "N Groups",
     ylab = "C",
     main = " Average Silhoutte Index")

Practice - Principle Component Analysis

state = read.csv("../datafiles/statedata.csv")

state2 = state[ , -1] #remove the state names 

rownames(state2) = state[ , 1] #sets the rownames by state instead of number. Essentially removes it as a variable

##the scale = true argument scales the data to account for a variety of units
state.pca = prcomp(state2, scale = TRUE)

summary(state.pca)

## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     1.8971 1.2775 1.0545 0.84113 0.62019 0.55449 0.38006
## Proportion of Variance 0.4499 0.2040 0.1390 0.08844 0.04808 0.03843 0.01806
## Cumulative Proportion  0.4499 0.6539 0.7928 0.88128 0.92936 0.96780 0.98585
##                            PC8
## Standard deviation     0.33643
## Proportion of Variance 0.01415
## Cumulative Proportion  1.00000

##Scaling the standard deviatioback up to the variance (which is the square of the SD)


state.pca$sdev^2

## [1] 3.5988956 1.6319192 1.1119412 0.7075042 0.3846417 0.3074617 0.1444488
## [8] 0.1131877

##Visualization tool using a screeplot

screeplot(state.pca)

##Rotation is the loadings between the individual variables and the new components

## High values indicate greater association. Direction can be positive or negative

state.pca$rotation

##                    PC1         PC2         PC3         PC4          PC5
## Population  0.12642809  0.41087417 -0.65632546 -0.40938555  0.405946365
## Income     -0.29882991  0.51897884 -0.10035919 -0.08844658 -0.637586953
## Illiteracy  0.46766917  0.05296872  0.07089849  0.35282802  0.003525994
## Life.Exp   -0.41161037 -0.08165611 -0.35993297  0.44256334  0.326599685
## Murder      0.44425672  0.30694934  0.10846751 -0.16560017 -0.128068739
## HS.Grad    -0.42468442  0.29876662  0.04970850  0.23157412 -0.099264551
## Frost      -0.35741244 -0.15358409  0.38711447 -0.61865119  0.217363791
## Area       -0.03338461  0.58762446  0.51038499  0.20112550  0.498506338
##                    PC6          PC7         PC8
## Population -0.01065617 -0.062158658 -0.21924645
## Income      0.46177023  0.009104712  0.06029200
## Illiteracy  0.38741578 -0.619800310 -0.33868838
## Life.Exp    0.21908161 -0.256213054  0.52743331
## Murder     -0.32519611 -0.295043151  0.67825134
## HS.Grad    -0.64464647 -0.393019181 -0.30724183
## Frost       0.21268413 -0.472013140  0.02834442
## Area        0.14836054  0.286260213  0.01320320

##The loadings reflect both the strength and the direction of association (the direction of the eigenvectors), so life expectancy increases toward negative values of component 1, and area increases with positive values of component 2. We can use this to try and assign some meaning to the axes: e.g. axis 1 represents a social-economic gradient, and axis 2 is related the physical geography of the state.

##Site scores

state.pca$x [ , 1:4]

##            PC1         PC2          PC3          PC4
## AL  3.78988728 -0.23477897  0.229317426  0.383268981
## AK -1.05313550  5.45617512  4.240590401  0.575673699
## AZ  0.86742876  0.74506148  0.077268654  1.718843204
## AR  2.38177761 -1.28834366  0.222792819  0.623207350
## CA  0.24138147  3.50952277 -2.806440773 -0.070375517
## CO -2.06218136  0.50566387  0.511384160 -0.109922581
## CT -1.89943583 -0.24300645 -0.662670280  0.169730671
## DE -0.42478394 -0.50791950  0.218723788 -0.192652755
## FL  1.17212341  1.13474136 -1.281840070  0.488495802
## GA  3.29417162  0.10995684  0.387068686 -0.455587595
## HI -0.48704129  0.12526216 -1.377335153  2.950230930
## ID -1.42342916 -0.61114319  0.434488061  0.405610358
## IL  0.11896424  1.28238783 -0.803855520 -1.582992442
## IN -0.47120189 -0.24520088 -0.301483896 -0.656489461
## IA -2.32181208 -0.53685609 -0.293246733  0.204744921
## KS -1.90151483 -0.07719072 -0.177111056  0.614073058
## KY  2.12935981 -1.06425233  0.251716929 -0.348839068
## LA  4.24100842 -0.34630079  0.228892174  0.879342117
## ME -0.96019374 -1.70241922  0.721478601 -0.545261667
## MD -0.20342599  0.38881112 -0.339225021 -0.661994566
## MA -1.19589376 -0.21865625 -1.019629690  0.290071067
## MI  0.18186944  0.84711636 -0.554194008 -1.182753855
## MN -2.43361605 -0.36533543 -0.272388339  0.067401738
## MS  4.03208863 -1.05124066  0.919582795  0.175909614
## MO  0.31125449 -0.14830589  0.006605131 -0.553888853
## MT -1.37887297 -0.03353877  1.339951830 -0.244949880
## NE -2.18101665 -0.54774825  0.055771873  0.455863194
## NV -1.12708455  1.13291366  1.890646998 -1.504838621
## NH -1.67128925 -1.31239813  0.437881704 -0.480357556
## NJ -0.64958222  0.28146986 -0.973902564 -0.616106237
## NM  1.32244692 -0.29357041  1.361964694  0.707016778
## NY  1.05034998  1.89371072 -2.197830728 -1.266966983
## NC  2.69433377 -0.51713890  0.142734146 -0.560640547
## ND -2.41766786 -0.78192203  0.276975457 -0.136309449
## OH -0.26795708  0.41685336 -1.032942807 -1.147560168
## OK  0.07391320 -0.64658337 -0.071187462  0.606914285
## OR -1.32472856  0.22767511 -0.498963071  1.350011371
## PA  0.07738173  0.26940938 -1.070097480 -1.289496583
## RI -0.74084731 -1.46130325 -0.227824040  0.296882798
## SC  3.71100631 -0.90984427  0.748732017 -0.315820402
## SD -2.01253414 -1.31509491  0.536546594 -0.157426050
## TN  2.21813394 -0.65102504 -0.016767256  0.002741771
## TX  2.41364282  2.32744119 -0.286040511  0.804077700
## UT -2.26283736 -0.53433138  0.219767287  0.850616612
## VT -1.36926611 -1.50938322  0.445702144 -0.359113201
## VA  0.99354796  0.18457034 -0.210828404 -0.324420224
## WA -1.34001299  0.51154448 -0.851811855  1.237439671
## WV  1.50662213 -1.60198375  0.492191330 -0.342121310
## WI -1.75754046 -0.63572738 -0.425370544 -0.112299153
## WY -1.48379101  0.04225606  1.354211561 -0.638982968

#Sorting for gradients

sort(state.pca$x[, 1])

##          MN          ND          IA          UT          NE          CO 
## -2.43361605 -2.41766786 -2.32181208 -2.26283736 -2.18101665 -2.06218136 
##          SD          KS          CT          WI          NH          WY 
## -2.01253414 -1.90151483 -1.89943583 -1.75754046 -1.67128925 -1.48379101 
##          ID          MT          VT          WA          OR          MA 
## -1.42342916 -1.37887297 -1.36926611 -1.34001299 -1.32472856 -1.19589376 
##          NV          AK          ME          RI          NJ          HI 
## -1.12708455 -1.05313550 -0.96019374 -0.74084731 -0.64958222 -0.48704129 
##          IN          DE          OH          MD          OK          PA 
## -0.47120189 -0.42478394 -0.26795708 -0.20342599  0.07391320  0.07738173 
##          IL          MI          CA          MO          AZ          VA 
##  0.11896424  0.18186944  0.24138147  0.31125449  0.86742876  0.99354796 
##          NY          FL          NM          WV          KY          TN 
##  1.05034998  1.17212341  1.32244692  1.50662213  2.12935981  2.21813394 
##          AR          TX          NC          GA          SC          AL 
##  2.38177761  2.41364282  2.69433377  3.29417162  3.71100631  3.78988728 
##          MS          LA 
##  4.03208863  4.24100842

#sorting by PC1?

##Biplot for first two components

biplot(state.pca, xlim = c(-0.4, 0.6))

biplot(state.pca, choices = c(1,3))

EXERCISES

1. Boston House Prices, Socio-Economic and Environmental Factors

1.1 k-means Cluster Analysis

##Boston Cluster Analysis

boston = read.csv("../datafiles/boston6k.csv")
boston.loc = cbind(boston$LON, boston$LAT)
boston1 = boston[ , 9:21]

##Scale the data
boston1.s = scale(boston1)

##Run the loop

source("cluster_boston_script.r")

##Results

plot(1:20, sil.out2, type = "b",
     lwd = 2,
     xlab = "n Groups",
     ylab = "Index Score",
     main = "Average Silhoutte Score")

Based on this plot, two (2) clusters give the optimal solution.

1.2 Interpretation

Two clusters seem to be the optimal number of clusters given this output, however, I recommend using six (6) clusters to divide our data. Given the number of variables in this data set, we will likely miss spatial patterns if we limit our cluster analysis to only two groups. In the visualization below it appears that using only two groups gives us a spatial distribution between an urban center and the outlying urban area. Six clusters reveal a much more interesting pattern related to the underlying variables.

##Plotting 2 versus 6 clusters

ngrp2 = 2
ngrp6 = 6
mycol2 = brewer.pal(ngrp6, name = "Accent") #Experimenting with different color palette

boston.kmeans2 = kmeans(boston1.s, ngrp2, nstart = 50, iter.max = 20) ##Two groups
boston.kmeans6 = kmeans(boston1.s, ngrp6, nstart = 50, iter.max = 20) ##Six groups

par(mfrow = c(1,2))
plot(boston.loc,
     xlab = "Longitude",
     ylab = "Latitude",
     main = "Boston Housing Clusters (2)",
     pch = 16,
     col = alpha(mycol2[boston.kmeans2$cluster], 0.5))
map(add =TRUE)

plot(boston.loc,
     xlab = "Longitude",
     ylab = "Latitude",
     main = "Boston Housing Clusters (6)",
     pch = 16,
     col = alpha(mycol2[boston.kmeans6$cluster], 0.5))
map(add =TRUE)

1.3 k-means Re-run

##Output from the previous set of code used to help visualize the difference between two and 6 clusters

boston.kmeans6

## K-means clustering with 6 clusters of sizes 34, 50, 35, 83, 89, 215
## 
## Cluster means:
##         CRIM         ZN      INDUS       CHAS        NOX          RM        AGE
## 1 -0.1985497 -0.2602436  0.2799956  3.6647712  0.3830784  0.27566811  0.3721322
## 2 -0.2807410 -0.4872402  1.5745577 -0.2723291  1.0086925 -0.48725596  0.9430257
## 3  1.4802645 -0.4872402  1.0149946 -0.2723291  0.9676887 -0.29693894  0.7656016
## 4 -0.4124374  1.9050974 -1.1029195 -0.2248941 -1.1420321  0.63738196 -1.3699186
## 5  0.9722616 -0.4872402  1.0149946 -0.2723291  1.0010692 -0.47909897  0.7460871
## 6 -0.3875372 -0.2999768 -0.5700686 -0.2723291 -0.4262088  0.07032582 -0.1827830
##          DIS          RAD         TAX     PTRATIO           B      LSTAT
## 1 -0.4033382  0.001081444 -0.09756330 -0.39245849  0.17154271 -0.1643525
## 2 -0.8946961 -0.616658684  0.03359999 -0.30002307 -0.07381173  0.5971853
## 3 -0.8580043  1.659602895  1.52941294  0.80577843 -3.29705640  1.1699052
## 4  1.5217901 -0.615191967 -0.60132504 -0.72908309  0.35337310 -0.9051745
## 5 -0.8009959  1.659602895  1.52941294  0.80577843  0.16590165  0.8119640
## 6  0.1556210 -0.576435631 -0.64232592 -0.05143203  0.32167393 -0.2900152
## 
## Clustering vector:
##   [1] 1 1 1 5 5 5 5 1 1 5 5 3 5 1 1 5 1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
##  [38] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 3 3 3 3 3 3 3 3 3 3 3 5 5 5 3 3 3 3 3 3 3
##  [75] 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 3 5 5 5 5 3 5 5 5 3 3 3 3 5 5 5 5 5 5 5 5 3
## [112] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 2 2 2 2 2 6 6 6 6 6 6 6 6 6 6 6
## [149] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [186] 6 6 6 6 4 4 6 6 6 6 6 6 6 6 6 6 6 4 4 4 4 4 4 4 6 6 6 6 4 4 4 4 6 6 6 6 6
## [223] 6 6 6 6 6 6 6 6 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [260] 6 6 6 6 6 6 6 6 6 6 6 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2
## [297] 2 2 2 2 2 2 1 2 1 1 2 2 2 2 1 2 1 1 2 2 2 2 2 2 2 2 6 6 6 6 6 6 6 6 6 6 6
## [334] 6 6 6 6 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 6 6 6 1 1 1 1 1 6 6 6 1 6 1 1
## [371] 1 1 1 6 6 6 6 6 6 6 6 6 6 6 1 6 1 6 4 4 4 6 4 4 6 6 4 6 4 4 4 4 4 4 4 4 4
## [408] 6 6 6 6 6 6 6 6 6 6 6 6 1 6 6 6 1 1 4 1 1 4 4 4 4 1 4 4 4 4 4 4 4 4 4 4 6
## [445] 6 6 6 6 4 4 4 4 4 4 4 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
## [482] 4 4 6 6 6 6 6 6 6 6 4 6 4 4 6 6 4 4 4 4 4 4 4 4 4
## 
## Within cluster sum of squares by cluster:
## [1] 290.2630 290.5607 183.3608 338.7875 419.3128 909.5003
##  (between_SS / total_SS =  63.0 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

1.4 Median Table

##Convert the back to the unscaled data by cluster, then calculate the mean value by cluster. Included the original first few columns as well (primarily for median price columns)
boston.kmeans6.centers = aggregate(boston, list(boston.kmeans6$cluster), mean)

library(kableExtra)
boston.kmeans6.centers %>%
  kbl() %>%
  kable_classic() %>%
   scroll_box(width = "500px", height = "200px")

Group.1	ID	TOWN	TOWNNO	TRACT	LON	LAT	MEDV	CMEDV	CRIM	ZN	INDUS	CHAS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT
1	281.47059	NA	46.52941	2875.9412	-71.17256	42.34603	27.80588	27.80588	1.9056897	5.294118	13.057647	1.0000000	0.5990853	6.478324	79.05000	2.945729	9.558824	391.7941	17.60588	372.33500	11.479412
2	278.96000	NA	33.52000	3318.6000	-71.09546	42.38783	19.55200	19.55200	1.1987174	0.000000	21.938800	0.0000000	0.6715800	5.942280	95.12000	1.911072	4.180000	413.9000	17.80600	349.93540	16.917600
3	73.34286	NA	82.11429	821.4571	-71.08327	42.31907	12.90000	12.87143	16.3460857	0.000000	18.100000	0.0000000	0.6668286	6.076000	90.12571	1.988334	24.000000	666.0000	20.20000	55.66971	21.007429
4	383.07229	NA	42.21687	3789.3012	-71.16298	42.31999	29.02530	29.03735	0.0659249	55.795181	3.570361	0.0120482	0.4223590	6.732470	30.01325	6.999492	4.192771	306.8916	16.87711	388.93518	6.189157
5	69.01124	NA	81.56180	761.6404	-71.07796	42.32645	16.43708	16.39663	11.9764754	0.000000	18.100000	0.0000000	0.6706966	5.948011	89.57640	2.108378	24.000000	666.0000	20.20000	371.82000	18.451348
6	298.83256	NA	33.28372	3216.8326	-71.06380	42.39237	23.97721	23.98465	0.2801047	4.367442	7.225907	0.0000000	0.5053070	6.334047	63.42977	4.122735	4.530233	299.9814	18.34419	386.04121	10.582047

The clusters seem to be divided primarily across crime rates, property tax rate, zoning, distance to employment centers, and socioeconomic factors. Unsurprisingly, cluster number 4 is home to the lower end of the socioeconomic bracket with low value homes, high-crime rates, and the lowest distance to employment centers. This is likely a lower-class urban center in Boston.

1.5 Mean Corrected House Value

value = cbind(boston.kmeans6.centers$Group.1, boston.kmeans6.centers$CMEDV)
value2 = as.data.frame(value)
colnames(value2) = c("Cluster", "Mean Corrected House Value")

library(kableExtra)
value2 %>%
    kbl(caption = "House Value by Cluster") %>%
    kable_classic( full_width = F)

House Value by Cluster
Cluster	Mean Corrected House Value
1	27.80588
2	19.55200
3	12.87143
4	29.03735
5	16.39663
6	23.98465

1.6 Anova Test between Clusters

##Setting up a new data frame for the data the includes the cluster number

boston.clus.df = data.frame(cluster = boston.kmeans6$cluster, boston)

##Change the cluster ID to a character
boston.clus.df$cluster = as.character(boston.clus.df$cluster) 

clust.aov = aov(CMEDV ~ cluster, data = boston.clus.df)
clust.aov.sum = summary(clust.aov)
print.sum = print(clust.aov.sum)

##              Df Sum Sq Mean Sq F value Pr(>F)    
## cluster       5  11973  2394.5   39.12 <2e-16 ***
## Residuals   500  30605    61.2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

print.sum

##              Df Sum Sq Mean Sq F value Pr(>F)    
## cluster       5  11973  2394.5   39.12 <2e-16 ***
## Residuals   500  30605    61.2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA produces a large F-statistic and a extremely small p-value which leads us to reject the null hypothesis that the house values are the same. In this case, we accept the alternative hypothesis that the house values are different among the clusters.

2. Principle Component Analysis of Climate in WNA

2.1 PCA Analysis, Biplot, Screeplot

##Read in a new set of data to work with
wna.pca.data = read.csv("../datafiles/wnaclim2.csv")

##Use only the monthly temp and precip data

wnaclim.temp.precip = wna.pca.data[ , 3:26]

##PCA - scale = TRUE syntax scales the data
wnaclim.pca = prcomp(wnaclim.temp.precip, scale = TRUE)

##Biplot

par(mfrow = c(1,2))
wna.biplot = biplot(wnaclim.pca)

##Screeplot

wna.scree = screeplot(wnaclim.pca)

2.2 Second PCA Variance

summary(wnaclim.pca)

## Importance of components:
##                           PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.4612 2.8065 1.5846 0.84733 0.63986 0.51316 0.29261
## Proportion of Variance 0.4992 0.3282 0.1046 0.02992 0.01706 0.01097 0.00357
## Cumulative Proportion  0.4992 0.8274 0.9320 0.96190 0.97896 0.98993 0.99350
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.20707 0.17437 0.15014 0.13534 0.10798 0.09353 0.08322
## Proportion of Variance 0.00179 0.00127 0.00094 0.00076 0.00049 0.00036 0.00029
## Cumulative Proportion  0.99528 0.99655 0.99749 0.99825 0.99874 0.99910 0.99939
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.06116 0.05676 0.04584 0.04161 0.03439 0.02946 0.02562
## Proportion of Variance 0.00016 0.00013 0.00009 0.00007 0.00005 0.00004 0.00003
## Cumulative Proportion  0.99955 0.99968 0.99977 0.99984 0.99989 0.99993 0.99995
##                           PC22    PC23    PC24
## Standard deviation     0.02411 0.01802 0.01392
## Proportion of Variance 0.00002 0.00001 0.00001
## Cumulative Proportion  0.99998 0.99999 1.00000

PC1 and PC2 account for approximately 83% of the variance.

2.3 Variable Loadings

## Look at the loadings for PC1 and PC2 (columns 1 and 2)

wnaclim.pca$rotation[ , 1:2]

##             PC1         PC2
## tjan 0.27154635 -0.07007925
## tfeb 0.27524689 -0.07830351
## tmar 0.27484658 -0.09367701
## tapr 0.26473010 -0.12257641
## tmay 0.23877755 -0.16889072
## tjun 0.20220435 -0.21701738
## tjul 0.20390908 -0.22851811
## taug 0.22863878 -0.20271341
## tsep 0.25186014 -0.16881584
## toct 0.26714245 -0.12423680
## tnov 0.27214789 -0.09552584
## tdec 0.27301687 -0.07380880
## pjan 0.16803736  0.25995212
## pfeb 0.17102216  0.26393484
## pmar 0.17796187  0.25650454
## papr 0.17074339  0.27824680
## pmay 0.14748106  0.25905950
## pjun 0.05730680  0.23790590
## pjul 0.01369406  0.10097126
## paug 0.03206302  0.14782502
## psep 0.10354548  0.26528778
## poct 0.13939969  0.28759576
## pnov 0.16303542  0.27829605
## pdec 0.16860662  0.26639834

The February and March temperature have high loadings (~0.28 & 0.27 respectively) on principle component 1.
The October and November precipitation have high loadings (~0.29 & 0.28 respectively) on principle component 2.
Based on these loadings, we see that principle component 1 is largely a temperature gradient whereas principle component 2 is largely a precipitation gradient. This matches what we see in our ordination plot as well.

2.4 Mapping and Analysis of Axis 1

##PC1
##Site scores plot

##Set the score as a variable
wnaclim.pca.score = wnaclim.pca$x[ , 1]

##Map the score to location (long/lat)

quilt.plot(wna.pca.data$Longitude, wna.pca.data$Latitude, wnaclim.pca.score)
world(add = TRUE)

PC1 is largely influenced by spring temperatures which is reflected in the spatial pattern in the map above. This map shows a latitudinal gradient which is consistent with the insolation and temperature patterns in North America. Additional, the coastal zone along the Pacific Northwest and west coast of Canada is representative of the moderating influence of the ocean. This spatial pattern of temperature agrees with our understanding of the temperature regime of North America.

2.5 Mapping and Analysis of Axis 2

##PC2
##Set the score as a variable
wnaclim.pca.score2 = wnaclim.pca$x[ , 2]

##Map the score to location (long/lat)

quilt.plot(wna.pca.data$Longitude, wna.pca.data$Latitude, wnaclim.pca.score2)
world(add = TRUE)

PC2 is largely influenced by fall precipitation which is reflected by the spatial pattern in the map above. The Pacific Northwest and west coast of Canada are dominated by a long period of precipitation beginning in the fall and lasting through the spring. This is a result of the oceanic influence on the precipitation regime in the area. The interior of North America has less precipitation due its more continental climate, however, this map likely does not take into account large amounts of orographic precipitation that occurs in more mountainous regimes. This needs to be confirmed by looking at the elevation values from where these observations were recorded from.

Leydet_Lab07_Multivariate Analysis

David Leydet

2022-10-06

Practice - Distance Measures and Cluster Analysis

Distance Measures

Hierarchical Clustering

k-means Clustering

k-means Prototypes

Testing clustering solutions

Testing Different Cluster Numbers

Practice - Principle Component Analysis

EXERCISES

1. Boston House Prices, Socio-Economic and Environmental Factors

1.1 k-means Cluster Analysis

1.2 Interpretation

1.3 k-means Re-run

1.4 Median Table

1.5 Mean Corrected House Value

1.6 Anova Test between Clusters

2. Principle Component Analysis of Climate in WNA

2.1 PCA Analysis, Biplot, Screeplot

2.2 Second PCA Variance

2.3 Variable Loadings

2.4 Mapping and Analysis of Axis 1

2.5 Mapping and Analysis of Axis 2