Let us first import the data file of cereals in our R environment. The following achieves this objective and prints the first five observations.
cereals <- read.table("http://www.public.iastate.edu/~maitra/stat501/datasets/cereals.dat", header = F)
names(cereals) <- c("Brand", "Manufacturer", "Calories", "Protein", "Fat", "Sodium", "Fiber", "Carbohydrates","Sugar", "Potassium", "Class")
head(cereals, 5)
## Brand Manufacturer Calories Protein Fat Sodium Fiber
## 1 ACCheerios G 110 2 2 180 1.5
## 2 Cheerios G 110 6 2 290 2.0
## 3 CocoaPuffs G 110 1 1 180 0.0
## 4 CountChocula G 110 1 1 180 0.0
## 5 GoldenGrahams G 110 1 1 280 0.0
## Carbohydrates Sugar Potassium Class
## 1 10.5 10 70 1
## 2 17.0 1 105 1
## 3 12.0 13 55 1
## 4 12.0 13 65 1
## 5 15.0 9 45 1
Now, we will first standardized the variable under consideration according to our problem statement. The results of this operation are printed using the following code.
cereals_standardized <- scale(cereals[,3:10], center = TRUE, scale = TRUE)
head(cereals_standardized)
## Calories Protein Fat Sodium Fiber Carbohydrates
## [1,] 0.1103426 -0.3806804 1.2767742 -0.005871679 -0.1189104 -0.8822330
## [2,] 0.1103426 2.8931707 1.2767742 1.382780515 0.1589780 0.6446037
## [3,] 0.1103426 -1.1991431 0.0290176 -0.005871679 -0.9525758 -0.5298861
## [4,] 0.1103426 -1.1991431 0.0290176 -0.005871679 -0.9525758 -0.5298861
## [5,] 0.1103426 -1.1991431 0.0290176 1.256539406 -0.9525758 0.1748078
## [6,] 0.1103426 0.4377824 0.0290176 0.877816080 -0.1189104 -0.6473351
## Sugar Potassium
## [1,] 0.5280395 -0.21810133
## [2,] -1.4559536 0.31132205
## [3,] 1.1893705 -0.44499706
## [4,] 1.1893705 -0.29373324
## [5,] 0.3075958 -0.59626088
## [6,] 0.5280395 0.08442632
Now, we will compute the distance matrix of our standardized variables and paste the specimen of first 5 observations from our distance matrix named distance in this report.
distance <- dist(cereals_standardized, method = "euclidean", upper = TRUE, diag = TRUE)
distance <- as.matrix(distance)
head(distance,5)
## 1 2 3 4 5 6 7
## 1 0.000000 4.389923 1.8800970 1.8678873 2.413378 1.776058 3.455752
## 2 4.389923 0.000000 5.5151628 5.4964619 4.869285 3.684407 3.962058
## 3 1.880097 5.515163 0.0000000 0.1512638 1.700201 2.210626 3.327191
## 4 1.867887 5.496462 0.1512638 0.0000000 1.720269 2.179354 3.340917
## 5 2.413378 4.869285 1.7002007 1.7202688 0.000000 2.169286 2.115457
## 8 9 10 11 12 13 14
## 1 1.6192301 2.037995 1.692973 1.681436 3.325714 3.798287 2.771250
## 2 4.8239397 3.849793 3.765158 3.868147 4.123082 4.992484 3.117500
## 3 0.8476301 2.365674 2.754506 3.113207 3.186540 4.206123 3.497958
## 4 0.8610212 2.336478 2.704207 3.053845 3.204440 4.112606 3.465098
## 5 1.7945462 1.912533 3.071040 3.577630 2.334208 4.523253 3.086864
## 15 16 17 18 19 20 21 22
## 1 2.0443430 2.888244 1.893207 6.577956 2.818565 4.240844 3.062161 2.591681
## 2 5.5993313 3.108638 4.095663 6.390159 5.751471 4.425505 6.170433 4.189560
## 3 0.7514574 3.568236 1.773684 7.554684 1.806802 3.896519 1.881682 4.312430
## 4 0.8513833 3.536029 1.773684 7.475524 1.844401 3.911172 1.929708 4.259042
## 5 1.9680463 3.113561 1.467644 7.561212 2.917822 2.686999 2.901732 4.644644
## 23 24 25 26 27 28 29 30
## 1 3.923052 1.712732 2.868150 3.788553 3.962087 2.186708 3.616425 1.979298
## 2 4.516607 5.214771 5.603114 5.338347 4.872641 4.069030 4.909135 4.436180
## 3 3.501791 1.288037 1.597044 3.858931 4.153263 2.181978 4.243838 1.513060
## 4 3.521339 1.340270 1.646424 3.835141 4.081017 2.181978 4.189576 1.543008
## 5 2.626153 2.553318 1.796336 4.553077 4.218383 1.901755 4.547773 1.498137
## 31 32 33 34 35 36 37 38
## 1 3.386087 3.876270 4.228492 3.730262 4.337000 2.259963 4.639299 1.580069
## 2 3.520707 4.015880 3.808077 4.712787 4.569048 5.858332 2.816341 5.236607
## 3 4.156754 4.036184 4.044825 4.350095 3.824121 1.910388 4.975300 1.493297
## 4 4.118043 4.019142 4.053302 4.254365 3.839050 1.934193 4.977599 1.531124
## 5 3.731677 3.660273 2.864802 4.544612 2.592047 3.427599 4.564946 1.834499
## 39 40 41 42 43
## 1 1.263239 2.054943 5.353324 5.141861 4.604046
## 2 5.017712 2.949282 7.099531 6.641340 5.350988
## 3 1.617395 3.462659 5.035857 5.113100 5.859570
## 4 1.638478 3.439454 5.056262 5.117573 5.840013
## 5 1.846190 3.646725 5.331019 5.509723 6.495437
Now we will perform the multidimensional scaling of our distance matrix using the cmdscale command and store the results in our variable named as MDS.
n = nrow(distance)
MDS <- cmdscale(distance, k= 2, eig = TRUE, x.ret = TRUE)
X <- MDS$points[,1:2]
Now, in order to make prettier plots, we have used the ggplot2. In the follwoing plot of two dimensional solution of multidimensional scaling, we have used numbers to specify the observation and colour to specify the manufacturer. The tomato colour corresponds to G, green to K and skybblue to Q where G,K, Q are respective manufacturers.
library(ggplot2)
g <- ggplot(data.frame(X, cereals),
aes(X[,2], X[,1], label = rownames(cereals), group=Manufacturer))
g + geom_text(aes(color = Manufacturer) ,
angle=0, show.legend = T, nudge_x = 0.05, nudge_y = 0.15) +
ggtitle("Multi-dimensional Scaling on Cereals Data") +
geom_point() + theme_classic() +
theme(legend.position = "bottom", legend.title =
element_text(size=10, face="bold"),
legend.text = element_text(size=10, face="bold"))
# 4 According to the two dimensional solution of the multidimensional scaling, the cereals 14 and 16 seems to be most similar followed by the cereals 3 and 4.
According to the two dimensional solution of the multidimensional scaling problem, the cereals 41 and 18 are most distinct as confirmed by distance matrix.
It is possible to obtain the exact representation of our original distance matrix. In such case, we have to set the number of dimension equal to the rank of our matrix.
The follwoing code reports the eigen values and computes the goodness of fit of the solution
ev <- MDS$eig
gof <- MDS$GOF
print(round(ev,digits=4))
## [1] 106.9979 77.9000 74.2909 36.4679 20.8866 15.0120 2.5411
## [8] 1.9036 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## [15] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## [22] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## [29] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## [36] 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
## [43] 0.0000
print(gof)
## [1] 0.5502914 0.5502914
We can also report the goodness of fit directly using the output of cmdscale as follows-
print(cmdscale(distance, 2, eig=TRUE)$GOF)
## [1] 0.5502914 0.5502914
or we can compute the goodness of fit manually using the two different criterion as follows-
print( (ev[1]+ev[2])/sum(abs(ev)) )
## [1] 0.5502914
print( (ev[1]+ev[2])/sum(ev[ev>0]))
## [1] 0.5502914
Yes. There are almost always few number of nonzero eigenvalues accompanied by a number of zero eigenvlaues (There will always be one eigenvalue equal to zero in practice). The number of non-zero eigenvalues is equal to p.
If the coordinates of n points in p dimensions be denoted by \(X_i\), i = 1, … ,n. These can be collected together in a n x p matrix X . When we perform the eigendecomposition of the matrix B, if p < n then there are n-p zero eigenvalues. In fact if the points are not in “general position” the number of zero eigenvalues will be greater than n - p. For a detailed reference, we have used this source
The following code computes the fitted distances.
Also, In the following plot, we have plotted the fitted distances against the observed distances. The red line correpsonds to the the regression line. We can see that the regression line goes almost linearly with respect to the fitted and observed distances. This proves that there is a strong correlation between the observed and fitted distances which was our initial objective. Since the line is almost at 45 degree, we can say that correlation is strong and hence, visuall, it gives a good goodness of fit.
fitted <- as.matrix(dist(X, method = "euclidean"))
fitted <- as.vector(fitted)
observed <- as.vector(distance)
reg <- lm(fitted~observed)
plot(observed, fitted,pch=19, cex=0.4)
abline(lm(fitted~observed), col="red")
print(paste("Coefficient of determination:", summary(reg)$r.squared))
## [1] "Coefficient of determination: 0.770427902994939"
The coefficient of determination has been estimated as 0.7704
(1p) Try now non-metric MDS with the isoMDS program. Plot the two-dimensional solution, labelling the points again with the name or number of the brand, and using different symbols for different manufacturers.
library(MASS)
n <- nrow(distance)
init <- scale(matrix(runif(n*2),ncol=2),scale=FALSE)
nmmds.out <- isoMDS(distance, init, k=2, maxit = 100)
## initial value 43.229760
## iter 5 value 35.927515
## iter 10 value 28.171844
## iter 15 value 24.298783
## iter 20 value 19.978812
## iter 25 value 17.719199
## iter 30 value 17.113076
## iter 35 value 16.933540
## iter 35 value 16.920982
## iter 35 value 16.910725
## final value 16.910725
## converged
Y <- nmmds.out$points
g <- ggplot(data.frame(Y, cereals), aes(Y[,2], Y[,1], label = rownames(cereals)))
g + geom_text(aes(color = Manufacturer) , angle=0, show.legend = F, nudge_x = -0.40,
nudge_y = -0.90) +
ggtitle("Non-Mteric Multi-dimensional Scaling on Cereals Data") +
geom_point(aes(shape = Manufacturer, color = Manufacturer)) + theme_classic() + ggtitle("Non-Metric Multi-dimensional Scaling on Cereals Data")
We cn observe that the pair of cereals 3 and 4 are most similar according to the two dimensional solution of non-metric MDS.
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
nmmds <- metaMDS(comm = distance, distance = "euclidean", k=2)
## Run 0 stress 0.1702258
## Run 1 stress 0.1595462
## ... New best solution
## ... Procrustes: rmse 0.1142226 max resid 0.4359294
## Run 2 stress 0.1676239
## Run 3 stress 0.1676171
## Run 4 stress 0.1733476
## Run 5 stress 0.1595408
## ... New best solution
## ... Procrustes: rmse 0.009497985 max resid 0.0508101
## Run 6 stress 0.1743246
## Run 7 stress 0.1946937
## Run 8 stress 0.1702232
## Run 9 stress 0.159544
## ... Procrustes: rmse 0.004115569 max resid 0.0152421
## Run 10 stress 0.1935051
## Run 11 stress 0.1702223
## Run 12 stress 0.172109
## Run 13 stress 0.1861297
## Run 14 stress 0.1702252
## Run 15 stress 0.2065886
## Run 16 stress 0.186326
## Run 17 stress 0.1676229
## Run 18 stress 0.1702275
## Run 19 stress 0.1927414
## Run 20 stress 0.1725534
## *** No convergence -- monoMDS stopping criteria:
## 20: stress ratio > sratmax
gof <- goodness(nmmds)
plot(nmmds, display = "sites", type = "n", ylab=c(-4,4))
points(nmmds, display = "sites", cex = 2*gof/mean(gof))
Now, in order to graphically examine the stress in case of non-metric multidimensional scaling, let us consider the following stressplot-
stressplot(nmmds, pch = 19, cex=0.75, l.col = "tomato", p.col = "skyblue")
fitted <- as.vector(as.matrix(dist(Y, method = "euclidean")))
observed <- as.vector(as.matrix(distance))
reg <- lm(fitted~observed)
plot(observed, fitted,pch=19, cex=0.4)
abline(lm(fitted~observed), col="red")
print(paste("Coefficient of determination:", summary(reg)$r.squared))
## [1] "Coefficient of determination: 0.862919076728646"
We can observe from the plot that the fitted distances are in almost perfectly linearly related with observed distances.
stress_vec <- numeric(10)
for(i in seq(10)){
stress_vec[i] <- metaMDS(distance, distance = "euclidean", k=i)$stress
}
## Run 0 stress 0.3060661
## Run 1 stress 0.3737939
## Run 2 stress 0.383095
## Run 3 stress 0.3789417
## Run 4 stress 0.3849806
## Run 5 stress 0.3732161
## Run 6 stress 0.4158972
## Run 7 stress 0.3021579
## ... New best solution
## ... Procrustes: rmse 0.05777802 max resid 0.2750343
## Run 8 stress 0.4050292
## Run 9 stress 0.5636962
## Run 10 stress 0.4148599
## Run 11 stress 0.3252705
## Run 12 stress 0.4131135
## Run 13 stress 0.5618635
## Run 14 stress 0.3550077
## Run 15 stress 0.4120756
## Run 16 stress 0.3274244
## Run 17 stress 0.3968979
## Run 18 stress 0.3820266
## Run 19 stress 0.3359674
## Run 20 stress 0.3994779
## *** No convergence -- monoMDS stopping criteria:
## 2: stress ratio > sratmax
## 18: scale factor of the gradient < sfgrmin
## Run 0 stress 0.1702258
## Run 1 stress 0.1725568
## Run 2 stress 0.1595438
## ... New best solution
## ... Procrustes: rmse 0.1163146 max resid 0.4246312
## Run 3 stress 0.1595493
## ... Procrustes: rmse 0.0007063866 max resid 0.003491665
## ... Similar to previous best
## Run 4 stress 0.1733467
## Run 5 stress 0.1870897
## Run 6 stress 0.1702242
## Run 7 stress 0.1743268
## Run 8 stress 0.1702223
## Run 9 stress 0.1595416
## ... New best solution
## ... Procrustes: rmse 0.0004225535 max resid 0.001814049
## ... Similar to previous best
## Run 10 stress 0.1595369
## ... New best solution
## ... Procrustes: rmse 0.002250948 max resid 0.009903102
## ... Similar to previous best
## Run 11 stress 0.1595787
## ... Procrustes: rmse 0.008070538 max resid 0.04337308
## Run 12 stress 0.1721106
## Run 13 stress 0.1595422
## ... Procrustes: rmse 0.002162376 max resid 0.009690388
## ... Similar to previous best
## Run 14 stress 0.1595387
## ... Procrustes: rmse 0.001806793 max resid 0.006983696
## ... Similar to previous best
## Run 15 stress 0.1702259
## Run 16 stress 0.1595418
## ... Procrustes: rmse 0.002256913 max resid 0.009928014
## ... Similar to previous best
## Run 17 stress 0.1733518
## Run 18 stress 0.1733467
## Run 19 stress 0.1595454
## ... Procrustes: rmse 0.002668942 max resid 0.01006312
## Run 20 stress 0.1702242
## *** Solution reached
## Run 0 stress 0.07574604
## Run 1 stress 0.07574723
## ... Procrustes: rmse 0.0007795211 max resid 0.003419078
## ... Similar to previous best
## Run 2 stress 0.07574644
## ... Procrustes: rmse 0.0008205733 max resid 0.003585969
## ... Similar to previous best
## Run 3 stress 0.07574557
## ... New best solution
## ... Procrustes: rmse 0.0006125671 max resid 0.002712198
## ... Similar to previous best
## Run 4 stress 0.07574625
## ... Procrustes: rmse 0.0001345882 max resid 0.0005504402
## ... Similar to previous best
## Run 5 stress 0.07574793
## ... Procrustes: rmse 0.0009084477 max resid 0.003670822
## ... Similar to previous best
## Run 6 stress 0.07574623
## ... Procrustes: rmse 0.0001612335 max resid 0.0006803311
## ... Similar to previous best
## Run 7 stress 0.0757476
## ... Procrustes: rmse 0.0003785336 max resid 0.001604955
## ... Similar to previous best
## Run 8 stress 0.07574875
## ... Procrustes: rmse 0.0009466463 max resid 0.003899034
## ... Similar to previous best
## Run 9 stress 0.07574522
## ... New best solution
## ... Procrustes: rmse 0.000169709 max resid 0.0007192319
## ... Similar to previous best
## Run 10 stress 0.07574701
## ... Procrustes: rmse 0.0004608046 max resid 0.001970016
## ... Similar to previous best
## Run 11 stress 0.07574589
## ... Procrustes: rmse 0.0002617693 max resid 0.001049401
## ... Similar to previous best
## Run 12 stress 0.07574686
## ... Procrustes: rmse 0.0004425562 max resid 0.001872713
## ... Similar to previous best
## Run 13 stress 0.07574686
## ... Procrustes: rmse 0.0005961313 max resid 0.002427331
## ... Similar to previous best
## Run 14 stress 0.1072316
## Run 15 stress 0.07574723
## ... Procrustes: rmse 0.0004931585 max resid 0.002097565
## ... Similar to previous best
## Run 16 stress 0.07574528
## ... Procrustes: rmse 3.432592e-05 max resid 0.0001441084
## ... Similar to previous best
## Run 17 stress 0.07574784
## ... Procrustes: rmse 0.0005613181 max resid 0.002278743
## ... Similar to previous best
## Run 18 stress 0.0757461
## ... Procrustes: rmse 0.0002956139 max resid 0.001239801
## ... Similar to previous best
## Run 19 stress 0.07574704
## ... Procrustes: rmse 0.00047404 max resid 0.002060818
## ... Similar to previous best
## Run 20 stress 0.07574756
## ... Procrustes: rmse 0.0005407509 max resid 0.002300831
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.03787597
## Run 1 stress 0.03787538
## ... New best solution
## ... Procrustes: rmse 0.0006673933 max resid 0.002754092
## ... Similar to previous best
## Run 2 stress 0.03787556
## ... Procrustes: rmse 0.0006667357 max resid 0.002723906
## ... Similar to previous best
## Run 3 stress 0.03787558
## ... Procrustes: rmse 0.0005957249 max resid 0.0025295
## ... Similar to previous best
## Run 4 stress 0.03787458
## ... New best solution
## ... Procrustes: rmse 0.000535526 max resid 0.002437723
## ... Similar to previous best
## Run 5 stress 0.03787545
## ... Procrustes: rmse 0.0005810287 max resid 0.002548039
## ... Similar to previous best
## Run 6 stress 0.03787411
## ... New best solution
## ... Procrustes: rmse 0.0002102796 max resid 0.0008483166
## ... Similar to previous best
## Run 7 stress 0.03787476
## ... Procrustes: rmse 0.0002562865 max resid 0.001175876
## ... Similar to previous best
## Run 8 stress 0.03787539
## ... Procrustes: rmse 0.0003583391 max resid 0.001585512
## ... Similar to previous best
## Run 9 stress 0.03787614
## ... Procrustes: rmse 0.0004306208 max resid 0.002101444
## ... Similar to previous best
## Run 10 stress 0.03787423
## ... Procrustes: rmse 0.0001240673 max resid 0.0006123107
## ... Similar to previous best
## Run 11 stress 0.03787528
## ... Procrustes: rmse 0.0003501988 max resid 0.001263885
## ... Similar to previous best
## Run 12 stress 0.03787512
## ... Procrustes: rmse 0.0003078171 max resid 0.001562269
## ... Similar to previous best
## Run 13 stress 0.0378758
## ... Procrustes: rmse 0.0003763664 max resid 0.001815497
## ... Similar to previous best
## Run 14 stress 0.03787588
## ... Procrustes: rmse 0.0004361378 max resid 0.002041628
## ... Similar to previous best
## Run 15 stress 0.03787508
## ... Procrustes: rmse 0.0003120289 max resid 0.001500885
## ... Similar to previous best
## Run 16 stress 0.03787457
## ... Procrustes: rmse 0.0002072484 max resid 0.0007516937
## ... Similar to previous best
## Run 17 stress 0.03787483
## ... Procrustes: rmse 0.0002721146 max resid 0.001298597
## ... Similar to previous best
## Run 18 stress 0.03787593
## ... Procrustes: rmse 0.0004352882 max resid 0.001809008
## ... Similar to previous best
## Run 19 stress 0.03787516
## ... Procrustes: rmse 0.0003035088 max resid 0.001458314
## ... Similar to previous best
## Run 20 stress 0.03787535
## ... Procrustes: rmse 0.0003517603 max resid 0.001634192
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.02300081
## Run 1 stress 0.02282012
## ... New best solution
## ... Procrustes: rmse 0.01929814 max resid 0.08539132
## Run 2 stress 0.02538112
## Run 3 stress 0.02281352
## ... New best solution
## ... Procrustes: rmse 0.001053425 max resid 0.00351825
## ... Similar to previous best
## Run 4 stress 0.02300151
## ... Procrustes: rmse 0.01912496 max resid 0.08542431
## Run 5 stress 0.02300174
## ... Procrustes: rmse 0.01928165 max resid 0.08570723
## Run 6 stress 0.02281602
## ... Procrustes: rmse 0.0007199103 max resid 0.002010636
## ... Similar to previous best
## Run 7 stress 0.02281507
## ... Procrustes: rmse 0.001153041 max resid 0.005608316
## ... Similar to previous best
## Run 8 stress 0.02300208
## ... Procrustes: rmse 0.01931861 max resid 0.08582599
## Run 9 stress 0.02281549
## ... Procrustes: rmse 0.0006884681 max resid 0.001684089
## ... Similar to previous best
## Run 10 stress 0.02300178
## ... Procrustes: rmse 0.01928995 max resid 0.08581327
## Run 11 stress 0.02300311
## ... Procrustes: rmse 0.01938989 max resid 0.08580304
## Run 12 stress 0.0230006
## ... Procrustes: rmse 0.01849809 max resid 0.08487098
## Run 13 stress 0.02300217
## ... Procrustes: rmse 0.01932276 max resid 0.08574722
## Run 14 stress 0.0254839
## Run 15 stress 0.0228148
## ... Procrustes: rmse 0.0006209562 max resid 0.00148241
## ... Similar to previous best
## Run 16 stress 0.02281281
## ... New best solution
## ... Procrustes: rmse 0.0001881466 max resid 0.0006541141
## ... Similar to previous best
## Run 17 stress 0.02300218
## ... Procrustes: rmse 0.01944234 max resid 0.08584056
## Run 18 stress 0.02300078
## ... Procrustes: rmse 0.01928634 max resid 0.08590994
## Run 19 stress 0.02300176
## ... Procrustes: rmse 0.0184428 max resid 0.08516195
## Run 20 stress 0.02281543
## ... Procrustes: rmse 0.0005239057 max resid 0.001837024
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.003674216
## Run 1 stress 0.0037131
## ... Procrustes: rmse 0.001578516 max resid 0.007033342
## ... Similar to previous best
## Run 2 stress 0.003692511
## ... Procrustes: rmse 0.001022557 max resid 0.003887192
## ... Similar to previous best
## Run 3 stress 0.003676006
## ... Procrustes: rmse 0.0002836986 max resid 0.001262374
## ... Similar to previous best
## Run 4 stress 0.003737111
## ... Procrustes: rmse 0.002187927 max resid 0.008358599
## ... Similar to previous best
## Run 5 stress 0.003676399
## ... Procrustes: rmse 0.0004905773 max resid 0.002152741
## ... Similar to previous best
## Run 6 stress 0.003716788
## ... Procrustes: rmse 0.001669811 max resid 0.007566777
## ... Similar to previous best
## Run 7 stress 0.003698366
## ... Procrustes: rmse 0.001269048 max resid 0.005758743
## ... Similar to previous best
## Run 8 stress 0.003702936
## ... Procrustes: rmse 0.001349968 max resid 0.006413426
## ... Similar to previous best
## Run 9 stress 0.003710465
## ... Procrustes: rmse 0.001558951 max resid 0.006909082
## ... Similar to previous best
## Run 10 stress 0.003704548
## ... Procrustes: rmse 0.001412867 max resid 0.006480718
## ... Similar to previous best
## Run 11 stress 0.003688376
## ... Procrustes: rmse 0.0008559671 max resid 0.004146068
## ... Similar to previous best
## Run 12 stress 0.003676135
## ... Procrustes: rmse 0.0003934134 max resid 0.001625992
## ... Similar to previous best
## Run 13 stress 0.003685614
## ... Procrustes: rmse 0.0006934641 max resid 0.003332509
## ... Similar to previous best
## Run 14 stress 0.003717566
## ... Procrustes: rmse 0.001653991 max resid 0.007297071
## ... Similar to previous best
## Run 15 stress 0.003679928
## ... Procrustes: rmse 0.0006422716 max resid 0.003103896
## ... Similar to previous best
## Run 16 stress 0.003680689
## ... Procrustes: rmse 0.0006702671 max resid 0.003221675
## ... Similar to previous best
## Run 17 stress 0.003691664
## ... Procrustes: rmse 0.001028387 max resid 0.005027969
## ... Similar to previous best
## Run 18 stress 0.003707817
## ... Procrustes: rmse 0.001492838 max resid 0.006639841
## ... Similar to previous best
## Run 19 stress 0.003700584
## ... Procrustes: rmse 0.000989846 max resid 0.004098477
## ... Similar to previous best
## Run 20 stress 0.003702211
## ... Procrustes: rmse 0.001326036 max resid 0.005962515
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.001410885
## Run 1 stress 0.002245285
## Run 2 stress 0.002952704
## Run 3 stress 0.002328138
## Run 4 stress 0.002162399
## Run 5 stress 0.002860444
## Run 6 stress 0.00221348
## Run 7 stress 0.001848448
## ... Procrustes: rmse 0.005975175 max resid 0.019969
## Run 8 stress 0.002705209
## Run 9 stress 0.002997412
## Run 10 stress 0.002207861
## Run 11 stress 0.002449932
## Run 12 stress 0.002924535
## Run 13 stress 0.002690009
## Run 14 stress 0.001884083
## ... Procrustes: rmse 0.005925612 max resid 0.01329047
## Run 15 stress 0.002630658
## Run 16 stress 0.004486147
## Run 17 stress 0.002355202
## Run 18 stress 0.002497401
## Run 19 stress 0.001733067
## ... Procrustes: rmse 0.004733554 max resid 0.01132779
## Run 20 stress 0.002218073
## *** No convergence -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
## Run 0 stress 0
## Run 1 stress 0.002276995
## Run 2 stress 0.002316248
## Run 3 stress 0.002300638
## Run 4 stress 0.002321039
## Run 5 stress 0.001064433
## Run 6 stress 0.001394388
## Run 7 stress 0.001968515
## Run 8 stress 0.002061072
## Run 9 stress 0.001890866
## Run 10 stress 0.0019193
## Run 11 stress 0.002118596
## Run 12 stress 0.00164058
## Run 13 stress 0.001827669
## Run 14 stress 0.00226947
## Run 15 stress 0.002115454
## Run 16 stress 0.002540222
## Run 17 stress 0.002061425
## Run 18 stress 0.002290973
## Run 19 stress 0.002738252
## Run 20 stress 0.002135311
## *** No convergence -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
## Warning in metaMDS(distance, distance = "euclidean", k = i): Stress is
## (nearly) zero - you may have insufficient data
## Run 0 stress 0
## Run 1 stress 0.00156074
## Run 2 stress 0.002350812
## Run 3 stress 0.001691967
## Run 4 stress 0.001094814
## Run 5 stress 0.002982556
## Run 6 stress 0.001926688
## Run 7 stress 0.001641592
## Run 8 stress 0.001485491
## Run 9 stress 0.002368502
## Run 10 stress 0.002499582
## Run 11 stress 0.001338726
## Run 12 stress 0.002003303
## Run 13 stress 0.001942324
## Run 14 stress 0.001536311
## Run 15 stress 0.001126751
## Run 16 stress 0.003216684
## Run 17 stress 0.003147314
## Run 18 stress 0.001451909
## Run 19 stress 0.004276786
## Run 20 stress 0.001910724
## *** No convergence -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
## Warning in metaMDS(distance, distance = "euclidean", k = i): Stress is
## (nearly) zero - you may have insufficient data
## Run 0 stress 0
## Run 1 stress 0.001342667
## Run 2 stress 0.001402353
## Run 3 stress 0.00161456
## Run 4 stress 0.001455335
## Run 5 stress 0.002098512
## Run 6 stress 0.002058185
## Run 7 stress 0.003136625
## Run 8 stress 0.001966418
## Run 9 stress 0.001857467
## Run 10 stress 0.002292139
## Run 11 stress 0.001518411
## Run 12 stress 0.00243853
## Run 13 stress 0.004224885
## Run 14 stress 0.002221585
## Run 15 stress 0.00220415
## Run 16 stress 0.001816446
## Run 17 stress 0.001508697
## Run 18 stress 0.001168783
## Run 19 stress 0.001909096
## Run 20 stress 0.001593926
## *** No convergence -- monoMDS stopping criteria:
## 20: no. of iterations >= maxit
## Warning in metaMDS(distance, distance = "euclidean", k = i): Stress is
## (nearly) zero - you may have insufficient data
plot(seq(10),stress_vec, type = 'o', ylab = "Stress", xlab = "Number of dimensions",
col="tomato", pch=19)
abline(h=0.2, lty=2)
abline(h=0.05, lty=3)
Now, in order to obtain a good fit, we must have a stress of approximately 5% or equivalently, 0.05. However, so far upto three dimensions, the stress exisitng is equal to 7% approximately. Hence, It would be in our favor to obtain a good fit if we chose four dimensions where our stress reduces to 3% approximately.
print(paste("Stress Values", stress_vec, sep = ": "))
## [1] "Stress Values: 0.302157886672921"
## [2] "Stress Values: 0.159536905752132"
## [3] "Stress Values: 0.0757452233172523"
## [4] "Stress Values: 0.0378741100008223"
## [5] "Stress Values: 0.0228128131336221"
## [6] "Stress Values: 0.00367421620284672"
## [7] "Stress Values: 0.00141088470041126"
## [8] "Stress Values: 0"
## [9] "Stress Values: 0"
## [10] "Stress Values: 0"
pairs(cbind(X,Y), labels = c("MDS 1", "MDS 2", "NM MDS 1", "NM MDS 2"), col="skyblue",
pch=19)
The correlation matrix of the four variables are as follows-
cor(cbind(X,Y))
## [,1] [,2] [,3] [,4]
## [1,] 1.000000e+00 -4.389205e-18 0.92277753 0.28824057
## [2,] -4.389205e-18 1.000000e+00 -0.21554445 -0.06756668
## [3,] 9.227775e-01 -2.155445e-01 1.00000000 0.08206196
## [4,] 2.882406e-01 -6.756668e-02 0.08206196 1.00000000
If we will be using the non-standardized variables, then the distance matrix will not be accurate as different variables will be on different scales and it will result in an absurd distance matrix. Therefore, it is important to use the standarized variables for computing the distance matrix in our case because different variables are based on different scales and have different units.