1

Let us first import the data file of cereals in our R environment. The following achieves this objective and prints the first five observations.

cereals <- read.table("http://www.public.iastate.edu/~maitra/stat501/datasets/cereals.dat", header = F)
names(cereals) <- c("Brand", "Manufacturer", "Calories", "Protein", "Fat", "Sodium", "Fiber", "Carbohydrates","Sugar", "Potassium", "Class")
head(cereals, 5)
##           Brand Manufacturer Calories Protein Fat Sodium Fiber
## 1    ACCheerios            G      110       2   2    180   1.5
## 2      Cheerios            G      110       6   2    290   2.0
## 3    CocoaPuffs            G      110       1   1    180   0.0
## 4  CountChocula            G      110       1   1    180   0.0
## 5 GoldenGrahams            G      110       1   1    280   0.0
##   Carbohydrates Sugar Potassium Class
## 1          10.5    10        70     1
## 2          17.0     1       105     1
## 3          12.0    13        55     1
## 4          12.0    13        65     1
## 5          15.0     9        45     1

2

Now, we will first standardized the variable under consideration according to our problem statement. The results of this operation are printed using the following code.

cereals_standardized <- scale(cereals[,3:10], center = TRUE, scale = TRUE)
head(cereals_standardized)
##       Calories    Protein       Fat       Sodium      Fiber Carbohydrates
## [1,] 0.1103426 -0.3806804 1.2767742 -0.005871679 -0.1189104    -0.8822330
## [2,] 0.1103426  2.8931707 1.2767742  1.382780515  0.1589780     0.6446037
## [3,] 0.1103426 -1.1991431 0.0290176 -0.005871679 -0.9525758    -0.5298861
## [4,] 0.1103426 -1.1991431 0.0290176 -0.005871679 -0.9525758    -0.5298861
## [5,] 0.1103426 -1.1991431 0.0290176  1.256539406 -0.9525758     0.1748078
## [6,] 0.1103426  0.4377824 0.0290176  0.877816080 -0.1189104    -0.6473351
##           Sugar   Potassium
## [1,]  0.5280395 -0.21810133
## [2,] -1.4559536  0.31132205
## [3,]  1.1893705 -0.44499706
## [4,]  1.1893705 -0.29373324
## [5,]  0.3075958 -0.59626088
## [6,]  0.5280395  0.08442632

Now, we will compute the distance matrix of our standardized variables and paste the specimen of first 5 observations from our distance matrix named distance in this report.

distance <- dist(cereals_standardized, method = "euclidean", upper = TRUE, diag = TRUE)
distance <- as.matrix(distance)
head(distance,5)
##          1        2         3         4        5        6        7
## 1 0.000000 4.389923 1.8800970 1.8678873 2.413378 1.776058 3.455752
## 2 4.389923 0.000000 5.5151628 5.4964619 4.869285 3.684407 3.962058
## 3 1.880097 5.515163 0.0000000 0.1512638 1.700201 2.210626 3.327191
## 4 1.867887 5.496462 0.1512638 0.0000000 1.720269 2.179354 3.340917
## 5 2.413378 4.869285 1.7002007 1.7202688 0.000000 2.169286 2.115457
##           8        9       10       11       12       13       14
## 1 1.6192301 2.037995 1.692973 1.681436 3.325714 3.798287 2.771250
## 2 4.8239397 3.849793 3.765158 3.868147 4.123082 4.992484 3.117500
## 3 0.8476301 2.365674 2.754506 3.113207 3.186540 4.206123 3.497958
## 4 0.8610212 2.336478 2.704207 3.053845 3.204440 4.112606 3.465098
## 5 1.7945462 1.912533 3.071040 3.577630 2.334208 4.523253 3.086864
##          15       16       17       18       19       20       21       22
## 1 2.0443430 2.888244 1.893207 6.577956 2.818565 4.240844 3.062161 2.591681
## 2 5.5993313 3.108638 4.095663 6.390159 5.751471 4.425505 6.170433 4.189560
## 3 0.7514574 3.568236 1.773684 7.554684 1.806802 3.896519 1.881682 4.312430
## 4 0.8513833 3.536029 1.773684 7.475524 1.844401 3.911172 1.929708 4.259042
## 5 1.9680463 3.113561 1.467644 7.561212 2.917822 2.686999 2.901732 4.644644
##         23       24       25       26       27       28       29       30
## 1 3.923052 1.712732 2.868150 3.788553 3.962087 2.186708 3.616425 1.979298
## 2 4.516607 5.214771 5.603114 5.338347 4.872641 4.069030 4.909135 4.436180
## 3 3.501791 1.288037 1.597044 3.858931 4.153263 2.181978 4.243838 1.513060
## 4 3.521339 1.340270 1.646424 3.835141 4.081017 2.181978 4.189576 1.543008
## 5 2.626153 2.553318 1.796336 4.553077 4.218383 1.901755 4.547773 1.498137
##         31       32       33       34       35       36       37       38
## 1 3.386087 3.876270 4.228492 3.730262 4.337000 2.259963 4.639299 1.580069
## 2 3.520707 4.015880 3.808077 4.712787 4.569048 5.858332 2.816341 5.236607
## 3 4.156754 4.036184 4.044825 4.350095 3.824121 1.910388 4.975300 1.493297
## 4 4.118043 4.019142 4.053302 4.254365 3.839050 1.934193 4.977599 1.531124
## 5 3.731677 3.660273 2.864802 4.544612 2.592047 3.427599 4.564946 1.834499
##         39       40       41       42       43
## 1 1.263239 2.054943 5.353324 5.141861 4.604046
## 2 5.017712 2.949282 7.099531 6.641340 5.350988
## 3 1.617395 3.462659 5.035857 5.113100 5.859570
## 4 1.638478 3.439454 5.056262 5.117573 5.840013
## 5 1.846190 3.646725 5.331019 5.509723 6.495437

3

Now we will perform the multidimensional scaling of our distance matrix using the cmdscale command and store the results in our variable named as MDS.

n = nrow(distance)
MDS <- cmdscale(distance, k= 2, eig = TRUE, x.ret = TRUE)
X <- MDS$points[,1:2]

Now, in order to make prettier plots, we have used the ggplot2. In the follwoing plot of two dimensional solution of multidimensional scaling, we have used numbers to specify the observation and colour to specify the manufacturer. The tomato colour corresponds to G, green to K and skybblue to Q where G,K, Q are respective manufacturers.

library(ggplot2)
g <- ggplot(data.frame(X, cereals), 
            aes(X[,2], X[,1], label = rownames(cereals), group=Manufacturer))

g + geom_text(aes(color = Manufacturer) , 
              angle=0, show.legend = T, nudge_x = 0.05, nudge_y = 0.15) + 
  ggtitle("Multi-dimensional Scaling on Cereals Data") +
  geom_point() + theme_classic() + 
  theme(legend.position = "bottom", legend.title = 
          element_text(size=10, face="bold"), 
        legend.text = element_text(size=10, face="bold"))

# 4 According to the two dimensional solution of the multidimensional scaling, the cereals 14 and 16 seems to be most similar followed by the cereals 3 and 4.

5

According to the two dimensional solution of the multidimensional scaling problem, the cereals 41 and 18 are most distinct as confirmed by distance matrix.

6

It is possible to obtain the exact representation of our original distance matrix. In such case, we have to set the number of dimension equal to the rank of our matrix.

7

The follwoing code reports the eigen values and computes the goodness of fit of the solution

ev <- MDS$eig
gof <- MDS$GOF
print(round(ev,digits=4))
##  [1] 106.9979  77.9000  74.2909  36.4679  20.8866  15.0120   2.5411
##  [8]   1.9036   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
## [15]   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
## [22]   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
## [29]   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
## [36]   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000   0.0000
## [43]   0.0000
print(gof)
## [1] 0.5502914 0.5502914

We can also report the goodness of fit directly using the output of cmdscale as follows-

print(cmdscale(distance, 2, eig=TRUE)$GOF)
## [1] 0.5502914 0.5502914

or we can compute the goodness of fit manually using the two different criterion as follows-

print( (ev[1]+ev[2])/sum(abs(ev)) )
## [1] 0.5502914
print( (ev[1]+ev[2])/sum(ev[ev>0]))
## [1] 0.5502914

8. (1p) Are there any zero eigenvalues? Can you explain these?

Yes. There are almost always few number of nonzero eigenvalues accompanied by a number of zero eigenvlaues (There will always be one eigenvalue equal to zero in practice). The number of non-zero eigenvalues is equal to p.

If the coordinates of n points in p dimensions be denoted by \(X_i\), i = 1, … ,n. These can be collected together in a n x p matrix X . When we perform the eigendecomposition of the matrix B, if p < n then there are n-p zero eigenvalues. In fact if the points are not in “general position” the number of zero eigenvalues will be greater than n - p. For a detailed reference, we have used this source

9.

The following code computes the fitted distances.

Also, In the following plot, we have plotted the fitted distances against the observed distances. The red line correpsonds to the the regression line. We can see that the regression line goes almost linearly with respect to the fitted and observed distances. This proves that there is a strong correlation between the observed and fitted distances which was our initial objective. Since the line is almost at 45 degree, we can say that correlation is strong and hence, visuall, it gives a good goodness of fit.

fitted <- as.matrix(dist(X, method = "euclidean"))
fitted <- as.vector(fitted)
observed <- as.vector(distance)
reg <- lm(fitted~observed)
plot(observed, fitted,pch=19, cex=0.4)
abline(lm(fitted~observed), col="red")

print(paste("Coefficient of determination:", summary(reg)$r.squared))
## [1] "Coefficient of determination: 0.770427902994939"

The coefficient of determination has been estimated as 0.7704

10.

(1p) Try now non-metric MDS with the isoMDS program. Plot the two-dimensional solution, labelling the points again with the name or number of the brand, and using different symbols for different manufacturers.

library(MASS)
n <- nrow(distance)
init <- scale(matrix(runif(n*2),ncol=2),scale=FALSE)
nmmds.out <- isoMDS(distance, init, k=2, maxit = 100)
## initial  value 43.229760 
## iter   5 value 35.927515
## iter  10 value 28.171844
## iter  15 value 24.298783
## iter  20 value 19.978812
## iter  25 value 17.719199
## iter  30 value 17.113076
## iter  35 value 16.933540
## iter  35 value 16.920982
## iter  35 value 16.910725
## final  value 16.910725 
## converged
Y <- nmmds.out$points
g <- ggplot(data.frame(Y, cereals), aes(Y[,2], Y[,1], label = rownames(cereals)))
g + geom_text(aes(color = Manufacturer) , angle=0, show.legend = F, nudge_x = -0.40,
              nudge_y = -0.90) + 
  ggtitle("Non-Mteric Multi-dimensional Scaling on Cereals Data") + 
  geom_point(aes(shape = Manufacturer, color = Manufacturer)) + theme_classic() + ggtitle("Non-Metric Multi-dimensional Scaling on Cereals Data")

11

We cn observe that the pair of cereals 3 and 4 are most similar according to the two dimensional solution of non-metric MDS.

library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
nmmds <- metaMDS(comm = distance, distance = "euclidean", k=2)
## Run 0 stress 0.1702258 
## Run 1 stress 0.1595462 
## ... New best solution
## ... Procrustes: rmse 0.1142226  max resid 0.4359294 
## Run 2 stress 0.1676239 
## Run 3 stress 0.1676171 
## Run 4 stress 0.1733476 
## Run 5 stress 0.1595408 
## ... New best solution
## ... Procrustes: rmse 0.009497985  max resid 0.0508101 
## Run 6 stress 0.1743246 
## Run 7 stress 0.1946937 
## Run 8 stress 0.1702232 
## Run 9 stress 0.159544 
## ... Procrustes: rmse 0.004115569  max resid 0.0152421 
## Run 10 stress 0.1935051 
## Run 11 stress 0.1702223 
## Run 12 stress 0.172109 
## Run 13 stress 0.1861297 
## Run 14 stress 0.1702252 
## Run 15 stress 0.2065886 
## Run 16 stress 0.186326 
## Run 17 stress 0.1676229 
## Run 18 stress 0.1702275 
## Run 19 stress 0.1927414 
## Run 20 stress 0.1725534 
## *** No convergence -- monoMDS stopping criteria:
##     20: stress ratio > sratmax
gof <- goodness(nmmds)

plot(nmmds, display = "sites", type = "n", ylab=c(-4,4))
points(nmmds, display = "sites", cex = 2*gof/mean(gof))

Extra (Ignore)

Now, in order to graphically examine the stress in case of non-metric multidimensional scaling, let us consider the following stressplot-

stressplot(nmmds, pch = 19, cex=0.75, l.col = "tomato", p.col = "skyblue")

12

fitted <- as.vector(as.matrix(dist(Y, method = "euclidean")))
observed <- as.vector(as.matrix(distance))
reg <- lm(fitted~observed)
plot(observed, fitted,pch=19, cex=0.4)
abline(lm(fitted~observed), col="red")

print(paste("Coefficient of determination:", summary(reg)$r.squared))
## [1] "Coefficient of determination: 0.862919076728646"

We can observe from the plot that the fitted distances are in almost perfectly linearly related with observed distances.

13. Now, we will be computing the stress for the dimensions 1, 2, 3, 4, 5, …. and explain how many dimensions are required to obtain a good fit.

stress_vec <- numeric(10)
for(i in seq(10)){
  stress_vec[i] <- metaMDS(distance, distance = "euclidean", k=i)$stress
}
## Run 0 stress 0.3060661 
## Run 1 stress 0.3737939 
## Run 2 stress 0.383095 
## Run 3 stress 0.3789417 
## Run 4 stress 0.3849806 
## Run 5 stress 0.3732161 
## Run 6 stress 0.4158972 
## Run 7 stress 0.3021579 
## ... New best solution
## ... Procrustes: rmse 0.05777802  max resid 0.2750343 
## Run 8 stress 0.4050292 
## Run 9 stress 0.5636962 
## Run 10 stress 0.4148599 
## Run 11 stress 0.3252705 
## Run 12 stress 0.4131135 
## Run 13 stress 0.5618635 
## Run 14 stress 0.3550077 
## Run 15 stress 0.4120756 
## Run 16 stress 0.3274244 
## Run 17 stress 0.3968979 
## Run 18 stress 0.3820266 
## Run 19 stress 0.3359674 
## Run 20 stress 0.3994779 
## *** No convergence -- monoMDS stopping criteria:
##      2: stress ratio > sratmax
##     18: scale factor of the gradient < sfgrmin
## Run 0 stress 0.1702258 
## Run 1 stress 0.1725568 
## Run 2 stress 0.1595438 
## ... New best solution
## ... Procrustes: rmse 0.1163146  max resid 0.4246312 
## Run 3 stress 0.1595493 
## ... Procrustes: rmse 0.0007063866  max resid 0.003491665 
## ... Similar to previous best
## Run 4 stress 0.1733467 
## Run 5 stress 0.1870897 
## Run 6 stress 0.1702242 
## Run 7 stress 0.1743268 
## Run 8 stress 0.1702223 
## Run 9 stress 0.1595416 
## ... New best solution
## ... Procrustes: rmse 0.0004225535  max resid 0.001814049 
## ... Similar to previous best
## Run 10 stress 0.1595369 
## ... New best solution
## ... Procrustes: rmse 0.002250948  max resid 0.009903102 
## ... Similar to previous best
## Run 11 stress 0.1595787 
## ... Procrustes: rmse 0.008070538  max resid 0.04337308 
## Run 12 stress 0.1721106 
## Run 13 stress 0.1595422 
## ... Procrustes: rmse 0.002162376  max resid 0.009690388 
## ... Similar to previous best
## Run 14 stress 0.1595387 
## ... Procrustes: rmse 0.001806793  max resid 0.006983696 
## ... Similar to previous best
## Run 15 stress 0.1702259 
## Run 16 stress 0.1595418 
## ... Procrustes: rmse 0.002256913  max resid 0.009928014 
## ... Similar to previous best
## Run 17 stress 0.1733518 
## Run 18 stress 0.1733467 
## Run 19 stress 0.1595454 
## ... Procrustes: rmse 0.002668942  max resid 0.01006312 
## Run 20 stress 0.1702242 
## *** Solution reached
## Run 0 stress 0.07574604 
## Run 1 stress 0.07574723 
## ... Procrustes: rmse 0.0007795211  max resid 0.003419078 
## ... Similar to previous best
## Run 2 stress 0.07574644 
## ... Procrustes: rmse 0.0008205733  max resid 0.003585969 
## ... Similar to previous best
## Run 3 stress 0.07574557 
## ... New best solution
## ... Procrustes: rmse 0.0006125671  max resid 0.002712198 
## ... Similar to previous best
## Run 4 stress 0.07574625 
## ... Procrustes: rmse 0.0001345882  max resid 0.0005504402 
## ... Similar to previous best
## Run 5 stress 0.07574793 
## ... Procrustes: rmse 0.0009084477  max resid 0.003670822 
## ... Similar to previous best
## Run 6 stress 0.07574623 
## ... Procrustes: rmse 0.0001612335  max resid 0.0006803311 
## ... Similar to previous best
## Run 7 stress 0.0757476 
## ... Procrustes: rmse 0.0003785336  max resid 0.001604955 
## ... Similar to previous best
## Run 8 stress 0.07574875 
## ... Procrustes: rmse 0.0009466463  max resid 0.003899034 
## ... Similar to previous best
## Run 9 stress 0.07574522 
## ... New best solution
## ... Procrustes: rmse 0.000169709  max resid 0.0007192319 
## ... Similar to previous best
## Run 10 stress 0.07574701 
## ... Procrustes: rmse 0.0004608046  max resid 0.001970016 
## ... Similar to previous best
## Run 11 stress 0.07574589 
## ... Procrustes: rmse 0.0002617693  max resid 0.001049401 
## ... Similar to previous best
## Run 12 stress 0.07574686 
## ... Procrustes: rmse 0.0004425562  max resid 0.001872713 
## ... Similar to previous best
## Run 13 stress 0.07574686 
## ... Procrustes: rmse 0.0005961313  max resid 0.002427331 
## ... Similar to previous best
## Run 14 stress 0.1072316 
## Run 15 stress 0.07574723 
## ... Procrustes: rmse 0.0004931585  max resid 0.002097565 
## ... Similar to previous best
## Run 16 stress 0.07574528 
## ... Procrustes: rmse 3.432592e-05  max resid 0.0001441084 
## ... Similar to previous best
## Run 17 stress 0.07574784 
## ... Procrustes: rmse 0.0005613181  max resid 0.002278743 
## ... Similar to previous best
## Run 18 stress 0.0757461 
## ... Procrustes: rmse 0.0002956139  max resid 0.001239801 
## ... Similar to previous best
## Run 19 stress 0.07574704 
## ... Procrustes: rmse 0.00047404  max resid 0.002060818 
## ... Similar to previous best
## Run 20 stress 0.07574756 
## ... Procrustes: rmse 0.0005407509  max resid 0.002300831 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.03787597 
## Run 1 stress 0.03787538 
## ... New best solution
## ... Procrustes: rmse 0.0006673933  max resid 0.002754092 
## ... Similar to previous best
## Run 2 stress 0.03787556 
## ... Procrustes: rmse 0.0006667357  max resid 0.002723906 
## ... Similar to previous best
## Run 3 stress 0.03787558 
## ... Procrustes: rmse 0.0005957249  max resid 0.0025295 
## ... Similar to previous best
## Run 4 stress 0.03787458 
## ... New best solution
## ... Procrustes: rmse 0.000535526  max resid 0.002437723 
## ... Similar to previous best
## Run 5 stress 0.03787545 
## ... Procrustes: rmse 0.0005810287  max resid 0.002548039 
## ... Similar to previous best
## Run 6 stress 0.03787411 
## ... New best solution
## ... Procrustes: rmse 0.0002102796  max resid 0.0008483166 
## ... Similar to previous best
## Run 7 stress 0.03787476 
## ... Procrustes: rmse 0.0002562865  max resid 0.001175876 
## ... Similar to previous best
## Run 8 stress 0.03787539 
## ... Procrustes: rmse 0.0003583391  max resid 0.001585512 
## ... Similar to previous best
## Run 9 stress 0.03787614 
## ... Procrustes: rmse 0.0004306208  max resid 0.002101444 
## ... Similar to previous best
## Run 10 stress 0.03787423 
## ... Procrustes: rmse 0.0001240673  max resid 0.0006123107 
## ... Similar to previous best
## Run 11 stress 0.03787528 
## ... Procrustes: rmse 0.0003501988  max resid 0.001263885 
## ... Similar to previous best
## Run 12 stress 0.03787512 
## ... Procrustes: rmse 0.0003078171  max resid 0.001562269 
## ... Similar to previous best
## Run 13 stress 0.0378758 
## ... Procrustes: rmse 0.0003763664  max resid 0.001815497 
## ... Similar to previous best
## Run 14 stress 0.03787588 
## ... Procrustes: rmse 0.0004361378  max resid 0.002041628 
## ... Similar to previous best
## Run 15 stress 0.03787508 
## ... Procrustes: rmse 0.0003120289  max resid 0.001500885 
## ... Similar to previous best
## Run 16 stress 0.03787457 
## ... Procrustes: rmse 0.0002072484  max resid 0.0007516937 
## ... Similar to previous best
## Run 17 stress 0.03787483 
## ... Procrustes: rmse 0.0002721146  max resid 0.001298597 
## ... Similar to previous best
## Run 18 stress 0.03787593 
## ... Procrustes: rmse 0.0004352882  max resid 0.001809008 
## ... Similar to previous best
## Run 19 stress 0.03787516 
## ... Procrustes: rmse 0.0003035088  max resid 0.001458314 
## ... Similar to previous best
## Run 20 stress 0.03787535 
## ... Procrustes: rmse 0.0003517603  max resid 0.001634192 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.02300081 
## Run 1 stress 0.02282012 
## ... New best solution
## ... Procrustes: rmse 0.01929814  max resid 0.08539132 
## Run 2 stress 0.02538112 
## Run 3 stress 0.02281352 
## ... New best solution
## ... Procrustes: rmse 0.001053425  max resid 0.00351825 
## ... Similar to previous best
## Run 4 stress 0.02300151 
## ... Procrustes: rmse 0.01912496  max resid 0.08542431 
## Run 5 stress 0.02300174 
## ... Procrustes: rmse 0.01928165  max resid 0.08570723 
## Run 6 stress 0.02281602 
## ... Procrustes: rmse 0.0007199103  max resid 0.002010636 
## ... Similar to previous best
## Run 7 stress 0.02281507 
## ... Procrustes: rmse 0.001153041  max resid 0.005608316 
## ... Similar to previous best
## Run 8 stress 0.02300208 
## ... Procrustes: rmse 0.01931861  max resid 0.08582599 
## Run 9 stress 0.02281549 
## ... Procrustes: rmse 0.0006884681  max resid 0.001684089 
## ... Similar to previous best
## Run 10 stress 0.02300178 
## ... Procrustes: rmse 0.01928995  max resid 0.08581327 
## Run 11 stress 0.02300311 
## ... Procrustes: rmse 0.01938989  max resid 0.08580304 
## Run 12 stress 0.0230006 
## ... Procrustes: rmse 0.01849809  max resid 0.08487098 
## Run 13 stress 0.02300217 
## ... Procrustes: rmse 0.01932276  max resid 0.08574722 
## Run 14 stress 0.0254839 
## Run 15 stress 0.0228148 
## ... Procrustes: rmse 0.0006209562  max resid 0.00148241 
## ... Similar to previous best
## Run 16 stress 0.02281281 
## ... New best solution
## ... Procrustes: rmse 0.0001881466  max resid 0.0006541141 
## ... Similar to previous best
## Run 17 stress 0.02300218 
## ... Procrustes: rmse 0.01944234  max resid 0.08584056 
## Run 18 stress 0.02300078 
## ... Procrustes: rmse 0.01928634  max resid 0.08590994 
## Run 19 stress 0.02300176 
## ... Procrustes: rmse 0.0184428  max resid 0.08516195 
## Run 20 stress 0.02281543 
## ... Procrustes: rmse 0.0005239057  max resid 0.001837024 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.003674216 
## Run 1 stress 0.0037131 
## ... Procrustes: rmse 0.001578516  max resid 0.007033342 
## ... Similar to previous best
## Run 2 stress 0.003692511 
## ... Procrustes: rmse 0.001022557  max resid 0.003887192 
## ... Similar to previous best
## Run 3 stress 0.003676006 
## ... Procrustes: rmse 0.0002836986  max resid 0.001262374 
## ... Similar to previous best
## Run 4 stress 0.003737111 
## ... Procrustes: rmse 0.002187927  max resid 0.008358599 
## ... Similar to previous best
## Run 5 stress 0.003676399 
## ... Procrustes: rmse 0.0004905773  max resid 0.002152741 
## ... Similar to previous best
## Run 6 stress 0.003716788 
## ... Procrustes: rmse 0.001669811  max resid 0.007566777 
## ... Similar to previous best
## Run 7 stress 0.003698366 
## ... Procrustes: rmse 0.001269048  max resid 0.005758743 
## ... Similar to previous best
## Run 8 stress 0.003702936 
## ... Procrustes: rmse 0.001349968  max resid 0.006413426 
## ... Similar to previous best
## Run 9 stress 0.003710465 
## ... Procrustes: rmse 0.001558951  max resid 0.006909082 
## ... Similar to previous best
## Run 10 stress 0.003704548 
## ... Procrustes: rmse 0.001412867  max resid 0.006480718 
## ... Similar to previous best
## Run 11 stress 0.003688376 
## ... Procrustes: rmse 0.0008559671  max resid 0.004146068 
## ... Similar to previous best
## Run 12 stress 0.003676135 
## ... Procrustes: rmse 0.0003934134  max resid 0.001625992 
## ... Similar to previous best
## Run 13 stress 0.003685614 
## ... Procrustes: rmse 0.0006934641  max resid 0.003332509 
## ... Similar to previous best
## Run 14 stress 0.003717566 
## ... Procrustes: rmse 0.001653991  max resid 0.007297071 
## ... Similar to previous best
## Run 15 stress 0.003679928 
## ... Procrustes: rmse 0.0006422716  max resid 0.003103896 
## ... Similar to previous best
## Run 16 stress 0.003680689 
## ... Procrustes: rmse 0.0006702671  max resid 0.003221675 
## ... Similar to previous best
## Run 17 stress 0.003691664 
## ... Procrustes: rmse 0.001028387  max resid 0.005027969 
## ... Similar to previous best
## Run 18 stress 0.003707817 
## ... Procrustes: rmse 0.001492838  max resid 0.006639841 
## ... Similar to previous best
## Run 19 stress 0.003700584 
## ... Procrustes: rmse 0.000989846  max resid 0.004098477 
## ... Similar to previous best
## Run 20 stress 0.003702211 
## ... Procrustes: rmse 0.001326036  max resid 0.005962515 
## ... Similar to previous best
## *** Solution reached
## Run 0 stress 0.001410885 
## Run 1 stress 0.002245285 
## Run 2 stress 0.002952704 
## Run 3 stress 0.002328138 
## Run 4 stress 0.002162399 
## Run 5 stress 0.002860444 
## Run 6 stress 0.00221348 
## Run 7 stress 0.001848448 
## ... Procrustes: rmse 0.005975175  max resid 0.019969 
## Run 8 stress 0.002705209 
## Run 9 stress 0.002997412 
## Run 10 stress 0.002207861 
## Run 11 stress 0.002449932 
## Run 12 stress 0.002924535 
## Run 13 stress 0.002690009 
## Run 14 stress 0.001884083 
## ... Procrustes: rmse 0.005925612  max resid 0.01329047 
## Run 15 stress 0.002630658 
## Run 16 stress 0.004486147 
## Run 17 stress 0.002355202 
## Run 18 stress 0.002497401 
## Run 19 stress 0.001733067 
## ... Procrustes: rmse 0.004733554  max resid 0.01132779 
## Run 20 stress 0.002218073 
## *** No convergence -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
## Run 0 stress 0 
## Run 1 stress 0.002276995 
## Run 2 stress 0.002316248 
## Run 3 stress 0.002300638 
## Run 4 stress 0.002321039 
## Run 5 stress 0.001064433 
## Run 6 stress 0.001394388 
## Run 7 stress 0.001968515 
## Run 8 stress 0.002061072 
## Run 9 stress 0.001890866 
## Run 10 stress 0.0019193 
## Run 11 stress 0.002118596 
## Run 12 stress 0.00164058 
## Run 13 stress 0.001827669 
## Run 14 stress 0.00226947 
## Run 15 stress 0.002115454 
## Run 16 stress 0.002540222 
## Run 17 stress 0.002061425 
## Run 18 stress 0.002290973 
## Run 19 stress 0.002738252 
## Run 20 stress 0.002135311 
## *** No convergence -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
## Warning in metaMDS(distance, distance = "euclidean", k = i): Stress is
## (nearly) zero - you may have insufficient data
## Run 0 stress 0 
## Run 1 stress 0.00156074 
## Run 2 stress 0.002350812 
## Run 3 stress 0.001691967 
## Run 4 stress 0.001094814 
## Run 5 stress 0.002982556 
## Run 6 stress 0.001926688 
## Run 7 stress 0.001641592 
## Run 8 stress 0.001485491 
## Run 9 stress 0.002368502 
## Run 10 stress 0.002499582 
## Run 11 stress 0.001338726 
## Run 12 stress 0.002003303 
## Run 13 stress 0.001942324 
## Run 14 stress 0.001536311 
## Run 15 stress 0.001126751 
## Run 16 stress 0.003216684 
## Run 17 stress 0.003147314 
## Run 18 stress 0.001451909 
## Run 19 stress 0.004276786 
## Run 20 stress 0.001910724 
## *** No convergence -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
## Warning in metaMDS(distance, distance = "euclidean", k = i): Stress is
## (nearly) zero - you may have insufficient data
## Run 0 stress 0 
## Run 1 stress 0.001342667 
## Run 2 stress 0.001402353 
## Run 3 stress 0.00161456 
## Run 4 stress 0.001455335 
## Run 5 stress 0.002098512 
## Run 6 stress 0.002058185 
## Run 7 stress 0.003136625 
## Run 8 stress 0.001966418 
## Run 9 stress 0.001857467 
## Run 10 stress 0.002292139 
## Run 11 stress 0.001518411 
## Run 12 stress 0.00243853 
## Run 13 stress 0.004224885 
## Run 14 stress 0.002221585 
## Run 15 stress 0.00220415 
## Run 16 stress 0.001816446 
## Run 17 stress 0.001508697 
## Run 18 stress 0.001168783 
## Run 19 stress 0.001909096 
## Run 20 stress 0.001593926 
## *** No convergence -- monoMDS stopping criteria:
##     20: no. of iterations >= maxit
## Warning in metaMDS(distance, distance = "euclidean", k = i): Stress is
## (nearly) zero - you may have insufficient data
plot(seq(10),stress_vec, type = 'o', ylab = "Stress", xlab = "Number of dimensions",
     col="tomato", pch=19)
abline(h=0.2, lty=2)
abline(h=0.05, lty=3)

Now, in order to obtain a good fit, we must have a stress of approximately 5% or equivalently, 0.05. However, so far upto three dimensions, the stress exisitng is equal to 7% approximately. Hence, It would be in our favor to obtain a good fit if we chose four dimensions where our stress reduces to 3% approximately.

print(paste("Stress Values", stress_vec, sep = ": "))
##  [1] "Stress Values: 0.302157886672921"  
##  [2] "Stress Values: 0.159536905752132"  
##  [3] "Stress Values: 0.0757452233172523" 
##  [4] "Stress Values: 0.0378741100008223" 
##  [5] "Stress Values: 0.0228128131336221" 
##  [6] "Stress Values: 0.00367421620284672"
##  [7] "Stress Values: 0.00141088470041126"
##  [8] "Stress Values: 0"                  
##  [9] "Stress Values: 0"                  
## [10] "Stress Values: 0"

14.

pairs(cbind(X,Y), labels = c("MDS 1", "MDS 2", "NM MDS 1", "NM MDS 2"), col="skyblue",
      pch=19)

The correlation matrix of the four variables are as follows-

cor(cbind(X,Y))
##               [,1]          [,2]        [,3]        [,4]
## [1,]  1.000000e+00 -4.389205e-18  0.92277753  0.28824057
## [2,] -4.389205e-18  1.000000e+00 -0.21554445 -0.06756668
## [3,]  9.227775e-01 -2.155445e-01  1.00000000  0.08206196
## [4,]  2.882406e-01 -6.756668e-02  0.08206196  1.00000000

15

If we will be using the non-standardized variables, then the distance matrix will not be accurate as different variables will be on different scales and it will result in an absurd distance matrix. Therefore, it is important to use the standarized variables for computing the distance matrix in our case because different variables are based on different scales and have different units.