Assignment-05

Author

Hasantha

Multidimensional scaling (MDS)

Multidimensional scaling (MDS) is a visual representation of distances or dissimilarities between sets of objects. “Objects” can be colors, faces, map coordinates, political persuasion, or any kind of a categorical variable.

Two general methods exist for solving the MDS problem. The first is called Metric, or `Classical, Multidimensional Scaling (MMDS)` since it tries to reproduce the original metric or distances.

The proximity 𝛿𝑖𝑗 between observation vectors 𝒚𝑖 and 𝒚𝑗 is given by

𝛿𝑖𝑗 = [(𝒚𝑖 − 𝒚𝑗)′(𝒚𝑖 − 𝒚𝑗)]1/2

which indicates the distance between the vectors 𝒚𝑖 and 𝒚𝑗. If the observation vectors are available, we can calculate these distances. The process of reduction to a lower dimensional geometric representation is called metric multidimensional scaling (MMDS).

The second method, called `Non-Metric Multidimensional Scaling (NMMDS)`, assumes that only the ranks of the distances are known. Hence, the NMMDS method produces a graph which tries to reproduce these ranks. This method is also known as `ordinal MDS` and is suitable for qualitative data. If the original distances are only similarities based on judgment, the process is called non metric multidimensional scaling (NMMDS), and the final spatial representation preserves only the rank order among the similarities.

Part(a)

Description of the data

The data set shows the airline distances between Ten U.S. Cities.

City # City Name
1 Atlanta
2 Chicago
3 Denver
4 Houston
5 Los Angeles
6 Miami
7 New York
8 San Francisco
9 Seattle
10 Washington D.C
Air_line_data_2 <- matrix(ncol=10,nrow=10)

colnames(Air_line_data_2) <- c("Atlanta","Chicago","Denver","Houston","Los_Angeles","Miami","New_York","San_Francisco","Seattle","Washington_D.C")

rownames(Air_line_data_2) <- c("Atlanta","Chicago","Denver","Houston","Los_Angeles","Miami","New_York","San_Francisco","Seattle","Washington_D.C")

Air_line_data_2[lower.tri(Air_line_data_2)] <- c(587,1212,701,1936,604,748,2139,2182,543,920,940,1745,1188,713,1858,1737,597,879,831,1726,1631,949,1021,1494,1374,968,1420,1645,1891,1220,2339,2451,347,959,2300,1092,2594,2734,923,2571,2408,205,678,2442,2329)

diag(Air_line_data_2) <- 0
Air_line_data_2 <- as.dist(Air_line_data_2, diag = TRUE)
class(Air_line_data_2)
[1] "dist"
Air_line_data_2
               Atlanta Chicago Denver Houston Los_Angeles Miami New_York
Atlanta              0                                                  
Chicago            587       0                                          
Denver            1212     920      0                                   
Houston            701     940    879       0                           
Los_Angeles       1936    1745    831    1374           0               
Miami              604    1188   1726     968        2339     0         
New_York           748     713   1631    1420        2451  1092        0
San_Francisco     2139    1858    949    1645         347  2594     2571
Seattle           2182    1737   1021    1891         959  2734     2408
Washington_D.C     543     597   1494    1220        2300   923      205
               San_Francisco Seattle Washington_D.C
Atlanta                                            
Chicago                                            
Denver                                             
Houston                                            
Los_Angeles                                        
Miami                                              
New_York                                           
San_Francisco              0                       
Seattle                  678       0               
Washington_D.C          2442    2329              0
Air_line_data_2_mat = as.matrix(Air_line_data_2)

colnames(Air_line_data_2_mat) <- c("Atlanta","Chicago","Denver","Houston","Los_Angeles","Miami","New_York","San_Francisco","Seattle","Washington_D.C")

rownames(Air_line_data_2_mat) <- c("Atlanta","Chicago","Denver","Houston","Los_Angeles","Miami","New_York","San_Francisco","Seattle","Washington_D.C")

Air_line_data_2_mat  
               Atlanta Chicago Denver Houston Los_Angeles Miami New_York
Atlanta              0     587   1212     701        1936   604      748
Chicago            587       0    920     940        1745  1188      713
Denver            1212     920      0     879         831  1726     1631
Houston            701     940    879       0        1374   968     1420
Los_Angeles       1936    1745    831    1374           0  2339     2451
Miami              604    1188   1726     968        2339     0     1092
New_York           748     713   1631    1420        2451  1092        0
San_Francisco     2139    1858    949    1645         347  2594     2571
Seattle           2182    1737   1021    1891         959  2734     2408
Washington_D.C     543     597   1494    1220        2300   923      205
               San_Francisco Seattle Washington_D.C
Atlanta                 2139    2182            543
Chicago                 1858    1737            597
Denver                   949    1021           1494
Houston                 1645    1891           1220
Los_Angeles              347     959           2300
Miami                   2594    2734            923
New_York                2571    2408            205
San_Francisco              0     678           2442
Seattle                  678       0           2329
Washington_D.C          2442    2329              0
MMDS_1 = cmdscale(Air_line_data_2, k = 2)
plot(MMDS_1[,1], MMDS_1[,2], type = "n", xlab = "", ylab = "", axes = FALSE,
     main = "cmdscale (stats)")
text(MMDS_1[,1], MMDS_1[,2],labels(Air_line_data_2), cex = 0.9, xpd = TRUE)

Resulted plot indicate that Washington and New York are closet cities. Secondly we can see that Los Angeles and San Francisco are close to each other. On the other hand Miami and Seattle are the farthest cities to each other. Also San Francisco is located far away from Miami, New York and Washington.

Part(b)

Description of the data

The data matrix used in this analysis represents the sum of the dissimilarities between the World War II politicians.

(“Hitler, Mussolini, Churchill, Eisenhower, Stalin, Attlee, Franco, De_Gaulle, Mao_Tse, Truman Chamberlain, Tito”)

library(MASS)
World_war_Politicians_data_1 <- matrix(ncol=12,nrow=12)

colnames(World_war_Politicians_data_1) <- c("Hitler","Mussolini","Churchill","Eisenhower","Stalin","Attlee","Franco","De_Gaulle","Mao_Tse","Truman","Chamberlain","Tiro")

rownames(World_war_Politicians_data_1) <- c("Hitler","Mussolini","Churchill","Eisenhower","Stalin","Attlee","Franco","De_Gaulle","Mao_Tse","Truman","Chamberlain","Tiro")

class(World_war_Politicians_data_1)
[1] "matrix" "array" 
World_war_Politicians_data_1[lower.tri(World_war_Politicians_data_1)] <- c(5,11,15,8,17,5,10,16,17,12,16,14,16,13,18,3,11,18,18,14,17,7,11,11,12,5,16,8,10,8,16,16,14,8,17,6,7,12,15,13,11,12,14,16,12,16,12,16,12,9,13,9,17,16,10,12,13,9,11,7,12,17,10,9,11,15)
diag(World_war_Politicians_data_1) <- 0
World_war_Politicians_data_1 <- as.dist(World_war_Politicians_data_1, diag = TRUE)


NMMDS_1 = isoMDS(World_war_Politicians_data_1, k = 2)
initial  value 18.887607 
iter   5 value 14.915153
iter  10 value 12.972441
final  value 12.927660 
converged
plot(NMMDS_1$points, type = "n", xlab = "", ylab = "", axes = FALSE,)
text(NMMDS_1$points, labels(World_war_Politicians_data_1), cex = 0.9, xpd = TRUE)

According to above plot we can see that “Franco” and “Mussolini” has got the more similarity compared to others. It is clear that “Hitler” and “Mussolini” has second highest similarity. Next we can find the there is a similarity between “De Gaulle” and “Churchill”. On the other hand “Mussolini” and “Attlee” has the highest amount of dissimilarity according to the above plot.