Abstract

This article applies Multidimensional Scaling Techniques on data obtained by applying optimization techniques and differential equations solving onto data from the epidemiologic evolution of the COVID-19 pandemic in Brazil. The MDS provides a visual insight on the how the Brazilian states are performing more closely regarding the epidemiological parameters obtained.

Introduction

The Pandemic

Please refer to http://seir.candonga.org.br/ about how the data was obtained, trimmed, processed and tabulated.

Multidimensional Scaling: a general measure of similarity or preference

The MDS is a quantitative interpretative method, which shows itself as a potential alternative for researchers seeking to broaden the understanding of the variables involved in the investigation, providing a broader view of the raw data (Babones, 2016). Also known as perceptual mapping, it is a procedure that allows the researcher to determine the perceived relative image of a set of objects (companies, products, ideas or other items associated with commonly considered perceptions) (Hair, Black, Babin Anderson, & Tatham, 2006). It is one of the simplest tools for multidimensional analysis, initially developed to map distances between objects (Johnson & Wichern, 2007) and aims to transform consumer judgments regarding similarity or preference (e.g., preference for stores or brands) into distances represented in multidimensional space. By comparing objects (e.g. product, service, person, aroma). It differs from other multivariate methods in that it uses a general measure of similarity or preference. According to Hair et al., (2006) three main steps need to be followed, such as; gathering measures of similarity or preference, employing MDS techniques to estimate the relative position of each object, and interpreting the axes of dimensional space in terms of perceptual and/or objective attributes. MDS techniques position objects judged by respondents to be the most similar, so that the distance between them in multidimensional space is less than the distance between any other pairs of objects. The resulting perceptual map will show the relative position of all objects. The creation of the perceptual map requires the use of techniques that consider the nature of the responses achieved by the survey participants in relation to the objects.

Developing a perceptual map following the steps below (Hair et al.,2006):

Step 1

Research problem. At this stage, the researcher must choose objectives: Identify unrecognized evaluative dimensions; make a comparative assessment of objects. Search specification. Identify all relevant objects: Choose between similarity or preference data; select a disaggregated or aggregated analysis.

Step 2

Choose a perceptual mapping approach. The evaluation attributes are specified by researcher (compositional) or are used only general preference measures (decompositional)? The techniques employed can be classified into two types: decompositional method, which measures only the general impression of an object, or compositional method, which employs techniques to evaluate objects based on a combination of specific attributes. It is important to emphasize that the objects under study have a set of characteristic attributes for the basis of comparison of respondents. The researcher should choose several objects that facilitate the respondent to analysis to achieve a stable multidimensional solution. The number of objects also affects the determination of an acceptable level of adjustment. Often, having less than the number of objects suggested for a given dimensionality causes an exaggerated estimate of adjustment.

Step 3

Type of evaluation. The respondent is describing similarities between objects, preferences between them, or a combination of both? Multidimensional scaling has no assumptions restrictions on methodology, data type or form of the relationships between the variables, but the MDS requires the accepted researcher three fundamental principles on perception: variation in dimensionality, variation in importance and variation in time. MDS techniques help the researcher to understand each separate individual and identify perceptions share and evaluate dimensions within the sample of respondents.

Step 4

Selection of the base for the perceptual map. The map represents insights of similarity or preferably? If based on perceptions of similarity, the researcher should consider that relative object positions reflect similarity on perceived dimensions. However, when based on preference, you should consider the position of objects in front of ideal points.

Step 5

Identification of dimensions. Once the perceptual map is obtained, the two approaches - compositional and decompositional - again diverge in their interpretation of the results. The differences in interpretation are supported by the amount of information directly provided in the analysis (e.g., the attributes incorporated in the compositional analysis versus their absence in the decompositional analysis) and the generality of the results for the real decision-making process.

Step 6

Validation of MDS results. Any validation approach tries to assess generality (e.g. similarity in different samples), while maintaining comparability. The researcher should consider validation techniques that meet each requirement to some degree, such as: Analysis of partitioned samples and Comparison of decompositional versus compositional solutions.

Multidimensional scaling brings together only global or holistic measures of similarity or preference and then empirically infers the dimensions (character and number) that reflect the best explanation of an individual’s responses, either separately or collectively. In this technique, the statistical variable used in many other methods becomes the perceptual dimensions inferred from the analysis. As such, the researcher does not have to worry about issues such as specification error, multicollinearity or statistical characteristics of the variables. The challenge for the researcher, however, is the interpretation of the statistical variable; without a valid interpretation, the MDS main objectives are compromised.

Multidimensional Scaling: In depth approach

Multidimensional scaling (MDS) is a technique that allows visualizing the amount of similarity of discretionary cases of a dataset. In other words, it shows information about the pairwise distances among a set of n objects into a structure of n points mapped into an abstract Cartesian space. It is a form of non-linear dimensionality reduction. It was first conceived to be a tool in Geography, but now is used in diverse fields such as attitude study in psychology, sociology, and market research.

More formally:

“Multidimensional scaling (MDS) is a technique for the analysis of similarity or dissimilarity data on a set of objects. MDS attempts to model such data as distances among points in a geometric space. The main reason for doing this is that one wants a graphical display of the structure of the data, one that is much easier to understand than an array of numbers and, moreover, one that displays the essential information in the data, smoothing out noise.(Borg & J.F. Groenen, 2005)”

In the following, the four purposes of MDS:

  1. MDS as a method that represents (dis)similarity data as distances in a low-dimensional space in order to make these data accessible to visual inspection and exploration;
  2. MDS as a technique that allows one to test if and how certain criteria by which one can distinguish among different objects of interest are mirrored in corresponding empirical differences of these objects;
  3. MDS as a data-analytic approach that allows one to discover the dimensions that underlie judgments of (dis)similarity;
  4. MDS as a psychological model that explains judgments of dissimilarity in terms of a rule that mimics a particular type of distance function. (Borg & J.F. Groenen, 2005)

(Ashkiani, S. 2017)

A good illustration from (Borg & J.F. Groenen, 2005)

Consider an example. The U.S. Statistical Abstract 1970 issued by the Bureau of the Census provides statistics on the rate of different crimes in the 50 U.S. states (Wilkinson, 1990). One question that can be asked about these data is to what extent can one predict a high crime rate of murder, say, by knowing that the crime rate of burglary is high.

MDS Model

MDS transforms the proximity values into disparities. Doing so is done using a transformation function, such that:

\[disparity_{ij} = f(proximity_{ij})\]

The values of \(f(\delta_{ij})\) [ \(\delta\) is equivalent to \(p_{ij}\) in this report ] are often called “fitted distances”, and denoted by \(\hat{d_{ij}}\). They are also sometimes called “disparities”. However, it is important to remember that the fitted distances are not distances, but simply numbers which are fitted to the distances.(Kruskal and Wish, 1978, p29)

Stablished a model,in the next step, a MDS algorithm tries to find a configuration of the objects in the low dimensional space, in our case, a 2d space, such that the distances are the “map” are as close as possible to the corresponding disparities. This closeness is considered as the index of goodness of fit, i.e. how good the map represents the disparities. In order to quantify such measure of fit, usually Stress-1 index (Kruskal, 1964) is used as the loss function. The index is as follows:

\[Stress-1 = \sigma_{i} = \sqrt{ \frac{\sum{[\hat{d_{ij}}-d_{ij}]^2} }{\sum{d_{ij}^2} } } =\sqrt{ \frac{\sum{[f(p_{ij})-d_{ij}]^2} }{\sum{d_{ij}^2} } }\]

As one can evaluate, the \(Stress-1\) formula measures the deviation of low dimensional representation from the ideal representation and then norm it by division through the denominator.

In practice, the computing procedure iteratively changes the position of the observations in the MDS configuration in order to minimize \(Stress-1\) or similar indexes. While the ideal situation may look like some precise representation of the disparities as the between-object distances on the map, it usually cannot be achieved, and what we get is an approximation of the disparities by distances:

\[f(p_{ij}) \approx d_{ij}\]

This approximation, being intrinsically not very precise, is desirable comparing to exact representation:

Given that the proximities contain some error, such approximate representations make even better representations— more robust, reliable, replicable, and substantively meaningful ones—than those that are formally perfect, because they may smooth out noise. (Borg and Groenen, 2005, p41)

As stated before, the choice of transformation function defines the MDS model. This choice is usually choosing a class of functions rather than exact function with explicit parameters. As a consequence of such choice, ratio MDS, interval MDS, and ordinal MDS can be applied to proper dataset. While there are more models available, here we briefly review these three function types, since they are the most prevalent in practice.

In interval MDS, a linear function is used as the transformation function:

\[ p_{ij} \longrightarrow a + b.p_{ij} \]

a and b would be determined by the computerized procedure, so there is no need to be worry about them. If \(a=0\) then we will have ratio MDS:

\[p_{ij} \longrightarrow b.p_{ij}\]

In ratio MDS, however, the proximities are assumed to have a fixed origin and no such arbitrary additive constants are admissible.(Borg and Groenen, 2005, p34)

These two types of transformation are called metric scaling.

where metric refers to the type of transformation of the dissimilarities and not the space in which a configuration of points is sought. (Cox and Cox, 2001, p6)

In contrast, the ordinal MDS is a sort of nonmetric scaling. In non-metric scaling the transformation function only preserves the rank order of the proximities. Thus the non-metric function must follow the monotonicity constraint (Cox and Cox, 2001, p7) :

\[p_{ij} < p_{kl} \Rightarrow d_{ij} \leq d_{kl}\]

So in a non-metric MDS configuration, the absolute value of distances is not meaningful, since the algorithm tries to represent the rank-order, i.e. relative distances.

It is important to re-emphasize that these ratio, interval and ordinal types of MDS should not be confused with the data measurement with the same names. These types are talking about how to transform proximity scores into disparities. However, when the proximity data is ordinal, e.g. rank-order preference of a comodity from the point of view of a customer, then the rational choice is applying ordinal transformation to the data.

MDS in R

There are several functions and packages for multidimensional scaling. Next there are the ones we will use in this exercise.

  • cmdscale() [stats package]: Compute classical (metric) multidimensional scaling.
  • isoMDS() [MASS package]: Compute Kruskal’s non-metric multidimensional scaling (one form of non-metric MDS).
  • sammon() [MASS package]: Compute sammon’s non-linear mapping (one form of non-metric MDS).

All these functions take a distance object as the main argument and k is the desired number of dimensions in the scaled output. By default, they return two dimension solutions, but we can change that through the parameter k which defaults to 2.

The eurodist dataset: Distances Between European Cities

The eurodist gives the road distances (in km) between 21 cities in Europe. The data are taken from a table in The Cambridge Encyclopaedia. It is composed by objects based on 21 and 10 objects, respectively

The dataset eurodist is a full symmetric matrix containing the dissimilarities.

We use the classic MDS implementation by cmdscale() function.

# Loading the dataset: Distance between European cities
data("eurodist")


# inspecting the data, first 5 values

euromat = as.matrix(eurodist)
euromat[1:5, 1:5]
          Athens Barcelona Brussels Calais Cherbourg
Athens         0      3313     2963   3175      3339
Barcelona   3313         0     1318   1326      1294
Brussels    2963      1318        0    204       583
Calais      3175      1326      204      0       460
Cherbourg   3339      1294      583    460         0
eurodistmatrix = as.matrix(eurodist)

fit_e <- cmdscale(eurodistmatrix, eig = TRUE, k = 2) # The MDS being performed

mds_e <- as_tibble(fit_e$points)                     # The coordinates from MDS object
colnames(mds_e) <- c("Dim.1", "Dim.2")
ggscatter(mds_e, x = "Dim.1", y = "Dim.2",           # The plot
          label = colnames(eurodistmatrix),
          size = 1,
          title = "MDS: Distance between European cities",
          repel = TRUE)

Distances between Brazilian capitals

Again we work with road distances, the map tends to be somewhat different from the geographic one.

# Road distances between Brazilian capitals

# Loading from a Spreadsheet with the distances

D <- read_excel("D:/Google Drive/COPPEAD/R/WD/Distancias.xlsx", na = "0", skip = 1)

# Cleaning and trimming the data

D[is.na(D)] <- 0
D <- as.data.frame(D)

rownames(D) <- D[, 1]
D <- D[, -1]

fit_D <- cmdscale(D, eig = TRUE, k = 2)

# Adjusting the map to be recognizable

fit_D$points[, 1] <- 0 - fit_D$points[, 1]
fit_D$points[, 2] <- 0 - fit_D$points[, 2]

mds_D <- as_tibble(fit_D$points)
colnames(mds_D) <- c("Dim.1", "Dim.2")
ggscatter(mds_D, x = "Dim.1", y = "Dim.2", label = colnames(D), size = 1, title = "MDS: Road distances between Brazilian capitals", 
    repel = TRUE)

Dataset swiss: Swiss Fertility and Socioeconomic Indicators (1888) Data

A data frame with 47 observations on 6 variables, each of which is in percent, i.e., in [0, 100].

  • [,1] Fertility Ig, ‘common standardized fertility measure’
  • [,2] Agriculture % of males involved in agriculture as occupation
  • [,3] Examination % draftees receiving highest mark on army examination
  • [,4] Education % education beyond primary school for draftees.
  • [,5] Catholic % ‘catholic’ (as opposed to ‘protestant’).
  • [,6] Infant.Mortality live births who live less than 1 year.

All variables but ‘Fertility’ give proportions of the population.

Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries.

The data collected are for 47 French-speaking “provinces” at about 1888.

Here, all variables are scaled to [0, 100], where in the original, all but “Catholic” were scaled to [0, 1].

Fertility Agriculture Examination Education Catholic Infant.Mortality
Courtelary 80.2 17.0 15 12 9.96 22.2
Delemont 83.1 45.1 6 9 84.84 22.2
Franches-Mnt 92.5 39.7 5 5 93.40 20.2
Moutier 85.8 36.5 12 7 33.77 20.3
Neuveville 76.9 43.5 17 15 5.16 20.6
Porrentruy 76.1 35.3 9 7 90.57 26.6
Broye 83.8 70.2 16 7 92.85 23.6
Glane 92.4 67.8 14 8 97.16 24.9
Gruyere 82.4 53.3 12 7 97.67 21.0
Sarine 82.9 45.2 16 13 91.38 24.4
Veveyse 87.1 64.5 14 6 98.61 24.5
Aigle 64.1 62.0 21 12 8.52 16.5
Aubonne 66.9 67.5 14 7 2.27 19.1
Avenches 68.9 60.7 19 12 4.43 22.7
Cossonay 61.7 69.3 22 5 2.82 18.7
Echallens 68.3 72.6 18 2 24.20 21.2
Grandson 71.7 34.0 17 8 3.30 20.0
Lausanne 55.7 19.4 26 28 12.11 20.2
La Vallee 54.3 15.2 31 20 2.15 10.8
Lavaux 65.1 73.0 19 9 2.84 20.0
Morges 65.5 59.8 22 10 5.23 18.0
Moudon 65.0 55.1 14 3 4.52 22.4
Nyone 56.6 50.9 22 12 15.14 16.7
Orbe 57.4 54.1 20 6 4.20 15.3
Oron 72.5 71.2 12 1 2.40 21.0
Payerne 74.2 58.1 14 8 5.23 23.8
Paysd'enhaut 72.0 63.5 6 3 2.56 18.0
Rolle 60.5 60.8 16 10 7.72 16.3
Vevey 58.3 26.8 25 19 18.46 20.9
Yverdon 65.4 49.5 15 8 6.10 22.5
Conthey 75.5 85.9 3 2 99.71 15.1
Entremont 69.3 84.9 7 6 99.68 19.8
Herens 77.3 89.7 5 2 100.00 18.3
Martigwy 70.5 78.2 12 6 98.96 19.4
Monthey 79.4 64.9 7 3 98.22 20.2
St Maurice 65.0 75.9 9 9 99.06 17.8
Sierre 92.2 84.6 3 3 99.46 16.3
Sion 79.3 63.1 13 13 96.83 18.1
Boudry 70.4 38.4 26 12 5.62 20.3
La Chauxdfnd 65.7 7.7 29 11 13.79 20.5
Le Locle 72.7 16.7 22 13 11.22 18.9
Neuchatel 64.4 17.6 35 32 16.92 23.0
Val de Ruz 77.6 37.6 15 7 4.97 20.0
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3

Along with cmdscale() we will use the function dist() to calculate the distance between the observations.

Available distance measures are (written for two vectors x and y):

euclidean: Usual distance between the two vectors (2 norm aka L_2), sqrt(sum((x_i - y_i)^2)).

maximum: Maximum distance between two components of x and y (supremum norm)

manhattan: Absolute distance between the two vectors (1 norm aka L_1).

canberra: sum(|x_i - y_i| / (|x_i| + |y_i|)). Terms with zero numerator and denominator are omitted from the sum and treated as if the values were missing.

This is intended for non-negative values (e.g., counts), in which case the denominator can be written in various equivalent ways; Originally, R used x_i + y_i, then from 1998 to 2017, |x_i + y_i|, and then the correct |x_i| + |y_i|.

binary: (aka asymmetric binary): The vectors are regarded as binary bits, so non-zero elements are ‘on’ and zero elements are ‘off’. The distance is the proportion of bits in which only one is on amongst those in which at least one is on.

minkowski: The p norm, the pth root of the sum of the pth powers of the differences of the components.

Applying dist() with the default parameters on the dataset

# Cmpute MDS



mds <- swiss %>% dist() %>% cmdscale() %>% as_tibble()
colnames(mds) <- c("Dim.1", "Dim.2")
# Plot MDS
ggscatter(mds, x = "Dim.1", y = "Dim.2", label = rownames(swiss), size = 1, title = "MDS: Classical", 
    repel = TRUE)

Using Sammon’s Non-Linear Mapping

# Cmpute MDS

mds <- swiss %>% dist() %>% sammon() %>% .$points %>% as_tibble()
Initial stress        : 0.01959
stress after   0 iters: 0.01959
colnames(mds) <- c("Dim.1", "Dim.2")
# Plot MDS
ggscatter(mds, x = "Dim.1", y = "Dim.2", label = rownames(swiss), size = 1, title = "MDS: Sammon", 
    repel = TRUE)

Using Kruskal’s Non-metric Multidimensional Scaling

# Cmpute MDS

mds <- swiss %>% dist() %>% isoMDS() %>% .$points %>% as_tibble()
initial  value 5.463800 
iter   5 value 4.499103
iter   5 value 4.495335
iter   5 value 4.492669
final  value 4.492669 
converged
colnames(mds) <- c("Dim.1", "Dim.2")
# Plot MDS
ggscatter(mds, x = "Dim.1", y = "Dim.2", label = rownames(swiss), size = 1, title = "MDS: Kruskal's Non-metric Multidimensional Scaling", 
    repel = TRUE)

The Shepard diagram

A Shepard diagram compares how far apart your data points are before and after you transform them (ie: goodness-of-fit) as a scatter plot.

While a really accurate dimension reduction like the one above will produce a straight line. However since information is almost always lost during data reduction, at least on real, high-dimension data, Shepard diagrams rarely look this straight.

swiss_scale <- scale(swiss)

swiss_dist <- dist(swiss_scale)

swiss_mds <- isoMDS(swiss_dist)
initial  value 12.906919 
iter   5 value 9.546640
final  value 9.414896 
converged
mds_ISO__sh <- Shepard(swiss_dist, swiss_mds$points)

plot(mds_ISO__sh, pch = ".", xlab = "Dissimilarity", ylab = "Distance", xlim = range(mds_ISO__sh$x), 
    ylim = range(mds_ISO__sh$x))
lines(mds_ISO__sh$x, mds_ISO__sh$yf, type = "S")

Visualizing a correlation matrix using Multidimensional Scaling

The dataset: mtcars

Extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

Format

A data frame with 32 observations on 11 (numeric) variables.

  • [, 1] mpg Miles/(US) gallon
  • [, 2] cyl Number of cylinders
  • [, 3] disp Displacement (cu.in.)
  • [, 4] hp Gross horsepower
  • [, 5] drat Rear axle ratio
  • [, 6] wt Weight (1000 lbs)
  • [, 7] qsec 1/4 mile time
  • [, 8] vs Engine (0 = V-shaped, 1 = straight)
  • [, 9] am Transmission (0 = automatic, 1 = manual)
  • [,10] gear Number of forward gears
  • [,11] carb Number of carburetors
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

The correlation

MDS can be also used to reveal a hidden pattern in a correlation matrix.

Correlation actually measures similarity, but it is easy to transform it to a measure of dissimilarity. Distance between objects can be calculated as 1 - res.cor.

Positive correlated objects are close together on the same side of the plot.

res.cor <- cor(mtcars, method = "spearman")
autoplot(1 - res.cor)

mds.cor <- (1 - res.cor) %>% cmdscale() %>% as_tibble()
colnames(mds.cor) <- c("Dim.1", "Dim.2")
ggscatter(mds.cor, x = "Dim.1", y = "Dim.2", size = 1, label = colnames(res.cor), 
    title = "MDS: Correlation Matrix", repel = TRUE)

MDS in R: COVID-19 parameter data

Overall Status: Brazilian Landscape

The parameters shown in the table below answer the following question: adapting SEIR to a computational model based on the known progression of the COVID-19 pandemic, based on official data, and optimizing by the evolution of the number of deaths to the last date with actual data, which parameters would adapt better to the observed reality?

The graphs shown at http://seir.candonga.org.br/ were generated using these paramenters.

IST <- readRDS("D:/Google Drive/COPPEAD/R/WD/IST.rds")



IST %>% gt() %>% tab_header(title = md("**Brazilian Paramenters by State/Region**"), 
    subtitle = "The simulation´s parameters found accordingly with the accumulated COVID-19 deaths") %>% 
    tab_source_note(source_note = "Source: Differential Evolution Algorithm Simulation with the data provided by https://covid.saude.gov.br/.") %>% 
    tab_footnote(footnote = "The time reference is in days", locations = cells_column_labels(columns = vars("Time from infected to symptomatic")))
Brazilian Paramenters by State/Region
The simulation´s parameters found accordingly with the accumulated COVID-19 deaths
State/Region Initial number of infected individuals Estimated rate of infection Time from infected to symptomatic1 Time, infected recovering or progressing towards hospitalization Time, UCI individuals returning to the hospital or diyng Time, Hospitalized recovering or going critical % of infectious that are asymptomatic or mild % of fraction of severe cases that turn critical % of fraction of critical cases that are fatal
Brazil 9 2.2 9.7 3.5 2.1 1.3 51% 87% 90%
Acre 9 1.33 9.9 3.1 3.6 3 50% 89% 89%
Alagoas 9 1.67 10 4.5 2.1 1.7 54% 87% 79%
Amapá 9 1.49 8.3 4.3 3.7 1.1 53% 89% 70%
Amazonas 9 1.51 8.1 3.2 2.2 1.7 52% 89% 89%
Bahia 9 1.62 8.7 4.3 2.1 1.1 51% 90% 89%
Ceará 9 1.68 8.6 3.8 2 1 51% 90% 89%
Distrito Federal 9 1.7 8.1 4.4 3.5 3.7 63% 82% 72%
Espírito Santo 8 1.58 8.8 3.9 2.2 1.5 53% 89% 83%
Goiás 6 2.67 10.4 7.5 5.5 6.1 80% 42% 57%
Maranhão 9 1.61 8.8 3.3 3.2 1.5 53% 89% 89%
Mato Grosso 5 3.32 15.6 9.2 12 4.7 85% 76% 70%
Mato Grosso do Sul 7 3.1 16.6 8.8 2.2 2.5 88% 42% 66%
Minas Gerais 9 1.52 8.3 3.6 2.3 2.5 51% 85% 59%
Pará 9 1.81 10.9 3.4 2.5 1.2 50% 90% 85%
Paraíba 9 1.62 8.1 5.2 3.4 1.4 50% 89% 90%
Paraná 3 2.7 14.3 6.8 2.5 2.3 72% 66% 73%
Pernambuco 9 1.49 8.5 3.3 2.2 1.1 51% 89% 88%
Piauí 8 2.03 12.9 6.1 10.3 7.3 51% 73% 90%
Rio de Janeiro 9 1.66 8.1 4.2 2.2 1.1 51% 90% 88%
Rio Grande do Norte 9 1.89 9.3 7.4 8.2 8.8 50% 88% 88%
Rio Grande do Sul 9 1.4 8.1 3.8 2.2 2.1 52% 87% 88%
Rondônia 9 1.46 8.5 3.1 2.1 9 64% 84% 84%
Roraima 9 1.33 9.1 4.1 8 5.8 51% 85% 82%
Santa Catarina 8 1.43 9.7 3.3 2.6 1.5 50% 89% 78%
São Paulo 9 1.8 8.8 3.3 3 1.1 50% 88% 83%
Sergipe 9 1.61 8.6 4.8 3.6 8 52% 79% 70%
Tocantins 9 1.3 8 3.5 9.7 6.1 50% 85% 63%
Source: Differential Evolution Algorithm Simulation with the data provided by https://covid.saude.gov.br/.

1 The time reference is in days

Given we have the aforementioned data, we can apply the classical MDS by cmdscale()

# Trimming and adjustig the data

IST[, 8] <- IST[, 8] %>% gsub(pattern = "%", replacement = "")
IST[, 9] <- IST[, 9] %>% gsub(pattern = "%", replacement = "")
IST[, 10] <- IST[, 10] %>% gsub(pattern = "%", replacement = "")

IST_NUM <- IST
rownames(IST_NUM) <- IST_NUM[, 1]
IST_NUM <- IST_NUM[, -1]

IST_NUM <- as.matrix(IST_NUM)

IST_NUM_1 <- mapply(IST_NUM, FUN = as.numeric)  # Character to numeric

IST_NUM_1 <- matrix(data = IST_NUM_1, ncol = ncol(IST_NUM), nrow = nrow(IST_NUM))

rownames(IST_NUM_1) <- rownames(IST_NUM)  # Adjusting the names of the dimentions
colnames(IST_NUM_1) <- colnames(IST_NUM)

IST_SCALE <- scale(IST_NUM_1)  # Default scaling (euclidean)

IST_DIST <- dist(x = IST_SCALE)  # Scaling

fit_IST <- cmdscale(IST_DIST, eig = TRUE, k = 2)


mds_IST <- as_tibble(fit_IST$points)
colnames(mds_IST) <- c("Dim.1", "Dim.2")
ggscatter(mds_IST, x = "Dim.1", y = "Dim.2", label = rownames(IST_NUM_1), size = 1, 
    repel = TRUE)

Given we have the aforementioned data, we can apply the classical MDS by cmdscale()

# Experiment in clustering: K-means

clust <- kmeans(mds_IST, 2)$cluster %>% as.factor()

fviz_nbclust(mds_IST, FUNcluster = kmeans)

mds_IST <- mds_IST %>% mutate(groups = clust)
# Plot and color by groups
ggscatter(mds_IST, x = "Dim.1", y = "Dim.2", label = rownames(IST_NUM_1), color = "groups", 
    palette = "jco", size = 1, ellipse = TRUE, ellipse.type = "convex", repel = TRUE)

# Shepard

The Sheppard Diagram

IST_ISO_MDS <- isoMDS(IST_DIST)
initial  value 11.747231 
iter   5 value 7.391638
iter  10 value 5.564691
final  value 5.428759 
converged
mds_ISO__sh <- Shepard(IST_DIST, IST_ISO_MDS$points)

plot(mds_ISO__sh, pch = ".", xlab = "Dissimilarity", ylab = "Distance", xlim = range(mds_ISO__sh$x), 
    ylim = range(mds_ISO__sh$x))
lines(mds_ISO__sh$x, mds_ISO__sh$yf, type = "S")

Shepard diagram for the MDS solution. The vertical distance between a point and the regression line gives the error of the corresponding distance in the MDS representation

Bibliography

Gower, J. C. (1966). Some Distance Properties of Latent Root and Vector Methods Used in Multivariate Analysis. Biometrika, 53(3/4), 325. https://doi.org/10.2307/2333639

Jr, J. F. H., Black, W. C., Babin, B. J., Anderson, R. E., Black, W. C., & Anderson, R. E. (2018). Multivariate Data Analysis. https://doi.org/10.1002/9781119409137.ch4

Kruskal, J., & Wish, M. (1978). Multidimensional Scaling. SAGE Publications, Inc. https://doi.org/10.4135/9781412985130

Mair, P., Groenen, P. J. F., & Leeuw, J. de. (2019). More on Multidimensional Scaling and Unfolding in R: smacof Version 2. https://cran.r-project.org/package=smacof

Mardia, K. V. (1978). Some properties of clasical multi-dimesional scaling. Communications in Statistics - Theory and Methods, 7(13), 1233–1241. https://doi.org/10.1080/03610927808827707

Shingo Aoki, Kotaro Toyozumi, & Hiroshi Tsuji. (2007). Visualizing method for data envelopment analysis. 2007 IEEE International Conference on Systems, Man and Cybernetics, 474–479. https://doi.org/10.1109/ICSMC.2007.4413784

Torgerson, W. S. (1958). Theory and Methods of Scaling.

Vehkalahti, K., & Everitt, B. S. (2018). Multivariate Analysis for the Behavioral Sciences. In Multivariate Analysis for the Behavioral Sciences. CRC Press. https://doi.org/10.1201/9781351202275