This report contains two multivariate statistical analyses:
These methods help visualize complex datasets with many variables.
## tibble [290 × 19] (S3: tbl_df/tbl/data.frame)
## $ municipality : chr [1:290] "Ale" "Alingsås" "Alvesta" "Aneby" ...
## $ region : chr [1:290] "Götaland" "Götaland" "Götaland" "Götaland" ...
## $ county : chr [1:290] "Västra Götalands län" "Västra Götalands län" "Kronobergs län" "Jönköpings län" ...
## $ income : chr [1:290] "Middle" "Middle" "Low" "Low" ...
## $ pop.size : num [1:290] 30223 40390 20026 6776 13934 ...
## $ area : num [1:290] 317 472 974 518 325 ...
## $ mean.age : num [1:290] 39.5 42.1 41.8 42.4 44.2 47 45.3 44.6 46.2 43.9 ...
## $ mortality : num [1:290] 0.84 1.005 0.939 0.945 1.184 ...
## $ natality : num [1:290] 1.18 1.05 1.31 1.45 1.08 ...
## $ pop.change : num [1:290] 2.23 0.854 0.879 2.553 0.222 ...
## $ immigration : num [1:290] 7.41 4.88 6.82 8.09 5.96 ...
## $ emigration : num [1:290] 5.52 4.12 6.31 6.02 5.76 ...
## $ tax.capacity : num [1:290] 197627 199056 174595 181317 177804 ...
## $ tax.equal : num [1:290] -462 -1025 2357 -837 -885 ...
## $ unemployment : num [1:290] 3.9 5.1 8.5 5.5 9.4 4.8 7.6 6.6 5.2 11.8 ...
## $ foreign.origin : num [1:290] 21.8 14.6 24.3 14.8 18.2 ...
## $ higher.edu : num [1:290] 21 24.2 17.2 17.7 18.9 ...
## $ greenhouse.gases: num [1:290] 103773 105709 119691 50356 62066 ...
## $ dioxin.mg : num [1:290] 29.5 69.2 69.7 18.5 35 ...
## municipality region county income
## Length:290 Length:290 Length:290 Length:290
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## pop.size area mean.age mortality
## Min. : 2451 Min. : 8.69 Min. :36.30 Min. :0.5156
## 1st Qu.: 9970 1st Qu.: 344.96 1st Qu.:41.40 1st Qu.:0.9211
## Median : 15937 Median : 670.47 Median :43.50 Median :1.0888
## Mean : 34897 Mean : 1404.52 Mean :43.25 Mean :1.0932
## 3rd Qu.: 34905 3rd Qu.: 1291.24 3rd Qu.:45.08 3rd Qu.:1.2521
## Max. :949761 Max. :19155.37 Max. :49.60 Max. :2.1542
## natality pop.change immigration emigration
## Min. :0.4913 Min. :-2.7589 Min. : 3.490 Min. : 3.038
## 1st Qu.:0.9368 1st Qu.: 0.1163 1st Qu.: 5.446 1st Qu.: 4.766
## Median :1.0321 Median : 0.7146 Median : 6.369 Median : 5.605
## Mean :1.0317 Mean : 0.7555 Mean : 6.645 Mean : 5.868
## 3rd Qu.:1.1337 3rd Qu.: 1.4155 3rd Qu.: 7.358 3rd Qu.: 6.567
## Max. :1.6692 Max. : 4.1021 Max. :15.571 Max. :14.183
## tax.capacity tax.equal unemployment foreign.origin
## Min. :146813 Min. :-5214.0 Min. : 2.100 Min. : 7.168
## 1st Qu.:174204 1st Qu.: -631.5 1st Qu.: 5.300 1st Qu.:12.904
## Median :182235 Median : 341.0 Median : 7.200 Median :16.428
## Mean :188085 Mean : 825.3 Mean : 7.504 Mean :18.554
## 3rd Qu.:196206 3rd Qu.: 1787.8 3rd Qu.: 9.200 3rd Qu.:21.953
## Max. :358342 Max. :11864.0 Max. :15.100 Max. :58.555
## higher.edu greenhouse.gases dioxin.mg
## Min. :11.94 Min. : 10597 Min. : 5.135
## 1st Qu.:15.38 1st Qu.: 52604 1st Qu.: 24.855
## Median :18.07 Median : 87251 Median : 41.683
## Mean :19.61 Mean : 181646 Mean : 85.663
## 3rd Qu.:22.27 3rd Qu.: 152722 3rd Qu.: 82.013
## Max. :42.39 Max. :4052201 Max. :1282.273
Categorical variables removed:
## # A tibble: 6 × 15
## pop.size area mean.age mortality natality pop.change immigration emigration
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 30223 317. 39.5 0.840 1.18 2.23 7.41 5.52
## 2 40390 472 42.1 1.01 1.05 0.854 4.88 4.12
## 3 20026 974. 41.8 0.939 1.31 0.879 6.82 6.31
## 4 6776 518. 42.4 0.945 1.45 2.55 8.09 6.02
## 5 13934 325. 44.2 1.18 1.08 0.222 5.96 5.76
## 6 2821 12558. 47 1.42 0.922 -1.95 4.04 5.49
## # ℹ 7 more variables: tax.capacity <dbl>, tax.equal <dbl>, unemployment <dbl>,
## # foreign.origin <dbl>, higher.edu <dbl>, greenhouse.gases <dbl>,
## # dioxin.mg <dbl>
Standardization is necessary because variables have different scales.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.2271 1.6315 1.4544 1.15955 0.94201 0.85625 0.71273
## Proportion of Variance 0.3307 0.1774 0.1410 0.08964 0.05916 0.04888 0.03387
## Cumulative Proportion 0.3307 0.5081 0.6491 0.73878 0.79794 0.84682 0.88068
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.68860 0.61825 0.57896 0.4915 0.41455 0.3263 0.27902
## Proportion of Variance 0.03161 0.02548 0.02235 0.0161 0.01146 0.0071 0.00519
## Cumulative Proportion 0.91229 0.93778 0.96012 0.9762 0.98768 0.9948 0.99997
## PC15
## Standard deviation 0.02074
## Proportion of Variance 0.00003
## Cumulative Proportion 1.00000
We select principal components explaining:
More than 60–70% of total variance.
The first principal components were chosen because they summarize the largest variation in the dataset.
fviz_pca_ind(
pca_result,
geom = "point",
habillage = scb$region,
addEllipses = TRUE,
label = "none"
)Municipalities close together have similar socioeconomic characteristics.
This plot shows which variables drive differences between municipalities.
## pop.size area mean.age mortality
## 0.27320477 -0.15113010 -0.39419653 -0.39182306
## natality pop.change immigration emigration
## 0.25639831 0.34404101 0.16733833 0.05176083
## tax.capacity tax.equal unemployment foreign.origin
## 0.27750312 -0.19284174 -0.12458883 0.21110679
## higher.edu greenhouse.gases dioxin.mg
## 0.36357993 0.17163745 0.20708180
PC1 was positively associated with:
This suggests municipalities with high PC1 scores are large urban municipalities.
## pop.size area mean.age mortality
## -0.18834479 -0.19818064 -0.07166123 -0.02038489
## natality pop.change immigration emigration
## 0.14751696 0.06707752 0.49389563 0.50642689
## tax.capacity tax.equal unemployment foreign.origin
## -0.14539976 0.12551530 0.24996009 0.33048716
## higher.edu greenhouse.gases dioxin.mg
## -0.13473684 -0.27359438 -0.30237918
PC2 was associated with:
Municipalities with high PC2 scores tend to have older populations.
Population size, demographic structure, and economic capacity explain major differences among Swedish municipalities.
Using Bray–Curtis distance.
## Run 0 stress 0.1508602
## Run 1 stress 0.1501291
## ... New best solution
## ... Procrustes: rmse 0.1315384 max resid 0.5536647
## Run 2 stress 0.1739011
## Run 3 stress 0.1606822
## Run 4 stress 0.1499106
## ... New best solution
## ... Procrustes: rmse 0.0219446 max resid 0.1068088
## Run 5 stress 0.1501291
## ... Procrustes: rmse 0.02195007 max resid 0.1068811
## Run 6 stress 0.1579322
## Run 7 stress 0.1609625
## Run 8 stress 0.1510025
## Run 9 stress 0.1530473
## Run 10 stress 0.1763764
## Run 11 stress 0.1575809
## Run 12 stress 0.1567907
## Run 13 stress 0.1609626
## Run 14 stress 0.1530473
## Run 15 stress 0.1514533
## Run 16 stress 0.1715047
## Run 17 stress 0.1554413
## Run 18 stress 0.1634385
## Run 19 stress 0.1579323
## Run 20 stress 0.1716996
## Run 21 stress 0.1751966
## Run 22 stress 0.1502523
## ... Procrustes: rmse 0.02329153 max resid 0.1034964
## Run 23 stress 0.1575807
## Run 24 stress 0.1634385
## Run 25 stress 0.1579322
## Run 26 stress 0.1508602
## Run 27 stress 0.1555186
## Run 28 stress 0.1495229
## ... New best solution
## ... Procrustes: rmse 0.01184774 max resid 0.06061541
## Run 29 stress 0.1553503
## Run 30 stress 0.1499105
## ... Procrustes: rmse 0.01178412 max resid 0.06065181
## Run 31 stress 0.1564722
## Run 32 stress 0.1512937
## Run 33 stress 0.1512932
## Run 34 stress 0.1508601
## Run 35 stress 0.1530475
## Run 36 stress 0.1827846
## Run 37 stress 0.1721044
## Run 38 stress 0.1530473
## Run 39 stress 0.1508602
## Run 40 stress 0.1541209
## Run 41 stress 0.1549409
## Run 42 stress 0.1744663
## Run 43 stress 0.1506509
## Run 44 stress 0.1549411
## Run 45 stress 0.1504013
## Run 46 stress 0.1508601
## Run 47 stress 0.182646
## Run 48 stress 0.1506509
## Run 49 stress 0.1631079
## Run 50 stress 0.158413
## Run 51 stress 0.1633539
## Run 52 stress 0.3941036
## Run 53 stress 0.1836335
## Run 54 stress 0.1508603
## Run 55 stress 0.1504013
## Run 56 stress 0.1554107
## Run 57 stress 0.1530473
## Run 58 stress 0.1681186
## Run 59 stress 0.1634386
## Run 60 stress 0.1988293
## Run 61 stress 0.1606821
## Run 62 stress 0.163108
## Run 63 stress 0.1506508
## Run 64 stress 0.1530473
## Run 65 stress 0.1506508
## Run 66 stress 0.1530472
## Run 67 stress 0.1960253
## Run 68 stress 0.1530472
## Run 69 stress 0.1606821
## Run 70 stress 0.1530472
## Run 71 stress 0.149028
## ... New best solution
## ... Procrustes: rmse 0.02408859 max resid 0.09703141
## Run 72 stress 0.1541217
## Run 73 stress 0.1630893
## Run 74 stress 0.1512932
## Run 75 stress 0.1504013
## Run 76 stress 0.1506513
## Run 77 stress 0.1553503
## Run 78 stress 0.1752673
## Run 79 stress 0.1512932
## Run 80 stress 0.1688955
## Run 81 stress 0.3898181
## Run 82 stress 0.1688955
## Run 83 stress 0.1530473
## Run 84 stress 0.1809899
## Run 85 stress 0.1634386
## Run 86 stress 0.154121
## Run 87 stress 0.1503611
## Run 88 stress 0.149028
## ... New best solution
## ... Procrustes: rmse 7.815705e-05 max resid 0.0003023142
## ... Similar to previous best
## *** Best solution repeated 1 times
##
## Call:
## metaMDS(comm = entero, distance = "bray", k = 2, trymax = 100)
##
## global Multidimensional Scaling using monoMDS
##
## Data: entero
## Distance: bray
##
## Dimensions: 2
## Stress: 0.149028
## Stress type 1, weak ties
## Best solution was repeated 1 time in 88 tries
## The best solution was from try 88 (random start)
## Scaling: centring, PC rotation, halfchange scaling
## Species: expanded scores based on 'entero'
nmds_points <- as.data.frame(nmds$points)
nmds_points$Nationality <- sampledf$Nationality
ggplot(nmds_points,
aes(x = MDS1,
y = MDS2,
color = Nationality)) +
geom_point(size = 3) +
theme_minimal()Partial clustering suggests microbiome differences among nationalities.
dist_matrix <- vegdist(
entero,
method = "bray"
)
bd <- betadisper(
dist_matrix,
sampledf$Nationality
)
anova(bd)## Analysis of Variance Table
##
## Response: Distances
## Df Sum Sq Mean Sq F value Pr(>F)
## Groups 5 0.028893 0.0057786 0.7874 0.5679
## Residuals 27 0.198156 0.0073391
If:
p > 0.05
→ Dispersion not different
→ PERMANOVA can be performed
adonis_result <- adonis2(
dist_matrix ~ Nationality,
data = sampledf,
permutations = 999
)
adonis_result## Permutation test for adonis under reduced model
## Permutation: free
## Number of permutations: 999
##
## adonis2(formula = dist_matrix ~ Nationality, data = sampledf, permutations = 999)
## Df SumOfSqs R2 F Pr(>F)
## Model 5 0.91239 0.3803 3.3139 0.001 ***
## Residual 27 1.48675 0.6197
## Total 32 2.39914 1.0000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If:
p < 0.05
→ Significant differences between nationalities.
The NMDS ordination showed clustering of samples according to nationality, indicating differences in gut microbiome composition.
The multivariate analyses demonstrated measurable variation in both datasets.
PCA Results
NMDS Results
These results highlight the usefulness of multivariate statistical methods for analyzing complex datasets.