This project aims to explore patterns linking temperature change and gas emissions.
Setting the working directory and loading libraries:
setwd("C:/Users/adilk/OneDrive/Desktop/Uni Work/USL")
getwd()
## [1] "C:/Users/adilk/OneDrive/Desktop/Uni Work/USL"
library("sf")
library("maps")
library("dplyr")
library("ggplot2")
library("ggrepel")
library("countrycode")
library("corrplot")
Load the temperature change dataset and replace NA values with mean, as well as make sure the data is numeric as needed
dataset <- read.csv("climate_data.csv")
climate_data <- dataset %>%
mutate(across(everything(), ~ ifelse(is.na(.), mean(., na.rm = TRUE), .)))
head(climate_data)
## ObjectId Country ISO2 ISO3
## 1 1 Afghanistan, Islamic Rep. of AF AFG
## 2 2 Africa AFRTMP
## 3 3 Albania AL ALB
## 4 4 Algeria DZ DZA
## 5 5 American Samoa AS ASM
## 6 6 Americas AMETMP
## Indicator
## 1 Temperature change with respect to a baseline climatology, corresponding to the period 1951-1980
## 2 Temperature change with respect to a baseline climatology, corresponding to the period 1951-1980
## 3 Temperature change with respect to a baseline climatology, corresponding to the period 1951-1980
## 4 Temperature change with respect to a baseline climatology, corresponding to the period 1951-1980
## 5 Temperature change with respect to a baseline climatology, corresponding to the period 1951-1980
## 6 Temperature change with respect to a baseline climatology, corresponding to the period 1951-1980
## Unit
## 1 Degree Celsius
## 2 Degree Celsius
## 3 Degree Celsius
## 4 Degree Celsius
## 5 Degree Celsius
## 6 Degree Celsius
## Source
## 1 Food and Agriculture Organization of the United Nations (FAO). 2022. FAOSTAT Climate Change, Climate Indicators, Temperature change. License: CC BY-NC-SA 3.0 IGO. Extracted from: https://www.fao.org/faostat/en/#data/ET. Accessed on 2023-03-28.
## 2 Food and Agriculture Organization of the United Nations (FAO). 2022. FAOSTAT Climate Change, Climate Indicators, Temperature change. License: CC BY-NC-SA 3.0 IGO. Extracted from: https://www.fao.org/faostat/en/#data/ET. Accessed on 2023-03-28.
## 3 Food and Agriculture Organization of the United Nations (FAO). 2022. FAOSTAT Climate Change, Climate Indicators, Temperature change. License: CC BY-NC-SA 3.0 IGO. Extracted from: https://www.fao.org/faostat/en/#data/ET. Accessed on 2023-03-28.
## 4 Food and Agriculture Organization of the United Nations (FAO). 2022. FAOSTAT Climate Change, Climate Indicators, Temperature change. License: CC BY-NC-SA 3.0 IGO. Extracted from: https://www.fao.org/faostat/en/#data/ET. Accessed on 2023-03-28.
## 5 Food and Agriculture Organization of the United Nations (FAO). 2022. FAOSTAT Climate Change, Climate Indicators, Temperature change. License: CC BY-NC-SA 3.0 IGO. Extracted from: https://www.fao.org/faostat/en/#data/ET. Accessed on 2023-03-28.
## 6 Food and Agriculture Organization of the United Nations (FAO). 2022. FAOSTAT Climate Change, Climate Indicators, Temperature change. License: CC BY-NC-SA 3.0 IGO. Extracted from: https://www.fao.org/faostat/en/#data/ET. Accessed on 2023-03-28.
## CTS.Code CTS.Name
## 1 ECCS Surface Temperature Change
## 2 ECCS Surface Temperature Change
## 3 ECCS Surface Temperature Change
## 4 ECCS Surface Temperature Change
## 5 ECCS Surface Temperature Change
## 6 ECCS Surface Temperature Change
## CTS.Full.Descriptor
## 1 Environment, Climate Change, Climate and Weather, Surface Temperature Change
## 2 Environment, Climate Change, Climate and Weather, Surface Temperature Change
## 3 Environment, Climate Change, Climate and Weather, Surface Temperature Change
## 4 Environment, Climate Change, Climate and Weather, Surface Temperature Change
## 5 Environment, Climate Change, Climate and Weather, Surface Temperature Change
## 6 Environment, Climate Change, Climate and Weather, Surface Temperature Change
## X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970 X1971
## 1 -0.126 -0.173 0.844 -0.751 -0.220 0.239 -0.348 -0.398 -0.513 0.843 0.642
## 2 -0.017 -0.036 0.063 -0.156 -0.201 0.138 -0.215 -0.221 0.355 0.220 -0.198
## 3 0.635 0.342 0.086 -0.169 -0.390 0.553 -0.082 0.062 -0.036 -0.137 -0.205
## 4 0.155 0.120 0.050 0.254 -0.111 0.405 -0.024 -0.065 0.247 0.080 -0.412
## 5 0.121 0.000 0.211 -0.089 -0.595 0.129 -0.387 -0.185 0.145 -0.034 -0.420
## 6 0.053 -0.084 0.264 -0.244 -0.300 -0.104 -0.113 -0.003 0.098 0.156 -0.239
## X1972 X1973 X1974 X1975 X1976 X1977 X1978 X1979 X1980 X1981 X1982
## 1 -1.095 0.264 -0.466 -0.442 -0.291 0.544 0.120 0.380 0.655 0.558 -0.286
## 2 0.001 0.377 -0.232 -0.231 -0.252 0.240 0.090 0.445 0.376 0.109 0.222
## 3 -0.081 -0.294 -0.128 -0.203 -0.677 0.530 -0.825 0.191 -0.422 -0.346 0.207
## 4 -0.360 -0.023 -0.506 -0.569 -0.799 0.550 0.060 0.706 0.266 0.249 0.421
## 5 -0.056 0.330 -0.300 -0.110 -0.169 0.157 0.093 0.342 0.344 0.165 0.288
## 6 -0.454 0.164 -0.283 -0.123 -0.136 0.505 -0.072 -0.072 0.506 0.697 -0.093
## X1983 X1984 X1985 X1986 X1987 X1988 X1989 X1990 X1991 X1992 X1993
## 1 0.240 0.252 0.397 -0.015 0.493 1.035 0.015 0.925 -0.051 -0.212 0.254
## 2 0.418 0.348 0.307 0.295 0.685 0.618 0.000 0.568 0.375 0.092 0.373
## 3 -0.088 -0.238 -0.072 0.600 -0.078 0.397 -0.041 0.809 -0.280 0.106 0.113
## 4 0.617 0.054 0.552 0.347 1.050 1.402 0.453 1.332 0.102 -0.232 0.645
## 5 0.309 0.257 0.235 0.375 0.333 0.510 0.167 0.520 0.677 0.335 -0.092
## 6 0.451 0.072 -0.009 0.287 0.836 0.560 0.174 0.367 0.419 0.252 0.268
## X1994 X1995 X1996 X1997 X1998 X1999 X2000 X2001 X2002 X2003 X2004
## 1 0.546 0.457 -0.093 0.4600000 0.6640000 1.271 1.064 1.377 1.457 0.710 1.482
## 2 0.393 0.549 0.515 0.5810000 0.9490000 0.622 0.458 0.660 0.886 1.002 0.814
## 3 1.360 -0.137 0.006 0.1110000 0.8210000 0.724 1.084 1.551 0.519 1.026 0.472
## 4 0.830 0.687 0.923 1.1130000 1.1560000 1.502 0.862 1.895 1.291 1.619 1.016
## 5 0.141 0.732 0.787 0.5419583 0.9748904 0.240 0.604 0.882 1.130 0.712 0.244
## 6 0.503 0.711 0.186 0.5670000 1.4170000 0.797 0.725 0.851 0.777 0.981 0.679
## X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016
## 1 0.513 1.838 0.794 0.808 0.9480000 1.664 1.455 0.271 1.345 0.521 1.204 1.612
## 2 1.081 0.925 0.793 0.724 0.9600000 1.501 0.913 0.760 1.021 1.013 1.190 1.392
## 3 0.217 0.390 1.389 1.043 0.9770000 1.261 1.125 1.546 1.394 1.285 1.667 1.558
## 4 1.289 1.422 1.249 1.205 0.9580000 2.275 1.403 1.166 1.221 1.713 1.153 1.787
## 5 0.878 0.480 0.928 0.498 0.9177623 1.131 0.681 0.720 1.012 0.840 0.679 1.209
## 6 1.039 1.223 0.923 0.571 0.6810000 1.313 0.865 1.348 0.846 0.724 1.352 1.766
## X2017 X2018 X2019 X2020 X2021 X2022 X2023
## 1 1.642 1.624 0.991 0.587 1.475 2.154 1.956
## 2 1.180 1.178 1.297 1.200 1.396 0.996 1.485
## 3 1.196 2.103 1.751 1.567 1.589 1.585 2.122
## 4 1.533 1.231 1.107 1.905 2.360 1.776 2.274
## 5 1.105 0.859 1.209 1.101 0.938 0.925 0.951
## 6 1.353 0.949 1.229 1.303 1.558 1.075 1.772
summary(climate_data)
## ObjectId Country ISO2 ISO3
## Min. : 1.00 Length:236 Length:236 Length:236
## 1st Qu.: 59.75 Class :character Class :character Class :character
## Median :118.50 Mode :character Mode :character Mode :character
## Mean :118.73
## 3rd Qu.:177.25
## Max. :241.00
## Indicator Unit Source CTS.Code
## Length:236 Length:236 Length:236 Length:236
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## CTS.Name CTS.Full.Descriptor X1961 X1962
## Length:236 Length:236 Min. :-0.69700 Min. :-1.36400
## Class :character Class :character 1st Qu.:-0.07725 1st Qu.:-0.17150
## Mode :character Mode :character Median : 0.13950 Median :-0.03500
## Mean : 0.16655 Mean :-0.02863
## 3rd Qu.: 0.28100 3rd Qu.: 0.05850
## Max. : 1.89400 Max. : 1.03300
## X1963 X1964 X1965 X1966
## Min. :-2.24400 Min. :-0.87600 Min. :-1.0510 Min. :-2.3570
## 1st Qu.:-0.16725 1st Qu.:-0.23150 1st Qu.:-0.3645 1st Qu.:-0.0065
## Median :-0.01623 Median :-0.09661 Median :-0.2457 Median : 0.1073
## Mean :-0.01623 Mean :-0.09661 Mean :-0.2457 Mean : 0.1073
## 3rd Qu.: 0.17875 3rd Qu.: 0.05700 3rd Qu.:-0.1268 3rd Qu.: 0.2442
## Max. : 1.18700 Max. : 1.08300 Max. : 0.8690 Max. : 1.4240
## X1967 X1968 X1969 X1970
## Min. :-1.0230 Min. :-3.25100 Min. :-1.74000 Min. :-1.28100
## 1st Qu.:-0.2440 1st Qu.:-0.30750 1st Qu.:-0.00025 1st Qu.:-0.00875
## Median :-0.1110 Median :-0.21314 Median : 0.13660 Median : 0.10057
## Mean :-0.1077 Mean :-0.21314 Mean : 0.13660 Mean : 0.10015
## 3rd Qu.:-0.0270 3rd Qu.:-0.07275 3rd Qu.: 0.30900 3rd Qu.: 0.23600
## Max. : 1.1430 Max. : 0.47900 Max. : 0.93500 Max. : 1.00000
## X1971 X1972 X1973 X1974
## Min. :-1.3090 Min. :-1.79200 Min. :-0.89200 Min. :-0.9280
## 1st Qu.:-0.2692 1st Qu.:-0.16425 1st Qu.: 0.09675 1st Qu.:-0.3025
## Median :-0.1958 Median :-0.06672 Median : 0.23750 Median :-0.1548
## Mean :-0.1958 Mean :-0.06672 Mean : 0.23750 Mean :-0.1548
## 3rd Qu.:-0.0905 3rd Qu.: 0.09000 3rd Qu.: 0.41600 3rd Qu.:-0.0710
## Max. : 0.6650 Max. : 2.38600 Max. : 1.14700 Max. : 1.5960
## X1975 X1976 X1977 X1978
## Min. :-1.09200 Min. :-1.1540 Min. :-0.7030 Min. :-0.85500
## 1st Qu.:-0.24025 1st Qu.:-0.3757 1st Qu.: 0.0855 1st Qu.:-0.01000
## Median :-0.04500 Median :-0.2422 Median : 0.1810 Median : 0.07109
## Mean :-0.01387 Mean :-0.2422 Mean : 0.1810 Mean : 0.07109
## 3rd Qu.: 0.06450 3rd Qu.:-0.1133 3rd Qu.: 0.2933 3rd Qu.: 0.20825
## Max. : 1.89300 Max. : 0.9240 Max. : 1.0830 Max. : 0.93200
## X1979 X1980 X1981 X1982
## Min. :-1.3220 Min. :-0.7590 Min. :-1.06600 Min. :-0.68500
## 1st Qu.: 0.1520 1st Qu.: 0.1373 1st Qu.: 0.07175 1st Qu.: 0.01525
## Median : 0.2366 Median : 0.2519 Median : 0.18124 Median : 0.16815
## Mean : 0.2366 Mean : 0.2519 Mean : 0.18124 Mean : 0.16815
## 3rd Qu.: 0.3970 3rd Qu.: 0.4522 3rd Qu.: 0.32125 3rd Qu.: 0.37075
## Max. : 1.2740 Max. : 1.0620 Max. : 1.56300 Max. : 1.18200
## X1983 X1984 X1985 X1986
## Min. :-2.4210 Min. :-1.9230 Min. :-1.21000 Min. :-0.76600
## 1st Qu.: 0.1817 1st Qu.:-0.1168 1st Qu.:-0.01600 1st Qu.: 0.02575
## Median : 0.3335 Median : 0.0803 Median : 0.07129 Median : 0.14353
## Mean : 0.3335 Mean : 0.0803 Mean : 0.07129 Mean : 0.14353
## 3rd Qu.: 0.6550 3rd Qu.: 0.2482 3rd Qu.: 0.25900 3rd Qu.: 0.30000
## Max. : 1.4460 Max. : 1.8960 Max. : 1.78900 Max. : 0.90700
## X1987 X1988 X1989 X1990
## Min. :-1.6490 Min. :-2.1080 Min. :-1.6760 Min. :-0.7180
## 1st Qu.: 0.2355 1st Qu.: 0.3307 1st Qu.: 0.0135 1st Qu.: 0.2820
## Median : 0.3927 Median : 0.4760 Median : 0.2180 Median : 0.5465
## Mean : 0.3914 Mean : 0.4760 Mean : 0.2474 Mean : 0.5500
## 3rd Qu.: 0.6873 3rd Qu.: 0.6310 3rd Qu.: 0.3257 3rd Qu.: 0.6743
## Max. : 1.4290 Max. : 1.4020 Max. : 2.1840 Max. : 1.8390
## X1991 X1992 X1993 X1994
## Min. :-0.6440 Min. :-1.35600 Min. :-1.5210 Min. :-0.5750
## 1st Qu.: 0.2248 1st Qu.: 0.00875 1st Qu.: 0.0185 1st Qu.: 0.2855
## Median : 0.3757 Median : 0.23500 Median : 0.2260 Median : 0.5200
## Mean : 0.3757 Mean : 0.22454 Mean : 0.2024 Mean : 0.5922
## 3rd Qu.: 0.5380 3rd Qu.: 0.51200 3rd Qu.: 0.4868 3rd Qu.: 0.7778
## Max. : 1.3160 Max. : 1.60200 Max. : 1.1140 Max. : 1.9670
## X1995 X1996 X1997 X1998
## Min. :-0.4310 Min. :-0.7930 Min. :-0.5180 Min. :-0.5010
## 1st Qu.: 0.3685 1st Qu.: 0.0525 1st Qu.: 0.3000 1st Qu.: 0.7883
## Median : 0.6224 Median : 0.2915 Median : 0.5420 Median : 0.9990
## Mean : 0.6224 Mean : 0.2915 Mean : 0.5420 Mean : 0.9749
## 3rd Qu.: 0.7925 3rd Qu.: 0.5158 3rd Qu.: 0.7732 3rd Qu.: 1.1965
## Max. : 2.1010 Max. : 1.7100 Max. : 1.9660 Max. : 2.4840
## X1999 X2000 X2001 X2002
## Min. :-0.5510 Min. :-0.7930 Min. :-0.1510 Min. :-0.0730
## 1st Qu.: 0.4105 1st Qu.: 0.3550 1st Qu.: 0.5192 1st Qu.: 0.6913
## Median : 0.7015 Median : 0.5780 Median : 0.7915 Median : 0.8790
## Mean : 0.7413 Mean : 0.6749 Mean : 0.8503 Mean : 0.9295
## 3rd Qu.: 1.0020 3rd Qu.: 0.9882 3rd Qu.: 1.2277 3rd Qu.: 1.1327
## Max. : 2.0670 Max. : 2.0460 Max. : 2.0170 Max. : 2.2550
## X2003 X2004 X2005 X2006
## Min. :-0.4550 Min. :-0.5160 Min. :-0.2870 Min. :-0.1660
## 1st Qu.: 0.6442 1st Qu.: 0.5730 1st Qu.: 0.6647 1st Qu.: 0.6310
## Median : 0.8440 Median : 0.7794 Median : 0.8793 Median : 0.8545
## Mean : 0.8413 Mean : 0.7794 Mean : 0.8793 Mean : 0.8903
## 3rd Qu.: 1.0277 3rd Qu.: 0.9567 3rd Qu.: 1.1000 3rd Qu.: 1.1253
## Max. : 2.3630 Max. : 2.1610 Max. : 2.4460 Max. : 3.8860
## X2007 X2008 X2009 X2010
## Min. :-0.4570 Min. :-0.0330 Min. :-0.1950 Min. :-0.3420
## 1st Qu.: 0.6915 1st Qu.: 0.4510 1st Qu.: 0.6847 1st Qu.: 0.7845
## Median : 0.9030 Median : 0.7475 Median : 0.9178 Median : 1.1178
## Mean : 1.0257 Mean : 0.8090 Mean : 0.9178 Mean : 1.1196
## 3rd Qu.: 1.1458 3rd Qu.: 1.0430 3rd Qu.: 1.1798 3rd Qu.: 1.3300
## Max. : 3.0500 Max. : 2.5870 Max. : 1.9060 Max. : 3.2310
## X2011 X2012 X2013 X2014
## Min. :-0.3560 Min. :-0.2730 Min. :0.1730 Min. :0.1300
## 1st Qu.: 0.5797 1st Qu.: 0.5890 1st Qu.:0.7225 1st Qu.:0.7097
## Median : 0.8035 Median : 0.8615 Median :0.9420 Median :1.0060
## Mean : 0.8273 Mean : 0.9087 Mean :0.9420 Mean :1.1141
## 3rd Qu.: 1.0580 3rd Qu.: 1.1578 3rd Qu.:1.1590 3rd Qu.:1.2810
## Max. : 2.1050 Max. : 3.8250 Max. :2.2710 Max. :3.3820
## X2015 X2016 X2017 X2018
## Min. :-0.1320 Min. :0.234 Min. :0.243 Min. :0.3020
## 1st Qu.: 0.9945 1st Qu.:1.155 1st Qu.:1.023 1st Qu.:0.8575
## Median : 1.2613 Median :1.440 Median :1.277 Median :1.1690
## Mean : 1.2613 Mean :1.440 Mean :1.281 Mean :1.3047
## 3rd Qu.: 1.5063 3rd Qu.:1.692 3rd Qu.:1.501 3rd Qu.:1.6442
## Max. : 3.3130 Max. :5.327 Max. :3.335 Max. :4.0640
## X2019 X2020 X2021 X2022
## Min. :0.121 Min. :0.059 Min. :-0.5690 Min. :-0.6980
## 1st Qu.:1.133 1st Qu.:1.157 1st Qu.: 0.9465 1st Qu.: 0.8598
## Median :1.401 Median :1.476 Median : 1.3098 Median : 1.3010
## Mean :1.429 Mean :1.523 Mean : 1.3098 Mean : 1.3490
## 3rd Qu.:1.658 3rd Qu.:1.797 3rd Qu.: 1.6035 3rd Qu.: 1.8030
## Max. :2.871 Max. :3.669 Max. : 3.0120 Max. : 3.5120
## X2023
## Min. :0.215
## 1st Qu.:1.198
## Median :1.647
## Mean :1.647
## 3rd Qu.:2.066
## Max. :3.578
Only the numeric columns are needed, and they need to be scaled for PCA.
scaled <- scale(climate_data[, -c(1:10)])
scaled <- as.data.frame(scaled)
Plot the data correlation plot
data_corr <- cor(scaled, method = "pearson")
corrplot(data_corr, order = "alphabet", tl.cex = 0.6)
Strong correlation past the year 1999 can be clearly seen. Let’s perform MDS
data_dist <- dist(t(scaled))
years = colnames(scaled)
years_numeric <- as.numeric(substr(years, 2, 5))
mds <- cmdscale(data_dist, k = 2)
colors <- ifelse(years_numeric > 1999, "red", "blue")
plot(mds, type = 'n')
text(mds, labels = years, cex=0.6, adj=0.5, col=colors)
MDS also shows that years past 1999 are clustered relatively tightly together.
Let’s perform PCA:
pca_result <- prcomp(scaled, scale. = FALSE)
summary(pca_result)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 4.0030 2.9361 2.5436 2.1030 1.61980 1.56622 1.4527
## Proportion of Variance 0.2544 0.1368 0.1027 0.0702 0.04165 0.03894 0.0335
## Cumulative Proportion 0.2544 0.3912 0.4939 0.5641 0.60573 0.64466 0.6782
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 1.29055 1.24609 1.16145 1.12926 1.09298 1.00824 0.96600
## Proportion of Variance 0.02644 0.02465 0.02141 0.02024 0.01896 0.01614 0.01481
## Cumulative Proportion 0.70460 0.72924 0.75065 0.77090 0.78986 0.80599 0.82081
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.91572 0.89490 0.86529 0.83392 0.82058 0.71412 0.69462
## Proportion of Variance 0.01331 0.01271 0.01188 0.01104 0.01069 0.00809 0.00766
## Cumulative Proportion 0.83412 0.84683 0.85871 0.86975 0.88044 0.88853 0.89619
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.67690 0.66746 0.64193 0.61953 0.58257 0.56321 0.54381
## Proportion of Variance 0.00727 0.00707 0.00654 0.00609 0.00539 0.00504 0.00469
## Cumulative Proportion 0.90346 0.91054 0.91708 0.92317 0.92856 0.93359 0.93829
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.51528 0.50544 0.48645 0.47296 0.47121 0.44437 0.42447
## Proportion of Variance 0.00421 0.00406 0.00376 0.00355 0.00352 0.00313 0.00286
## Cumulative Proportion 0.94250 0.94656 0.95031 0.95386 0.95739 0.96052 0.96338
## PC36 PC37 PC38 PC39 PC40 PC41 PC42
## Standard deviation 0.41881 0.39643 0.39063 0.38511 0.37094 0.35810 0.34525
## Proportion of Variance 0.00278 0.00249 0.00242 0.00235 0.00218 0.00204 0.00189
## Cumulative Proportion 0.96616 0.96866 0.97108 0.97344 0.97562 0.97766 0.97955
## PC43 PC44 PC45 PC46 PC47 PC48 PC49
## Standard deviation 0.32539 0.3176 0.31167 0.3073 0.30176 0.28222 0.27077
## Proportion of Variance 0.00168 0.0016 0.00154 0.0015 0.00145 0.00126 0.00116
## Cumulative Proportion 0.98123 0.9828 0.98437 0.9859 0.98732 0.98858 0.98974
## PC50 PC51 PC52 PC53 PC54 PC55 PC56
## Standard deviation 0.26537 0.25778 0.24867 0.24794 0.24151 0.22611 0.21434
## Proportion of Variance 0.00112 0.00105 0.00098 0.00098 0.00093 0.00081 0.00073
## Cumulative Proportion 0.99086 0.99192 0.99290 0.99387 0.99480 0.99561 0.99634
## PC57 PC58 PC59 PC60 PC61 PC62 PC63
## Standard deviation 0.20212 0.19784 0.18754 0.1777 0.1775 0.16909 0.15409
## Proportion of Variance 0.00065 0.00062 0.00056 0.0005 0.0005 0.00045 0.00038
## Cumulative Proportion 0.99699 0.99761 0.99817 0.9987 0.9992 0.99962 1.00000
As we can see, 63 variables can be condenced into 22 components and keep 90% of variance. Therefore, we use 22 components for further analysis.
pca_scores <- as.data.frame(pca_result$x)
pca_subset <- pca_scores[, 1:22]
Now using the extracted components we perform K-means clustering to cluster countries into 3
set.seed(123)
kmeans_result <- kmeans(pca_subset, centers = 3)
clustered_data <- cbind(climate_data, cluster = kmeans_result$cluster)
Visualisation of clusters.
ggplot(pca_subset, aes(x = PC1, y = PC2, color = factor(kmeans_result$cluster))) +
geom_point(size = 3) +
labs(title = "Clusters of Observations in PCA Space", color = "Cluster") +
theme_minimal()
Now we’d like to plot the clustered data with some labels to have a better idea of what the clusters represent:
countries <- climate_data[, 2]
clustered_data <- cbind(Country = countries,
KMeans_Cluster = kmeans_result$cluster)
clustered_data <- as.data.frame(clustered_data)
ggplot(data = cbind(pca_subset, Country = countries, Cluster = factor(kmeans_result$cluster)),
aes(x = PC1, y = PC2, color = Cluster, label = Country)) +
geom_point(size = 3) + # Plot the points
geom_text_repel(size = 3, max.overlaps = 10) +
labs(title = "Clusters of Observations with Country Names in PCA Space",
x = "Principal Component 1",
y = "Principal Component 2",
color = "Cluster") +
theme_minimal()
Now let’s colour the world map based on the clustering results. First we need to handle the name mismatch between the dataset’s and map’s country names
world_map <- map_data("world")
world_map <- as.data.frame(world_map)
world_map$region <- as.character(world_map$region)
dataset_countries <- unique(clustered_data$Country)
map_countries <- unique(world_map$region)
mismatched <- setdiff(dataset_countries, map_countries)
print(mismatched)
## [1] "Afghanistan, Islamic Rep. of" "Africa"
## [3] "Americas" "Andorra, Principality of"
## [5] "Antigua and Barbuda" "Armenia, Rep. of"
## [7] "Aruba, Kingdom of the Netherlands" "Asia"
## [9] "Azerbaijan, Rep. of" "Bahamas, The"
## [11] "Bahrain, Kingdom of" "Belarus, Rep. of"
## [13] "British Virgin Islands" "Brunei Darussalam"
## [15] "Cabo Verde" "Central African Rep."
## [17] "China, P.R.: Hong Kong" "China, P.R.: Macao"
## [19] "China, P.R.: Mainland" "Comoros, Union of the"
## [21] "Congo, Dem. Rep. of the" "Congo, Rep. of"
## [23] "Côte d'Ivoire" "Croatia, Rep. of"
## [25] "Czech Rep." "Dominican Rep."
## [27] "Egypt, Arab Rep. of" "Equatorial Guinea, Rep. of"
## [29] "Eritrea, The State of" "Estonia, Rep. of"
## [31] "Eswatini, Kingdom of" "Ethiopia, The Federal Dem. Rep. of"
## [33] "Europe" "Falkland Islands (Malvinas)"
## [35] "Fiji, Rep. of" "Gambia, The"
## [37] "Gibraltar" "Guiana, French"
## [39] "Holy See" "Iran, Islamic Rep. of"
## [41] "Kazakhstan, Rep. of" "Korea, Dem. People's Rep. of"
## [43] "Korea, Rep. of" "Kyrgyz Rep."
## [45] "Lao People's Dem. Rep." "Lesotho, Kingdom of"
## [47] "Madagascar, Rep. of" "Marshall Islands, Rep. of the"
## [49] "Mauritania, Islamic Rep. of" "Micronesia, Federated States of"
## [51] "Moldova, Rep. of" "Mozambique, Rep. of"
## [53] "Nauru, Rep. of" "Netherlands, The"
## [55] "North Macedonia, Republic of " "Oceania"
## [57] "Palau, Rep. of" "Poland, Rep. of"
## [59] "Réunion" "Russian Federation"
## [61] "San Marino, Rep. of" "São Tomé and Príncipe, Dem. Rep. of"
## [63] "Serbia, Rep. of" "Slovak Rep."
## [65] "Slovenia, Rep. of" "South Sudan, Rep. of"
## [67] "St. Kitts and Nevis" "St. Lucia"
## [69] "St. Vincent and the Grenadines" "Svalbard Jan Mayen Islands"
## [71] "Syrian Arab Rep." "Taiwan Province of China"
## [73] "Tajikistan, Rep. of" "Tanzania, United Rep. of"
## [75] "Timor-Leste, Dem. Rep. of" "Tokelau"
## [77] "Trinidad and Tobago" "Türkiye, Rep. of"
## [79] "Tuvalu" "United Kingdom"
## [81] "United States" "United States Virgin Islands"
## [83] "Uzbekistan, Rep. of" "Venezuela, Rep. Bolivariana de"
## [85] "Wallis and Futuna Islands" "West Bank and Gaza"
## [87] "World" "Yemen, Rep. of"
clustered_data$Country <- countrycode(
clustered_data$Country,
origin = "country.name",
destination = "country.name"
)
clustered_data$Country <- ifelse(
clustered_data$Country == "Côte d'Ivoire",
"Ivory Coast",
clustered_data$Country
)
clustered_data <- clustered_data %>%
filter(!Country %in% c("Africa", "Asia", "World", "Antarctica", "Americas"))
colnames(clustered_data) <- c("Country", "KMeans_Cluster")
clustered_data$Country <- as.character(clustered_data$Country)
clustered_data$Country <- ifelse(clustered_data$Country == "United States", "USA", clustered_data$Country)
Now that the mismatch is handled, we can plot the world map with clustering.
map_data_with_clusters <- left_join(world_map, clustered_data, by = c("region" = "Country"))
# Plot the map
ggplot(map_data_with_clusters, aes(x = long, y = lat, group = group, fill = factor(KMeans_Cluster))) +
geom_polygon(color = "black") +
scale_fill_manual(values = c("green", "blue", "red")) +
labs(title = "Map Colored by KMeans Clusters", fill = "Cluster") +
theme_minimal() +
coord_fixed(ratio = 1.1)
Now we need to perform the same steps, except for the Gas emissions dataset
perform PCA on the gas emissions data
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 5.4645 1.42641 0.25046 0.15161 0.08615 0.07190 0.05440
## Proportion of Variance 0.9331 0.06358 0.00196 0.00072 0.00023 0.00016 0.00009
## Cumulative Proportion 0.9331 0.99672 0.99868 0.99939 0.99963 0.99979 0.99988
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.03792 0.02699 0.02088 0.01791 0.01403 0.01206 0.01043
## Proportion of Variance 0.00004 0.00002 0.00001 0.00001 0.00001 0.00000 0.00000
## Cumulative Proportion 0.99993 0.99995 0.99996 0.99997 0.99998 0.99998 0.99999
## PC15 PC16 PC17 PC18 PC19 PC20
## Standard deviation 0.01001 0.007621 0.006874 0.006775 0.0059 0.005428
## Proportion of Variance 0.00000 0.000000 0.000000 0.000000 0.0000 0.000000
## Cumulative Proportion 0.99999 0.999990 0.999990 0.999990 1.0000 1.000000
## PC21 PC22 PC23 PC24 PC25 PC26
## Standard deviation 0.005289 0.004593 0.004286 0.004059 0.003563 0.003351
## Proportion of Variance 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## Cumulative Proportion 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
## PC27 PC28 PC29 PC30 PC31 PC32
## Standard deviation 0.003022 0.002799 0.002521 0.002247 0.001679 0.001589
## Proportion of Variance 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
## Cumulative Proportion 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
We use only 2 components as they capture 99% of variance.
Clusters with labels
## Warning: ggrepel: 185 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
Country name mismatch handling and map plotting
## [1] "United States" "European Union (27)"
## [3] "Türkiye" "United Kingdom"
## [5] "Côte d'Ivoire" "Trinidad and Tobago"
## [7] "Macedonia" "Eswatini"
## [9] "Antigua and Barbuda" "Saint Kitts and Nevis"
## [11] "Saint Vincent and the Grenadines" "Tuvalu"
## Warning: Some values were not matched unambiguously: European Union (27), Micronesia
We can see that there are 2 extreme outliers: China and USA. The rest of the countries are fairly similar, but countries like Russia, India or Brazil have a little more emissions than others.
However, the patterns seen on the temperature change map do not exactly correlate to the patterns seen on the gas emissions map.
While more precise analysis is needed, this visual representation might provide insight in how gas emissions affect the temperature change for each country.