I am taking the .csv file of the Annual Surface Tempurature Change from climatedata.imf.org. My goal is to tidy the dataset for analysis. I want to see which regions are most affected by climate change.
Lets read the .csv file into R from my github repository.
climate <- read.csv("https://raw.githubusercontent.com/evelynbartley/Data-607/main/Indicator_3_1_Climate_Indicators_Annual_Mean_Global_Surface_Temperature_577579683071085080.csv")
tibble(climate)
## # A tibble: 225 × 72
## ObjectId Country ISO2 ISO3 Indicator Unit Source CTS.Code CTS.Name
## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 Afghanistan, I… AF AFG Temperat… Degr… Food … ECCS Surface…
## 2 2 Albania AL ALB Temperat… Degr… Food … ECCS Surface…
## 3 3 Algeria DZ DZA Temperat… Degr… Food … ECCS Surface…
## 4 4 American Samoa AS ASM Temperat… Degr… Food … ECCS Surface…
## 5 5 Andorra, Princ… AD AND Temperat… Degr… Food … ECCS Surface…
## 6 6 Angola AO AGO Temperat… Degr… Food … ECCS Surface…
## 7 7 Anguilla AI AIA Temperat… Degr… Food … ECCS Surface…
## 8 8 Antigua and Ba… AG ATG Temperat… Degr… Food … ECCS Surface…
## 9 9 Argentina AR ARG Temperat… Degr… Food … ECCS Surface…
## 10 10 Armenia, Rep. … AM ARM Temperat… Degr… Food … ECCS Surface…
## # ℹ 215 more rows
## # ℹ 63 more variables: CTS.Full.Descriptor <chr>, X1961 <dbl>, X1962 <dbl>,
## # X1963 <dbl>, X1964 <dbl>, X1965 <dbl>, X1966 <dbl>, X1967 <dbl>,
## # X1968 <dbl>, X1969 <dbl>, X1970 <dbl>, X1971 <dbl>, X1972 <dbl>,
## # X1973 <dbl>, X1974 <dbl>, X1975 <dbl>, X1976 <dbl>, X1977 <dbl>,
## # X1978 <dbl>, X1979 <dbl>, X1980 <dbl>, X1981 <dbl>, X1982 <dbl>,
## # X1983 <dbl>, X1984 <dbl>, X1985 <dbl>, X1986 <dbl>, X1987 <dbl>, …
Let’s clean up the dataset to include the variables we need for analysis. I want to use the ISO3 code for each country instead of the Country’s name for tidyness.
climate1 <- climate |>
select(Country = ISO3, X1961:X2000)
tibble(climate1)
## # A tibble: 225 × 41
## Country X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968 X1969 X1970
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AFG -0.113 -0.164 0.847 -0.764 -0.244 0.226 -0.371 -0.423 -0.539 0.813
## 2 ALB 0.627 0.326 0.075 -0.166 -0.388 0.559 -0.074 0.081 -0.013 -0.106
## 3 DZA 0.164 0.114 0.077 0.25 -0.1 0.433 -0.026 -0.067 0.291 0.116
## 4 ASM 0.079 -0.042 0.169 -0.14 -0.562 0.181 -0.368 -0.187 0.132 -0.047
## 5 AND 0.736 0.112 -0.752 0.308 -0.49 0.415 0.637 0.018 -0.137 0.121
## 6 AGO 0.041 -0.152 -0.19 -0.229 -0.196 0.175 -0.081 -0.193 0.188 0.248
## 7 AIA 0.086 -0.024 0.234 0.189 -0.365 -0.001 -0.257 -0.2 0.317 0.082
## 8 ATG 0.09 0.031 0.288 0.214 -0.385 0.097 -0.192 -0.225 0.271 0.109
## 9 ARG 0.122 -0.046 0.162 -0.343 0.09 -0.163 0 0.472 0.292 0.438
## 10 ARM NA NA NA NA NA NA NA NA NA NA
## # ℹ 215 more rows
## # ℹ 30 more variables: X1971 <dbl>, X1972 <dbl>, X1973 <dbl>, X1974 <dbl>,
## # X1975 <dbl>, X1976 <dbl>, X1977 <dbl>, X1978 <dbl>, X1979 <dbl>,
## # X1980 <dbl>, X1981 <dbl>, X1982 <dbl>, X1983 <dbl>, X1984 <dbl>,
## # X1985 <dbl>, X1986 <dbl>, X1987 <dbl>, X1988 <dbl>, X1989 <dbl>,
## # X1990 <dbl>, X1991 <dbl>, X1992 <dbl>, X1993 <dbl>, X1994 <dbl>,
## # X1995 <dbl>, X1996 <dbl>, X1997 <dbl>, X1998 <dbl>, X1999 <dbl>, …
Instead of having columns for every year, I want to have one column for year, and one column for the surface temperature change in degrees Celcius.
climate2 <- climate1 %>%
pivot_longer(
cols = starts_with("X"),
names_to = "Year",
values_to = "TempChange"
)
tibble(climate2)
## # A tibble: 9,000 × 3
## Country Year TempChange
## <chr> <chr> <dbl>
## 1 AFG X1961 -0.113
## 2 AFG X1962 -0.164
## 3 AFG X1963 0.847
## 4 AFG X1964 -0.764
## 5 AFG X1965 -0.244
## 6 AFG X1966 0.226
## 7 AFG X1967 -0.371
## 8 AFG X1968 -0.423
## 9 AFG X1969 -0.539
## 10 AFG X1970 0.813
## # ℹ 8,990 more rows
To create one value that we can reference for the change in surface temperature from 1961 to 2000, I want to calculate the average change in surface temperature for each country.
climate3 <- climate2 %>%
group_by(Country) %>%
summarise(avg = mean(TempChange, na.rm = TRUE))
head(climate3)
## # A tibble: 6 × 2
## Country avg
## <chr> <dbl>
## 1 ABW 0.147
## 2 AFG 0.139
## 3 AGO 0.212
## 4 AIA 0.189
## 5 ALB 0.0844
## 6 AND 0.380
Lets see the distribution of average change in temperature.
avgofavgs <- mean(climate3$avg, na.rm = TRUE)
ggplot(climate3, aes(x = avg)) + geom_histogram() + geom_vline(aes(xintercept = avgofavgs), color = "tomato", linewidth = 1)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).
Our distribution is looking pretty normal! There does seem to be two
outliers.
I want to see which country had the highest average change in temperature and which had the lowest.
climate3[which.min(climate3$avg), ]
## # A tibble: 1 × 2
## Country avg
## <chr> <dbl>
## 1 GRL -0.156
climate3[which.max(climate3$avg), ]
## # A tibble: 1 × 2
## Country avg
## <chr> <dbl>
## 1 LUX 1.65
The country with the lowest average change in temperature was Greenland at -0.156 degrees Celcius and the country with the highest average change in temperature was Luxembourg at 1.651 degrees Celcius. Using the small arrows when viewing climate3, we can arrange the averages in descending order. From this, I can see that Luxenbourg, Belgium, Estonia, Latvia, and Slovenia had the five highest average changes in temperatures. This provides evidence that Europe is being effected by climate change the most.