Project 2 - part 2

Overview

I am taking the .csv file of the Annual Surface Tempurature Change from climatedata.imf.org. My goal is to tidy the dataset for analysis. I want to see which regions are most affected by climate change.

Lets read the .csv file into R from my github repository.

climate <- read.csv("https://raw.githubusercontent.com/evelynbartley/Data-607/main/Indicator_3_1_Climate_Indicators_Annual_Mean_Global_Surface_Temperature_577579683071085080.csv")
tibble(climate)

## # A tibble: 225 × 72
##    ObjectId Country         ISO2  ISO3  Indicator Unit  Source CTS.Code CTS.Name
##       <int> <chr>           <chr> <chr> <chr>     <chr> <chr>  <chr>    <chr>   
##  1        1 Afghanistan, I… AF    AFG   Temperat… Degr… Food … ECCS     Surface…
##  2        2 Albania         AL    ALB   Temperat… Degr… Food … ECCS     Surface…
##  3        3 Algeria         DZ    DZA   Temperat… Degr… Food … ECCS     Surface…
##  4        4 American Samoa  AS    ASM   Temperat… Degr… Food … ECCS     Surface…
##  5        5 Andorra, Princ… AD    AND   Temperat… Degr… Food … ECCS     Surface…
##  6        6 Angola          AO    AGO   Temperat… Degr… Food … ECCS     Surface…
##  7        7 Anguilla        AI    AIA   Temperat… Degr… Food … ECCS     Surface…
##  8        8 Antigua and Ba… AG    ATG   Temperat… Degr… Food … ECCS     Surface…
##  9        9 Argentina       AR    ARG   Temperat… Degr… Food … ECCS     Surface…
## 10       10 Armenia, Rep. … AM    ARM   Temperat… Degr… Food … ECCS     Surface…
## # ℹ 215 more rows
## # ℹ 63 more variables: CTS.Full.Descriptor <chr>, X1961 <dbl>, X1962 <dbl>,
## #   X1963 <dbl>, X1964 <dbl>, X1965 <dbl>, X1966 <dbl>, X1967 <dbl>,
## #   X1968 <dbl>, X1969 <dbl>, X1970 <dbl>, X1971 <dbl>, X1972 <dbl>,
## #   X1973 <dbl>, X1974 <dbl>, X1975 <dbl>, X1976 <dbl>, X1977 <dbl>,
## #   X1978 <dbl>, X1979 <dbl>, X1980 <dbl>, X1981 <dbl>, X1982 <dbl>,
## #   X1983 <dbl>, X1984 <dbl>, X1985 <dbl>, X1986 <dbl>, X1987 <dbl>, …

Let’s clean up the dataset to include the variables we need for analysis. I want to use the ISO3 code for each country instead of the Country’s name for tidyness.

climate1 <- climate |> 
  select(Country = ISO3, X1961:X2000)
tibble(climate1)

## # A tibble: 225 × 41
##    Country  X1961  X1962  X1963  X1964  X1965  X1966  X1967  X1968  X1969  X1970
##    <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
##  1 AFG     -0.113 -0.164  0.847 -0.764 -0.244  0.226 -0.371 -0.423 -0.539  0.813
##  2 ALB      0.627  0.326  0.075 -0.166 -0.388  0.559 -0.074  0.081 -0.013 -0.106
##  3 DZA      0.164  0.114  0.077  0.25  -0.1    0.433 -0.026 -0.067  0.291  0.116
##  4 ASM      0.079 -0.042  0.169 -0.14  -0.562  0.181 -0.368 -0.187  0.132 -0.047
##  5 AND      0.736  0.112 -0.752  0.308 -0.49   0.415  0.637  0.018 -0.137  0.121
##  6 AGO      0.041 -0.152 -0.19  -0.229 -0.196  0.175 -0.081 -0.193  0.188  0.248
##  7 AIA      0.086 -0.024  0.234  0.189 -0.365 -0.001 -0.257 -0.2    0.317  0.082
##  8 ATG      0.09   0.031  0.288  0.214 -0.385  0.097 -0.192 -0.225  0.271  0.109
##  9 ARG      0.122 -0.046  0.162 -0.343  0.09  -0.163  0      0.472  0.292  0.438
## 10 ARM     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA    
## # ℹ 215 more rows
## # ℹ 30 more variables: X1971 <dbl>, X1972 <dbl>, X1973 <dbl>, X1974 <dbl>,
## #   X1975 <dbl>, X1976 <dbl>, X1977 <dbl>, X1978 <dbl>, X1979 <dbl>,
## #   X1980 <dbl>, X1981 <dbl>, X1982 <dbl>, X1983 <dbl>, X1984 <dbl>,
## #   X1985 <dbl>, X1986 <dbl>, X1987 <dbl>, X1988 <dbl>, X1989 <dbl>,
## #   X1990 <dbl>, X1991 <dbl>, X1992 <dbl>, X1993 <dbl>, X1994 <dbl>,
## #   X1995 <dbl>, X1996 <dbl>, X1997 <dbl>, X1998 <dbl>, X1999 <dbl>, …

Instead of having columns for every year, I want to have one column for year, and one column for the surface temperature change in degrees Celcius.

climate2 <- climate1 %>%
  pivot_longer(
    cols = starts_with("X"),
    names_to = "Year",
    values_to = "TempChange"
  )
tibble(climate2)

## # A tibble: 9,000 × 3
##    Country Year  TempChange
##    <chr>   <chr>      <dbl>
##  1 AFG     X1961     -0.113
##  2 AFG     X1962     -0.164
##  3 AFG     X1963      0.847
##  4 AFG     X1964     -0.764
##  5 AFG     X1965     -0.244
##  6 AFG     X1966      0.226
##  7 AFG     X1967     -0.371
##  8 AFG     X1968     -0.423
##  9 AFG     X1969     -0.539
## 10 AFG     X1970      0.813
## # ℹ 8,990 more rows

To create one value that we can reference for the change in surface temperature from 1961 to 2000, I want to calculate the average change in surface temperature for each country.

climate3 <- climate2 %>%
  group_by(Country) %>%
  summarise(avg = mean(TempChange, na.rm = TRUE))
head(climate3)

## # A tibble: 6 × 2
##   Country    avg
##   <chr>    <dbl>
## 1 ABW     0.147 
## 2 AFG     0.139 
## 3 AGO     0.212 
## 4 AIA     0.189 
## 5 ALB     0.0844
## 6 AND     0.380

Lets see the distribution of average change in temperature.

avgofavgs <- mean(climate3$avg, na.rm = TRUE)
ggplot(climate3, aes(x = avg)) + geom_histogram() + geom_vline(aes(xintercept = avgofavgs), color = "tomato", linewidth = 1)

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 4 rows containing non-finite values (`stat_bin()`).

Our distribution is looking pretty normal! There does seem to be two outliers.

I want to see which country had the highest average change in temperature and which had the lowest.

climate3[which.min(climate3$avg), ]

## # A tibble: 1 × 2
##   Country    avg
##   <chr>    <dbl>
## 1 GRL     -0.156

climate3[which.max(climate3$avg), ]

## # A tibble: 1 × 2
##   Country   avg
##   <chr>   <dbl>
## 1 LUX      1.65

Project 2 - part 2

Evelyn Bartley

2024-03-02

Overview

Analysis and Conclusion