Nations HW Assignment

Author

Kittim

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)
# load data
nations <- read_csv("C:/Users/mutho/Desktop/Fall 2023/Data 110/DATASETS/d.RData")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 2 Columns: 1
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): RDX3

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
setwd("C:/Users/mutho/Desktop/Fall 2023/Data 110/DATASETS")
nations <- read_csv("nations.csv")
Rows: 5275 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso2c, iso3c, country, region, income
dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(nations)
# A tibble: 6 × 10
  iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
3 AD    AND   Andorra  2003         NA      74783       10.3                2  
4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
# ℹ 2 more variables: region <chr>, income <chr>

Practice to remove NA in gdp_percap

nations_updated <- nations |>
  filter(!is.na(gdp_percap))
head(nations_updated)
# A tibble: 6 × 10
  iso2c iso3c country   year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>    <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 AE    ARE   United …  1991     73037.    1913190       24.6                7.9
2 AE    ARE   United …  1993     71960.    2127863       22.4                7.3
3 AE    ARE   United …  2001     83534.    3217865       15.8                5.5
4 AE    ARE   United …  1992     73154.    2019014       23.5                7.6
5 AE    ARE   United …  1994     74684.    2238281       21.3                6.9
6 AE    ARE   United …  2007     75427.    6010100       12.8                4.7
# ℹ 2 more variables: region <chr>, income <chr>
nations <- read_csv("nations.csv") |> 
 mutate(gdp_tod = gdp_percap*population/10^12)
Rows: 5275 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso2c, iso3c, country, region, income
dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(nations)
# A tibble: 6 × 11
  iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
3 AD    AND   Andorra  2003         NA      74783       10.3                2  
4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
# ℹ 3 more variables: region <chr>, income <chr>, gdp_tod <dbl>
summary(nations)
    iso2c              iso3c             country               year     
 Length:5275        Length:5275        Length:5275        Min.   :1990  
 Class :character   Class :character   Class :character   1st Qu.:1996  
 Mode  :character   Mode  :character   Mode  :character   Median :2002  
                                                          Mean   :2002  
                                                          3rd Qu.:2008  
                                                          Max.   :2014  
                                                                        
   gdp_percap         population          birth_rate    neonat_mortal_rate
 Min.   :   239.7   Min.   :9.004e+03   Min.   : 6.90   Min.   : 0.70     
 1st Qu.:  2263.6   1st Qu.:7.175e+05   1st Qu.:13.40   1st Qu.: 6.70     
 Median :  6563.2   Median :5.303e+06   Median :21.60   Median :15.00     
 Mean   : 12788.8   Mean   :2.958e+07   Mean   :24.16   Mean   :19.40     
 3rd Qu.: 17195.0   3rd Qu.:1.757e+07   3rd Qu.:33.88   3rd Qu.:29.48     
 Max.   :141968.1   Max.   :1.364e+09   Max.   :55.12   Max.   :73.10     
 NA's   :766        NA's   :14          NA's   :295     NA's   :525       
    region             income             gdp_tod       
 Length:5275        Length:5275        Min.   : 0.0000  
 Class :character   Class :character   1st Qu.: 0.0077  
 Mode  :character   Mode  :character   Median : 0.0324  
                                       Mean   : 0.3259  
                                       3rd Qu.: 0.1849  
                                       Max.   :18.0829  
                                       NA's   :766      

Chart 1 represent 3 East africa countries and South Africa

east_africa_sa<- nations %>%
  filter(iso2c == "KE" | iso2c == "TZ" | iso2c == "UG" | iso2c == "ZA") |>
  arrange(year)
head(east_africa_sa)
# A tibble: 6 × 11
  iso2c iso3c country   year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>    <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 KE    KEN   Kenya     1990      1536.   23446229       42.2               27.4
2 TZ    TZA   Tanzania  1990       952.   25458208       44.1               39.5
3 UG    UGA   Uganda    1990       500.   17384369       49.8               38.5
4 ZA    ZAF   South A…  1990      6698.   35200000       29.3               20.4
5 KE    KEN   Kenya     1991      1557.   24234087       41.1               27.2
6 TZ    TZA   Tanzania  1991       972.   26307482       43.7               38.8
# ℹ 3 more variables: region <chr>, income <chr>, gdp_tod <dbl>
ggplot(east_africa_sa, aes(x = year, y = gdp_tod, color = country) ) +
  geom_line() +
  geom_point() +
  ggtitle("Kenya's Position in East Africa and it's Comparison with South Africa") +
  xlab("year") +
  ylab("GDP ($ trillion)") +
  scale_color_brewer(palette = "Set1") + 
  theme(legend.title = element_blank(), legend.key = element_rect() ) +
  theme(panel.background = element_rect(fill = "white",colour = "white") ) +
  theme(panel.grid.major = element_line(size = 0.6, linetype = 'solid', colour = "#f0f0f0") ) +
  theme(panel.grid.minor = element_line(size = 0.6, linetype = 'solid', colour = "#f0f0f0") )
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

I’ve chosen to analyze my country’s situation by examining East African countries. Kenya recently engaged in discussions with South Africa, resulting in an agreement that facilitates trade between the two nations. This development has also simplified travel for Kenyan citizens to South Africa. Consequently, I decided to investigate how Kenya is performing in comparison to South Africa.

Chat 2 by_region

by_region <- nations |> 
  mutate(gdp_tod = gdp_percap*population/10^12) |>
  group_by(year, region) |> 
  summarise(GDP = sum(gdp_tod, na.rm = TRUE)) |> 
  arrange(year, region)
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
head(by_region)
# A tibble: 6 × 3
# Groups:   year [1]
   year region                       GDP
  <dbl> <chr>                      <dbl>
1  1990 East Asia & Pacific         5.52
2  1990 Europe & Central Asia       9.36
3  1990 Latin America & Caribbean   2.40
4  1990 Middle East & North Africa  1.66
5  1990 North America               6.54
6  1990 South Asia                  1.35
chat2 <- ggplot(by_region, aes(x = year, y = GDP, fill = region)) +
  geom_area(color = "black") + 
  ggtitle("GDP by World Bank Region") +
  ylab("GDP ($ trillion)") + 
  xlab("Year") + 
  scale_fill_brewer(palette = "Set2") +
  theme(panel.background = element_rect(fill = "white",colour = "white") ) +
  theme(panel.grid.major = element_line(size = 0.6, linetype = 'solid', colour = "#f0f0f0") ) +
  theme(panel.grid.minor = element_line(size = 0.6, linetype = 'solid', colour = "#f0f0f0") ) 
chat2