Nations Data Project

Author

Brian Caceres

message = FALSE
library(tidyr)
library(ggplot2)
library(RColorBrewer)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ lubridate 1.9.2     ✔ tibble    3.2.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
getwd()
[1] "/Users/briancaceres/Desktop/Data_110"
setwd("/Users/Briancaceres/Desktop/Data_110")
nations_data <- read_csv("nations.csv")
Rows: 5275 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso2c, iso3c, country, region, income
dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
nations_data
# A tibble: 5,275 × 10
   iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
   <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
 1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
 2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
 3 AD    AND   Andorra  2003         NA      74783       10.3                2  
 4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
 5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
 6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
 7 AD    AND   Andorra  2004         NA      78337       10.9                2  
 8 AD    AND   Andorra  2010         NA      84419        9.8                1.7
 9 AD    AND   Andorra  2001         NA      67770       11.8                2.1
10 AD    AND   Andorra  2002         NA      71046       11.2                2.1
# ℹ 5,265 more rows
# ℹ 2 more variables: region <chr>, income <chr>

Creating a new data set that we will use to input into geom_point. We need to create a new column ‘gdp’ that gives us the overall gdp by year. We also need to isolate which variables we want to use in geom_point. We use the select function to isolate country, year, and our new column ‘gdp’.

chart1 <- nations_data |>
  drop_na(gdp_percap) |>
  group_by(country) |> 
  mutate(gdp = ((gdp_percap*population)/10^12)) |>
  select(country, year, gdp) |>
  filter(country == "China" | country == "Germany" | country == "Japan" | country == "United States")
chart1
# A tibble: 100 × 3
# Groups:   country [4]
   country  year   gdp
   <chr>   <dbl> <dbl>
 1 China    1992  1.47
 2 China    2005  6.59
 3 China    2000  3.68
 4 China    1991  1.26
 5 China    2013 16.6 
 6 China    1999  3.32
 7 China    2014 18.1 
 8 China    2003  5.07
 9 China    2004  5.73
10 China    1993  1.71
# ℹ 90 more rows
ggplot(chart1, aes(x = year, y = gdp, color = country)) +
  geom_point() +
  labs(x = "Year",
       y = "GDP ($ trillion)",
       title = "China's Rise to Become the Larges Economy")+
  scale_color_brewer(palette = "Set1") +
  geom_line()

Similar to above however we know use the group function in coordination with the summarise function to sum the gdp by region. We isolate the region, year, and gdp this time.

chart2 <- nations_data |> 
  group_by(country) |> 
  mutate(gdp = ((gdp_percap*population)/10^12)) |>
  select(region, country,  year, gdp) |>
  group_by(region, year) |>
  summarise(GDP = sum(gdp, na.rm = TRUE))
`summarise()` has grouped output by 'region'. You can override using the
`.groups` argument.
chart2
# A tibble: 175 × 3
# Groups:   region [7]
   region               year   GDP
   <chr>               <dbl> <dbl>
 1 East Asia & Pacific  1990  5.52
 2 East Asia & Pacific  1991  6.03
 3 East Asia & Pacific  1992  6.50
 4 East Asia & Pacific  1993  7.04
 5 East Asia & Pacific  1994  7.64
 6 East Asia & Pacific  1995  8.29
 7 East Asia & Pacific  1996  8.96
 8 East Asia & Pacific  1997  9.55
 9 East Asia & Pacific  1998  9.60
10 East Asia & Pacific  1999 10.1 
# ℹ 165 more rows
ggplot(chart2, aes(x = year, 
                   y = GDP, 
                   fill = region))+
geom_area(color = "white")+
  labs()+
  scale_fill_brewer(palette = "Set2")+
labs(x = "Year",
     y = "GDP ($ Trillion", 
     title = "GDP by World Bank Region")