DATA 110 - Assignment 6 Pt_2

Author

Kalina Peterson

Loading the Dataset & Libraries

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.5.2
Warning: package 'ggplot2' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.5
✔ ggplot2   4.0.2     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.3.0
✔ purrr     1.1.0     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(alluvial)
Warning: package 'alluvial' was built under R version 4.5.2
library(ggalluvial)
Warning: package 'ggalluvial' was built under R version 4.5.2
setwd("C:/Users/kpeter81/OneDrive - montgomerycollege.edu/Datasets")
nations <- read_csv("nations.csv")
Rows: 5275 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso2c, iso3c, country, region, income
dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Creating a New Variable: GDP (in trillions of dollars)

  • For both charts, you will first need to create a new variable in the data, usingmutate from dplyr, giving the GDP of each country in trillions of dollars, by multiplying gdp_percap by population and dividing by a trillion.
nations1 <- nations |>
  filter(!is.na(gdp_percap))|>
  mutate(gdp = (gdp_percap * population)/1000000000000)
head(nations1)
# A tibble: 6 × 11
  iso2c iso3c country   year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>    <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 AE    ARE   United …  1991     73037.    1913190       24.6                7.9
2 AE    ARE   United …  1993     71960.    2127863       22.4                7.3
3 AE    ARE   United …  2001     83534.    3217865       15.8                5.5
4 AE    ARE   United …  1992     73154.    2019014       23.5                7.6
5 AE    ARE   United …  1994     74684.    2238281       21.3                6.9
6 AE    ARE   United …  2007     75427.    6010100       12.8                4.7
# ℹ 3 more variables: region <chr>, income <chr>, gdp <dbl>
  • For both charts, you will first need to create a new variable in the data, usingmutate from dplyr, giving the GDP of each country in trillions of dollars, by multiplying gdp_percap by population and dividing by a trillion.

  • Draw both charts with ggplot2.

  • For the first chart, you will need to filter the data with dplyr for the four desired countries. When making the chart with ggplot2 you will need to add both geom_point and geom_line layers, and use the Set1 ColorBrewer palette using:  scale_color_brewer(palette = “Set1”).

nations2 <- nations1 |>
  filter(country %in% c("Japan", "Germany", "United States", "China"))
head(nations2)
# A tibble: 6 × 11
  iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
  <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
1 CN    CHN   China    1992      1260. 1164970000       18.3               29.4
2 CN    CHN   China    2005      5053. 1303720000       12.4               14  
3 CN    CHN   China    2000      2915. 1262645000       14.0               21.2
4 CN    CHN   China    1991      1091. 1150780000       19.7               29.7
5 CN    CHN   China    2013     12219. 1357380000       12.1                6.3
6 CN    CHN   China    1999      2650. 1252735000       14.6               22.2
# ℹ 3 more variables: region <chr>, income <chr>, gdp <dbl>
p1 <- nations2 |>
  ggplot(aes(x = year, y = gdp, fill = country, color = country)) +
  theme_minimal()+
  geom_point() +
  geom_line ()+
  scale_fill_brewer(palette = "Set1") +
  labs(title = "China's Rise to Become the Largest Economy",
       y = "GDP ($ trillion)",
       x = "year")
p1

  • pFor the second chart, using dplyr you will need to group_by region and  year, and then summarize on your mutated value for gdp

  • using summarise(GDP = sum(gdp, na.rm = TRUE)). (There will be null values, or NAs, in this data, so you will need to use na.rm = TRUE).

  • Each region’s area will be generated by the command geom_area () 

  • When drawing the chart with ggplot2, you will need to use the Set2 ColorBrewer palette using  scale_fill_brewer(palette = “Set2”)

  • Think about the difference between filland color when making the chart, and where the above fill command needs to go in order for the regions to fill with the different colors when making the chart, and put a very thin white line around each area.

nations3 <- nations1 |>
  group_by(region, year) |> 
  summarise(GDP = sum(gdp, na.rm = TRUE))
`summarise()` has grouped output by 'region'. You can override using the
`.groups` argument.
p2 <- nations3 |>
  ggplot(aes(x = year, y = GDP, fill = region)) +
     geom_area (color = "white") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "GDP by World Bank Region",
         # \n breaks the long title
       y = "GDP ($ Trillions)",
       fill = "World Bank Region")
p2