Nations homework

Author

E Choi

##load the dataset

library(tidyverse)
library(ggplot2)
library(RColorBrewer)
setwd("C:/Users/enomc/OneDrive - montgomerycollege.edu/Documents/Data Science")
nations <- read_csv("nations.csv")
national<- nations |>
  mutate(gdp = (gdp_percap*population)/10^12)
nationals3 <- national|>
filter(country %in% c("Angola", "Armenia","Argentina","Austria"))
national
# A tibble: 5,275 × 11
   iso2c iso3c country  year gdp_percap population birth_rate neonat_mortal_rate
   <chr> <chr> <chr>   <dbl>      <dbl>      <dbl>      <dbl>              <dbl>
 1 AD    AND   Andorra  1996         NA      64291       10.9                2.8
 2 AD    AND   Andorra  1994         NA      62707       10.9                3.2
 3 AD    AND   Andorra  2003         NA      74783       10.3                2  
 4 AD    AND   Andorra  1990         NA      54511       11.9                4.3
 5 AD    AND   Andorra  2009         NA      85474        9.9                1.7
 6 AD    AND   Andorra  2011         NA      82326       NA                  1.6
 7 AD    AND   Andorra  2004         NA      78337       10.9                2  
 8 AD    AND   Andorra  2010         NA      84419        9.8                1.7
 9 AD    AND   Andorra  2001         NA      67770       11.8                2.1
10 AD    AND   Andorra  2002         NA      71046       11.2                2.1
# ℹ 5,265 more rows
# ℹ 3 more variables: region <chr>, income <chr>, gdp <dbl>
aplot <- nationals3 |>
  ggplot(aes(color=country, x=year, y=gdp_percap),) +
  geom_line (position = "dodge", stat = "identity") +
  scale_color_brewer(palette = "Set1") +
  geom_point() +
  scale_color_brewer(palette = "Set1") +
  labs(fill = "Country", 
       y = "Gross Domestic Product in Trillions",
       title = "Austria's road to the largest economy",
       caption = "Source: Nations dataset")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
aplot
Ignoring unknown labels:
• fill : "Country"
Warning: Width not defined
ℹ Set with `position_dodge(width = ...)`
Warning: Removed 25 rows containing missing values or values outside the scale range
(`geom_line()`).
Warning: Removed 25 rows containing missing values or values outside the scale range
(`geom_point()`).

cplot <-national |>
  group_by(region, year) |>
  summarise(sum_GDP = sum(gdp, na.rm = TRUE))
`summarise()` has grouped output by 'region'. You can override using the
`.groups` argument.
cplot
# A tibble: 175 × 3
# Groups:   region [7]
   region               year sum_GDP
   <chr>               <dbl>   <dbl>
 1 East Asia & Pacific  1990    5.52
 2 East Asia & Pacific  1991    6.03
 3 East Asia & Pacific  1992    6.50
 4 East Asia & Pacific  1993    7.04
 5 East Asia & Pacific  1994    7.64
 6 East Asia & Pacific  1995    8.29
 7 East Asia & Pacific  1996    8.96
 8 East Asia & Pacific  1997    9.55
 9 East Asia & Pacific  1998    9.60
10 East Asia & Pacific  1999   10.1 
# ℹ 165 more rows
bplot <- cplot |>
  ggplot(aes(x=year, y=sum_GDP, fill=region))+
  geom_area(position= "stack", stat="identity")+
  geom_area(position= "stack", stat="identity", color ="white")+
 scale_fill_brewer(palette="Set2")+
  labs(fill="Region",
       y= "Gross Domestic Product in Trillions",
       title= "GDP by Region",
       caption = "Source: Nations dataset")
bplot       

First I loaded the required datasets in my first chunk. Then I used mutate from dplyr. I did it so that there would be a new environment called national. National would take the gdp per cap by population in the nations dataset and then dividing by a trillion using 10^12 as mentioned earlier. Then I used ggplot to draw both of my charts and fdiltered my first chart for four countries. I added geom line and geom point layers and then use set 1 colorbrewer pallete. I also used dplyr after and then used summarize na.rm =TRUE so that even with null values the data would still be able to be used.