nations <-read.csv("~/Downloads/Data 101 and Data 110 class/Data 110/Data Sets/nations.csv")library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
p1) For both charts, you will first need to create a new variable in the data, using mutate from dplyr, giving the GDP of each country in trillions of dollars, by multiplying gdp_percap by population and dividing by a trillion.
p1 <- nations |>mutate(gdp = gdp_percap * population /10^12)head(p1)
iso2c iso3c country year gdp_percap population birth_rate neonat_mortal_rate
1 AD AND Andorra 1996 NA 64291 10.9 2.8
2 AD AND Andorra 1994 NA 62707 10.9 3.2
3 AD AND Andorra 2003 NA 74783 10.3 2.0
4 AD AND Andorra 1990 NA 54511 11.9 4.3
5 AD AND Andorra 2009 NA 85474 9.9 1.7
6 AD AND Andorra 2011 NA 82326 NA 1.6
region income gdp
1 Europe & Central Asia High income NA
2 Europe & Central Asia High income NA
3 Europe & Central Asia High income NA
4 Europe & Central Asia High income NA
5 Europe & Central Asia High income NA
6 Europe & Central Asia High income NA
p2) For the first chart, you will need to filter the data with dplyr for the four desired countries. When making the chart with ggplot2 you will need to add both geom_point and geom_line layers, and use the Set1 ColorBrewer palette using: scale_color_brewer(palette = “Set1”).
p2 <- p1 |>filter(country %in%c("Chile", "Bolivia", "Dominican Republic", "El Salvador")) #I choose these data variables because these are all Spanish speaking countries and I'm Chilean and Bolivian, and I know so many people from the Dominican Republic and El Salvador, haha!head(p2)
iso2c iso3c country year gdp_percap population birth_rate neonat_mortal_rate
1 BO BOL Bolivia 1996 3119.282 7717445 32.873 35.6
2 BO BOL Bolivia 1993 2726.411 7273824 34.126 39.0
3 BO BOL Bolivia 2002 3647.572 8653343 29.796 28.4
4 BO BOL Bolivia 2008 4986.849 9599916 26.510 24.4
5 BO BOL Bolivia 2009 5108.847 9758799 25.991 23.4
6 BO BOL Bolivia 2001 3570.055 8496378 30.356 29.0
region income gdp
1 Latin America & Caribbean Lower middle income 0.02407288
2 Latin America & Caribbean Lower middle income 0.01983143
3 Latin America & Caribbean Lower middle income 0.03156369
4 Latin America & Caribbean Lower middle income 0.04787333
5 Latin America & Caribbean Lower middle income 0.04985621
6 Latin America & Caribbean Lower middle income 0.03033254
p3) For the second chart, using dplyr you will need to group_by region and year, and then summarize on your mutated value for gdp using summarise(GDP = sum(gdp, na.rm = TRUE)). (There will be null values, or NAs, in this data, so you will need to use na.rm = TRUE).
`summarise()` has grouped output by 'region'. You can override using the
`.groups` argument.
head(p3)
# A tibble: 6 × 3
# Groups: region [1]
region year GDP
<chr> <int> <dbl>
1 East Asia & Pacific 1990 5.52
2 East Asia & Pacific 1991 6.03
3 East Asia & Pacific 1992 6.50
4 East Asia & Pacific 1993 7.04
5 East Asia & Pacific 1994 7.64
6 East Asia & Pacific 1995 8.29
Draw both charts with ggplot2. Each region’s area will be generated by the command geom_area () in ggplot2, and you will need to use the Set2 ColorBrewer palette using scale_fill_brewer(palette = “Set2”).
p2 |>ggplot(aes(x = year, y = gdp, color = country)) +geom_point() +geom_line() +labs(title ="Chile's Rise to Become the Largest Economy",subtitle ="By: Emilio S",x ="GDP ($ Trillions)",y ="Year" ) +scale_color_brewer(palette ="Set1") +theme_minimal()
p3 |>ggplot(aes(x = year, y = GDP, fill = region)) +geom_area(color ="white", size =0.4) +labs(title ="GDP by World Bank Region",subtitle ="By: Emilio S",x ="GDP ($ Trillions)",y ="Year" ) +scale_fill_brewer(palette ="Set2") +theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.