── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load the data
setwd("C:/Users/User/Downloads/Data 110 Projects and Assignments")nations <-read_csv("nations.csv")
Rows: 5275 Columns: 10
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): iso2c, iso3c, country, region, income
dbl (5): year, gdp_percap, population, birth_rate, neonat_mortal_rate
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spec(nations)
cols(
iso2c = col_character(),
iso3c = col_character(),
country = col_character(),
year = col_double(),
gdp_percap = col_double(),
population = col_double(),
birth_rate = col_double(),
neonat_mortal_rate = col_double(),
region = col_character(),
income = col_character()
)
Check out the first few lines
head(nations)
# A tibble: 6 × 10
iso2c iso3c country year gdp_percap population birth_rate neonat_mortal_rate
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AD AND Andorra 1996 NA 64291 10.9 2.8
2 AD AND Andorra 1994 NA 62707 10.9 3.2
3 AD AND Andorra 2003 NA 74783 10.3 2
4 AD AND Andorra 1990 NA 54511 11.9 4.3
5 AD AND Andorra 2009 NA 85474 9.9 1.7
6 AD AND Andorra 2011 NA 82326 NA 1.6
# ℹ 2 more variables: region <chr>, income <chr>
clean the dataset
# check missing valuesifelse(mean(complete.cases(nations)) ==1, "No NA Founded", "Found NA")
[1] "Found NA"
# remove the NA valuesnations2 <- nations |>filter(!is.na(country) &!is.na(year) &!is.na(gdp_percap) &!is.na(population) &!is.na(birth_rate) &!is.na(neonat_mortal_rate) &!is.na(region) &!is.na(income))nations2
# A tibble: 4,328 × 10
iso2c iso3c country year gdp_percap population birth_rate neonat_mortal_rate
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AE ARE United… 1991 73037. 1913190 24.6 7.9
2 AE ARE United… 1993 71960. 2127863 22.4 7.3
3 AE ARE United… 2001 83534. 3217865 15.8 5.5
4 AE ARE United… 1992 73154. 2019014 23.5 7.6
5 AE ARE United… 1994 74684. 2238281 21.3 6.9
6 AE ARE United… 2007 75427. 6010100 12.8 4.7
7 AE ARE United… 2004 87844. 3975945 14.2 5.1
8 AE ARE United… 1996 79480. 2467726 19.3 6.4
9 AE ARE United… 2006 82754. 5171255 13.3 4.9
10 AE ARE United… 2000 84975. 3050128 16.4 5.6
# ℹ 4,318 more rows
# ℹ 2 more variables: region <chr>, income <chr>
Create a new variable in the data
The mutate function creates a new column gdp_in_trillions which calculates the GDP by multiplying gdp_percap (GDP per capita) by population and then dividing by 10^12 to convert it to trillions.
The .after = gdp_percap argument specifies that the new column should be placed immediately after the gdp_percap column.
# create a new variable nations3 <- nations2 |>mutate(gdp_in_trillions = gdp_percap*population/10^12 , .after = gdp_percap)nations3
# A tibble: 4,328 × 11
iso2c iso3c country year gdp_percap gdp_in_trillions population birth_rate
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AE ARE United A… 1991 73037. 0.140 1913190 24.6
2 AE ARE United A… 1993 71960. 0.153 2127863 22.4
3 AE ARE United A… 2001 83534. 0.269 3217865 15.8
4 AE ARE United A… 1992 73154. 0.148 2019014 23.5
5 AE ARE United A… 1994 74684. 0.167 2238281 21.3
6 AE ARE United A… 2007 75427. 0.453 6010100 12.8
7 AE ARE United A… 2004 87844. 0.349 3975945 14.2
8 AE ARE United A… 1996 79480. 0.196 2467726 19.3
9 AE ARE United A… 2006 82754. 0.428 5171255 13.3
10 AE ARE United A… 2000 84975. 0.259 3050128 16.4
# ℹ 4,318 more rows
# ℹ 3 more variables: neonat_mortal_rate <dbl>, region <chr>, income <chr>
Create a dot-and-line chart
The filter function from the dplyr package selects only the rows where the country column matches one of the specified countries: United States, China, Switzerland, or Nigeria.
geom_line(linewidth = 1.2) adds lines to the plot, with a line thickness of 1.2. geom_point(size = 3, alpha = 0.3) adds points to the plot with a size of 3 and an alpha (transparency) of 0.3, making the points semi-transparent.
theme_minimal() applies a minimal theme for a clean and simple look.
scale_y_continuous(limits = c(0, 20)) sets the y-axis to have limits between 0 and 20 trillion dollars. scale_color_brewer(palette = “Set1”) uses the “Set1” palette from the RColorBrewer package for the line colors, providing distinct and visually appealing colors for each country.
ggtitle, xlab, and ylab functions add a title and axis labels to the plot.
nations_chart1 <- nations3 |># Filter Data for 4 Specific Countries filter(country %in%c("United States", "China", "Switzerland", "Nigeria")) |>ggplot(aes(x= year, y = gdp_in_trillions, color = country)) +# Add Line and Point Geometriesgeom_line(linewidth =1.2)+geom_point(size =3, alpha =0.3) +# Apply Minimal Themetheme_minimal(base_size =12) +# Customize Y-Axis and Color Scalescale_y_continuous(limits=c(0,20)) +scale_color_brewer(palette ="Set1") +# Add Titles and Labelsggtitle("China's Economy is Rising") +xlab("Year") +ylab("GDP in $ trillions") +labs(caption ="Source: World Bank" )+# Center Title and Captiontheme(plot.title =element_text(hjust =0.5),plot.caption =element_text(hjust =0.5)) # plot.title and plot.caption center both title and caption.nations_chart1
I wanted to compare the GDP of some developed countries with the GDP of a large African country like Nigeria on this line chart.To keep the line I was obliged to avoid adding interactivity to the plot with plotly.
I was surprised to see that Switzerland’s GDP is lower compared to China and the United States, even though it is considered one of the richest countries in Europe. Additionally, its GDP is not significantly different from Nigeria’s GDP. In 2013, China was on the same level as the United States, but in 2014 its GDP was higher than the United States’. The rapid economic growth of China might be due to its technological development. Analyzing current data will help determine if China’s economy is already significantly larger than that of the United States. While China and the United States gdp has increased over the years, Switzerland and Nigeria gdp remained constant with a shy increase after 2010.
Create a second chart
The summarise function calculates the sum of gdp_in_trillions for each group (each region and year combination). The na.rm = TRUE argument ensures that any missing values are removed before summing.
The geom_area function creates an area plot, where the areas under the curves are filled. The color = “white” argument outlines the areas with a white border for better visual separation.
scale_fill_brewer(palette = “Set2”) applies a color palette from the RColorBrewer package, providing distinct colors for different regions.
nations_chart2 <- nations |># Group Data by Region and Yeargroup_by(region, year) |># Calculate GDP in Trillionsmutate(gdp_in_trillions = gdp_percap*population/10^12 , .after = gdp_percap) |># Summarize Datasummarise(gdp_in_trillions =sum(gdp_in_trillions, na.rm =TRUE)) |># Create the area plotggplot(aes(x= year, y= gdp_in_trillions, fill = region)) +geom_area(color ="white") +# Customize Theme and Appearancetheme_minimal() +scale_fill_brewer(palette ="Set2")+ggtitle("World GDP Per Region \n (1990-2014)") +xlab("Year") +ylab("GDP in $ trillions") +labs(caption ="Source: World Bank" ) +theme(plot.title =element_text(hjust =0.5),plot.caption =element_text(hjust =0.5))
`summarise()` has grouped output by 'region'. You can override using the
`.groups` argument.
nations_chart2
The plot shows how the GDP of various regions has changed over time using different colors to signify each region. The area plot helps to understand each region’s contribution to the total world GDP over the years. This visualization is helpful for comparing the economic growth of different regions over time and understanding global economic trends.
The East Asia & Pacific region has experienced rapid GDP growth, especially after 2000, indicating a significant economic boom in countries like China, which has had a major impact on the global economy like highlighted with the first chart. Both North America and Europe & Central Asia have shown constant economic growth, with Europe & Central Asia growth appearing to be more pronounced. In other regions such as South Asia, Latin America & Caribbean, Middle East & North Africa, and Sub-Saharan Africa, there is also growth, but it is not as significant as in East Asia & Pacific or North America; their growth remained low.