For this section, I selected a dataset shared by Taha Malik. It contains population and Gross Domestic Product (GDP) data for the USA, China, and India across three years: 2000, 2005, and 2010. The dataset is untidy because each year has its own set of columns for population and GDP, spreading variables across columns instead of storing them in their own fields. This wide format makes it harder to analyze trends over time.
We manually reconstructed the dataset using tribble() to preserve its original wide format. Each country has separate columns for population and GDP in 2000, 2005, and 2010.
library(tibble)
library(ggplot2)
country_data <- tribble(~Country, ~`2000_Population`, ~`2000_GDP`, ~`2005_Population`, ~`2005_GDP`, ~`2010_Population`, ~`2010_GDP`, "USA", 282162411, 10285, 295516599, 13094, 309327143, 14964, "China", 1262645000, 1198, 1307560000, 2286, 1340910000, 6087, "India", 1053050912, 476, 1139964932, 834, 1224614327,1708)
View(country_data)
The dataset was saved as a .csv file. This file will be uploaded to GitHub for reproducibility and remote access
write.csv(country_data, "country_population_gdp.csv", row.names = FALSE)
We read the CSV file directly from GitHub using read_csv(). This ensures the data source is documented and reproducible.
library(readr)
country_raw <- read_csv("https://raw.githubusercontent.com/arutam-antunish/DATA607/refs/heads/main/country_population_gdp.csv")
## Rows: 3 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Country
## dbl (6): 2000_Population, 2000_GDP, 2005_Population, 2005_GDP, 2010_Populati...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(country_raw)
We transformed the dataset from wide to long format using pivot_longer() and pivot_wider(). This created a tidy table with one row per country-year, and separate columns for population and GDP. The Year column was converted to numeric for easier analysis.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
country_tidy <- country_raw %>%
pivot_longer(cols = -Country, names_to = c("Year", "Variable"), names_sep = "_", values_to = "Value") %>% pivot_wider(names_from = Variable, values_from = Value) %>% mutate(Year = as.integer(Year))
View(country_tidy)
With the data now tidy, we can explore GDP and population trends over time. We’ll compare GDP growth across countries, analyze population growth rates, and examine GDP per capita.
We calculated GDP growth from 2000 to 2010 for each country, both in absolute and percentage terms.
gdp_growth <- country_tidy %>%
group_by(Country) %>% summarise(GDP_2000 = GDP[Year == 2000], GDP_2010 = GDP[Year == 2010],
Growth = GDP_2010 - GDP_2000,
Percent_Growth = round((Growth / GDP_2000) * 100, 1))
View(gdp_growth)
This line chart shows how GDP increased for each country over the decade.
ggplot(country_tidy, aes(x = Year, y = GDP, color = Country)) + geom_line(size = 1.2) + geom_point(size = 2) + labs(title = "GDP Growth (2000–2010)", x = "Year", y = "GDP (in billions USD)") +
theme_classic()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We measured population growth over the same period to compare demographic expansion with economic growth.
pop_growth <- country_tidy %>%
group_by(Country) %>%
summarise(Pop_2000 = Population[Year == 2000],
Pop_2010 = Population[Year == 2010],
Growth = Pop_2010 - Pop_2000,
Percent_Growth = round((Growth / Pop_2000) * 100, 1))
View(pop_growth)
This chart illustrates population growth trends for each country from 2000 to 2010.
ggplot(country_tidy, aes(x = Year, y = Population, color = Country)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(title = "Population Growth (2000–2010)", x = "Year", y = "Population") + theme_classic()
We calculated GDP per capita by dividing total GDP (converted to dollars) by population. This metric shows how economic output per person changed over time.
country_tidy <- country_tidy %>%
mutate(GDP_per_capita = round((GDP * 1e9) / Population, 2))
gdp_per_capita <- country_tidy %>%
select(Country, Year, GDP_per_capita)
View(gdp_per_capita)
This plot shows how GDP per capita evolved over time, reflecting economic output per person.
ggplot(country_tidy, aes(x = Year, y = GDP_per_capita, color = Country)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(title = "GDP Per Capita (2000–2010)",
x = "Year", y = "GDP per Capita (USD)") +
theme_classic()
We cleaned and reshaped the dataset from wide to long format, organizing it by country, year, population, and GDP. This allowed us to analyze GDP growth, population changes, and GDP per capita trends from 2000 to 2010. We found that China had the highest GDP growth, India led in population increase, and the USA maintained the highest GDP per capita.