DATA 607 Project 2 Part II

Introduction

For this section, I selected a dataset shared by Taha Malik. It contains population and Gross Domestic Product (GDP) data for the USA, China, and India across three years: 2000, 2005, and 2010. The dataset is untidy because each year has its own set of columns for population and GDP, spreading variables across columns instead of storing them in their own fields. This wide format makes it harder to analyze trends over time.

Untidy data

We manually reconstructed the dataset using tribble() to preserve its original wide format. Each country has separate columns for population and GDP in 2000, 2005, and 2010.

library(tibble)
library(ggplot2)

country_data <- tribble(~Country, ~`2000_Population`, ~`2000_GDP`, ~`2005_Population`, ~`2005_GDP`, ~`2010_Population`, ~`2010_GDP`, "USA", 282162411, 10285, 295516599, 13094, 309327143, 14964, "China", 1262645000, 1198, 1307560000, 2286, 1340910000, 6087, "India", 1053050912, 476, 1139964932, 834, 1224614327,1708)

View(country_data)

Export to csv

The dataset was saved as a .csv file. This file will be uploaded to GitHub for reproducibility and remote access

write.csv(country_data, "country_population_gdp.csv", row.names = FALSE)

Read the csv from Github

We read the CSV file directly from GitHub using read_csv(). This ensures the data source is documented and reproducible.

library(readr)

country_raw <- read_csv("https://raw.githubusercontent.com/arutam-antunish/DATA607/refs/heads/main/country_population_gdp.csv")

## Rows: 3 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Country
## dbl (6): 2000_Population, 2000_GDP, 2005_Population, 2005_GDP, 2010_Populati...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(country_raw)

Data transformation

We transformed the dataset from wide to long format using pivot_longer() and pivot_wider(). This created a tidy table with one row per country-year, and separate columns for population and GDP. The Year column was converted to numeric for easier analysis.

library(tidyr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

country_tidy <- country_raw %>%
pivot_longer(cols = -Country, names_to = c("Year", "Variable"), names_sep = "_", values_to = "Value") %>% pivot_wider(names_from = Variable, values_from = Value) %>% mutate(Year = as.integer(Year))

View(country_tidy)

Analysis

With the data now tidy, we can explore GDP and population trends over time. We’ll compare GDP growth across countries, analyze population growth rates, and examine GDP per capita.

GDP growth over time

We calculated GDP growth from 2000 to 2010 for each country, both in absolute and percentage terms.

gdp_growth <- country_tidy %>%
group_by(Country) %>% summarise(GDP_2000 = GDP[Year == 2000], GDP_2010 = GDP[Year == 2010],
Growth = GDP_2010 - GDP_2000,
Percent_Growth = round((Growth / GDP_2000) * 100, 1))


View(gdp_growth)

This line chart shows how GDP increased for each country over the decade.

ggplot(country_tidy, aes(x = Year, y = GDP, color = Country)) + geom_line(size = 1.2) + geom_point(size = 2) + labs(title = "GDP Growth (2000–2010)", x = "Year", y = "GDP (in billions USD)") +
theme_classic()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Population growht over time

We measured population growth over the same period to compare demographic expansion with economic growth.

pop_growth <- country_tidy %>%
group_by(Country) %>%
summarise(Pop_2000 = Population[Year == 2000],
Pop_2010 = Population[Year == 2010],
Growth = Pop_2010 - Pop_2000,
Percent_Growth = round((Growth / Pop_2000) * 100, 1))

View(pop_growth)

This chart illustrates population growth trends for each country from 2000 to 2010.

ggplot(country_tidy, aes(x = Year, y = Population, color = Country)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(title = "Population Growth (2000–2010)", x = "Year", y = "Population") + theme_classic()

GDP per capita trends

We calculated GDP per capita by dividing total GDP (converted to dollars) by population. This metric shows how economic output per person changed over time.

country_tidy <- country_tidy %>%
mutate(GDP_per_capita = round((GDP * 1e9) / Population, 2))

gdp_per_capita <- country_tidy %>%
select(Country, Year, GDP_per_capita)

View(gdp_per_capita)

This plot shows how GDP per capita evolved over time, reflecting economic output per person.

ggplot(country_tidy, aes(x = Year, y = GDP_per_capita, color = Country)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(title = "GDP Per Capita (2000–2010)",
x = "Year", y = "GDP per Capita (USD)") +
theme_classic()

Findings

China had the highest GDP growth, increasing fivefold from 2000 to 2010.
India’s GDP also grew significantly, but remained lower than China and the USA.
Population growth was fastest in India, followed by China; the USA had slower growth.
GDP per capita rose sharply in China, surpassing India and narrowing the gap with the USA.
The USA maintained the highest GDP per capita throughout the decade, despite slower overall growth.

Conclusions

We cleaned and reshaped the dataset from wide to long format, organizing it by country, year, population, and GDP. This allowed us to analyze GDP growth, population changes, and GDP per capita trends from 2000 to 2010. We found that China had the highest GDP growth, India led in population increase, and the USA maintained the highest GDP per capita.