In this part, I worked with a dataset showing the population and GDP for three countries (USA, China, and India) across three different years: 2000, 2005, and 2010. The dataset is untidy because it is presented in a wide format, where each year has its own set of columns for both population and GDP. The goal was to tidy the dataset, reshape it into a long format, and make it easier to analyze and visualize economic and population trends over time.
To start, I loaded the libraries used for organizing, transforming, and analyzing the data.
library(tidyverse)
I recreated the dataset in wide format to match the structure of the original table. This table contains population and GDP data for the years 2000, 2005, and 2010, stored in separate columns for each year.
gdp_data <- data.frame(
Country = c("USA", "China", "India"),
`2000_Population` = c(282162411, 1262645000, 1053050912),
`2000_GDP` = c(10285, 1198, 476),
`2005_Population` = c(295516599, 1307560000, 1139964932),
`2005_GDP` = c(13094, 2286, 834),
`2010_Population` = c(309327143, 1340910000, 1224614327),
`2010_GDP` = c(14964, 6087, 1708)
)
gdp_data
## Country X2000_Population X2000_GDP X2005_Population X2005_GDP
## 1 USA 282162411 10285 295516599 13094
## 2 China 1262645000 1198 1307560000 2286
## 3 India 1053050912 476 1139964932 834
## X2010_Population X2010_GDP
## 1 309327143 14964
## 2 1340910000 6087
## 3 1224614327 1708
The dataset was untidy because each year had separate columns for population and GDP. To tidy it, I converted the dataset into long format using pivot_longer(), creating columns for year, population, and GDP. This format makes it easier to perform comparisons and trend analyses across countries and time periods.
gdp_long <- gdp_data %>%
pivot_longer(
cols = -Country,
names_to = c("Year", ".value"),
names_sep = "_"
)
gdp_long
## # A tibble: 9 × 4
## Country Year Population GDP
## <chr> <chr> <dbl> <dbl>
## 1 USA X2000 282162411 10285
## 2 USA X2005 295516599 13094
## 3 USA X2010 309327143 14964
## 4 China X2000 1262645000 1198
## 5 China X2005 1307560000 2286
## 6 China X2010 1340910000 6087
## 7 India X2000 1053050912 476
## 8 India X2005 1139964932 834
## 9 India X2010 1224614327 1708
Next, I summarized population and GDP changes over time to highlight growth patterns for each country. This allowed me to observe how both variables evolved between 2000 and 2010.
summary_table <- gdp_long %>%
group_by(Country) %>%
summarise(
Population_Growth = max(Population) - min(Population),
GDP_Growth = max(GDP) - min(GDP)
)
summary_table
## # A tibble: 3 × 3
## Country Population_Growth GDP_Growth
## <chr> <dbl> <dbl>
## 1 China 78265000 4889
## 2 India 171563415 1232
## 3 USA 27164732 4679
To visualize the economic trends, I created a line chart showing GDP changes over time for each country. This chart illustrates how each nation’s economy developed during the decade, with China showing the fastest rate of GDP growth.
ggplot(gdp_long, aes(x = Year, y = GDP, color = Country, group = Country)) +
geom_line(size = 1.2) +
geom_point(size = 2) +
labs(title = "GDP Growth (2000–2010)", y = "GDP per Capita ($)", x = "Year") +
theme_minimal()
After tidying and transforming the dataset, I was able to clearly visualize population and GDP trends from 2000 to 2010. This part of the project demonstrated how reshaping wide-format data into long format provides flexibility for comparison, statistical summaries, and visual analysis of growth over time.