Load required packages
library(readr)
library(dplyr)
library(tidyverse)
Load the World Bank Dataset
#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..'))
world_bank
## # A tibble: 1,675 × 19
## Time `Time Code` `Country Name` `Country Code` Region `Income Group`
## <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 2000 YR2000 Brazil BRA Latin America… Upper middle …
## 2 2000 YR2000 China CHN East Asia & P… Upper middle …
## 3 2000 YR2000 France FRA Europe & Cent… High income
## 4 2000 YR2000 Germany DEU Europe & Cent… High income
## 5 2000 YR2000 India IND South Asia Lower middle …
## 6 2000 YR2000 Indonesia IDN East Asia & P… Upper middle …
## 7 2000 YR2000 Italy ITA Europe & Cent… High income
## 8 2000 YR2000 Japan JPN East Asia & P… High income
## 9 2000 YR2000 Korea, Rep. KOR East Asia & P… High income
## 10 2000 YR2000 Mexico MEX Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## # `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## # `Unemployment, total (% of total labor force)` <dbl>,
## # `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## # `Population, total` <dbl>,
## # `Exports of goods and services (% of GDP)` <dbl>, …
dim(world_bank)
## [1] 1675 19
# Check column data types
glimpse(world_bank)
## Rows: 1,675
## Columns: 19
## $ Time <dbl> 2000, 20…
## $ `Time Code` <chr> "YR2000"…
## $ `Country Name` <chr> "Brazil"…
## $ `Country Code` <chr> "BRA", "…
## $ Region <chr> "Latin A…
## $ `Income Group` <chr> "Upper m…
## $ `GDP (constant 2015 US$)` <dbl> 1.18642e…
## $ `GDP growth (annual %)` <dbl> 4.387949…
## $ `GDP (current US$)` <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)` <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)` <dbl> 7.044141…
## $ `Labor force, total` <dbl> 80295093…
## $ `Population, total` <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)` <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)` <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)` <dbl> 5.033917…
## $ `Gross savings (% of GDP)` <dbl> 13.99170…
## $ `Current account balance (% of GDP)` <dbl> -4.04774…
# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)
# Clean column names
library(janitor)
df <- world_bank |> clean_names()
glimpse(df)
## Rows: 1,675
## Columns: 19
## $ time <int> 2000, …
## $ time_code <chr> "YR200…
## $ country_name <chr> "Brazi…
## $ country_code <chr> "BRA",…
## $ region <chr> "Latin…
## $ income_group <chr> "Upper…
## $ gdp_constant_2015_us <dbl> 1.1864…
## $ gdp_growth_annual_percent <dbl> 4.3879…
## $ gdp_current_us <dbl> 6.5544…
## $ unemployment_total_percent_of_total_labor_force <dbl> NA, 3.…
## $ inflation_consumer_prices_annual_percent <dbl> 7.0441…
## $ labor_force_total <dbl> 802950…
## $ population_total <dbl> 174018…
## $ exports_of_goods_and_services_percent_of_gdp <dbl> 10.188…
## $ imports_of_goods_and_services_percent_of_gdp <dbl> 12.451…
## $ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
## $ foreign_direct_investment_net_inflows_percent_of_gdp <dbl> 5.0339…
## $ gross_savings_percent_of_gdp <dbl> 13.991…
## $ current_account_balance_percent_of_gdp <dbl> -4.047…
The dataset contains a time column representing years.Since R requires full dates, I converted it into a Date format by assigning January 1st to each year. This allows proper time-series analysis.
data <- df |>
mutate(date = as.Date(paste0(time, "-01-01")))
gdp_growth_annual_percent
library(tsibble)
ts_data <- data |>
filter(country_name == "United States") |>
select(country_name, date, gdp_growth_annual_percent) |>
filter(!is.na(gdp_growth_annual_percent)) |>
as_tsibble(index = date)
ts_data
## # A tsibble: 25 x 3 [1D]
## country_name date gdp_growth_annual_percent
## <chr> <date> <dbl>
## 1 United States 2000-01-01 4.08
## 2 United States 2001-01-01 0.956
## 3 United States 2002-01-01 1.70
## 4 United States 2003-01-01 2.80
## 5 United States 2004-01-01 3.85
## 6 United States 2005-01-01 3.48
## 7 United States 2006-01-01 2.78
## 8 United States 2007-01-01 2.00
## 9 United States 2008-01-01 0.114
## 10 United States 2009-01-01 -2.58
## # ℹ 15 more rows
ts_data |>
filter(country_name == "United States") |>
ggplot(aes(x = date, y = gdp_growth_annual_percent)) +
geom_line(color = "blue") +
labs(title = "GDP Growth Over Time (India)",
y = "GDP Growth (%)",
x = "Year")
US GDP growth has been mostly positive but volatile, ranging roughly from -2.5% to +4%, with two sharp contractions visible around 2009 (Global Financial Crisis) and 2020 (COVID-19 pandemic). Outside of crisis years, growth has been relatively stable, hovering between 1.5% and 3.5% for most of the period. There is a strong V-shaped recovery after 2020, with growth spiking to ~6.5% in 2021 before normalizing back to ~2.5–3% by 2023–2024.
library(fpp3)
model <- ts_data |>
filter(country_name == "United States") |>
model(lm = TSLM(gdp_growth_annual_percent ~ trend()))
report(model)
## Series: gdp_growth_annual_percent
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7493 -0.3868 0.2890 0.6839 3.7358
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.063e+00 6.989e-01 2.951 0.00716 **
## trend() 3.342e-05 1.366e-04 0.245 0.80894
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.8 on 23 degrees of freedom
## Multiple R-squared: 0.002594, Adjusted R-squared: -0.04077
## F-statistic: 0.05983 on 1 and 23 DF, p-value: 0.80894
The trend coefficient (0.00003342) is essentially zero and statistically insignificant (p = 0.809), confirming there is no meaningful linear trend in US GDP growth over the sample period, growth has neither systematically risen nor fallen over time. The intercept of ~2.06% represents the baseline average GDP growth, and the very low R² (0.0026) means the trend variable explains less than 0.3% of variation in GDP growth i.e the model has virtually no predictive power. The wide residual range (−4.75 to +3.74) and a residual standard error of 1.8 reflect high volatility driven by external shocks (financial crises, pandemic), which a simple linear trend model cannot capture.
pre_2008 <- ts_data |> filter(date < "2008-01-01")
post_2008 <- ts_data |> filter(date >= "2008-01-01" & date < "2020-01-01")
covid_period <- ts_data %>% filter(date >= "2020-01-01")
ggplot(ts_data, aes(x = date, y = gdp_growth_annual_percent)) +
geom_line() +
geom_smooth(data = pre_2008, method = "lm", se = FALSE, color = "blue") +
geom_smooth(data = post_2008, method = "lm", se = FALSE, color = "red") +
geom_smooth(data = covid_period, method = "lm", se = FALSE, color = "green") +
labs(title = "Multiple Trends in GDP Growth (US)")
Pre-2008 (Blue): Growth was stable and flat at ~2.5–3%, reflecting a steady expansion with no directional trend. Post-2008 (Red): A clear upward recovery trend from near 0%, gradually climbing back to ~3% over the following decade. Post-2020 (Green): Steep upward slope capturing the V-shaped rebound, though likely overstated due to very few data points heavily influenced by the 2021 spike.
ts_data |>
filter(country_name == "United States") |>
ggplot(aes(date, gdp_growth_annual_percent)) +
geom_line() +
geom_smooth(method = "loess", se = FALSE, color = "red")
The red smoothed trend line shows a gradual decline in average GDP growth from ~3% in the early 2000s down to ~1.5% around 2010, followed by a slow recovery. The smoothing absorbs the COVID shock well, showing that the long-run trend by 2023–2024 has returned to roughly the same level (~2.5–3%) as the pre-2008 era. The gap between the raw line and the smooth line around 2009 and 2020 highlights that those were outlier shocks, not structural turning points in the trend.
ts_data_year <- ts_data |>
filter(country_name == "United States") |>
mutate(datayear = year(date)) |>
select(datayear, gdp_growth_annual_percent) |>
as_tsibble(index = datayear)
library(feasts)
ts_data_year |>
ACF(gdp_growth_annual_percent) |>
autoplot()
All autocorrelation bars fall well within the blue dashed significance bounds (±0.4), indicating no statistically significant autocorrelation at any lag. This suggests US GDP growth behaves mostly like white noise which means that past values do not reliably predict future values, which is consistent with an efficient, shock-driven economy. The slight spike at lag 11 is likely a random artifact given the short series (~25 years), and should not be interpreted as a meaningful seasonal or cyclical pattern.
US GDP growth shows no statistically significant linear trend over the sample period, averaging ~2% annually, with volatility driven primarily by two major external shocks, the 2008 financial crisis and the 2020 pandemic, rather than any structural directional change.
This confirms that US economic growth is largely mean-reverting and shock-dependent, meaning simple trend-based forecasting is inadequate, and models that account for structural breaks or external shocks would be more appropriate for policy or investment decisions.
How do crises (2008, COVID) impact trends?
Do high-income vs low-income countries behave differently?
Is inflation correlated with GDP growth?