Data-12

Load required packages

library(readr)
library(dplyr)
library(tidyverse)

Load the World Bank Dataset

#fill '..' values in numerical columns with NA.
world_bank <- read_csv("C:/Users/SP KHALID/Downloads/WDI- World Bank Dataset.csv" , na = c('..')) 
world_bank

## # A tibble: 1,675 × 19
##     Time `Time Code` `Country Name` `Country Code` Region         `Income Group`
##    <dbl> <chr>       <chr>          <chr>          <chr>          <chr>         
##  1  2000 YR2000      Brazil         BRA            Latin America… Upper middle …
##  2  2000 YR2000      China          CHN            East Asia & P… Upper middle …
##  3  2000 YR2000      France         FRA            Europe & Cent… High income   
##  4  2000 YR2000      Germany        DEU            Europe & Cent… High income   
##  5  2000 YR2000      India          IND            South Asia     Lower middle …
##  6  2000 YR2000      Indonesia      IDN            East Asia & P… Upper middle …
##  7  2000 YR2000      Italy          ITA            Europe & Cent… High income   
##  8  2000 YR2000      Japan          JPN            East Asia & P… High income   
##  9  2000 YR2000      Korea, Rep.    KOR            East Asia & P… High income   
## 10  2000 YR2000      Mexico         MEX            Latin America… Upper middle …
## # ℹ 1,665 more rows
## # ℹ 13 more variables: `GDP (constant 2015 US$)` <dbl>,
## #   `GDP growth (annual %)` <dbl>, `GDP (current US$)` <dbl>,
## #   `Unemployment, total (% of total labor force)` <dbl>,
## #   `Inflation, consumer prices (annual %)` <dbl>, `Labor force, total` <dbl>,
## #   `Population, total` <dbl>,
## #   `Exports of goods and services (% of GDP)` <dbl>, …

dim(world_bank)

## [1] 1675   19

# Check column data types
glimpse(world_bank)

## Rows: 1,675
## Columns: 19
## $ Time                                                          <dbl> 2000, 20…
## $ `Time Code`                                                   <chr> "YR2000"…
## $ `Country Name`                                                <chr> "Brazil"…
## $ `Country Code`                                                <chr> "BRA", "…
## $ Region                                                        <chr> "Latin A…
## $ `Income Group`                                                <chr> "Upper m…
## $ `GDP (constant 2015 US$)`                                     <dbl> 1.18642e…
## $ `GDP growth (annual %)`                                       <dbl> 4.387949…
## $ `GDP (current US$)`                                           <dbl> 6.554482…
## $ `Unemployment, total (% of total labor force)`                <dbl> NA, 3.70…
## $ `Inflation, consumer prices (annual %)`                       <dbl> 7.044141…
## $ `Labor force, total`                                          <dbl> 80295093…
## $ `Population, total`                                           <dbl> 17401828…
## $ `Exports of goods and services (% of GDP)`                    <dbl> 10.18805…
## $ `Imports of goods and services (% of GDP)`                    <dbl> 12.45171…
## $ `General government final consumption expenditure (% of GDP)` <dbl> 18.76784…
## $ `Foreign direct investment, net inflows (% of GDP)`           <dbl> 5.033917…
## $ `Gross savings (% of GDP)`                                    <dbl> 13.99170…
## $ `Current account balance (% of GDP)`                          <dbl> -4.04774…

# Convert Time column to integer
world_bank$Time <- as.integer(world_bank$Time)

# Clean column names
library(janitor)
df <- world_bank |> clean_names()
glimpse(df)

## Rows: 1,675
## Columns: 19
## $ time                                                            <int> 2000, …
## $ time_code                                                       <chr> "YR200…
## $ country_name                                                    <chr> "Brazi…
## $ country_code                                                    <chr> "BRA",…
## $ region                                                          <chr> "Latin…
## $ income_group                                                    <chr> "Upper…
## $ gdp_constant_2015_us                                            <dbl> 1.1864…
## $ gdp_growth_annual_percent                                       <dbl> 4.3879…
## $ gdp_current_us                                                  <dbl> 6.5544…
## $ unemployment_total_percent_of_total_labor_force                 <dbl> NA, 3.…
## $ inflation_consumer_prices_annual_percent                        <dbl> 7.0441…
## $ labor_force_total                                               <dbl> 802950…
## $ population_total                                                <dbl> 174018…
## $ exports_of_goods_and_services_percent_of_gdp                    <dbl> 10.188…
## $ imports_of_goods_and_services_percent_of_gdp                    <dbl> 12.451…
## $ general_government_final_consumption_expenditure_percent_of_gdp <dbl> 18.767…
## $ foreign_direct_investment_net_inflows_percent_of_gdp            <dbl> 5.0339…
## $ gross_savings_percent_of_gdp                                    <dbl> 13.991…
## $ current_account_balance_percent_of_gdp                          <dbl> -4.047…

Convert time column to date

The dataset contains a time column representing years.Since R requires full dates, I converted it into a Date format by assigning January 1st to each year. This allows proper time-series analysis.

data <- df |>
  mutate(date = as.Date(paste0(time, "-01-01")))

Response Variable

gdp_growth_annual_percent

Create tsibble

library(tsibble)

ts_data <- data |>
  filter(country_name == "United States") |>
  select(country_name, date, gdp_growth_annual_percent) |>
  filter(!is.na(gdp_growth_annual_percent)) |>
  as_tsibble(index = date)
ts_data

## # A tsibble: 25 x 3 [1D]
##    country_name  date       gdp_growth_annual_percent
##    <chr>         <date>                         <dbl>
##  1 United States 2000-01-01                     4.08 
##  2 United States 2001-01-01                     0.956
##  3 United States 2002-01-01                     1.70 
##  4 United States 2003-01-01                     2.80 
##  5 United States 2004-01-01                     3.85 
##  6 United States 2005-01-01                     3.48 
##  7 United States 2006-01-01                     2.78 
##  8 United States 2007-01-01                     2.00 
##  9 United States 2008-01-01                     0.114
## 10 United States 2009-01-01                    -2.58 
## # ℹ 15 more rows

Plot over time for’USA’

ts_data |>
  filter(country_name == "United States") |>
  ggplot(aes(x = date, y = gdp_growth_annual_percent)) +
  geom_line(color = "blue") +
  labs(title = "GDP Growth Over Time (India)",
       y = "GDP Growth (%)",
       x = "Year")

US GDP growth has been mostly positive but volatile, ranging roughly from -2.5% to +4%, with two sharp contractions visible around 2009 (Global Financial Crisis) and 2020 (COVID-19 pandemic). Outside of crisis years, growth has been relatively stable, hovering between 1.5% and 3.5% for most of the period. There is a strong V-shaped recovery after 2020, with growth spiking to ~6.5% in 2021 before normalizing back to ~2.5–3% by 2023–2024.

library(fpp3)
model <- ts_data |>
  filter(country_name == "United States") |>
  model(lm = TSLM(gdp_growth_annual_percent ~ trend()))

report(model)

## Series: gdp_growth_annual_percent 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7493 -0.3868  0.2890  0.6839  3.7358 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 2.063e+00  6.989e-01   2.951  0.00716 **
## trend()     3.342e-05  1.366e-04   0.245  0.80894   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.8 on 23 degrees of freedom
## Multiple R-squared: 0.002594,    Adjusted R-squared: -0.04077
## F-statistic: 0.05983 on 1 and 23 DF, p-value: 0.80894

The trend coefficient (0.00003342) is essentially zero and statistically insignificant (p = 0.809), confirming there is no meaningful linear trend in US GDP growth over the sample period, growth has neither systematically risen nor fallen over time. The intercept of ~2.06% represents the baseline average GDP growth, and the very low R² (0.0026) means the trend variable explains less than 0.3% of variation in GDP growth i.e the model has virtually no predictive power. The wide residual range (−4.75 to +3.74) and a residual standard error of 1.8 reflect high volatility driven by external shocks (financial crises, pandemic), which a simple linear trend model cannot capture.

Multiple Trends

pre_2008 <- ts_data |> filter(date < "2008-01-01")
post_2008 <- ts_data |> filter(date >= "2008-01-01" & date < "2020-01-01")
covid_period <- ts_data %>% filter(date >= "2020-01-01")

Visualize

ggplot(ts_data, aes(x = date, y = gdp_growth_annual_percent)) +
  geom_line() +
  geom_smooth(data = pre_2008, method = "lm", se = FALSE, color = "blue") +
  geom_smooth(data = post_2008, method = "lm", se = FALSE, color = "red") +
  geom_smooth(data = covid_period, method = "lm", se = FALSE, color = "green") +
  labs(title = "Multiple Trends in GDP Growth (US)")

Pre-2008 (Blue): Growth was stable and flat at ~2.5–3%, reflecting a steady expansion with no directional trend. Post-2008 (Red): A clear upward recovery trend from near 0%, gradually climbing back to ~3% over the following decade. Post-2020 (Green): Steep upward slope capturing the V-shaped rebound, though likely overstated due to very few data points heavily influenced by the 2021 spike.

Smoothing

ts_data |>
  filter(country_name == "United States") |>
  ggplot(aes(date, gdp_growth_annual_percent)) +
  geom_line() +
  geom_smooth(method = "loess", se = FALSE, color = "red")

The red smoothed trend line shows a gradual decline in average GDP growth from ~3% in the early 2000s down to ~1.5% around 2010, followed by a slow recovery. The smoothing absorbs the COVID shock well, showing that the long-run trend by 2023–2024 has returned to roughly the same level (~2.5–3%) as the pre-2008 era. The gap between the raw line and the smooth line around 2009 and 2020 highlights that those were outlier shocks, not structural turning points in the trend.

Extract year

ts_data_year <- ts_data |>
  filter(country_name == "United States") |>
  mutate(datayear = year(date)) |>
  select(datayear, gdp_growth_annual_percent) |>
  as_tsibble(index = datayear)

ACF/PCF

library(feasts)

ts_data_year |>
  ACF(gdp_growth_annual_percent) |>
  autoplot()

All autocorrelation bars fall well within the blue dashed significance bounds (±0.4), indicating no statistically significant autocorrelation at any lag. This suggests US GDP growth behaves mostly like white noise which means that past values do not reliably predict future values, which is consistent with an efficient, shock-driven economy. The slight spike at lag 11 is likely a random artifact given the short series (~25 years), and should not be interpreted as a meaningful seasonal or cyclical pattern.

Final Insights

Key Findings

US GDP growth shows no statistically significant linear trend over the sample period, averaging ~2% annually, with volatility driven primarily by two major external shocks, the 2008 financial crisis and the 2020 pandemic, rather than any structural directional change.

Significance

This confirms that US economic growth is largely mean-reverting and shock-dependent, meaning simple trend-based forecasting is inadequate, and models that account for structural breaks or external shocks would be more appropriate for policy or investment decisions.

Further Questions

How do crises (2008, COVID) impact trends?
Do high-income vs low-income countries behave differently?
Is inflation correlated with GDP growth?