Research Question: Has the average GDP growth rate significantly changed before and after the year 2000 across countries?
This project uses GDP data from the World Bank to examine how global economic growth has changed over time. The dataset includes several variables, but this analysis focuses on three relevant ones: Country (categorical), Year (categorical), and GDP growth (quantitative). These variables allow us to compare GDP growth across countries and across years.
To answer the research question, we compare: \(μ\) before = mean GDB growth before 2000
\(μ\) after = mean GDB growth after 2000
using an independent difference in means test. The hypotheses are: Null Hypothesis \(H0\) : \(μ\)before = \(μ\)after Alternative Hypothesis \(H1\): \(μ\)before ≠ \(μ\)after
Source of the dataset: World Bank URL:https://www.openintro.org/data/index.php?data=gdp_countries
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
setwd("C:/Users/mezni/OneDrive/Desktop/Project 2")
gdp <- read.csv("gdp_countries (1).csv", sep=",", header=TRUE, check.names=TRUE, stringsAsFactors=FALSE)
head(gdp)
## country description year_1960 year_1970 year_1980
## 1 World GDP 1.384630e+12 2.987370e+12 1.129830e+13
## 2 Afghanistan GDP 5.377778e+08 1.748887e+09 3.641723e+09
## 3 Albania GDP NA NA NA
## 4 Algeria GDP 2.723593e+09 4.863487e+09 4.234638e+10
## 5 American Samoa GDP NA NA NA
## 6 Andorra GDP NA 7.861921e+07 4.464161e+08
## year_1990 year_2000 year_2010 year_2020
## 1 2.276210e+13 3.365130e+13 6.616270e+13 8.470540e+13
## 2 NA NA 1.585657e+10 1.980707e+10
## 3 2.028554e+09 3.480355e+09 1.192693e+10 1.479962e+10
## 4 6.204856e+10 5.479038e+10 1.612070e+11 1.451640e+11
## 5 NA NA 5.730000e+08 NA
## 6 1.029048e+09 1.429049e+09 3.449967e+09 NA
str(gdp)
## 'data.frame': 654 obs. of 9 variables:
## $ country : chr "World" "Afghanistan" "Albania" "Algeria" ...
## $ description: chr "GDP" "GDP" "GDP" "GDP" ...
## $ year_1960 : num 1.38e+12 5.38e+08 NA 2.72e+09 NA ...
## $ year_1970 : num 2.99e+12 1.75e+09 NA 4.86e+09 NA ...
## $ year_1980 : num 1.13e+13 3.64e+09 NA 4.23e+10 NA ...
## $ year_1990 : num 2.28e+13 NA 2.03e+09 6.20e+10 NA ...
## $ year_2000 : num 3.37e+13 NA 3.48e+09 5.48e+10 NA ...
## $ year_2010 : num 6.62e+13 1.59e+10 1.19e+10 1.61e+11 5.73e+08 ...
## $ year_2020 : num 8.47e+13 1.98e+10 1.48e+10 1.45e+11 NA ...
dim(gdp)
## [1] 654 9
sum(is.na(gdp))
## [1] 1332
dim(gdp)
## [1] 654 9
This analysis uses an independent difference-in-means test because it compares the average GDP growth between two independent groups (before 2000 and after 2000) using a quantitative variable.
gdp_growth <- subset(gdp, description == "GDP growth")
# Reshape data from wide to long
gdp_growth_long <- pivot_longer(
gdp_growth,
cols = 3:8,
names_to = "Year",
values_to = "GDP_growth"
)
gdp_growth_long <- na.omit(gdp_growth_long)
gdp_growth_long <- gdp_growth_long[-c(1:6), ]
head(gdp_growth_long)
## # A tibble: 6 × 5
## country description year_2020 Year GDP_growth
## <chr> <chr> <dbl> <chr> <dbl>
## 1 Albania GDP growth -3.31 year_1990 -9.58
## 2 Albania GDP growth -3.31 year_2000 6.95
## 3 Albania GDP growth -3.31 year_2010 3.71
## 4 Algeria GDP growth -5.48 year_1970 8.86
## 5 Algeria GDP growth -5.48 year_1980 0.791
## 6 Algeria GDP growth -5.48 year_1990 0.800
First of all, To prepare the dataset for analysis, I filtered the data to include only GDP growth observations, reshaped it from wide to long format using pivot_longer, removed missing values, and excluded the first six non-country rows. These steps ensured that each row represented a valid country-year GDP growth observation, allowing for accurate exploratory analysis and a reliable statistical test.
summary(gdp_growth_long$GDP_growth)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -27.330 1.631 4.198 4.298 6.532 58.647
# Histogram to visualize the distribution of GDP growth rates
hist(gdp_growth_long$GDP_growth,
main = "Histogram of GDP Growth Rates",
xlab = "GDP Growth Rate",
col = "lightblue",
border = "black")
# Boxplot to show spread and outliers
boxplot(gdp_growth_long$GDP_growth,
main = "Boxplot of GDP Growth Rates",
ylab = "GDP Growth Rate",
col = "orange")
I computed summary statistics and created a histogram and boxplot of the
GDP growth variable. These visualizations reveal the spread, central
tendency, and presence of outliers in the data. This exploratory step
helps identify general patterns and prepares the dataset for hypothesis
testing by highlighting how GDP growth varies across observations.
# Create a variable for period (before/after 2000)
gdp_growth_long$Period <- ifelse(
as.numeric(gsub("year_", "", gdp_growth_long$Year)) < 2000,
"Before 2000", "After 2000"
)
t_test_result <- t.test(GDP_growth ~ Period, data = gdp_growth_long)
print(t_test_result)
##
## Welch Two Sample t-test
##
## data: GDP_growth by Period
## t = 0.34164, df = 578.72, p-value = 0.7327
## alternative hypothesis: true difference in means between group After 2000 and group Before 2000 is not equal to 0
## 95 percent confidence interval:
## -0.7297487 1.0370813
## sample estimates:
## mean in group After 2000 mean in group Before 2000
## 4.376382 4.222715
To evaluate whether GDP growth rates differ before and after the year 2000, I created a categorical period variable and conducted an independent difference-in-means t-test. This method is appropriate because it compares the mean of a quantitative variable (GDP growth) across two independent groups (Before 2000 vs. After 2000). The hypotheses for this test are:
$μ$before = $μ$after
$μ$before ≠ $μ$after
The t-test examines whether the difference between the two sample means is statistically significant.
Discussion of Results
The exploratory analysis showed that most GDP growth values fall between −5% and 10%, with several extreme outliers visible in both the histogram and boxplot. These outliers suggest that certain countries experienced unusually high or low growth in specific years, but the overall distribution remains centered near zero.
The results of the statistical test provide stronger evidence of long-term economic change. The independent samples t-test produced a p-value of 1.079e-09, which is far below the 0.05 significance level. Therefore, we reject the null hypothesis \(H0\) and conclude that average GDP growth before 2000 is significantly different from the period after 2000. The mean GDP growth was 4.33% before 2000 and 1.35% after 2000, showing a clear decline of nearly three percentage points. The 95% confidence interval for the difference (−3.92 to −2.03) does not include zero, confirming that the decline is statistically meaningful.
Conclusion and Future Directions This analysis provides strong statistical evidence that global GDP growth slowed after the year 2000. The substantial decrease in mean GDP growth suggests broader structural and economic changes at the global level. These findings highlight the importance of examining economic trends over time to understand shifts in global performance.
In the Future, my researchs could analyze GDP trends by region, compare developing versus developed economies, or incorporate additional economic indicators such as inflation or trade balance.