library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.3
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.0.3
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)
library(readr)
## Warning: package 'readr' was built under R version 4.0.3

load dataset

# First 3 rows
head(worldbank, n = 3)
## # A tibble: 3 x 37
##   country_name country_code adolescent_fert~     GDP electricity_acc~
##   <chr>        <chr>                   <dbl>   <dbl>            <dbl>
## 1 Afghanistan  AFG                     65.1  2.11e10             98.7
## 2 Albania      ALB                     19.6  1.45e10            100  
## 3 Algeria      DZA                      9.83 2.01e11            100  
## # ... with 32 more variables: age_dependency <dbl>, birds_threatened <dbl>,
## #   crude_births <dbl>, compulsory_education <dbl>, start_up_cost <dbl>,
## #   crude_deaths <dbl>, credit <dbl>, business_ease <dbl>, fertility <dbl>,
## #   fish_threatened <dbl>, telephone <dbl>, foreign_inflow <dbl>,
## #   foreign_outflow <dbl>, GDP_growth <dbl>, measles_immunization <dbl>,
## #   labor <dbl>, land <dbl>, life_expectancy_female <dbl>,
## #   life_expectancy_male <dbl>, life_expectancy_total <dbl>,
## #   mammals_threatened <dbl>, mobile_cell <dbl>, infant_deaths <dbl>,
## #   pop_ann_growth <dbl>, female_pop_perc <dbl>, male_pop_perc <dbl>,
## #   pop_total <dbl>, private_cred <dbl>, profit_tax <dbl>,
## #   parliament_women <dbl>, sex_ratio <dbl>, legal_rights <dbl>
# Last 3 rows
tail(worldbank, n = 3)
## # A tibble: 3 x 37
##   country_name country_code adolescent_fert~     GDP electricity_acc~
##   <chr>        <chr>                   <dbl>   <dbl>            <dbl>
## 1 Guatemala    GTM                     69.8  5.46e10             94.7
## 2 Hong Kong S~ HKG                      2.65 2.88e11            100  
## 3 Guinea       GIN                    133.   1.11e10             44  
## # ... with 32 more variables: age_dependency <dbl>, birds_threatened <dbl>,
## #   crude_births <dbl>, compulsory_education <dbl>, start_up_cost <dbl>,
## #   crude_deaths <dbl>, credit <dbl>, business_ease <dbl>, fertility <dbl>,
## #   fish_threatened <dbl>, telephone <dbl>, foreign_inflow <dbl>,
## #   foreign_outflow <dbl>, GDP_growth <dbl>, measles_immunization <dbl>,
## #   labor <dbl>, land <dbl>, life_expectancy_female <dbl>,
## #   life_expectancy_male <dbl>, life_expectancy_total <dbl>,
## #   mammals_threatened <dbl>, mobile_cell <dbl>, infant_deaths <dbl>,
## #   pop_ann_growth <dbl>, female_pop_perc <dbl>, male_pop_perc <dbl>,
## #   pop_total <dbl>, private_cred <dbl>, profit_tax <dbl>,
## #   parliament_women <dbl>, sex_ratio <dbl>, legal_rights <dbl>

Part 1) Introduction

Despite the size and intricacies of nations and countries, several aggregate statistics exist which attempt to capture and communicate their health and wealth.

One of these aggregate statistics is the Gross Domestic Product (GDP) of a nation. The GDP is the market value of finished goods and services produced by labor and capital goods located within its borders.

The Solow-model models economic growth of a nation by representing its GDP as function of the nation’s physical and human capital, and technical knowledge. We denote physical capital as \(K\), human capital as \(eL\) (where \(e\) is education and \(L\) is labor), and technical knowledge as \(A\). We can thus express GDP as follows: \[GDP = f(A, K, eL)\] where \(f\) is the aggregate production function. Consequently, in this report we will examine the relationship between variables in the World Bank data corresponding to \(K, e, L\) and \(A\). We will answer the question, can the apparent association between each mentioned variable and GDP be explained by chance?

The obvious comparisons will include:

  • GDP and Land (\(K\))
  • GDP and Labor (\(L\))
  • GDP and Compulsory Education (\(e\))
  • GDP and Business Ease (\(A\))(where entrepreneurship is a proxy for technical knowledge)

Now, GDP expresses nation’s production in a given year, but does not express the cumulative wealth of the nation over its lifetime, or how the average individual of that society is situated in it. Although GDP might track well some aspects of a nation’s health, or wealth, it cannot capture all aspects that contribute to the wellbeing of the average individual or its society. For the principal reason that such aspects have no price, or are not bought and sold. For example one’s health, or gender parity in society.

We also focus on gender parity in this report. We take the countries in the given World Data bank as a simple-random sample of the world and answer the question: Is their gender parity in government worldwide?

Part 2) Exploratory Data Analysis

According to the data dictionary the variables listed above are defined as follows:

  • GDP: Gross domestic product (GDP) in constant 2010 U.S. dollars
  • labor: Labor force total of people ages 15 and older.
  • land: Land area in square kilometers
  • compulsory_education: Number of years children are schooled by law
  • business_ease: Score illustrates the distance of an economy to the “frontier.”
  • parliament_women: Percentage of parliamentary seats held by women

Figure 1: Land vs. GDP

We compare land and GDP for countries that have both values specified in the worldbank. The data points, (land, GDP), do not appear to lie on a line or on an exponential curve. In Figure 1 below, we see that they appear to line on a power curve.

Note that if \(y = ax^b\), then \[log_{10}(y) = log_{10}(a) + b \cdot log_{10}(x).\] Now, let \(log_{10}(y) \to Y\), \(log_{10}(a) \to B\), \(b \to m\) and \(log_{10}(x) \to X\), so that the previous equation becomes \[Y = m \cdot X + B.\]

Hence we compare \(log_{10}(\text{land})\) vs. \(log_{10}(\text{GDP})\).

# Figure 1
# GDP and Labor
ggplot(worldbank, aes(x = log10(land), y = log10(GDP))) + # power curve fit
  geom_point(na.rm = TRUE) +  # remove missing values
  theme_classic() +
  labs(title = "Land vs. GDP",
       y = "GDP",
       x = "Land",
       tag = "Figure 1") +
  geom_smooth(method='lm', formula= y~x, na.rm = TRUE) # remove missing values

r <- cor(x = log10(worldbank$land),y = log10(worldbank$GDP), 
         use = "complete.obs")
cat("Correlation between land and GDP is", r)
## Correlation between land and GDP is 0.4253266

Note that land values are quantitative data (ration scaled)and are measured in kilometers squared. Consider the summary statistics for land variable below.

summary(worldbank$land)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##       30    69040   247040   942115   645390 16376870        2

Figure 2: Labor vs. GDP

For Figure 2, we also find that data appears to fit best on some power curve. Hence we compare \(log_{10}(\text{labor})\) vs. \(log_{10}(\text{GDP})\).

# Figure 2
# GDP and Labor
ggplot(worldbank, aes(x = log10(labor), y = log10(GDP))) +
  geom_point(na.rm = T) + 
  theme_classic() +
  labs(title = "Labor vs. GDP",
       y = "GDP",
       x = "Labor Force",
       tag = "Figure 2") +
  geom_smooth(method='lm', formula= y~x, na.rm = T)

r <- cor(x = log10(worldbank$labor),y = log10(worldbank$GDP), 
         use = "complete.obs")
cat("Correlation between labor and GDP is", r)
## Correlation between labor and GDP is 0.6913595

Note that labor values are quantitative data (ration scale) and are measured in number of people. Consider the summary statistics for labor variable below.

summary(worldbank$labor)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##     80919   2756367   6716746  26916615  20462002 783424134

Figure 3: Compulsory Education vs. GDP

We compare land and GDP for countries that have both values specified in the worldbank. The data points, (compulsory_education, GDP), do not appear to lie on a line or on a power curve. In Figure 3 below, we see that they appear to line on an exponential curve best.

Note that if \(y = ab^x\), then \[log_{10}(y) = log_{10}(a) + x \cdot log_{10}(b).\] Now, let \(log_{10}(y) \to Y\), \(log_{10}(a) \to B\), \(log_{10}(b) \to m\) and \(x \to X\), so that the previous equation becomes \[Y = m \cdot X + B.\]

Hence we compare \(\text{compulsory_education}\) vs. \(log_{10}(\text{GDP})\).

# Figure 3
# GDP and Compulsory Education
ggplot(worldbank, aes(x = compulsory_education, y = log10(GDP))) + # exp. fit
  geom_point(na.rm = T) + # remove missing values
  theme_classic() +
  labs(title = "Compulsory Education (years) vs. GDP",
       y = "GDP",
       x = "Education",
       tag = "Figure 3") +
  geom_smooth(method='lm', formula= y~x, na.rm = T) # remove missing values

r <- cor(x = worldbank$compulsory_education, y = log10(worldbank$GDP), 
    use = "complete.obs")
cat("Correlation between years of compulsory education and GDP is", r)
## Correlation between years of compulsory education and GDP is 0.2796978

Note that compulsory education values are quantitative data and are measured in number of years (interval scaled). Consider the summary statistics for compulsory_education variable below.

summary(worldbank$compulsory_education)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   9.000  10.000   9.905  12.000  17.000       4

Figure 4: Business Ease vs. GDP

In Figure 4 below, we also find that data appears to lie on an exponential curve. Hence we compare \(\text{business_ease}\) vs. \(log_{10}(\text{GDP})\).

# Figure 4
# GDP and Business Ease
ggplot(worldbank, aes(x = business_ease, y = log10(GDP))) +
  geom_point(na.rm = T) + 
  theme_classic() +
  labs(title = "Business Ease Score vs. GDP",
       y = "GDP",
       x = "Business Ease Score",
       tag = "Figure 4") +
  geom_smooth(method='lm', formula= y~x, na.rm = T)

r <- cor(x = (worldbank$business_ease),y = log10(worldbank$GDP), 
    use = "complete.obs")
cat("Correlation between business ease score and GDP is", r)
## Correlation between business ease score and GDP is 0.5801141

Note that business ease score values are quantitative data and are measured by an artificial index(interval scaled). Consider the summary statistics for business_ease variable below.

summary(worldbank$business_ease)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   30.67   54.41   67.59   64.50   76.15   87.00       4

Part 3) Data Analysis and Inference

We will use one-sample hypothesis testing to answer the question, can the apparent association between \(K,e,L, A\) and GDP be explained by chance? And,is their gender parity in government worldwide?

Hypothesis Testing: Regression

We now perform hypothesis testing for regression lines with two similar methods. The first method will be done manually, and the other by extracting the needed values from the summary() function.

# Standard Deviation Function
standard_deviation <- function(x = c(1,1), na.rm = na.rm){
  average <- mean(x, na.rm = TRUE)
  deviations <- x - average
  square_of_deviations <- (deviations)^2
  mean_of_square_of_deviations <- mean(square_of_deviations)
  root_mean_square_of_deviations <- sqrt(mean_of_square_of_deviations)
  return(root_mean_square_of_deviations)
}

Land vs. GDP

First, we investigate the relationship between the land of a nation and its GDP.

Null: There is no correlation between land and GDP. The apparent correlation between land and GDP can be explained by chance.

Alternative: There is a correlation between land and GDP. The apparent correlation cannot be explained by chance. By Solow-model, we expect there to be some correlation between the two.

#Figure 1
# GDP vs. Land

# data frame with GDP and Land (only complete rows)
testing_data <- worldbank %>% 
  filter(land != "NA") %>%
  filter(GDP != "NA")

# ensure number of observations are equal
land_data <- pull(testing_data, land)
land_observations <- length(land_data) # length is 113
gdp_data <- pull(testing_data, GDP)
gdp_observations <- length(gdp_data) # length is 113

# prepare variables for hypothesis testing
sd_land <- standard_deviation(log10(land_data))
sd_gdp <- standard_deviation(log10(gdp_data))
r <- cor(log10(gdp_data), log10(land_data))
se <- sqrt(1-r^2)*sd_gdp / (sqrt(113-2)*sd_land) #113 - 2 if degrees of freedom
m <- r*sd_gdp / sd_land

# Z-test: Method 1
z_statistic <- (m - 0)/se
z_statistic
## [1] 4.951266
p_value <- 1-pnorm(z_statistic)
p_value
## [1] 3.68661e-07
cat("Is p-value .01 significant?", p_value < 0.01)
## Is p-value .01 significant? TRUE
# Z-test: Method 2
#summary() will handle missing values on its own
#summary() will display a p-value for t-test
#since degrees of freedom is high in this case, the value should be similar
#to z-test above
summary(lm(log10(worldbank$GDP) ~ log10(worldbank$land)))
## 
## Call:
## lm(formula = log10(worldbank$GDP) ~ log10(worldbank$land))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.03388 -0.55899  0.09062  0.48922  1.58963 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            9.25862    0.37508  24.684  < 2e-16 ***
## log10(worldbank$land)  0.34918    0.07052   4.951 2.65e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7495 on 111 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.1809, Adjusted R-squared:  0.1735 
## F-statistic: 24.52 on 1 and 111 DF,  p-value: 2.653e-06
z_statistic <- (0.34918-0)/0.07052
z_statistic
## [1] 4.951503
p_value <- 1-pnorm(z_statistic)
p_value
## [1] 3.682122e-07
cat("Is p-value .01 significant?", p_value < 0.01)
## Is p-value .01 significant? TRUE

Conclusion: Since the p-value is .01 significant, then we reject the null hypothesis and accept the alternative hypothesis. The apparent correlation is not due to chance. The Solow-model appears to be validated by given data.

Labor vs. GDP

Now, we investigate the relationship between the labor force of a nation and its GDP.

Null: There is no correlation between labor force and GDP. The apparent correlation between labor force and GDP can be explained by chance.

Alternative: There is a correlation between labor force and GDP. The apparent correlation cannot be explained by chance. By Solow-model, we expect there to be some correlation between the two.

#Figure 2
# GDP vs. Labor
testing_data <- worldbank %>% 
  filter(labor != "NA") %>%
  filter(GDP != "NA")

labor_data <- pull(testing_data, labor)
labor_observations <- length(labor_data) # length is 114
gdp_data <- pull(testing_data, GDP)
gdp_observations <- length(gdp_data) # length is 114

sd_labor <- standard_deviation(log10(labor_data))
sd_gdp <- standard_deviation(log10(gdp_data))
r <- cor(log10(gdp_data), log10(labor_data))
se <- sqrt(1-r^2)*sd_gdp / (sqrt(114-2)*sd_labor)
m <- r*sd_gdp / sd_labor

# Z-test: Method 1
z_statistic <- (m - 0)/se
z_statistic
## [1] 10.12671
p_value <- 1-pnorm(z_statistic)
p_value
## [1] 0
cat("Is p-value .01 significant?", p_value < 0.01)
## Is p-value .01 significant? TRUE
# Z-test: Method 2
summary(lm(log10(worldbank$GDP) ~ log10(worldbank$labor)))
## 
## Call:
## lm(formula = log10(worldbank$GDP) ~ log10(worldbank$labor))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3593 -0.3962  0.0044  0.4718  1.0631 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             5.50866    0.55307    9.96   <2e-16 ***
## log10(worldbank$labor)  0.81291    0.08027   10.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5958 on 112 degrees of freedom
##   (6 observations deleted due to missingness)
## Multiple R-squared:  0.478,  Adjusted R-squared:  0.4733 
## F-statistic: 102.6 on 1 and 112 DF,  p-value: < 2.2e-16
z_statistic <- (0.81291-0)/0.08027
z_statistic
## [1] 10.1272
p_value <- 1-pnorm(z_statistic)
p_value
## [1] 0
cat("Is p-value .01 significant?", p_value < 0.01)
## Is p-value .01 significant? TRUE

Conclusion: Since the p-value is .01 significant, then we reject the null hypothesis and accept the alternative hypothesis. The apparent correlation is not due to chance. The Solow-model appears to be validated by given data.

Compulsory Education vs. GDP

#Figure 3
# GDP vs. Compulsory Education
testing_data <- worldbank %>% 
  filter(compulsory_education != "NA") %>%
  filter(GDP != "NA")

edu_data <- pull(testing_data, compulsory_education)
edu_observations <- length(edu_data) # length is 110
gdp_data <- pull(testing_data, GDP)
gdp_observations <- length(gdp_data) # length is 110

sd_edu <- standard_deviation(edu_data)
sd_gdp <- standard_deviation(log10(gdp_data))
r <- cor(log10(gdp_data), edu_data)
se <- sqrt(1-r^2)*sd_gdp / (sqrt(110-2)*sd_edu)
m <- r*sd_gdp / sd_edu

# Z-test: Method 1
z_statistic <- (m - 0)/se
z_statistic
## [1] 3.02754
p_value <- 1-pnorm(z_statistic)
p_value
## [1] 0.001232765
cat("Is p-value .01 significant?", p_value < 0.01)
## Is p-value .01 significant? TRUE
# Z-test: Method 2
summary(lm(log10(worldbank$GDP) ~ (worldbank$compulsory_education)))
## 
## Call:
## lm(formula = log10(worldbank$GDP) ~ (worldbank$compulsory_education))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8743 -0.5969 -0.0735  0.5938  1.9996 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    10.27679    0.28760  35.733  < 2e-16 ***
## worldbank$compulsory_education  0.08456    0.02793   3.028  0.00308 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7819 on 108 degrees of freedom
##   (10 observations deleted due to missingness)
## Multiple R-squared:  0.07823,    Adjusted R-squared:  0.0697 
## F-statistic: 9.166 on 1 and 108 DF,  p-value: 0.003084
z_statistic <- (0.08456-0)/0.02793
z_statistic
## [1] 3.027569
p_value <- 1-pnorm(z_statistic)
p_value
## [1] 0.001232647
cat("Is p-value .01 significant?", p_value < 0.01)
## Is p-value .01 significant? TRUE

Conclusion: Since the p-value is .01 significant, then we reject the null hypothesis and accept the alternative hypothesis. The apparent correlation is not due to chance. The Solow-model appears to be validated by given data.

Business Ease Score vs. GDP

#Figure 4
# GDP vs. Business Ease Score
testing_data <- worldbank %>% 
  filter(business_ease != "NA") %>%
  filter(GDP != "NA")

ease_data <- pull(testing_data, business_ease)
ease_observations <- length(ease_data) # length is 112
gdp_data <- pull(testing_data, GDP)
gdp_observations <- length(gdp_data) # length is 112

sd_ease <- standard_deviation(ease_data)
sd_gdp <- standard_deviation(log10(gdp_data))
r <- cor(log10(gdp_data), ease_data)
se <- sqrt(1-r^2)*sd_gdp / (sqrt(112-2)*sd_ease)
m <- r*sd_gdp / sd_ease

# Z-test: Method 1
z_statistic <- (m - 0)/se
z_statistic
## [1] 7.469643
p_value <- 1-pnorm(z_statistic)
p_value
## [1] 4.019007e-14
cat("Is p-value .01 significant?", p_value < 0.01)
## Is p-value .01 significant? TRUE
# Z-test: Method 2
summary(lm(log10(worldbank$GDP) ~ (worldbank$business_ease)))
## 
## Call:
## lm(formula = log10(worldbank$GDP) ~ (worldbank$business_ease))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.48067 -0.42435 -0.05161  0.49525  1.64097 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             8.834608   0.309468   28.55  < 2e-16 ***
## worldbank$business_ease 0.034643   0.004638    7.47 2.04e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6682 on 110 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.3365, Adjusted R-squared:  0.3305 
## F-statistic:  55.8 on 1 and 110 DF,  p-value: 2.039e-11
z_statistic <- (0.034643-0)/0.004638
z_statistic
## [1] 7.469383
p_value <- 1-pnorm(z_statistic)
p_value
## [1] 4.03011e-14
cat("Is p-value .01 significant?", p_value < 0.01)
## Is p-value .01 significant? TRUE

Conclusion: Since the p-value is .01 significant, then we reject the null hypothesis and accept the alternative hypothesis. The apparent correlation is not due to chance. The Solow-model appears to be validated by given data.

Hypothesis Testing: Two Sample Z-Test

For the following exercise, we suppose that the data of percent of women in government in the provided data file constitutes a simple-random sample of the world.

Gender Parity in Government

To answer the question, is there gender parity in government worldwide? We create a data frame containing percent of men and women serving in parliament. The percent of women serving is given in the data, we subtract it from 100% to estimate the percent of men serving parliament in that same country, for all given countries. Given that the women data was assumed to be from a simple-random sample, we now assume that the derived data for the percent of men in government is also from a simple-random sample

# share of women in gov.
women_in_gov <- worldbank$parliament_women[!is.na(worldbank$parliament_women)]

# share of men in gov.
men_in_gov <- 100 - women_in_gov

# data frame of gender parity in gov
gender_parity_df <- data.frame(
  percent_men = men_in_gov,
  percent_women = women_in_gov)

# check first couple of rows of df
head(gender_parity_df, n=2)
##   percent_men percent_women
## 1    72.14286      27.85714
## 2    74.24242      25.75758
# check last couple of rows of df
tail(gender_parity_df, n=2)
##     percent_men percent_women
## 113    87.34177      12.65823
## 114    78.07018      21.92982
# summary table of gender parity
summary(gender_parity_df)
##   percent_men     percent_women  
##  Min.   : 46.92   Min.   : 0.00  
##  1st Qu.: 69.05   1st Qu.:15.00  
##  Median : 79.12   Median :20.88  
##  Mean   : 77.16   Mean   :22.84  
##  3rd Qu.: 85.00   3rd Qu.:30.95  
##  Max.   :100.00   Max.   :53.08

Null: There is gender parity in government world wide. The percent of women and men in government is equal. The apparent difference can be explained by chance.

Alternative: There are more men serving in parliament than women. The difference cannot be explained by chance. We suppose this due to the fact that women face more financial, educational, and cultural barriers on average worldwide.

# number of rows in gender parity df
sample1 <- 114
sample2 <- 114

# means
mean1 <- mean(gender_parity_df$percent_men, na.rm = TRUE)
mean2 <- mean(gender_parity_df$percent_women, na.rm = TRUE)

# pool sd implied by null
pool_sd <- (1-0)*sqrt(0.5*0.5)

# avg. diff
mean_diff <- mean1 - mean2

# se of men
se1 <- sqrt(sample1)*pool_sd
se_avg1 <- se1 / sample1

# se of women
se2 <- sqrt(sample2)*pool_sd
se_avg2 <- se2 / sample2 

# se diff
se_diff <- sqrt(se_avg1^2 + se_avg2^2)
cat("standard average error difference is", se_diff)
## standard average error difference is 0.06622662
# z-statistic
z_statistic <- (mean_diff - 0) / se_diff
cat("\nz-statistic is", z_statistic)
## 
## z-statistic is 820.1301
p_value <- 1-pnorm(z_statistic)
cat("\np-value is", p_value)
## 
## p-value is 0

Since p-value is 0 we reject the null hypothesis. The apparent difference is not due to chance. There are, on average, more men holding seats in parliament than women. Disproportionate lack of education and opportunity is an explanation.

Part 4) Conclusion

After all the regression hypothesis testing in Part 3, we found that there is a correlation between all available variables closest to the parameters \(A, e, K, L\) of the Solow-model and GDP. In this case, \(K\) was associated with land, \(L\) with labor force, \(e\) with years of compulsory education, and \(A\) with business ease score. We found that chance did not explain the apparent correlation between each variable and GDP. Though not surprising, it is reassuring, and bolsters our credence for the Solow-model of economic growth.

Finally, we determined that there isn’t gender parity with respect to the number of seats held by women in government worldwide. We suspect that the difference could be explained by the hypothesis that women, on average, face greater challenges and barriers in society worldwide.