Panel Data

Main

GDP Per Capital that might be contributed by women.

Tasks:

Find other research. Search terms for: research on women’s contribution to GDP, there is already research about women’s contribution to GDP, Policies that might increase the GDP per women, Polices for womens advance ment in society. The idea is to see the effects of goverment policies for women in GDP per capital.
get the variables correctly formatted and fix the NAs.
Get a list of women’s laws and see changes trough time. https://wbl.worldbank.org/. https://wbl.worldbank.org/en/aboutus.
Create a website/dashboard.
Do indiferential statistics.
talk more deeply about the x variables.
Learn on when to do pool. When to do fix and random model.
start putting citation for example
why you decided to use panel. Write about the Aliansa when did they start etc. State your H-0 and h_1. find out what log does to the data.

Links:

https://www.statista.com/statistics/519486/gender-parity-across-world-regions-in-2015/. find if this is legit, find what variables they are using.
cool little facts about women contribution to GDP: https://www.unwomen.org/en/what-we-do/economic-empowerment/facts-and-figures#notes.
There is also this: “Women’s Workplace Equality Index”.

Proposition

The reason i am choosing Colombia, Chile, Mexico, Peru. * These countries are high on the list of countries with high inequality for women. get citations, and data
* These countries are also working on fixing the problem and making it more livebable for the female species.
* These countries are also helping each other out on the same topic. They have sign a bilateral agreement on economical and social cooperation. The treaty is called the pacific alliance. get more information

this are the first stepts of many countries becoming one, called trade blocks, unions, economic aliances, etc. For example the erupean union.

The data is going to come from the world bank:

“NY.GDP.PCAP.CD”, # GDP per capita (current US)
“SP.POP.TOTL.FE.ZS”, # Population, female (% of total population).
“UIS.NE.1.G1.F”, # New entrants to Grade 1 of primary education, female.
“UIS.E.3.F”, # Enrolment in upper secondary education, F (number).
“SG.GEN.PARL.ZS”,# Proportion of seats held by women in national parliament.
“SL.TLF.TOTL.FE.ZS”, # Labor force, female (% of total labor force).
“SP.ADO.TFRT” # Adolescent fertility rate (births per 1,000 women ages 15-19)

The variable we are trying to explain would be GDP per capital over all.

There are not that money variables for females in the world bank for this four countries. Why are they not more data?

Problems with the model:

Some of the variables would give productivity at a future data not at the data the data was take. How can that have an effect on the GDP.
Some countries are bigger than others thus having bigger change.
multicolinearity. regression GDP per capital will have a relation ship against population growth.
First grade children would be correlated with population growth.

The population of female in percentage was used because it would show the amount of labor force contribution to the over all GDP per capital. The reson is because it shows the ove all welth being of the people in these four countirs. The exvariables would try to explaint how much contribution the female species contribute to the gdp per capital. The varaibles choosen where not ideal, but there where the ones with out missing data. The data is runing from 2000 20 16. The reson is because other variables had no data or had many missing time locations.

Population femen of total popultion is because of in one way or another there the female population of a country and they are contributing to the gdp per cakipal. So as the population of females fluctuates there would be a fluctuation of GDP per capital. For example if there is a lower rate of female population there moght be a different in dgp per kapital.

Why use panel data:

Data From the World Bank

The package used to gather data is wbstat.
The code below is gathering data for the fallowing countries. Chile, Colombia, Mexico, and Peru.

\(i's\) = Chile, Colombia, Mexico, Peru.
\(t's\) = 2000 to 2016, annually. \(N\) = 14.
\(x's\) = The indicators correspond to CO2 emissions in metric tons per capital,
\(y\) = GDP per capital (or maybe an index of GDP per capital, vacation, etc. )

df<- wb(country = c("CHL", "COL", "MEX", "PER"),
        indicator = c("NY.GDP.PCAP.CD", # GDP per capita (current US$). growth rate. at 2000
                      "SP.POP.TOTL.FE.ZS", # Population, female (% of total population) 
                      "UIS.NE.1.G1.F", # New entrants to Grade 1 of primary education, female (number) 
                      "UIS.E.3.F", # Enrolment in upper secondary education, female (number)
                      "SG.GEN.PARL.ZS",# Proportion of seats held by women in national parliament
                      "SL.TLF.TOTL.FE.ZS", # Labor force, female (% of total labor force)
                      "SP.ADO.TFRT" # Adolescent fertility rate (births per 1,000 women ages 15-19)
                      ),
        startdate = 2000, enddate = 2015,
        return_wide = T) # It will give the x variables in columns.

second method, standardized residual. larget than 2 or smaller then 2, is concervitive and would say that there is alot of outliers. 3. average statistics. if the number is vary large for that observation you have an x varaible that is far away from the adverage.

Clean the Data

df <- df[ ,-c(1, 3)]%>%
rename("Date" = "date", "Country" = "country",  "GDP" = "NY.GDP.PCAP.CD","Population" = "SP.POP.TOTL.FE.ZS", "Primary Education" = "UIS.NE.1.G1.F", "Enrolment in Upper Secondary Education" = "UIS.E.3.F", "Women in Parlement" = "SG.GEN.PARL.ZS", "Labor Force Female" = "SL.TLF.TOTL.FE.ZS", "Adolescent Fertility Rate" = "SP.ADO.TFRT")
#new_wb_cache <- wbcache()

Analysis

Descriptive Statistics

sum() Returns the sum of all the values present in its arguments.
mean() Generates the arithmetic mean.
geoMean() Finds the geometric mean.
var() Finds the varaince.
cor() Finds correlation.
cov() Finds the covaraince.
sd()
range()
min() Finds the minimum of a data set.
max()Finds the max of a data set.
median() Finds the median.

Set data as pannel data

pdata = pdata.frame(df, index = c("Country", "Date")) #breakes the data into cross "id" & time dimentino.
# Before setting the data into panel there needs to be a column name id and t which correspond to the ids and ts.
write.csv(pdata, file = "pdata.csv")

Pooling OLS estimation

pooldata <- plm(formula =  pdata$`GDP` ~ pdata$`Population` + pdata$`Enrolment in Upper Secondary Education` + pdata$`Primary Education` + pdata$`Women in Parlement` + pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, data = pdata, model = "pooling")

summary(pooldata)

## Pooling Model
## 
## Call:
## plm(formula = pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` + 
##     pdata$`Primary Education` + pdata$`Women in Parlement` + 
##     pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, 
##     data = pdata, model = "pooling")
## 
## Unbalanced Panel: n = 4, T = 14-16, N = 60
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -3622.686  -569.746   -90.958   638.828  4190.245 
## 
## Coefficients:
##                                                   Estimate  Std. Error t-value
## (Intercept)                                    -4.3360e+05  7.8984e+04 -5.4897
## pdata$Population                                8.8603e+03  1.5700e+03  5.6433
## pdata$`Enrolment in Upper Secondary Education` -5.2157e-03  2.7836e-03 -1.8737
## pdata$`Primary Education`                       1.1122e-02  3.6443e-03  3.0519
## pdata$`Women in Parlement`                     -1.3491e+00  7.1619e+01 -0.0188
## pdata$`Labor Force Female`                      4.4939e+02  9.9580e+01  4.5129
## pdata$`Adolescent Fertility Rate`              -4.1382e+02  4.9705e+01 -8.3256
##                                                 Pr(>|t|)    
## (Intercept)                                    1.159e-06 ***
## pdata$Population                               6.648e-07 ***
## pdata$`Enrolment in Upper Secondary Education`   0.06649 .  
## pdata$`Primary Education`                        0.00355 ** 
## pdata$`Women in Parlement`                       0.98504    
## pdata$`Labor Force Female`                     3.590e-05 ***
## pdata$`Adolescent Fertility Rate`              3.391e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    741830000
## Residual Sum of Squares: 134070000
## R-Squared:      0.81928
## Adj. R-Squared: 0.79882
## F-statistic: 40.0445 on 6 and 53 DF, p-value: < 2.22e-16

# Does not take into acount heterohesditicity.

Between Estimator

between <- plm(formula =  pdata$`GDP` ~ pdata$`Population` + pdata$`Enrolment in Upper Secondary Education` + pdata$`Primary Education` + pdata$`Women in Parlement` + pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, data = pdata, model = "between")

summary(between)

## Oneway (individual) effect Between Model
## 
## Call:
## plm(formula = pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` + 
##     pdata$`Primary Education` + pdata$`Women in Parlement` + 
##     pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, 
##     data = pdata, model = "between")
## 
## Unbalanced Panel: n = 4, T = 14-16, N = 60
## Observations used in estimation: 4
## 
## Residuals:
## ALL 4 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (3 dropped because of singularities)
##                                                   Estimate Std. Error t-value
## (Intercept)                                    -1.2666e+05         NA      NA
## pdata$Population                                2.6139e+03         NA      NA
## pdata$`Enrolment in Upper Secondary Education`  1.3659e-02         NA      NA
## pdata$`Primary Education`                      -2.0375e-02         NA      NA
##                                                Pr(>|t|)
## (Intercept)                                          NA
## pdata$Population                                     NA
## pdata$`Enrolment in Upper Secondary Education`       NA
## pdata$`Primary Education`                            NA
## 
## Total Sum of Squares:    25510000
## Residual Sum of Squares: 0
## R-Squared:      1
## Adj. R-Squared: NaN
## F-statistic: NaN on 3 and 0 DF, p-value: NA

First Differences Estimator

firstdiff <- plm(formula =  pdata$`GDP` ~ pdata$`Population` + pdata$`Enrolment in Upper Secondary Education` + pdata$`Primary Education` + pdata$`Women in Parlement` + pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, data = pdata, model = "fd")

summary(firstdiff)

## Oneway (individual) effect First-Difference Model
## 
## Call:
## plm(formula = pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` + 
##     pdata$`Primary Education` + pdata$`Women in Parlement` + 
##     pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, 
##     data = pdata, model = "fd")
## 
## Unbalanced Panel: n = 4, T = 14-16, N = 60
## Observations used in estimation: 56
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -2412.944  -394.884    89.717   515.660  1976.955 
## 
## Coefficients:
##                                                   Estimate  Std. Error t-value
## (Intercept)                                     6.1423e+02  2.5582e+02  2.4010
## pdata$Population                                1.4458e+02  7.1774e+03  0.0201
## pdata$`Enrolment in Upper Secondary Education` -8.4250e-04  3.7398e-03 -0.2253
## pdata$`Primary Education`                      -4.4979e-03  5.8751e-03 -0.7656
## pdata$`Women in Parlement`                     -8.4182e+01  4.1198e+01 -2.0434
## pdata$`Labor Force Female`                      1.1814e+02  3.3348e+02  0.3543
## pdata$`Adolescent Fertility Rate`               1.9506e+02  1.3104e+02  1.4886
##                                                Pr(>|t|)  
## (Intercept)                                     0.02019 *
## pdata$Population                                0.98401  
## pdata$`Enrolment in Upper Secondary Education`  0.82270  
## pdata$`Primary Education`                       0.44760  
## pdata$`Women in Parlement`                      0.04641 *
## pdata$`Labor Force Female`                      0.72465  
## pdata$`Adolescent Fertility Rate`               0.14301  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    39205000
## Residual Sum of Squares: 34105000
## R-Squared:      0.13007
## Adj. R-Squared: 0.023547
## F-statistic: 1.22105 on 6 and 49 DF, p-value: 0.31174

Fix Effect

FixEff = plm(formula =  pdata$`GDP` ~ pdata$`Population` + pdata$`Enrolment in Upper Secondary Education` + pdata$`Primary Education` + pdata$`Women in Parlement` + pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, data = pdata, model = "within")

summary(FixEff)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` + 
##     pdata$`Primary Education` + pdata$`Women in Parlement` + 
##     pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, 
##     data = pdata, model = "within")
## 
## Unbalanced Panel: n = 4, T = 14-16, N = 60
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -2056.765  -443.606    66.846   456.732  1472.914 
## 
## Coefficients:
##                                                   Estimate  Std. Error t-value
## pdata$Population                               -1.9445e+02  3.2505e+03 -0.0598
## pdata$`Enrolment in Upper Secondary Education` -2.5537e-03  1.6109e-03 -1.5853
## pdata$`Primary Education`                      -4.0641e-03  5.4584e-03 -0.7446
## pdata$`Women in Parlement`                     -1.0397e+02  4.2019e+01 -2.4744
## pdata$`Labor Force Female`                      1.4539e+03  1.1178e+02 13.0073
## pdata$`Adolescent Fertility Rate`              -1.1999e+02  4.7259e+01 -2.5390
##                                                Pr(>|t|)    
## pdata$Population                                0.95253    
## pdata$`Enrolment in Upper Secondary Education`  0.11921    
## pdata$`Primary Education`                       0.46003    
## pdata$`Women in Parlement`                      0.01678 *  
## pdata$`Labor Force Female`                      < 2e-16 ***
## pdata$`Adolescent Fertility Rate`               0.01428 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    360110000
## Residual Sum of Squares: 40350000
## R-Squared:      0.88795
## Adj. R-Squared: 0.86778
## F-statistic: 66.0388 on 6 and 50 DF, p-value: < 2.22e-16

Random Effect estimator

Lagrange Multiplier test between random and OLS

plmtest(pooldata)

## 
##  Lagrange Multiplier Test - (Honda) for unbalanced panels
## 
## data:  pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` +  ...
## normal = -0.56452, p-value = 0.7138
## alternative hypothesis: significant effects

Lagrange Multiplier test between fixed and OLS

pFtest(FixEff, pooldata)

## 
##  F test for individual effects
## 
## data:  pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` +  ...
## F = 38.709, df1 = 3, df2 = 50, p-value = 4.434e-13
## alternative hypothesis: significant effects

Housman Test for fixed and random effect

Visualising the Data

Maping the Data

https://cran.r-project.org/web/packages/ExPanDaR/vignettes/use_ExPanD.html, This is an add in