Main
GDP Per Capital that might be contributed by women.
Tasks:
- Find other research. Search terms for: research on women’s contribution to GDP, there is already research about women’s contribution to GDP, Policies that might increase the GDP per women, Polices for womens advance ment in society. The idea is to see the effects of goverment policies for women in GDP per capital.
- get the variables correctly formatted and fix the NAs.
- Get a list of women’s laws and see changes trough time. https://wbl.worldbank.org/. https://wbl.worldbank.org/en/aboutus.
- Create a website/dashboard.
- Do indiferential statistics.
- talk more deeply about the x variables.
- Learn on when to do pool. When to do fix and random model.
- start putting citation for example
- why you decided to use panel. Write about the Aliansa when did they start etc. State your H-0 and h_1. find out what log does to the data.
Links:
Proposition
The reason i am choosing Colombia, Chile, Mexico, Peru. * These countries are high on the list of countries with high inequality for women. get citations, and data
* These countries are also working on fixing the problem and making it more livebable for the female species.
* These countries are also helping each other out on the same topic. They have sign a bilateral agreement on economical and social cooperation. The treaty is called the pacific alliance. get more information
this are the first stepts of many countries becoming one, called trade blocks, unions, economic aliances, etc. For example the erupean union.
The data is going to come from the world bank:
- “NY.GDP.PCAP.CD”, # GDP per capita (current US)
- “SP.POP.TOTL.FE.ZS”, # Population, female (% of total population).
- “UIS.NE.1.G1.F”, # New entrants to Grade 1 of primary education, female.
- “UIS.E.3.F”, # Enrolment in upper secondary education, F (number).
- “SG.GEN.PARL.ZS”,# Proportion of seats held by women in national parliament.
- “SL.TLF.TOTL.FE.ZS”, # Labor force, female (% of total labor force).
- “SP.ADO.TFRT” # Adolescent fertility rate (births per 1,000 women ages 15-19)
The variable we are trying to explain would be GDP per capital over all.
There are not that money variables for females in the world bank for this four countries. Why are they not more data?
Problems with the model:
- Some of the variables would give productivity at a future data not at the data the data was take. How can that have an effect on the GDP.
- Some countries are bigger than others thus having bigger change.
- multicolinearity. regression GDP per capital will have a relation ship against population growth.
- First grade children would be correlated with population growth.
The population of female in percentage was used because it would show the amount of labor force contribution to the over all GDP per capital. The reson is because it shows the ove all welth being of the people in these four countirs. The exvariables would try to explaint how much contribution the female species contribute to the gdp per capital. The varaibles choosen where not ideal, but there where the ones with out missing data. The data is runing from 2000 20 16. The reson is because other variables had no data or had many missing time locations.
Population femen of total popultion is because of in one way or another there the female population of a country and they are contributing to the gdp per cakipal. So as the population of females fluctuates there would be a fluctuation of GDP per capital. For example if there is a lower rate of female population there moght be a different in dgp per kapital.
Why use panel data:
Data From the World Bank
The package used to gather data is wbstat.
The code below is gathering data for the fallowing countries. Chile, Colombia, Mexico, and Peru.
- \(i's\) = Chile, Colombia, Mexico, Peru.
- \(t's\) = 2000 to 2016, annually. \(N\) = 14.
- \(x's\) = The indicators correspond to CO2 emissions in metric tons per capital,
- \(y\) = GDP per capital (or maybe an index of GDP per capital, vacation, etc. )
df<- wb(country = c("CHL", "COL", "MEX", "PER"),
indicator = c("NY.GDP.PCAP.CD", # GDP per capita (current US$). growth rate. at 2000
"SP.POP.TOTL.FE.ZS", # Population, female (% of total population)
"UIS.NE.1.G1.F", # New entrants to Grade 1 of primary education, female (number)
"UIS.E.3.F", # Enrolment in upper secondary education, female (number)
"SG.GEN.PARL.ZS",# Proportion of seats held by women in national parliament
"SL.TLF.TOTL.FE.ZS", # Labor force, female (% of total labor force)
"SP.ADO.TFRT" # Adolescent fertility rate (births per 1,000 women ages 15-19)
),
startdate = 2000, enddate = 2015,
return_wide = T) # It will give the x variables in columns.
second method, standardized residual. larget than 2 or smaller then 2, is concervitive and would say that there is alot of outliers. 3. average statistics. if the number is vary large for that observation you have an x varaible that is far away from the adverage.
Clean the Data
df <- df[ ,-c(1, 3)]%>%
rename("Date" = "date", "Country" = "country", "GDP" = "NY.GDP.PCAP.CD","Population" = "SP.POP.TOTL.FE.ZS", "Primary Education" = "UIS.NE.1.G1.F", "Enrolment in Upper Secondary Education" = "UIS.E.3.F", "Women in Parlement" = "SG.GEN.PARL.ZS", "Labor Force Female" = "SL.TLF.TOTL.FE.ZS", "Adolescent Fertility Rate" = "SP.ADO.TFRT")
#new_wb_cache <- wbcache()
Analysis
Descriptive Statistics
sum() Returns the sum of all the values present in its arguments.
mean() Generates the arithmetic mean.
geoMean() Finds the geometric mean.
var() Finds the varaince.
cor() Finds correlation.
cov() Finds the covaraince.
sd()
range()
min() Finds the minimum of a data set.
max()Finds the max of a data set.
median() Finds the median.
Set data as pannel data
pdata = pdata.frame(df, index = c("Country", "Date")) #breakes the data into cross "id" & time dimentino.
# Before setting the data into panel there needs to be a column name id and t which correspond to the ids and ts.
write.csv(pdata, file = "pdata.csv")
Pooling OLS estimation
pooldata <- plm(formula = pdata$`GDP` ~ pdata$`Population` + pdata$`Enrolment in Upper Secondary Education` + pdata$`Primary Education` + pdata$`Women in Parlement` + pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, data = pdata, model = "pooling")
summary(pooldata)
## Pooling Model
##
## Call:
## plm(formula = pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` +
## pdata$`Primary Education` + pdata$`Women in Parlement` +
## pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`,
## data = pdata, model = "pooling")
##
## Unbalanced Panel: n = 4, T = 14-16, N = 60
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -3622.686 -569.746 -90.958 638.828 4190.245
##
## Coefficients:
## Estimate Std. Error t-value
## (Intercept) -4.3360e+05 7.8984e+04 -5.4897
## pdata$Population 8.8603e+03 1.5700e+03 5.6433
## pdata$`Enrolment in Upper Secondary Education` -5.2157e-03 2.7836e-03 -1.8737
## pdata$`Primary Education` 1.1122e-02 3.6443e-03 3.0519
## pdata$`Women in Parlement` -1.3491e+00 7.1619e+01 -0.0188
## pdata$`Labor Force Female` 4.4939e+02 9.9580e+01 4.5129
## pdata$`Adolescent Fertility Rate` -4.1382e+02 4.9705e+01 -8.3256
## Pr(>|t|)
## (Intercept) 1.159e-06 ***
## pdata$Population 6.648e-07 ***
## pdata$`Enrolment in Upper Secondary Education` 0.06649 .
## pdata$`Primary Education` 0.00355 **
## pdata$`Women in Parlement` 0.98504
## pdata$`Labor Force Female` 3.590e-05 ***
## pdata$`Adolescent Fertility Rate` 3.391e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 741830000
## Residual Sum of Squares: 134070000
## R-Squared: 0.81928
## Adj. R-Squared: 0.79882
## F-statistic: 40.0445 on 6 and 53 DF, p-value: < 2.22e-16
# Does not take into acount heterohesditicity.
Between Estimator
between <- plm(formula = pdata$`GDP` ~ pdata$`Population` + pdata$`Enrolment in Upper Secondary Education` + pdata$`Primary Education` + pdata$`Women in Parlement` + pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, data = pdata, model = "between")
summary(between)
## Oneway (individual) effect Between Model
##
## Call:
## plm(formula = pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` +
## pdata$`Primary Education` + pdata$`Women in Parlement` +
## pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`,
## data = pdata, model = "between")
##
## Unbalanced Panel: n = 4, T = 14-16, N = 60
## Observations used in estimation: 4
##
## Residuals:
## ALL 4 residuals are 0: no residual degrees of freedom!
##
## Coefficients: (3 dropped because of singularities)
## Estimate Std. Error t-value
## (Intercept) -1.2666e+05 NA NA
## pdata$Population 2.6139e+03 NA NA
## pdata$`Enrolment in Upper Secondary Education` 1.3659e-02 NA NA
## pdata$`Primary Education` -2.0375e-02 NA NA
## Pr(>|t|)
## (Intercept) NA
## pdata$Population NA
## pdata$`Enrolment in Upper Secondary Education` NA
## pdata$`Primary Education` NA
##
## Total Sum of Squares: 25510000
## Residual Sum of Squares: 0
## R-Squared: 1
## Adj. R-Squared: NaN
## F-statistic: NaN on 3 and 0 DF, p-value: NA
First Differences Estimator
firstdiff <- plm(formula = pdata$`GDP` ~ pdata$`Population` + pdata$`Enrolment in Upper Secondary Education` + pdata$`Primary Education` + pdata$`Women in Parlement` + pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, data = pdata, model = "fd")
summary(firstdiff)
## Oneway (individual) effect First-Difference Model
##
## Call:
## plm(formula = pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` +
## pdata$`Primary Education` + pdata$`Women in Parlement` +
## pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`,
## data = pdata, model = "fd")
##
## Unbalanced Panel: n = 4, T = 14-16, N = 60
## Observations used in estimation: 56
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2412.944 -394.884 89.717 515.660 1976.955
##
## Coefficients:
## Estimate Std. Error t-value
## (Intercept) 6.1423e+02 2.5582e+02 2.4010
## pdata$Population 1.4458e+02 7.1774e+03 0.0201
## pdata$`Enrolment in Upper Secondary Education` -8.4250e-04 3.7398e-03 -0.2253
## pdata$`Primary Education` -4.4979e-03 5.8751e-03 -0.7656
## pdata$`Women in Parlement` -8.4182e+01 4.1198e+01 -2.0434
## pdata$`Labor Force Female` 1.1814e+02 3.3348e+02 0.3543
## pdata$`Adolescent Fertility Rate` 1.9506e+02 1.3104e+02 1.4886
## Pr(>|t|)
## (Intercept) 0.02019 *
## pdata$Population 0.98401
## pdata$`Enrolment in Upper Secondary Education` 0.82270
## pdata$`Primary Education` 0.44760
## pdata$`Women in Parlement` 0.04641 *
## pdata$`Labor Force Female` 0.72465
## pdata$`Adolescent Fertility Rate` 0.14301
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 39205000
## Residual Sum of Squares: 34105000
## R-Squared: 0.13007
## Adj. R-Squared: 0.023547
## F-statistic: 1.22105 on 6 and 49 DF, p-value: 0.31174
Fix Effect
FixEff = plm(formula = pdata$`GDP` ~ pdata$`Population` + pdata$`Enrolment in Upper Secondary Education` + pdata$`Primary Education` + pdata$`Women in Parlement` + pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`, data = pdata, model = "within")
summary(FixEff)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` +
## pdata$`Primary Education` + pdata$`Women in Parlement` +
## pdata$`Labor Force Female` + pdata$`Adolescent Fertility Rate`,
## data = pdata, model = "within")
##
## Unbalanced Panel: n = 4, T = 14-16, N = 60
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2056.765 -443.606 66.846 456.732 1472.914
##
## Coefficients:
## Estimate Std. Error t-value
## pdata$Population -1.9445e+02 3.2505e+03 -0.0598
## pdata$`Enrolment in Upper Secondary Education` -2.5537e-03 1.6109e-03 -1.5853
## pdata$`Primary Education` -4.0641e-03 5.4584e-03 -0.7446
## pdata$`Women in Parlement` -1.0397e+02 4.2019e+01 -2.4744
## pdata$`Labor Force Female` 1.4539e+03 1.1178e+02 13.0073
## pdata$`Adolescent Fertility Rate` -1.1999e+02 4.7259e+01 -2.5390
## Pr(>|t|)
## pdata$Population 0.95253
## pdata$`Enrolment in Upper Secondary Education` 0.11921
## pdata$`Primary Education` 0.46003
## pdata$`Women in Parlement` 0.01678 *
## pdata$`Labor Force Female` < 2e-16 ***
## pdata$`Adolescent Fertility Rate` 0.01428 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 360110000
## Residual Sum of Squares: 40350000
## R-Squared: 0.88795
## Adj. R-Squared: 0.86778
## F-statistic: 66.0388 on 6 and 50 DF, p-value: < 2.22e-16
Random Effect estimator
Lagrange Multiplier test between random and OLS
plmtest(pooldata)
##
## Lagrange Multiplier Test - (Honda) for unbalanced panels
##
## data: pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` + ...
## normal = -0.56452, p-value = 0.7138
## alternative hypothesis: significant effects
Lagrange Multiplier test between fixed and OLS
pFtest(FixEff, pooldata)
##
## F test for individual effects
##
## data: pdata$GDP ~ pdata$Population + pdata$`Enrolment in Upper Secondary Education` + ...
## F = 38.709, df1 = 3, df2 = 50, p-value = 4.434e-13
## alternative hypothesis: significant effects
Housman Test for fixed and random effect
Visualising the Data