pacman::p_load(fixest,ivreg,doBy,htmltools,shiny,DT, coefplot)
also installing the dependencies ‘Deriv’, ‘microbenchmark’
trying URL 'https://cran.rstudio.com/src/contrib/Deriv_4.1.3.tar.gz'
Content type 'application/x-gzip' length 37214 bytes (36 KB)
==================================================
downloaded 36 KB
trying URL 'https://cran.rstudio.com/src/contrib/microbenchmark_1.4.9.tar.gz'
Content type 'application/x-gzip' length 59607 bytes (58 KB)
==================================================
downloaded 58 KB
trying URL 'https://cran.rstudio.com/src/contrib/doBy_4.6.14.tar.gz'
Content type 'application/x-gzip' length 4486402 bytes (4.3 MB)
==================================================
downloaded 4.3 MB
* installing *source* package ‘Deriv’ ...
** package ‘Deriv’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (Deriv)
* installing *source* package ‘microbenchmark’ ...
** package ‘microbenchmark’ successfully unpacked and MD5 sums checked
** using staged installation
checking for gcc... x86_64-conda-linux-gnu-cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether x86_64-conda-linux-gnu-cc accepts -g... yes
checking for x86_64-conda-linux-gnu-cc option to accept ISO C89... none needed
checking how to run the C preprocessor... x86_64-conda-linux-gnu-cc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdint.h... (cached) yes
checking for stdlib.h... (cached) yes
checking sys/time.h usability... yes
checking sys/time.h presence... yes
checking for sys/time.h... yes
checking for uint64_t... yes
checking for error_at_line... yes
checking for mach_absolute_time... no
checking for library containing clock_gettime... -lrt
checking for best clockid_t to use with clock_gettime... CLOCK_MONOTONIC_RAW
configure: creating ./config.status
config.status: creating src/Makevars
config.status: creating src/config.h
** libs
x86_64-conda-linux-gnu-cc -I"/opt/conda/lib/R/include" -DNDEBUG -D_POSIX_C_SOURCE=200112L -DHAVE_CONFIG_H -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /opt/conda/include -I/opt/conda/include -Wl,-rpath-link,/opt/conda/lib -fpic -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/conda/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/r-base-split_1630154298835/work=/usr/local/src/conda/r-base-4.1.1 -fdebug-prefix-map=/opt/conda=/usr/local/src/conda-prefix -c do_nothing.c -o do_nothing.o
x86_64-conda-linux-gnu-cc -I"/opt/conda/lib/R/include" -DNDEBUG -D_POSIX_C_SOURCE=200112L -DHAVE_CONFIG_H -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /opt/conda/include -I/opt/conda/include -Wl,-rpath-link,/opt/conda/lib -fpic -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/conda/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/r-base-split_1630154298835/work=/usr/local/src/conda/r-base-4.1.1 -fdebug-prefix-map=/opt/conda=/usr/local/src/conda-prefix -c init.c -o init.o
x86_64-conda-linux-gnu-cc -I"/opt/conda/lib/R/include" -DNDEBUG -D_POSIX_C_SOURCE=200112L -DHAVE_CONFIG_H -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /opt/conda/include -I/opt/conda/include -Wl,-rpath-link,/opt/conda/lib -fpic -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/conda/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/r-base-split_1630154298835/work=/usr/local/src/conda/r-base-4.1.1 -fdebug-prefix-map=/opt/conda=/usr/local/src/conda-prefix -c nanotimer.c -o nanotimer.o
x86_64-conda-linux-gnu-cc -shared -L/opt/conda/lib/R/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -o microbenchmark.so do_nothing.o init.o nanotimer.o -lrt -L/opt/conda/lib/R/lib -lR
installing to /opt/conda/lib/R/library/00LOCK-microbenchmark/00new/microbenchmark/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (microbenchmark)
* installing *source* package ‘doBy’ ...
** package ‘doBy’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** data
*** moving datasets to lazyload DB
** inst
** byte-compile and prepare package for lazy loading
Note: ... may be used in an incorrect context
Note: ... may be used in an incorrect context
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (doBy)
The downloaded source packages are in
‘/tmp/RtmpbUO7kP/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
doBy installed
also installing the dependencies ‘useful’, ‘dygraphs’
trying URL 'https://cran.rstudio.com/src/contrib/useful_1.2.6.tar.gz'
Content type 'application/x-gzip' length 54950 bytes (53 KB)
==================================================
downloaded 53 KB
trying URL 'https://cran.rstudio.com/src/contrib/dygraphs_1.1.1.6.tar.gz'
Content type 'application/x-gzip' length 318939 bytes (311 KB)
==================================================
downloaded 311 KB
trying URL 'https://cran.rstudio.com/src/contrib/coefplot_1.2.8.tar.gz'
Content type 'application/x-gzip' length 35905 bytes (35 KB)
==================================================
downloaded 35 KB
* installing *source* package ‘useful’ ...
** package ‘useful’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (useful)
* installing *source* package ‘dygraphs’ ...
** package ‘dygraphs’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (dygraphs)
* installing *source* package ‘coefplot’ ...
** package ‘coefplot’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (coefplot)
The downloaded source packages are in
‘/tmp/RtmpbUO7kP/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
coefplot installed
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(ggplot2)
library(tidyr)
library(ivreg)
library(dplyr)
Attaching package: ‘dplyr’
The following object is masked from ‘package:doBy’:
order_by
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(fixest)
library(doBy)
library(stargazer)
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(ipumsr)
library(coefplot)
setwd("/home/jovyan/ECON-448_COURSE_MATERIALS")
## install.packages("ipumsr")
##This code will read in the ipums data using the data dictionary. When you look at your data, the variables will have labels (as will the values).
## You will also get a nice data dictionary in RStudio's viewer pane (bottom right)
ddi <- read_ipums_ddi("ipumsi_00009.xml") ##Use whatever title your file has when it downloads, but the name here should be correct for this exercise.
ipumsr::ipums_view(ddi)
df_ipums <- read_ipums_micro(ddi)
Use of data from IPUMS-International is subject to conditions including that
users should cite the data appropriately. Use command `ipums_conditions()` for
more details.
df_analysis <-as.data.frame(df_ipums)
head(df_analysis)
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
#Much of the work we do to analyze data is actually processing (or cleaning) raw data to get it ready to be analyzed. This often means creating new variables, re-scaling existing variables, and even changing the unit of observation in each row. Let’s start by just looking at the data and the range of values. These can be cross-referenced with the data dictionary.
stargazer(df_analysis, type = "text")
=====================================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------------
COUNTRY 5,778 72.000 0.000 72 72
YEAR 5,778 2,011.000 0.000 2,011 2,011
SAMPLE 5,778 72,201,101.000 0.000 72,201,101 72,201,101
SERIAL 5,778 28,818,525.000 15,908,041.000 3,000 61,644,000
HHWT 5,778 61.989 0.816 0 62
OWNERSHIP 5,778 1.323 0.572 0 9
OWNERSHIPD 5,778 158.956 56.817 0 999
ELECTRIC 5,778 1.442 0.517 0 2
CELL 5,778 1.071 0.295 0 2
INTERNET 5,778 1.043 0.249 0 2
TV 5,778 15.559 5.174 0 20
RADIO 5,778 1.623 0.506 0 2
TOILET 5,778 19.759 4.558 0 23
FLOOR 5,778 191.630 41.364 0 236
PERNUM 5,778 1.975 1.546 1 19
PERWT 5,778 62.000 0.000 62 62
AGE 5,778 42.705 17.268 15 94
SEX 5,778 2.000 0.000 2 2
MARST 5,778 2.304 0.695 2 4
MARSTD 5,778 244.694 65.862 210 400
CHBORN 5,778 3.501 2.756 0 14
CHSURV 5,778 13.336 29.273 0 99
CHBORNF 5,778 1.742 1.654 0 9
CHBORNM 5,778 1.758 1.670 0 9
RELIGION 5,778 5.529 1.546 1 9
RELIGIOND 5,778 5,535.263 1,554.382 1,000 9,999
YRSCHOOL 5,778 10.251 15.616 0 98
EMPSTAT 5,778 1.919 0.955 1 9
EMPSTATD 5,778 202.388 97.422 110 999
LABFORCE 5,778 1.593 0.506 1 8
YRSCHOOL_SP 2,883 12.713 20.909 0 99
---------------------------------------------------------------------
#We see that a lot of people have values like 9 or 99 for these variables. Check the data dictionary to be sure, but these are usually codes that mean “don’t know.” We don’t want to include these values in our analysis, so we have to figure out how to treat them. The easiest thing to do is to eliminate the individuals that have these values from our analysis data. Another option would be to set those values to N/A, but we can just filter them out for future computational ease.
#In addition to 99, I see that some of the education variables are coded as 90, 91, etc. I will filter those out as well.
df_analysis<-filter(df_analysis, YRSCHOOL<90) #Several complicated codes over 90 here, so filtering these out
df_analysis<-filter(df_analysis, YRSCHOOL_SP<90)
df_analysis<-filter(df_analysis, CHBORN<31) #Values over 30 are don't knows
stargazer(df_analysis, type = "text")
=====================================================================
Statistic N Mean St. Dev. Min Max
---------------------------------------------------------------------
COUNTRY 2,646 72.000 0.000 72 72
YEAR 2,646 2,011.000 0.000 2,011 2,011
SAMPLE 2,646 72,201,101.000 0.000 72,201,101 72,201,101
SERIAL 2,646 26,935,105.000 16,266,070.000 3,000 61,172,000
HHWT 2,646 62.000 0.000 62 62
OWNERSHIP 2,646 1.394 0.596 0 9
OWNERSHIPD 2,646 165.420 59.086 0 999
ELECTRIC 2,646 1.451 0.508 0 2
CELL 2,646 1.068 0.273 0 2
INTERNET 2,646 1.065 0.267 0 2
TV 2,646 15.896 5.026 0 20
RADIO 2,646 1.665 0.483 0 2
TOILET 2,646 19.656 4.502 0 23
FLOOR 2,646 191.871 40.091 0 236
PERNUM 2,646 2.114 1.345 1 18
PERWT 2,646 62.000 0.000 62 62
AGE 2,646 39.301 13.684 15 94
SEX 2,646 2.000 0.000 2 2
MARST 2,646 2.003 0.070 2 4
MARSTD 2,646 215.367 8.399 210 400
CHBORN 2,646 3.329 2.616 0 14
CHSURV 2,646 13.271 29.264 0 99
CHBORNF 2,646 1.637 1.589 0 9
CHBORNM 2,646 1.689 1.619 0 9
RELIGION 2,646 5.548 1.537 1 9
RELIGIOND 2,646 5,556.017 1,548.194 1,000 9,999
YRSCHOOL 2,646 8.070 4.779 0 18
EMPSTAT 2,646 1.882 0.950 1 9
EMPSTATD 2,646 198.126 96.365 110 999
LABFORCE 2,646 1.614 0.503 1 8
YRSCHOOL_SP 2,646 7.547 5.319 0 18
---------------------------------------------------------------------
#3.1
stargazer(df_analysis[c("CHBORN", "AGE", "YRSCHOOL", "EMPLOYED")], type = "text", summary.stat = c("n","mean","sd", "min", "max"), covariate.labels = c("Children Ever Born", "Age", "Years of School", "Dummary variale for employed status"), digits = 2, title="Summary Statistics", notes="Note: Data are 1% sample of ever-married women 15+ from 2011 Botswana census. Source: IPUMS International")
Summary Statistics
==============================================================================================================
Statistic N Mean St. Dev. Min Max
--------------------------------------------------------------------------------------------------------------
Children Ever Born 2,646 3.33 2.62 0 14
Age 2,646 39.30 13.68 15 94
Years of School 2,646 8.07 4.78 0 18
Dummary variale for employed status 2,646 0.51 0.50 0 1
--------------------------------------------------------------------------------------------------------------
Note: Data are 1% sample of ever-married women 15+ from 2011 Botswana census. Source: IPUMS International
#3.2
#Create a new data frame that has mean years of schooling by age.
##Note: I am using pipes (the %>%) to get R to do what I want. summaryBY would be another command you could use without piping.
df_means_by_CHBORN<- df_analysis %>%
group_by(CHBORN) %>%
summarise(YRSCHOOL.mean=mean(YRSCHOOL)) %>%
as.data.frame()
#Alternative code w/out pipes:
#df_means_by_age <- #summaryBy(CHILDREN+CMR+LIT+EMPSTAT+YRSCHOOL_SP+EMPSTAT_SP ~ YRSCHOOL, #FUN=c(mean),na.rm=TRUE, data=df_analysis)
head(df_means_by_CHBORN)
#Another way to look at the data is by plotting it.
#Can you look at number of children by years of schooling instead?
ggplot(df_means_by_CHBORN, aes(x=CHBORN,y=YRSCHOOL.mean))+geom_point()+geom_smooth(method=lm, formula= y~poly(x,2))
#3.3
m_ols<-lm(CHBORN~YRSCHOOL, data=df_analysis)
stargazer(m_ols, type="text")
Warning: length of NULL cannot be changedWarning: length of NULL cannot be changedWarning: length of NULL cannot be changedWarning: length of NULL cannot be changedWarning: length of NULL cannot be changed
===============================================
Dependent variable:
---------------------------
CHBORN
-----------------------------------------------
YRSCHOOL -0.268***
(0.009)
Constant 5.490***
(0.087)
-----------------------------------------------
Observations 2,646
R2 0.239
Adjusted R2 0.239
Residual Std. Error 2.282 (df = 2644)
F Statistic 831.654*** (df = 1; 2644)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
#3.4 ## I will choose the dummy variable of employ which is euqal to 1 if employed, otherwise, it will be zero. Also, I want to choose age. I expect the coefficient of YEARSCHOOL is negative, the coefficient of dummy variable employstat is negative, the coefficient of age is positive.For the yearschool, it is shown in the previous regression model that the coefficient is negative. For the dummy variable employed, I expect it to be negative since if women are employed, they need to spend more time on work instead of spending time on borning and raising their children. As a result, it is unlikely for them to have more kids with employed status,according to the Time costs of children model. For the age variable, since older people trend to have more children than younger people due to the society. For those older people, they need to have more children to support their future life, which is explained by the Investment model of fertility. However, for younger women, their fertility model are more likely to be others such as the quality as a substitute for quantity model. If people have higher ages, they will have more children. As a result, the coefficient of age I expect is positive.
#3.5
##Recode to a dummy variable that is 1 if the person is currently employed, and 0 otherwise. Currently EMPSTAT=2 means unemployed, =3 means inactive, and =9 means unknown/missing.
df_analysis$EMPLOYED<-ifelse(df_analysis$EMPSTAT==1,1,0)
summary(df_analysis$EMPLOYED)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 1.0000 0.5091 1.0000 1.0000
#Now use feols to create a series of dummy variables for each age (each coefficient + the intercept will be the average education at that age).
f_ols<-feols(CHBORN~YRSCHOOL+EMPLOYED+AGE, data=df_analysis)
summary(f_ols)
OLS estimation, Dep. Var.: CHBORN
Observations: 2,646
Standard-errors: IID
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 2.01246 Adj. R2: 0.407329
##The estimated value of childborn = -0.143456yearschool + 0.089433age -0.096561 + 1.020911. The coefficients match my expectations. The coefficient on education(yearschool) change once I add the control variables and it becomes less negative.