Reproducible research in Stata

When students or professors try to teach stata or use stata to analyze the data, most time they write report and run stata code seperately. It’s even become a diaster if you have many graphics as you have to save them firstly and put them into the word documents unless you know how to do markdown in Stata, or write annoy code in Latex way.

Now, take a look what I have done by writing Stata code in Rstudio and convert Rmarkdown into htmal, word, pdf.

So cool, any format you want in Stata code, just do it!

Set the engine and working directory

setwd('Z:/L14009/Dataset_part_1')

First of all, open Rstuido, open a new R Markdwon and set the engine by the following code:

library(knitr)

statapath <- 'C:/Users/lexfw5/AppData/Local/Microsoft/AppV/Client/Integration/23C3E6EE-8FC3-40A0-B00E-E88CAE052CA2/Root/VFS/ProgramFilesX86/Stata15/StataSE-64.exe'

opts_chunk$set(engine="stata", engine.path=statapath, comment="")

Make sure you can find the stata engine in lab from University or from your onw computer.

2 Simple STATA Code example


 cd "Z:\L14009\Dataset_part_1"

sysuse auto.dta


describe

summarize price mpg 

graph box price, over(mpg)
graph export "boxplot.png", replace


. 
.  cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1

. 
. sysuse auto.dta
(1978 Automobile Data)

. 
. 
. describe

Contains data from C:\ProgramData\App-V\23C3E6EE-8FC3-40A0-B00E-E88CAE052CA2\A2
> B88160-AA03-4753-8241-EE4D929DFCC7\Root\VFS\PROGRA~2\Stata15\ado\base/a/auto.
> dta
  obs:            74                          1978 Automobile Data
 vars:            12                          13 Apr 2016 17:45
 size:         3,182                          (_dta has notes)
-------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
make            str18   %-18s                 Make and Model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair Record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g                 Length (in.)
turn            int     %8.0g                 Turn Circle (ft.)
displacement    int     %8.0g                 Displacement (cu. in.)
gear_ratio      float   %6.2f                 Gear Ratio
foreign         byte    %8.0g      origin     Car type
-------------------------------------------------------------------------------
Sorted by: foreign

. 
. summarize price mpg 

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41

. 
. graph box price, over(mpg)

. graph export "boxplot.png", replace
(file boxplot.png written in PNG format)

. 

example boxplot

Adding more pictures is not so difficult like that.

cd "Z:\L14009\Dataset_part_1"

use bwght.dta

tabulate male, summarize(bwght)

use mus02psid92m.dta 


kdensity lnearns, normal bwidth(0.2) n(4000)

graph export "dens_lnearns.png", replace


. cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1

. 
. use bwght.dta

. 
. tabulate male, summarize(bwght)

 =1 if male |  Summary of birth weight, grammes
      child |        Mean   Std. Dev.       Freq.
------------+------------------------------------
          0 |   3321.5649   576.27992         665
          1 |   3404.9776   575.19393         723
------------+------------------------------------
      Total |    3365.014   577.01456       1,388

. 
. use mus02psid92m.dta 

. 
. 
. kdensity lnearns, normal bwidth(0.2) n(4000)

. 
. graph export "dens_lnearns.png", replace
(file dens_lnearns.png written in PNG format)

. 

example Density curve

2.1 Let’s run a regresion

cd "Z:\L14009\Dataset_part_1"

use mus03data.dta 

global x suppins phylim actlim totchr age female income 

summarize $x

regress ltotexp $x, vce(robust)

rvfplot

graph export "ltot_residual.png", replace 

predict uhat, residuals 

qnorm uhat 

graph export "ltot_qq.png", replace 


. cd "Z:\L14009\Dataset_part_1"
Z:\L14009\Dataset_part_1

. 
. use mus03data.dta 

. 
. global x suppins phylim actlim totchr age female income 

. 
. summarize $x

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     suppins |      3,064    .5812663    .4934321          0          1
      phylim |      3,064    .4255875    .4945125          0          1
      actlim |      3,064    .2836162    .4508263          0          1
      totchr |      3,064    1.754243    1.307197          0          7
         age |      3,064    74.17167    6.372938         65         90
-------------+---------------------------------------------------------
      female |      3,064    .5796345    .4936982          0          1
      income |      3,064    22.47472    22.53491         -1     312.46

. 
. regress ltotexp $x, vce(robust)

Linear regression                               Number of obs     =      2,955
                                                F(7, 2947)        =     126.97
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2289
                                                Root MSE          =     1.2023

------------------------------------------------------------------------------
             |               Robust
     ltotexp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     suppins |   .2556428   .0465982     5.49   0.000     .1642744    .3470112
      phylim |   .3020598    .057705     5.23   0.000     .1889136     .415206
      actlim |   .3560054   .0634066     5.61   0.000     .2316797    .4803311
      totchr |   .3758201   .0187185    20.08   0.000     .3391175    .4125228
         age |   .0038016   .0037028     1.03   0.305    -.0034587     .011062
      female |  -.0843275    .045654    -1.85   0.065    -.1738444    .0051894
      income |   .0025498   .0010468     2.44   0.015     .0004973    .0046023
       _cons |   6.703737   .2825751    23.72   0.000     6.149673    7.257802
------------------------------------------------------------------------------

. 
. rvfplot

. 
. graph export "ltot_residual.png", replace 
(file ltot_residual.png written in PNG format)

. 
. predict uhat, residuals 
(109 missing values generated)

. 
. qnorm uhat 

. 
. graph export "ltot_qq.png", replace 
(file ltot_qq.png written in PNG format)

. 

example Residual scatter

example QQ_plot

See the outliers in qqplot, so obvious.

Conclusion

It is fun,right ?

I cannot wait for posting this in my facebok. hahah ^_^

Later I will post all the lab codes with analysis and graph in my facebook, please stay tunes.