Exporting Output of Statistical Analysis

This tutorial provides a generic process for automating the process of exporting the results of statistical analysis to a report. The tutorial will provide an in-depth discussion of one example which will provide the background to introduce the basic concepts.

The example that is used in that of a regression output. Many reports produced by working analysts will include the output of a regression as the central finding. Often as the report goes through revisions, the analysts may have to revise the output as refinements to the database or analysis are made. If the revision process is minimal in nature, it may be more efficient to simply copy and paste the output of the regression into the document each time a revision is made. However, if there are numerous revisions, or if the report is long and complex with many regressions, it will become optimal to automate this process. This is particuraly true, if the analyst prepares a lot of papers and time invested in learning these approaches has time to pay off.

The tutorial is divided into three stages. First, it shows how to divert the output of the statiscal function to an external file. Then there is a discussion of how to bring the output into a Latex environment. Finally, there will be discussion of how to embellish the output.

Send the Output to an External File

The first section here sets up the regression, then demonstrates means for controling the output generated.

Set up regression

This example will use the same data as the tutorial for building tables. This will enable users to replicate the results and validate their understanding. The database used is available with the default R system to allow for easy replication. It is loaded with the data.frame command after the workspace has been cleared with the rm command.

#clear the workspace
rm(list=ls(all=TRUE))
#make dataframe out of preloaded dataset
workset <- data.frame(women)

Run the regression interactively

It is not enough to run the regression. The regression must output the results to an variable. Then a function must be used to extract the desired data. The following examples will demonstrate how best to do this.

Simply running the regression sends the minimal output to the screen:

lm(workset$weight ~ workset$height)
## 
## Call:
## lm(formula = workset$weight ~ workset$height)
## 
## Coefficients:
##    (Intercept)  workset$height  
##         -87.52            3.45

Using the assignment operator sends the output to a variable with no display:

fit_try1 <- lm(workset$weight ~ workset$height)

Just printing the object will retrieve the same output as was shown by the interactive display:

print(fit_try1)
## 
## Call:
## lm(formula = workset$weight ~ workset$height)
## 
## Coefficients:
##    (Intercept)  workset$height  
##         -87.52            3.45

But summary will retrieve detailed output suitable for academic publication:

summary(fit_try1)
## 
## Call:
## lm(formula = workset$weight ~ workset$height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -87.51667    5.93694  -14.74 1.71e-09 ***
## workset$height   3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

The capture command is key

The capture command provides an elegant way of diverting the output of a single function summary to an external file. Note that one would use the less elegant echo() function if more than one command were to be captured.

capture.output(summary(fit_try1),file="fit_try1.txt",append=FALSE,split=FALSE)

Use the readLines function to validate that it happenned.

readLines("fit_try1.txt")
##  [1] ""                                                              
##  [2] "Call:"                                                         
##  [3] "lm(formula = workset$weight ~ workset$height)"                 
##  [4] ""                                                              
##  [5] "Residuals:"                                                    
##  [6] "    Min      1Q  Median      3Q     Max "                      
##  [7] "-1.7333 -1.1333 -0.3833  0.7417  3.1167 "                      
##  [8] ""                                                              
##  [9] "Coefficients:"                                                 
## [10] "                Estimate Std. Error t value Pr(>|t|)    "      
## [11] "(Intercept)    -87.51667    5.93694  -14.74 1.71e-09 ***"      
## [12] "workset$height   3.45000    0.09114   37.85 1.09e-14 ***"      
## [13] "---"                                                           
## [14] "Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1"
## [15] ""                                                              
## [16] "Residual standard error: 1.525 on 13 degrees of freedom"       
## [17] "Multiple R-squared:  0.991,\tAdjusted R-squared:  0.9903 "     
## [18] "F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14"        
## [19] ""

A second regression can be added

Now if the reviewer of the work wanted to see the squared value of height added to the specification, this could be easily accomodated:

workset$height2 <- workset$height * workset$height
fit_try2 <- lm(workset$weight ~ workset$height+workset$height2)
capture.output(summary(fit_try2),file="fit_try2.txt",append=FALSE,split=FALSE)

Now note the different results

readLines("fit_try2.txt")
##  [1] ""                                                               
##  [2] "Call:"                                                          
##  [3] "lm(formula = workset$weight ~ workset$height + workset$height2)"
##  [4] ""                                                               
##  [5] "Residuals:"                                                     
##  [6] "     Min       1Q   Median       3Q      Max "                  
##  [7] "-0.50941 -0.29611 -0.00941  0.28615  0.59706 "                  
##  [8] ""                                                               
##  [9] "Coefficients:"                                                  
## [10] "                 Estimate Std. Error t value Pr(>|t|)    "      
## [11] "(Intercept)     261.87818   25.19677  10.393 2.36e-07 ***"      
## [12] "workset$height   -7.34832    0.77769  -9.449 6.58e-07 ***"      
## [13] "workset$height2   0.08306    0.00598  13.891 9.32e-09 ***"      
## [14] "---"                                                            
## [15] "Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1" 
## [16] ""                                                               
## [17] "Residual standard error: 0.3841 on 12 degrees of freedom"       
## [18] "Multiple R-squared:  0.9995,\tAdjusted R-squared:  0.9994 "     
## [19] "F-statistic: 1.139e+04 on 2 and 12 DF,  p-value: < 2.2e-16"     
## [20] ""

Read the Regression Results into a Document

For this example the Latex document processing system will be used. Note only is this is a popular system in academic circles, but it is also a leading contender when developing automated systems in a business context. The system is text markup in nature, so the following examples will be Latex code in which the regressions saved in the two text files above are imported.

The Latex Code

The following text is latex code which imports two text files in a latex document:

\usepackage{listings}
\begin{document}
Notice that this is the first try
\lstset{framesep=8pt}
\lstinputlisting[language=R,caption = First Try, firstline=5,lastline=20]{fit_try1.txt}
\lstset{framesep=8pt}
Then there is the second try.  Note that the listing number is 
automatically updated but that the caption was manually updated:
\lstset{xleftmargin=-2.9cm,xrightmargin=-1.5cm,framesep=8pt}
\lstinputlisting[language=R,caption = First Try, firstline=5,lastline=20]{fit_try1.txt}
\end{document}

The Output

This creates a pdf with regression results updated in an automated fashion. There are a number of things to notice. First several options are set to adjust how the regression output is imported into the document and how it appears. Comparing the two listings for each regression gives an idea of their importance.

First off, the firstline parameter controls whether the first few lines of the output are seen. As well, a negative left margin allows the use of the whitespace. While this was not necessary in this example, at times R output will be so wide as to make this a useful option. Latex Oupt in PDF Format