DEM 7283 - Example 1 - Introduction to R and review of Stat 1

Welcome to R. R is pretty similar to stata. R and stata are both interpreted languages, not a compiled one. This means, you type something into R and it does it. There is no data step. There are no procs. The SAS and R book is very useful for going between the two programs.

R uses libraries to do different types of analysis, so we will need to install lots of different libraries to do different things. These need to be downloaded from the internet, using the install.packages() command. You only need to install a package once. E.g.

install.packages("lme4") will install the lme4 library. To use the functions within it, type

library(lme4)

Now you have access to those fuctions.

Below we will go through a simple R session where we basically review much of the concepts from DEM 7273. We will load a dataset, print some cases, do some descriptice statistics and plots, t tests, and some linear models and diagnostics of those models.

#Load some libraries that I need, you will need to install these first in order to use them
library(lmtest)
library (car)
library(Hmisc)
library(sandwich)
library(multcomp)
library(knitr)
library(lattice)

Read in a Commma separated values file, In this case, I am using the Population Reference Bureau’s Population Data sheet from 2008. Here, I call my R object dat, I could call it pretty much anything I want though.

R creates objects through the assignment operator <-

dat<-read.csv("/media/ozd504/extra/gdrive//classes/dem7283/class17/data/PRB2008_All.csv", header=T)

#print all of the variable names in the data set
names(dat)

##  [1] "Y"                                 "X"                                
##  [3] "ID"                                "Country"                          
##  [5] "Continent"                         "Region"                           
##  [7] "Year"                              "Population."                      
##  [9] "CBR"                               "CDR"                              
## [11] "Rate.of.natural.increase"          "Net.Migration.Rate"               
## [13] "ProjectedPopMid2025"               "ProjectedPopMid2050"              
## [15] "ProjectedPopChange_08_50Perc"      "IMR"                              
## [17] "WomandLifeTimeRiskMaternalDeath"   "TFR"                              
## [19] "PercPopLT15"                       "PercPopGT65"                      
## [21] "e0Total"                           "e0Male"                           
## [23] "e0Female"                          "PercUrban"                        
## [25] "PercPopinUrbanGT750k"              "PercPop1549HIVAIDS2001"           
## [27] "PercPop1549HIVAIDS2007"            "PercMarWomContraALL"              
## [29] "PercMarWomContraModern"            "PercPpUnderNourished0204"         
## [31] "MotorVehper1000Pop0005"            "PercPopwAccessImprovedWaterSource"
## [33] "GNIPPPperCapitaUSDollars"          "PopDensPerSqKM"                   
## [35] "PopDensPerSqMile"

#look at the first 5 cases
head(dat, n=5)

##   Y X  ID     Country Continent             Region Year Population. CBR CDR
## 1 1 1 115 Afghanistan      Asia South Central Asia 2008        32.7  47  21
## 2 2 2 178     Albania    Europe    Southern Europe 2008         3.2  13   6
## 3 3 3   1     Algeria    Africa  NORTHERN AFRICA   2008        34.7  22   4
## 4 4 4 179     Andorra    Europe    Southern Europe 2008         0.1  10   3
## 5 5 5  43      Angola    Africa      MIDDLE AFRICA 2008        16.8  47  21
##   Rate.of.natural.increase Net.Migration.Rate ProjectedPopMid2025
## 1                      2.6                  0                50.3
## 2                      0.7                 -3                 3.5
## 3                      1.8                 -1                43.3
## 4                      0.7                 26                 0.1
## 5                      2.7                  2                26.2
##   ProjectedPopMid2050 ProjectedPopChange_08_50Perc   IMR
## 1                81.9                          150 163.0
## 2                 3.6                           11   8.0
## 3                50.1                           44  27.0
## 4                 0.1                           -4   2.5
## 5                42.7                          155 132.0
##   WomandLifeTimeRiskMaternalDeath TFR PercPopLT15 PercPopGT65 e0Total e0Male
## 1                               8 6.8          45           2      43     43
## 2                             490 1.6          27           8      75     72
## 3                             220 2.3          30           5      72     71
## 4                              NA 1.2          15          12      NA     NA
## 5                              12 6.8          46           2      43     41
##   e0Female PercUrban PercPopinUrbanGT750k PercPop1549HIVAIDS2001
## 1       43        20                   12                     NA
## 2       79        45                   NA                     NA
## 3       74        63                   12                    0.1
## 4       NA        90                   NA                     NA
## 5       44        57                   27                    1.6
##   PercPop1549HIVAIDS2007 PercMarWomContraALL PercMarWomContraModern
## 1                     NA                  10                      9
## 2                     NA                  75                      8
## 3                    0.1                  61                     52
## 4                     NA                  NA                     NA
## 5                    2.1                   6                      5
##   PercPpUnderNourished0204 MotorVehper1000Pop0005
## 1                       NA                      6
## 2                        6                     85
## 3                        4                     91
## 4                       NA                    750
## 5                       35                     NA
##   PercPopwAccessImprovedWaterSource GNIPPPperCapitaUSDollars PopDensPerSqKM
## 1                                22                       NA             50
## 2                                97                     6580            113
## 3                                85                     5490             15
## 4                               100                       NA            182
## 5                                51                     4400             13
##   PopDensPerSqMile
## 1           129.50
## 2           292.67
## 3            38.85
## 4           471.38
## 5            33.67

###Other data formats If you have data from stata version 13, for example you can use

library(readstata13)
dat1<-read.dta13("/media/ozd504/extra/gdrive//classes/dem7283/class17/data/prb2008.dta",convert.factors = F)

#or a stata file from an earlier version:
library(foreign)
dat2<-read.dta("/media/ozd504/extra/gdrive/classes/dem7283/class17/data/prb2008_st12.dta",convert.factors = F)

#Or an Rdata file
load("/media/ozd504/extra/gdrive/classes/dem7283/class17/data/prb2008.Rdata")
#Note the load() command doesn't need an assignment operator.

Let’s have a look at some descriptive information about the data:

#Frequency Table of # of Contries by Continent
table(dat$Continent)

## 
##        Africa          Asia        Europe North America       Oceania 
##            56            51            45            27            17 
## South America 
##            13

#basic summary statistics for the variable TFR or the total fertility rate
summary(dat$TFR)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.775   2.500   3.032   4.000   7.100       1

We see the mean is 3.0322115 and that 1 case is missing. That case is Monaco.

##More Descriptive statistics

#just want a mean
mean(dat$TFR, na.rm=T)

## [1] 3.032212

#what does R give if there's a missing case and I don't specify na.rm=T?
mean(dat$TFR)

## [1] NA

NA means something is missing. NA is used for all kinds of missing data. So in this case, R won’t compute the mean because one case is missing, so it contributes no informaiton to the estimation of the mean.

#standard deviation
sd(dat$TFR, na.rm=T)

## [1] 1.602616

#Quantiles
quantile(dat$TFR, na.rm=T)

##    0%   25%   50%   75%  100% 
## 1.000 1.775 2.500 4.000 7.100

The median is 2.5

#histogram of the infant mortality rate
hist(dat$TFR, main="Histogram of Total Fertility Rate")

#Box plot for TFR* Continent
bwplot(TFR~Continent, dat,main="Boxplot of Total Fertility Rate by continent")

#scatter plot of TFR * IMR, the infant mortality rate
xyplot(TFR~IMR, data=dat, main="Bivariate Association between TFR and IMR")

##T-tests If our outcome is continuous and approximately normal, then we can use a t-test to compare the mean of two groups.

#t-test for Africa vs Rest of the world
#Useing the I() funciton, which generates a T/F value, i.e. 2 groups, Africa or Not Africa
t.test(TFR~I(Continent=="Africa"), dat, var.equal=F)

## 
##  Welch Two Sample t-test
## 
## data:  TFR by I(Continent == "Africa")
## t = -10.995, df = 77.925, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.813823 -1.951027
## sample estimates:
## mean in group FALSE  mean in group TRUE 
##            2.390789            4.773214

Which shows the mean TFR for Africa is 4.77 and the mean of the rest of the world is 2.39. The test is highly significant (big t value, low p value), suggesting the means are not equal.

Let’s illustrate this with a box and whisker plot.

dat$Africa<-recode(dat$Continent, recodes =' "Africa"="Africa"; else="notAfrica"', as.factor = T)
#BE SURE TO SPECIFY YOUR COMPARISON LEVEL!!! OTHERWISE R WILL TAKE THE FIRST ALPHANUMERIC VALUE!!!!
dat$Africa<-relevel(dat$Africa, ref = "notAfrica")
bwplot(TFR~Africa, data=dat)

Which shows the differences very clearly, but also the differences in variability, with Africa being much more variable overall.

###Linear model for a t-test I have thrown away the t-test per se and have moved purely to the linear model for all two and multiple group testing (the multi group case is already the linear model!). For the two group case, the linear model is:

\(TFR_i = \alpha + \beta * Africa_i + e_i\)

Where \(\beta\) tells us how much the mean shifts up or down for countries in Africa, on average. All non African countries have the Africa variable == 0, so thier estimated mean is just \(\alpha\).

#Simple ANOVA model version of a t-test
fit<-lm(TFR~Africa, dat)
summary(fit)

## 
## Call:
## lm(formula = TFR ~ Africa, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0732 -0.8908 -0.1908  0.6474  4.4092 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.39079    0.09778   24.45   <2e-16 ***
## AfricaAfrica  2.38242    0.18845   12.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.206 on 206 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.4369, Adjusted R-squared:  0.4342 
## F-statistic: 159.8 on 1 and 206 DF,  p-value: < 2.2e-16

anova(fit)

## Analysis of Variance Table
## 
## Response: TFR
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Africa      1 232.28 232.277  159.83 < 2.2e-16 ***
## Residuals 206 299.38   1.453                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The (Intercept) Estimate is 2.39, which we have already seen is the mean TFR for non-African countries. The Estimate for Africa1Africa is 2.38, which tells us the mean for Africa is 2.38 TFR points higher than non-African countries, on average. The estimate for this parameter is large, relative to its error and the t-statisitic is also large, suggesting the estimate is good, the p-value for the test is small, giving good evidence that the parameter is not 0, or that the means are not equal.

###ANOVA model for IMR by continent The ANOVA model is also easily fit:

fit<-lm(IMR~Continent, dat)
anova(fit)

## Analysis of Variance Table
## 
## Response: IMR
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Continent   5 139806 27961.1  47.245 < 2.2e-16 ***
## Residuals 201 118960   591.8                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Post hoc tests using Bonferroni comparisons
pairwise.t.test(dat$TFR,dat$Continent,p.adjust.method = "bonf")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  dat$TFR and dat$Continent 
## 
##               Africa  Asia    Europe  North America Oceania
## Asia          < 2e-16 -       -       -             -      
## Europe        < 2e-16 1.6e-06 -       -             -      
## North America < 2e-16 1.0000  0.0355  -             -      
## Oceania       2.0e-05 1.0000  1.8e-06 0.1365        -      
## South America 1.5e-07 1.0000  0.0083  1.0000        1.0000 
## 
## P value adjustment method: bonferroni

###Basic OLS Multiple regression models

#fit the basic regression model
fit1<-lm(TFR~ IMR + log(GNIPPPperCapitaUSDollars)+ log(PopDensPerSqMile), data=dat)
summary(fit1)

## 
## Call:
## lm(formula = TFR ~ IMR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile), 
##     data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.68839 -0.43922 -0.08561  0.43789  1.88767 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    4.270723   0.756144   5.648  6.6e-08 ***
## IMR                            0.033681   0.002691  12.518  < 2e-16 ***
## log(GNIPPPperCapitaUSDollars) -0.242478   0.071980  -3.369 0.000932 ***
## log(PopDensPerSqMile)         -0.063232   0.038083  -1.660 0.098658 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7032 on 172 degrees of freedom
##   (33 observations deleted due to missingness)
## Multiple R-squared:  0.8155, Adjusted R-squared:  0.8123 
## F-statistic: 253.4 on 3 and 172 DF,  p-value: < 2.2e-16

#do some diagnostic plots
plot(fit1)

###Assess the assumptions of the model:

#Normality of residuals from the first fit?
shapiro.test(rstudent(fit1))

## 
##  Shapiro-Wilk normality test
## 
## data:  rstudent(fit1)
## W = 0.9815, p-value = 0.01934

plot(density(rstudent(fit1)))

The test shows non normality, but the density plot looks pretty normal. The Shapiro Wilk test is known to be overly sensitive.

Other assumptions of the model:

#test for heteroskedasticity
bptest(fit1)

## 
##  studentized Breusch-Pagan test
## 
## data:  fit1
## BP = 13.513, df = 3, p-value = 0.003648

We see evidence of heteroskedasticity, which can affect our standard errors for our hypothesis tests. We can use the White correction for heteroskedasticty:

#make White-corrected t-statistics and p values
coeftest(fit1, vcov=vcovHC(fit1, type = "HC0"))

## 
## t test of coefficients:
## 
##                                 Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)                    4.2707229  0.8541396  5.0000 1.404e-06 ***
## IMR                            0.0336814  0.0032803 10.2678 < 2.2e-16 ***
## log(GNIPPPperCapitaUSDollars) -0.2424776  0.0802370 -3.0220  0.002895 ** 
## log(PopDensPerSqMile)         -0.0632321  0.0347513 -1.8196  0.070564 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Which is very similar to that from above assuming constant variances. The std. errors for IMR and GDP are a little higher, but nothing substantively different.

#variance inflation factors
vif(fit1)

##                           IMR log(GNIPPPperCapitaUSDollars) 
##                      3.135891                      3.037229 
##         log(PopDensPerSqMile) 
##                      1.057620

###More complicated linear models:

Here is the ANCOVA model for testing equality of slopes between groups:

#now we fit a model that includes differences between continents
fit2<-lm(TFR~ IMR+ log(GNIPPPperCapitaUSDollars)+ log(PopDensPerSqMile)+ Continent, data=dat)
summary(fit2)

## 
## Call:
## lm(formula = TFR ~ IMR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile) + 
##     Continent, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.75404 -0.36363 -0.07422  0.39485  2.02107 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    3.883497   0.797412   4.870 2.57e-06 ***
## IMR                            0.031252   0.003198   9.772  < 2e-16 ***
## log(GNIPPPperCapitaUSDollars) -0.168481   0.073755  -2.284   0.0236 *  
## log(PopDensPerSqMile)         -0.040229   0.039873  -1.009   0.3145    
## ContinentAsia                 -0.384597   0.182012  -2.113   0.0361 *  
## ContinentEurope               -0.652007   0.224036  -2.910   0.0041 ** 
## ContinentNorth America        -0.306933   0.223039  -1.376   0.1706    
## ContinentOceania               0.288356   0.265690   1.085   0.2793    
## ContinentSouth America        -0.348024   0.257248  -1.353   0.1779    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6811 on 167 degrees of freedom
##   (33 observations deleted due to missingness)
## Multiple R-squared:  0.832,  Adjusted R-squared:  0.8239 
## F-statistic: 103.3 on 8 and 167 DF,  p-value: < 2.2e-16

#We can compare how model 2 performs compared to model 1 by using an F test
anova (fit1, fit2, test="F")

## Analysis of Variance Table
## 
## Model 1: TFR ~ IMR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile)
## Model 2: TFR ~ IMR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile) + 
##     Continent
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    172 85.060                                
## 2    167 77.474  5     7.586 3.2704 0.007631 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Now we fit a model with an interaction term between continent and GDP, in R this is easy, all we have to do is use the * operation
fit3<-lm(TFR~ IMR + log(GNIPPPperCapitaUSDollars)+ log(PopDensPerSqMile) + Continent*log(GNIPPPperCapitaUSDollars), data=dat)
summary(fit3)

## 
## Call:
## lm(formula = TFR ~ IMR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile) + 
##     Continent * log(GNIPPPperCapitaUSDollars), data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.66998 -0.33461 -0.05822  0.30211  2.35124 
## 
## Coefficients:
##                                                       Estimate Std. Error
## (Intercept)                                           7.313792   0.993594
## IMR                                                   0.027531   0.003088
## log(GNIPPPperCapitaUSDollars)                        -0.552977   0.103346
## log(PopDensPerSqMile)                                -0.094902   0.038406
## ContinentAsia                                        -5.001775   1.031222
## ContinentEurope                                      -8.291472   1.711500
## ContinentNorth America                               -1.324479   1.643557
## ContinentOceania                                     -1.150323   1.762321
## ContinentSouth America                               -5.581892   3.431340
## log(GNIPPPperCapitaUSDollars):ContinentAsia           0.566799   0.124059
## log(GNIPPPperCapitaUSDollars):ContinentEurope         0.840081   0.178505
## log(GNIPPPperCapitaUSDollars):ContinentNorth America  0.156030   0.187778
## log(GNIPPPperCapitaUSDollars):ContinentOceania        0.188682   0.211207
## log(GNIPPPperCapitaUSDollars):ContinentSouth America  0.620699   0.388035
##                                                      t value Pr(>|t|)    
## (Intercept)                                            7.361 8.70e-12 ***
## IMR                                                    8.914 9.83e-16 ***
## log(GNIPPPperCapitaUSDollars)                         -5.351 2.95e-07 ***
## log(PopDensPerSqMile)                                 -2.471   0.0145 *  
## ContinentAsia                                         -4.850 2.87e-06 ***
## ContinentEurope                                       -4.845 2.95e-06 ***
## ContinentNorth America                                -0.806   0.4215    
## ContinentOceania                                      -0.653   0.5149    
## ContinentSouth America                                -1.627   0.1057    
## log(GNIPPPperCapitaUSDollars):ContinentAsia            4.569 9.68e-06 ***
## log(GNIPPPperCapitaUSDollars):ContinentEurope          4.706 5.38e-06 ***
## log(GNIPPPperCapitaUSDollars):ContinentNorth America   0.831   0.4072    
## log(GNIPPPperCapitaUSDollars):ContinentOceania         0.893   0.3730    
## log(GNIPPPperCapitaUSDollars):ContinentSouth America   1.600   0.1116    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6292 on 162 degrees of freedom
##   (33 observations deleted due to missingness)
## Multiple R-squared:  0.8609, Adjusted R-squared:  0.8497 
## F-statistic: 77.11 on 13 and 162 DF,  p-value: < 2.2e-16

#We can compare out model 2 with the interaction model to see if there is a significant difference in how GDP operates across continents
anova (fit2, fit3, test="F")

## Analysis of Variance Table
## 
## Model 1: TFR ~ IMR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile) + 
##     Continent
## Model 2: TFR ~ IMR + log(GNIPPPperCapitaUSDollars) + log(PopDensPerSqMile) + 
##     Continent * log(GNIPPPperCapitaUSDollars)
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    167 77.474                                  
## 2    162 64.138  5    13.336 6.7369 9.889e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This test suggests that there is ha significant interaction between GDP and Continent

###Transformations of outcomes Sometimes to coax residuals into normality or constant variance, we can try a transformation, such as a log or square root, here we examine a log transformed outcome, this can be easily done within the lm() function

fit4<-lm(log(TFR) ~ IMR + log(GNIPPPperCapitaUSDollars)+ log(PopDensPerSqMile)+Continent, data=dat)
summary(fit4)

## 
## Call:
## lm(formula = log(TFR) ~ IMR + log(GNIPPPperCapitaUSDollars) + 
##     log(PopDensPerSqMile) + Continent, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50723 -0.14230 -0.02355  0.14074  0.58783 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    1.510110   0.254631   5.931 1.69e-08 ***
## IMR                            0.008227   0.001021   8.056 1.43e-13 ***
## log(GNIPPPperCapitaUSDollars) -0.062776   0.023551  -2.665  0.00844 ** 
## log(PopDensPerSqMile)         -0.032488   0.012732  -2.552  0.01162 *  
## ContinentAsia                 -0.098798   0.058120  -1.700  0.09102 .  
## ContinentEurope               -0.353186   0.071540  -4.937 1.91e-06 ***
## ContinentNorth America        -0.062651   0.071221  -0.880  0.38030    
## ContinentOceania               0.133485   0.084840   1.573  0.11753    
## ContinentSouth America        -0.074871   0.082145  -0.911  0.36337    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2175 on 167 degrees of freedom
##   (33 observations deleted due to missingness)
## Multiple R-squared:  0.829,  Adjusted R-squared:  0.8208 
## F-statistic: 101.2 on 8 and 167 DF,  p-value: < 2.2e-16

#Normality of errors
shapiro.test(rstudent(fit4))

## 
##  Shapiro-Wilk normality test
## 
## data:  rstudent(fit4)
## W = 0.99227, p-value = 0.4714

#Heterosckedasticity test
bptest(fit4)

## 
##  studentized Breusch-Pagan test
## 
## data:  fit4
## BP = 27.183, df = 8, p-value = 0.0006574

We still see the model has problems with heteroskedasticity.

###Output of results

Here we make a nice table of the results between the model using the transformed outcome and that which used the un-transformed outcome.

library(stargazer)

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

stargazer(fit2, fit4, title="Model results from untransformed and log transformed outcome",type="html", align=T, covariate.labels = c("TFR", "lnGDP", "lnDens"))

**Model results from untransformed and log transformed outcome**

	Dependent variable:

	TFR	log(TFR)
	(1)	(2)

TFR	0.031^***	0.008^***
	(0.003)	(0.001)

lnGDP	-0.168^**	-0.063^***
	(0.074)	(0.024)

lnDens	-0.040	-0.032^**
	(0.040)	(0.013)

ContinentAsia	-0.385^**	-0.099^*
	(0.182)	(0.058)

ContinentEurope	-0.652^***	-0.353^***
	(0.224)	(0.072)

ContinentNorth America	-0.307	-0.063
	(0.223)	(0.071)

ContinentOceania	0.288	0.133
	(0.266)	(0.085)

ContinentSouth America	-0.348	-0.075
	(0.257)	(0.082)

Constant	3.883^***	1.510^***
	(0.797)	(0.255)


Observations	176	176
R²	0.832	0.829
Adjusted R²	0.824	0.821
Residual Std. Error (df = 167)	0.681	0.217
F Statistic (df = 8; 167)	103.349^***	101.220^***

Note:	p<0.1; p<0.05; p<0.01

We can also make a table comparing the nested models:

stargazer(fit2, fit3, title="Model results from nested models",type="html", align=T, covariate.labels = c("TFR", "lnGDP", "lnDens"))

**Model results from nested models**

	Dependent variable:

	TFR
	(1)	(2)

TFR	0.031^***	0.028^***
	(0.003)	(0.003)

lnGDP	-0.168^**	-0.553^***
	(0.074)	(0.103)

lnDens	-0.040	-0.095^**
	(0.040)	(0.038)

ContinentAsia	-0.385^**	-5.002^***
	(0.182)	(1.031)

ContinentEurope	-0.652^***	-8.291^***
	(0.224)	(1.712)

ContinentNorth America	-0.307	-1.324
	(0.223)	(1.644)

ContinentOceania	0.288	-1.150
	(0.266)	(1.762)

ContinentSouth America	-0.348	-5.582
	(0.257)	(3.431)

log(GNIPPPperCapitaUSDollars):ContinentAsia		0.567^***
		(0.124)

log(GNIPPPperCapitaUSDollars):ContinentEurope		0.840^***
		(0.179)

log(GNIPPPperCapitaUSDollars):ContinentNorth America		0.156
		(0.188)

log(GNIPPPperCapitaUSDollars):ContinentOceania		0.189
		(0.211)

log(GNIPPPperCapitaUSDollars):ContinentSouth America		0.621
		(0.388)

Constant	3.883^***	7.314^***
	(0.797)	(0.994)


Observations	176	176
R²	0.832	0.861
Adjusted R²	0.824	0.850
Residual Std. Error	0.681 (df = 167)	0.629 (df = 162)
F Statistic	103.349^*** (df = 8; 167)	77.115^*** (df = 13; 162)

Note:	p<0.1; p<0.05; p<0.01

DEM 7283 - Example 1 - Introduction to R and review of Stat 1

Corey Sparks, PhD

January 9, 2017