Brian Surratt

Here is my screenshot

Including libraries

library(readxl)
library(car)
## Loading required package: carData
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(ggplot2)
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(ggpubr)
library(ggrepel)

Reading in the Excel file

Junkins <- read_excel("/Users/briansurratt/Library/CloudStorage/OneDrive-UniversityofTexasatSanAntonio/DEM 7283 Stats II/Homework 1/Junkins Data.xlsx")

Creation of variables

Junkins$south<- recode(Junkins$region, "3=1; else=0")
tabyl(Junkins$south)
##  Junkins$south  n percent
##              0 34    0.68
##              1 16    0.32
Junkins$northeast<- recode(Junkins$region, "1=1; else=0")
tabyl(Junkins$northeast)
##  Junkins$northeast  n percent
##                  0 41    0.82
##                  1  9    0.18
Junkins$midwest<- recode(Junkins$region, "2=1; else=0")
tabyl(Junkins$midwest)
##  Junkins$midwest  n percent
##                0 38    0.76
##                1 12    0.24
Junkins$west<- recode(Junkins$region, "4=1; else=0")
tabyl(Junkins$west)
##  Junkins$west  n percent
##             0 37    0.74
##             1 13    0.26
Junkins$relconssq<-Junkins$relcons*Junkins$relcons

Junkins$relconsln<-log(Junkins$relcons)

Junkins$relconsrec<-1/(Junkins$relcons)

geographic variation in age at marriage

model <-lm(t_ageFM~northeast + midwest + west, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ northeast + midwest + west, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.43846 -0.55625  0.06563  0.75677  2.23750 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 27.20625    0.26624 102.188   <2e-16 ***
## northeast    1.03264    0.44373   2.327   0.0244 *  
## midwest      0.00625    0.40669   0.015   0.9878    
## west        -0.31779    0.39765  -0.799   0.4283    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.065 on 46 degrees of freedom
## Multiple R-squared:  0.1657, Adjusted R-squared:  0.1113 
## F-statistic: 3.045 on 3 and 46 DF,  p-value: 0.03805

Interpretation: The independent variables are the regions of residence in the United States, specifically northeast, midwest, and west. A multiple linear regression model is conducted in r using age at first marriage as the dependent variable. The correlation between the northeast region and age at first marriage is statistically significant (p<0.05). Living in the northeast (relative to other regions) shows an increase in age at first marriage by 1.03 years.

model <-lm(t_ageFM~northeast, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ northeast, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.65732 -0.50271  0.01768  0.74268  2.34268 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  27.1073     0.1642 165.054  < 2e-16 ***
## northeast     1.1316     0.3871   2.923  0.00527 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.052 on 48 degrees of freedom
## Multiple R-squared:  0.1511, Adjusted R-squared:  0.1334 
## F-statistic: 8.545 on 1 and 48 DF,  p-value: 0.005272

Interpretation: The independent variable is residence in the northeast region of the United States. A simple linear regression model is conducted in r using age at first marriage as the dependent variable. The correlation between the northeast region and age at first marriage is statistically significant (p<0.001). The adjusted R-squared shows the proportion of the variance that is explained by residing in the northeast is 13.34%.

Questions:

scatterplot and correlation of DV with IV

plot(Junkins$relcons,Junkins$t_ageFM)

cor.test(Junkins$relcons,Junkins$t_ageFM)
## 
##  Pearson's product-moment correlation
## 
## data:  Junkins$relcons and Junkins$t_ageFM
## t = -4.8748, df = 48, p-value = 1.233e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7359186 -0.3537616
## sample estimates:
##        cor 
## -0.5754459

Interpretation: The Pearson’s correlation between percent very religious and age at first marriage is negative and moderate to strong (-0.058). As percent very religious increases, age of first marriage decreases.

tests of different specifications of religious concentration

model <-lm(t_ageFM~relcons, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relcons, data = Junkins)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8183 -0.4097  0.0782  0.5650  1.8037 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.27091    0.23707 119.252  < 2e-16 ***
## relcons     -0.04911    0.01007  -4.875 1.23e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9335 on 48 degrees of freedom
## Multiple R-squared:  0.3311, Adjusted R-squared:  0.3172 
## F-statistic: 23.76 on 1 and 48 DF,  p-value: 1.233e-05

Interpretation: The independent variable is percent very religious and the dependent variable is age at first marriage. A simple linear regression model is conducted in r. The correlation between percent very religious and age at first marriage is statistically significant (p<0.001). The adjusted R-squared shows the proportion of the variance that is explained by percent very religious is 31.72%.

model <-lm(t_ageFM~relcons + relconssq, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relcons + relconssq, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.81884 -0.41111  0.08564  0.56738  1.80413 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.824e+01  3.623e-01  77.952   <2e-16 ***
## relcons     -4.607e-02  2.878e-02  -1.601    0.116    
## relconssq   -5.175e-05  4.593e-04  -0.113    0.911    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9432 on 47 degrees of freedom
## Multiple R-squared:  0.3313, Adjusted R-squared:  0.3029 
## F-statistic: 11.64 on 2 and 47 DF,  p-value: 7.81e-05

Interpretation: The independent variables are percent very religious and percent very religious squared. The dependent variable is age at first marriage. A multiple linear regression model is conducted in r. In this model, neither correlation is statistically significant.

Questions:

model <-lm(t_ageFM~relconsln, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relconsln, data = Junkins)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6441 -0.5908  0.0393  0.6119  1.9690 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  29.6203     0.5419  54.655  < 2e-16 ***
## relconsln    -0.8412     0.1911  -4.402 5.96e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9633 on 48 degrees of freedom
## Multiple R-squared:  0.2876, Adjusted R-squared:  0.2728 
## F-statistic: 19.38 on 1 and 48 DF,  p-value: 5.958e-05

Interpretation: The independent variable is the logarithm of percent very religious and the dependent variable is age at first marriage. A simple linear regression model is conducted in r. The correlation between the log of percent very religious and age at first marriage is statistically significant (p<0.001). The adjusted R-squared shows the proportion of the variance that is explained by the log of percent very religious is 27.28%.

Questions:

model <-lm(t_ageFM~relconsrec, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relconsrec, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.44794 -0.49354 -0.09158  0.80604  2.17950 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  26.7445     0.2233 119.789  < 2e-16 ***
## relconsrec    6.6906     2.0013   3.343  0.00161 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.028 on 48 degrees of freedom
## Multiple R-squared:  0.1889, Adjusted R-squared:  0.172 
## F-statistic: 11.18 on 1 and 48 DF,  p-value: 0.001612

Interpretation: The independent variable is the reciprocal of percent very religious and the dependent variable is age at first marriage. A simple linear regression model is conducted in r. The correlation between the reciprocal of percent very religious and age at first marriage is statistically significant (p<0.01). The adjusted R-squared shows the proportion of the variance that is explained by the reciprocal of percent very religious is 17.2%.

Questions:

test of mediation

model <-lm(t_ageFM~northeast + relcons, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ northeast + relcons, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.72217 -0.50792 -0.02583  0.54625  1.90289 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.10361    0.30729  91.457  < 2e-16 ***
## northeast    0.34786    0.40488   0.859 0.394607    
## relcons     -0.04375    0.01187  -3.686 0.000589 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.936 on 47 degrees of freedom
## Multiple R-squared:  0.3415, Adjusted R-squared:  0.3135 
## F-statistic: 12.19 on 2 and 47 DF,  p-value: 5.45e-05

Interpretation: The independent variables are residence in the northeast and percent very religious. The dependent variable is age at first marriage. A multiple linear regression model is conducted in r. The correlation between the percent very religious and age at first marriage is statistically significant (p<0.001), but residence in the northeast is not statistically significant. A 1% increase in percent very religous results in a delcine of age of marriage by .044 years.

Questions:

test of moderation

model <-lm(t_ageFM~northeast + relcons + (northeast*relcons), Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ northeast + relcons + (northeast * relcons), 
##     data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.72617 -0.50502 -0.02841  0.57561  1.89866 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       28.11320    0.30988  90.724  < 2e-16 ***
## northeast         -0.23321    1.06750  -0.218 0.828031    
## relcons           -0.04417    0.01197  -3.689 0.000594 ***
## northeast:relcons  0.11804    0.20041   0.589 0.558754    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9426 on 46 degrees of freedom
## Multiple R-squared:  0.3464, Adjusted R-squared:  0.3038 
## F-statistic: 8.127 on 3 and 46 DF,  p-value: 0.0001898

Interpretation: The independent variables are residence in the northeast, percent very religious, and residence in the northeast multipled by percent very religious. The dependent variable is age at first marriage. A multiple linear regression model is conducted in r. The correlation between the percent very religious and age at first marriage is statistically significant (p<0.001), but residence in the northeast, and the product of the two IVs is not statistically significant. A 1% increase in percent very religious results in a delcine of age of marriage by .044 years.

Questions:

creating a nice summary table

Model.1 <- lm(t_ageFM~northeast + relcons, Junkins)
Model.2 <- lm (t_ageFM~northeast + relcons + (northeast*relcons), Junkins)


# https://ademos.people.uic.edu/Chapter13.html

stargazer(Model.1, Model.2,type="text", 
column.labels = c("Main Effects", "Interaction"), 
intercept.bottom = FALSE, 
single.row=FALSE,     
notes.append = FALSE, 
header=FALSE) 
## 
## ================================================================
##                                 Dependent variable:             
##                     --------------------------------------------
##                                       t_ageFM                   
##                          Main Effects           Interaction     
##                              (1)                    (2)         
## ----------------------------------------------------------------
## Constant                  28.104***              28.113***      
##                            (0.307)                (0.310)       
##                                                                 
## northeast                   0.348                 -0.233        
##                            (0.405)                (1.067)       
##                                                                 
## relcons                   -0.044***              -0.044***      
##                            (0.012)                (0.012)       
##                                                                 
## northeast:relcons                                  0.118        
##                                                   (0.200)       
##                                                                 
## ----------------------------------------------------------------
## Observations                  50                    50          
## R2                          0.341                  0.346        
## Adjusted R2                 0.313                  0.304        
## Residual Std. Error    0.936 (df = 47)        0.943 (df = 46)   
## F Statistic         12.186*** (df = 2; 47) 8.127*** (df = 3; 46)
## ================================================================
## Note:                                *p<0.1; **p<0.05; ***p<0.01