Anamika A Kumar

Here is my screenshot

Including libraries

library(readxl)
library(car)
## Loading required package: carData
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(ggplot2)
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(ggpubr)
library(ggrepel)
library(carData
        )

Reading in the Excel file

Junkins <- read_excel("C:\\Users\\anami\\OneDrive\\Documents\\Stat ll\\Assignment 1\\Junkins Data.xlsx")

Creation of variables

Junkins$south<- recode(Junkins$region, "3=1; else=0")
tabyl(Junkins$south)
##  Junkins$south  n percent
##              0 34    0.68
##              1 16    0.32
Junkins$northeast<- recode(Junkins$region, "1=1; else=0")
tabyl(Junkins$northeast)
##  Junkins$northeast  n percent
##                  0 41    0.82
##                  1  9    0.18
Junkins$midwest<- recode(Junkins$region, "2=1; else=0")
tabyl(Junkins$midwest)
##  Junkins$midwest  n percent
##                0 38    0.76
##                1 12    0.24
Junkins$west<- recode(Junkins$region, "4=1; else=0")
tabyl(Junkins$west)
##  Junkins$west  n percent
##             0 37    0.74
##             1 13    0.26
Junkins$relconssq<-Junkins$relcons*Junkins$relcons

Junkins$relconscu<-Junkins$relconssq*Junkins$relcons

Junkins$relconsln<-log(Junkins$relcons)

Junkins$relconsrec<-1/(Junkins$relcons)

geographic variation in age at marriage

model <-lm(t_ageFM~northeast + midwest + west, Junkins)
summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ northeast + midwest + west, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.43846 -0.55625  0.06563  0.75677  2.23750 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 27.20625    0.26624 102.188   <2e-16 ***
## northeast    1.03264    0.44373   2.327   0.0244 *  
## midwest      0.00625    0.40669   0.015   0.9878    
## west        -0.31779    0.39765  -0.799   0.4283    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.065 on 46 degrees of freedom
## Multiple R-squared:  0.1657, Adjusted R-squared:  0.1113 
## F-statistic: 3.045 on 3 and 46 DF,  p-value: 0.03805

Interpretation:

The model estimates the average age of first marriage for the different regions of the USA, and the south region is taken as the reference level. The average age at first marriage for the south region is estimated as 27.21 years, which is highly significant (at p>.001 level). For the northeast region, the average age is slightly higher than that of the south region, 28.24 years, which is also significant in the model. However, the average age of first marriage for the mid west and west regions does not show significance in the above model.

model <-lm(t_ageFM~northeast, Junkins)
summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ northeast, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.65732 -0.50271  0.01768  0.74268  2.34268 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  27.1073     0.1642 165.054  < 2e-16 ***
## northeast     1.1316     0.3871   2.923  0.00527 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.052 on 48 degrees of freedom
## Multiple R-squared:  0.1511, Adjusted R-squared:  0.1334 
## F-statistic: 8.545 on 1 and 48 DF,  p-value: 0.005272

Interpretation:

The model estimates the average age of first marriage for northeastern and non-northeastern regions. For the northeast region, the average age of the first marriage is estimated at 28.24 years, while the period is 1.13 years lower for the non-northeast area (27.11 years). The average age of the first marriage for both regions is highly significant in the model. Also, the model explains 13.3% of the variation in the average age of the first marriage.

scatterplot and correlation of DV with IV

plot(Junkins$relcons,Junkins$t_ageFM)

cor.test(Junkins$relcons,Junkins$t_ageFM)
## 
##  Pearson's product-moment correlation
## 
## data:  Junkins$relcons and Junkins$t_ageFM
## t = -4.8748, df = 48, p-value = 1.233e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7359186 -0.3537616
## sample estimates:
##        cor 
## -0.5754459

Interpretation:

The scatterplot shows the negative relationship between religious conservatism and the average age at the first marriage. The average age of first marriage increases with the decline in levels of religious conservatism. Also, the Pearson’s product-moment correlation test outcome suggests a 95% chance that the correlation between religious conservatism and the average age of first marriage is between -.74 and -.35 . Since the estimated correlation is -.58 , there is enough statistical evidence to state that there is a significant negative correlation between the variables.

tests of different specifications of religious concentration

model <-lm(t_ageFM~relcons, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relcons, data = Junkins)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8183 -0.4097  0.0782  0.5650  1.8037 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.27091    0.23707 119.252  < 2e-16 ***
## relcons     -0.04911    0.01007  -4.875 1.23e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9335 on 48 degrees of freedom
## Multiple R-squared:  0.3311, Adjusted R-squared:  0.3172 
## F-statistic: 23.76 on 1 and 48 DF,  p-value: 1.233e-05

Interpretation:

the model estimates the average age of first marriage from the USA states as 28.27 years, which is highly significant at p<0. Also, for every 1% increase in levels of state religious conservatism, the average age of the first marriage declines by .049 years; this is highly significant at p<0. Moreover, the model explains 32% of the variation in average age at first marriage.

model <-lm(t_ageFM~relcons + relconssq, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relcons + relconssq, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.81884 -0.41111  0.08564  0.56738  1.80413 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.824e+01  3.623e-01  77.952   <2e-16 ***
## relcons     -4.607e-02  2.878e-02  -1.601    0.116    
## relconssq   -5.175e-05  4.593e-04  -0.113    0.911    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9432 on 47 degrees of freedom
## Multiple R-squared:  0.3313, Adjusted R-squared:  0.3029 
## F-statistic: 11.64 on 2 and 47 DF,  p-value: 7.81e-05
model <-lm(t_ageFM~relcons + relconssq + relconscu, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relcons + relconssq + relconscu, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.74980 -0.53189  0.02847  0.54631  1.86677 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.859e+01  5.415e-01  52.791   <2e-16 ***
## relcons     -1.017e-01  7.065e-02  -1.439    0.157    
## relconssq    2.049e-03  2.481e-03   0.826    0.413    
## relconscu   -2.026e-05  2.351e-05  -0.862    0.393    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9458 on 46 degrees of freedom
## Multiple R-squared:  0.3419, Adjusted R-squared:  0.299 
## F-statistic: 7.968 on 3 and 46 DF,  p-value: 0.0002207
model <-lm(t_ageFM~relconsln, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relconsln, data = Junkins)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6441 -0.5908  0.0393  0.6119  1.9690 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  29.6203     0.5419  54.655  < 2e-16 ***
## relconsln    -0.8412     0.1911  -4.402 5.96e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9633 on 48 degrees of freedom
## Multiple R-squared:  0.2876, Adjusted R-squared:  0.2728 
## F-statistic: 19.38 on 1 and 48 DF,  p-value: 5.958e-05
model <-lm(t_ageFM~relconsrec, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ relconsrec, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.44794 -0.49354 -0.09158  0.80604  2.17950 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  26.7445     0.2233 119.789  < 2e-16 ***
## relconsrec    6.6906     2.0013   3.343  0.00161 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.028 on 48 degrees of freedom
## Multiple R-squared:  0.1889, Adjusted R-squared:  0.172 
## F-statistic: 11.18 on 1 and 48 DF,  p-value: 0.001612

Interpretation:

The above models test different specifications of religious concentration with an average age of the first marriage for 50 states of the USA. However, the first model with a linear value of religious concentration has a better fit and specifications than other models.

test of mediation

model <-lm(t_ageFM~northeast + relcons, Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ northeast + relcons, data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.72217 -0.50792 -0.02583  0.54625  1.90289 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.10361    0.30729  91.457  < 2e-16 ***
## northeast    0.34786    0.40488   0.859 0.394607    
## relcons     -0.04375    0.01187  -3.686 0.000589 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.936 on 47 degrees of freedom
## Multiple R-squared:  0.3415, Adjusted R-squared:  0.3135 
## F-statistic: 12.19 on 2 and 47 DF,  p-value: 5.45e-05

Interpretation:

The objective of the test is to determine the influence of religious concentration and the northeast region on the average age at the first marriage. The northeast area variable appears statistically insignificant in the model. In contrast, levels of religious concentration appear statistically significant. But in the previous models, the northeast variable is statistically significant; this implies that the significance is because of the % of religious concentration.

model <-lm(t_ageFM~northeast + relcons + (northeast*relcons), Junkins)

summary(model)
## 
## Call:
## lm(formula = t_ageFM ~ northeast + relcons + (northeast * relcons), 
##     data = Junkins)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.72617 -0.50502 -0.02841  0.57561  1.89866 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       28.11320    0.30988  90.724  < 2e-16 ***
## northeast         -0.23321    1.06750  -0.218 0.828031    
## relcons           -0.04417    0.01197  -3.689 0.000594 ***
## northeast:relcons  0.11804    0.20041   0.589 0.558754    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9426 on 46 degrees of freedom
## Multiple R-squared:  0.3464, Adjusted R-squared:  0.3038 
## F-statistic: 8.127 on 3 and 46 DF,  p-value: 0.0001898

Interpretation:

In the model, the interaction term (northeast*relcons) is not statistically significant. So, only religious concentraTion is significant for the average age at the first marriage in the model.The northeast region appears insignificant, while the religious concentration is statistically significant at p<.001.

creating a nice summary table

Model.1 <- lm(t_ageFM~northeast, Junkins)
Model.2 <- lm (t_ageFM~northeast + relcons, Junkins)
stargazer(Model.1, Model.2,type="text", 
column.labels = c("Model 1", "Model 2"), 
intercept.bottom = FALSE, 
single.row=FALSE,     
notes.append = FALSE, 
header=FALSE) 
## 
## ================================================================
##                                 Dependent variable:             
##                     --------------------------------------------
##                                       t_ageFM                   
##                            Model 1               Model 2        
##                              (1)                   (2)          
## ----------------------------------------------------------------
## Constant                  27.107***             28.104***       
##                            (0.164)               (0.307)        
##                                                                 
## northeast                 1.132***                0.348         
##                            (0.387)               (0.405)        
##                                                                 
## relcons                                         -0.044***       
##                                                  (0.012)        
##                                                                 
## ----------------------------------------------------------------
## Observations                 50                     50          
## R2                          0.151                 0.341         
## Adjusted R2                 0.133                 0.313         
## Residual Std. Error    1.052 (df = 48)       0.936 (df = 47)    
## F Statistic         8.545*** (df = 1; 48) 12.186*** (df = 2; 47)
## ================================================================
## Note:                                *p<0.1; **p<0.05; ***p<0.01

Interpretation:

Model 1: The average age at the first marriage for the northeast region is 28.24 years, while this age is 27.11 years for its regional counterpart. Both the values are statistically significant at p<.01. Model 2: In this model, the average age at the first marriage in the northeast part is not statistically significant. At the same time, the levels of religious concentration and the average age at the first marriage for the non-northeast region are statistically significant at p<.01. Model 2 has a better fit and specifications than model 1. It explains 31.3% of the variation in the dependent variable, whereas model 1 explains only 13.3% of the variation in the average age at the first marriage. Also, model 2 has lower standard errors than model 1. ————————————————————————————————————-