Here is my screenshot
library(readxl)
library(car)
## Loading required package: carData
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggplot2)
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(ggpubr)
library(ggrepel)
Junkins <- read_excel("/Users/briansurratt/Library/CloudStorage/OneDrive-UniversityofTexasatSanAntonio/DEM 7283 Stats II/Homework 1/Junkins Data.xlsx")
Junkins$south<- recode(Junkins$region, "3=1; else=0")
tabyl(Junkins$south)
## Junkins$south n percent
## 0 34 0.68
## 1 16 0.32
Junkins$northeast<- recode(Junkins$region, "1=1; else=0")
tabyl(Junkins$northeast)
## Junkins$northeast n percent
## 0 41 0.82
## 1 9 0.18
Junkins$midwest<- recode(Junkins$region, "2=1; else=0")
tabyl(Junkins$midwest)
## Junkins$midwest n percent
## 0 38 0.76
## 1 12 0.24
Junkins$west<- recode(Junkins$region, "4=1; else=0")
tabyl(Junkins$west)
## Junkins$west n percent
## 0 37 0.74
## 1 13 0.26
Junkins$relconssq<-Junkins$relcons*Junkins$relcons
Junkins$relconsln<-log(Junkins$relcons)
Junkins$relconsrec<-1/(Junkins$relcons)
model <-lm(t_ageFM~northeast + midwest + west, Junkins)
summary(model)
##
## Call:
## lm(formula = t_ageFM ~ northeast + midwest + west, data = Junkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.43846 -0.55625 0.06563 0.75677 2.23750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.20625 0.26624 102.188 <2e-16 ***
## northeast 1.03264 0.44373 2.327 0.0244 *
## midwest 0.00625 0.40669 0.015 0.9878
## west -0.31779 0.39765 -0.799 0.4283
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.065 on 46 degrees of freedom
## Multiple R-squared: 0.1657, Adjusted R-squared: 0.1113
## F-statistic: 3.045 on 3 and 46 DF, p-value: 0.03805
Interpretation: The independent variables are the regions of residence in the United States, specifically northeast, midwest, and west. A multiple linear regression model is conducted in r using age at first marriage as the dependent variable. The correlation between the northeast region and age at first marriage is statistically significant (p<0.05). Living in the northeast (relative to other regions) shows an increase in age at first marriage by 1.03 years.
model <-lm(t_ageFM~northeast, Junkins)
summary(model)
##
## Call:
## lm(formula = t_ageFM ~ northeast, data = Junkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.65732 -0.50271 0.01768 0.74268 2.34268
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.1073 0.1642 165.054 < 2e-16 ***
## northeast 1.1316 0.3871 2.923 0.00527 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.052 on 48 degrees of freedom
## Multiple R-squared: 0.1511, Adjusted R-squared: 0.1334
## F-statistic: 8.545 on 1 and 48 DF, p-value: 0.005272
Interpretation: The independent variable is residence in the northeast region of the United States. A simple linear regression model is conducted in r using age at first marriage as the dependent variable. The correlation between the northeast region and age at first marriage is statistically significant (p<0.001). The adjusted R-squared shows the proportion of the variance that is explained by residing in the northeast is 13.34%.
Questions:
Why are the coefficient and p value different for northeast between the multiple versus simple regression? When interpreting an IV, which coefficeint (simple regression or multiple regression) is the better description of the relationship between an IV and the DV?
What is the difference in the R squared for the simple regression model and the multiple regression model? Which R squared is more meaningful in this case, the one from the simple regression or the multiple regression?
Does the summary table for a simple regression give the correlation between the IV and DV? (I guess not, since this is determined in the next r chunk.)
plot(Junkins$relcons,Junkins$t_ageFM)
cor.test(Junkins$relcons,Junkins$t_ageFM)
##
## Pearson's product-moment correlation
##
## data: Junkins$relcons and Junkins$t_ageFM
## t = -4.8748, df = 48, p-value = 1.233e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7359186 -0.3537616
## sample estimates:
## cor
## -0.5754459
Interpretation: The Pearson’s correlation between percent very religious and age at first marriage is negative and moderate to strong (-0.058). As percent very religious increases, age of first marriage decreases.
model <-lm(t_ageFM~relcons, Junkins)
summary(model)
##
## Call:
## lm(formula = t_ageFM ~ relcons, data = Junkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8183 -0.4097 0.0782 0.5650 1.8037
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.27091 0.23707 119.252 < 2e-16 ***
## relcons -0.04911 0.01007 -4.875 1.23e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9335 on 48 degrees of freedom
## Multiple R-squared: 0.3311, Adjusted R-squared: 0.3172
## F-statistic: 23.76 on 1 and 48 DF, p-value: 1.233e-05
Interpretation: The independent variable is percent very religious and the dependent variable is age at first marriage. A simple linear regression model is conducted in r. The correlation between percent very religious and age at first marriage is statistically significant (p<0.001). The adjusted R-squared shows the proportion of the variance that is explained by percent very religious is 31.72%.
model <-lm(t_ageFM~relcons + relconssq, Junkins)
summary(model)
##
## Call:
## lm(formula = t_ageFM ~ relcons + relconssq, data = Junkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.81884 -0.41111 0.08564 0.56738 1.80413
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.824e+01 3.623e-01 77.952 <2e-16 ***
## relcons -4.607e-02 2.878e-02 -1.601 0.116
## relconssq -5.175e-05 4.593e-04 -0.113 0.911
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9432 on 47 degrees of freedom
## Multiple R-squared: 0.3313, Adjusted R-squared: 0.3029
## F-statistic: 11.64 on 2 and 47 DF, p-value: 7.81e-05
Interpretation: The independent variables are percent very religious and percent very religious squared. The dependent variable is age at first marriage. A multiple linear regression model is conducted in r. In this model, neither correlation is statistically significant.
Questions:
model <-lm(t_ageFM~relconsln, Junkins)
summary(model)
##
## Call:
## lm(formula = t_ageFM ~ relconsln, data = Junkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6441 -0.5908 0.0393 0.6119 1.9690
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.6203 0.5419 54.655 < 2e-16 ***
## relconsln -0.8412 0.1911 -4.402 5.96e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9633 on 48 degrees of freedom
## Multiple R-squared: 0.2876, Adjusted R-squared: 0.2728
## F-statistic: 19.38 on 1 and 48 DF, p-value: 5.958e-05
Interpretation: The independent variable is the logarithm of percent very religious and the dependent variable is age at first marriage. A simple linear regression model is conducted in r. The correlation between the log of percent very religious and age at first marriage is statistically significant (p<0.001). The adjusted R-squared shows the proportion of the variance that is explained by the log of percent very religious is 27.28%.
Questions:
model <-lm(t_ageFM~relconsrec, Junkins)
summary(model)
##
## Call:
## lm(formula = t_ageFM ~ relconsrec, data = Junkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.44794 -0.49354 -0.09158 0.80604 2.17950
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.7445 0.2233 119.789 < 2e-16 ***
## relconsrec 6.6906 2.0013 3.343 0.00161 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.028 on 48 degrees of freedom
## Multiple R-squared: 0.1889, Adjusted R-squared: 0.172
## F-statistic: 11.18 on 1 and 48 DF, p-value: 0.001612
Interpretation: The independent variable is the reciprocal of percent very religious and the dependent variable is age at first marriage. A simple linear regression model is conducted in r. The correlation between the reciprocal of percent very religious and age at first marriage is statistically significant (p<0.01). The adjusted R-squared shows the proportion of the variance that is explained by the reciprocal of percent very religious is 17.2%.
Questions:
model <-lm(t_ageFM~northeast + relcons, Junkins)
summary(model)
##
## Call:
## lm(formula = t_ageFM ~ northeast + relcons, data = Junkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.72217 -0.50792 -0.02583 0.54625 1.90289
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.10361 0.30729 91.457 < 2e-16 ***
## northeast 0.34786 0.40488 0.859 0.394607
## relcons -0.04375 0.01187 -3.686 0.000589 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.936 on 47 degrees of freedom
## Multiple R-squared: 0.3415, Adjusted R-squared: 0.3135
## F-statistic: 12.19 on 2 and 47 DF, p-value: 5.45e-05
Interpretation: The independent variables are residence in the northeast and percent very religious. The dependent variable is age at first marriage. A multiple linear regression model is conducted in r. The correlation between the percent very religious and age at first marriage is statistically significant (p<0.001), but residence in the northeast is not statistically significant. A 1% increase in percent very religous results in a delcine of age of marriage by .044 years.
Questions:
model <-lm(t_ageFM~northeast + relcons + (northeast*relcons), Junkins)
summary(model)
##
## Call:
## lm(formula = t_ageFM ~ northeast + relcons + (northeast * relcons),
## data = Junkins)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.72617 -0.50502 -0.02841 0.57561 1.89866
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.11320 0.30988 90.724 < 2e-16 ***
## northeast -0.23321 1.06750 -0.218 0.828031
## relcons -0.04417 0.01197 -3.689 0.000594 ***
## northeast:relcons 0.11804 0.20041 0.589 0.558754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9426 on 46 degrees of freedom
## Multiple R-squared: 0.3464, Adjusted R-squared: 0.3038
## F-statistic: 8.127 on 3 and 46 DF, p-value: 0.0001898
Interpretation: The independent variables are residence in the northeast, percent very religious, and residence in the northeast multipled by percent very religious. The dependent variable is age at first marriage. A multiple linear regression model is conducted in r. The correlation between the percent very religious and age at first marriage is statistically significant (p<0.001), but residence in the northeast, and the product of the two IVs is not statistically significant. A 1% increase in percent very religious results in a delcine of age of marriage by .044 years.
Questions:
Model.1 <- lm(t_ageFM~northeast + relcons, Junkins)
Model.2 <- lm (t_ageFM~northeast + relcons + (northeast*relcons), Junkins)
# https://ademos.people.uic.edu/Chapter13.html
stargazer(Model.1, Model.2,type="text",
column.labels = c("Main Effects", "Interaction"),
intercept.bottom = FALSE,
single.row=FALSE,
notes.append = FALSE,
header=FALSE)
##
## ================================================================
## Dependent variable:
## --------------------------------------------
## t_ageFM
## Main Effects Interaction
## (1) (2)
## ----------------------------------------------------------------
## Constant 28.104*** 28.113***
## (0.307) (0.310)
##
## northeast 0.348 -0.233
## (0.405) (1.067)
##
## relcons -0.044*** -0.044***
## (0.012) (0.012)
##
## northeast:relcons 0.118
## (0.200)
##
## ----------------------------------------------------------------
## Observations 50 50
## R2 0.341 0.346
## Adjusted R2 0.313 0.304
## Residual Std. Error 0.936 (df = 47) 0.943 (df = 46)
## F Statistic 12.186*** (df = 2; 47) 8.127*** (df = 3; 46)
## ================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01