This analysis will focus on secondary school absences. Is there a relationship between age, sex, and weekend alcohol consumption and student absences? The homework being used is Homework 5.
library(tidyverse)
library(Zelig)
library(texreg)
library(mvtnorm)
library(radiant.data)
library(sjmisc)
library(lattice)
library(texreg)
library(stargazer)
library(ggplot2)
library(ggthemes)
library(plotly)
library(Zelig)
library(devtools)
library(readr)
student <- read_csv("/Users/cruz/Desktop/students.csv", col_names = TRUE)
The dependent variable chosen in this analysis explains some of the underlying reason for “absences” in this particular secondary school. The independent variables chosen are age, sex, and Walc(Weekend Student Alcohol Consumption).
lm0 <- lm(absences ~ age + sex + Walc, data = student)
summary(lm0)
Call:
lm(formula = absences ~ age + sex + Walc, data = student)
Residuals:
Min 1Q Median 3Q Max
-9.372 -4.618 -1.837 2.405 68.418
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -11.8414 5.1921 -2.281 0.02311 *
age 0.9730 0.3113 3.126 0.00190 **
sexM -1.6428 0.8205 -2.002 0.04595 *
Walc 0.9087 0.3206 2.835 0.00482 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.814 on 391 degrees of freedom
Multiple R-squared: 0.05399, Adjusted R-squared: 0.04673
F-statistic: 7.438 on 3 and 391 DF, p-value: 0.0000743
As observed in the following model, the impact age has on absences is statistically significant. For every year “age” increase, absences go up by (.992). The data also displays that among sexes, males have (0.91) fewer absences than females in this particular school, it is important to note that this was not statistically significant. The independent variable “Walc” (weekend alcohol consumption) displays that as weekend alcohol consumption rating increased, absences increased by (1.107). Lastly, when the interaction term was introduced (sex*Walc) the data displayed that males who engaged in weekend alcohol consumption were (-0.325) less likely than females who engaged in weekend alcohol consumption to be absent but it is important to note that this interaction was not statistically significant.
lm1 <- lm(absences ~ age + sex*Walc, data = student)
summary(lm1)
Call:
lm(formula = absences ~ age + sex * Walc, data = student)
Residuals:
Min 1Q Median 3Q Max
-9.853 -4.481 -1.762 2.412 68.583
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -12.5606 5.3985 -2.327 0.0205 *
age 0.9928 0.3142 3.160 0.0017 **
sexM -0.9157 1.6899 -0.542 0.5882
Walc 1.1071 0.5151 2.149 0.0322 *
sexM:Walc -0.3251 0.6604 -0.492 0.6227
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.821 on 390 degrees of freedom
Multiple R-squared: 0.05458, Adjusted R-squared: 0.04488
F-statistic: 5.628 on 4 and 390 DF, p-value: 0.0002057
Verifying regression results
library(magrittr)
library(dplyr)
library(sjmisc)
abstudent <- student%>%
select(absences, sex, Walc)%>%
group_by(sex, Walc)%>%
summarise(mean = mean(absences))
head(abstudent)
stargazer(lm0, lm1, type = "html")
Dependent variable: | ||
absences | ||
(1) | (2) | |
age | 0.973*** | 0.993*** |
(0.311) | (0.314) | |
sexM | -1.643** | -0.916 |
(0.820) | (1.690) | |
Walc | 0.909*** | 1.107** |
(0.321) | (0.515) | |
sexM:Walc | -0.325 | |
(0.660) | ||
Constant | -11.841** | -12.561** |
(5.192) | (5.399) | |
Observations | 395 | 395 |
R2 | 0.054 | 0.055 |
Adjusted R2 | 0.047 | 0.045 |
Residual Std. Error | 7.814 (df = 391) | 7.821 (df = 390) |
F Statistic | 7.438*** (df = 3; 391) | 5.628*** (df = 4; 390) |
Note: | p<0.1; p<0.05; p<0.01 |
Model lm0 seems to be the better fit for this data.
AIC(lm0,lm1)
BIC(lm0,lm1)
I went and plotted some of the variables being used to visually understand some of the relationships occuring in this analysis and also to verify visually that the output was correct.
student <- student%>%
mutate(sex = as.factor(sex))
library(visreg)
abstudent2 <- lm(absences ~ age + sex + Walc, data=student)
visreg(abstudent2)
As a very visually driven person the purpose of the extra plots is to simply help me visually understand the data and variables I chose for this analysis.
In this graph we see that as the age of the students in this secondary school increases, so does the level of weekend alcohol consumption.
ggplot(student)+
geom_smooth(aes(x = age, y = Walc), color= "cyan", fill = "blue") + geom_smooth(aes(x = age, y = Dalc), color= "aqua marine1", fill = "black") + theme_solarized()
ggplotly()
This graph displays that as weekend alcohol levels increase to around moderate range so do absences, then it begins to taper down interestingly.
library(ggplot2)
ggplot(student)+
geom_smooth(aes(x = absences, y = Walc), color= "cyan", fill = "blue") + geom_smooth(aes(x = absences, y = Dalc), color= "Aqua Marine1", fill = "black") + theme_dark() + scale_colour_stata()
ggplotly()
plot(absences ~ Walc, data = student)
plot(absences ~ age, data = student)
plot(absences ~ sex*Walc, data = student)
studentlm <- lm(absences ~ Walc, data = student)
library(visreg)
visreg(studentlm)
In this particular secondary school, females tend to be more absent than males.
ggplot(student)+
geom_smooth(aes(x = absences, y = sex), color= "cyan", fill = "blue") + theme_dark()
ggplotly()
In this secondary school as age increases so does absences.
g2 <- ggplot(student, mapping = aes(x = age, y = absences))
g2 <- g2 + geom_smooth(color = "aqua marine" , fill = "cyan") + theme_dark()
ggplotly(g2)
*Note I choose to utilize both static and interactive graphs solely for the purpose to show skill. My last graph is simply interactive with each individual process/code to display I can make a graph separate as well.
(Gershenson, Jacknowitz, and Brannegan 2017) (Chang 2012) (Maindonald and Braun 2010)
Chang, Winston. 2012. R Graphics Cookbook. Sebastopol, CA: O’Reilly Media, Inc.
Gershenson, Seth, Alison Jacknowitz, and Andrew Brannegan. 2017. “Are Student Absences Worth the Worry in Us Primary Schools?” Education Finance and Policy. MIT Press.
Maindonald, John, and W John Braun. 2010. Data Analysis and Graphics Using R: An Example-Based Approach. Cambridge: Cambridge University Press.