library(tidyverse)
library(Zelig)
library(texreg)
library(mvtnorm)
library(radiant.data)
library(sjmisc)
library(lattice)
library(readr)
student <- read_csv("/Users/cruz/Desktop/students.csv", col_names = TRUE)
Parsed with column specification:
cols(
.default = col_character(),
age = col_integer(),
Medu = col_integer(),
Fedu = col_integer(),
traveltime = col_integer(),
studytime = col_integer(),
failures = col_integer(),
famrel = col_integer(),
freetime = col_integer(),
goout = col_integer(),
Dalc = col_integer(),
Walc = col_integer(),
health = col_integer(),
absences = col_integer(),
G1 = col_integer(),
G2 = col_integer(),
G3 = col_integer()
)
See spec(...) for full column specifications.
head(student)
The dependent variable chosen in this analysis explains some of the underlying reason for “absences” in this particular secondary school. The independent variables chosen are age, sex, and Walc(Weekend Student Alcohol Consumption).
lm0 <- lm(absences ~ age + sex + Walc, data = student)
summary(lm0)
Call:
lm(formula = absences ~ age + sex + Walc, data = student)
Residuals:
Min 1Q Median 3Q Max
-9.372 -4.618 -1.837 2.405 68.418
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -11.8414 5.1921 -2.281 0.02311 *
age 0.9730 0.3113 3.126 0.00190 **
sexM -1.6428 0.8205 -2.002 0.04595 *
Walc 0.9087 0.3206 2.835 0.00482 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.814 on 391 degrees of freedom
Multiple R-squared: 0.05399, Adjusted R-squared: 0.04673
F-statistic: 7.438 on 3 and 391 DF, p-value: 0.0000743
As observed in the following model, the impact age has on absences is statistically significant. For every year “age” increase, absences go up by (.992). The data also displays that among sexes, males have (0.91) fewer absences than females in this particular school, it is important to note that this was not statistically significant. The independent variable “Walc” (weekend alcohol consumption) displays that as weekend alcohol consumption rating increased, absences increased by (1.107). Lastly, when the interaction term was introduced (sex*Walc) the data displayed that males who engaged in weekend alcohol consumption were (-0.325) less likely than females who engaged in weekend alcohol consumption to be absent but it is important to note that this interaction was not statistically significant.
lm1 <- lm(absences ~ age + sex*Walc, data = student)
summary(lm1)
Call:
lm(formula = absences ~ age + sex * Walc, data = student)
Residuals:
Min 1Q Median 3Q Max
-9.853 -4.481 -1.762 2.412 68.583
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -12.5606 5.3985 -2.327 0.0205 *
age 0.9928 0.3142 3.160 0.0017 **
sexM -0.9157 1.6899 -0.542 0.5882
Walc 1.1071 0.5151 2.149 0.0322 *
sexM:Walc -0.3251 0.6604 -0.492 0.6227
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.821 on 390 degrees of freedom
Multiple R-squared: 0.05458, Adjusted R-squared: 0.04488
F-statistic: 5.628 on 4 and 390 DF, p-value: 0.0002057
Verifying regression results
library(magrittr)
library(dplyr)
library(sjmisc)
abstudent <- student%>%
select(absences, sex, Walc)%>%
group_by(sex, Walc)%>%
summarise(mean = mean(absences))
head(abstudent)
Screenreg used just to see output of models in R.
library(texreg)
screenreg(list(lm0, lm1))
=================================
Model 1 Model 2
---------------------------------
(Intercept) -11.84 * -12.56 *
(5.19) (5.40)
age 0.97 ** 0.99 **
(0.31) (0.31)
sexM -1.64 * -0.92
(0.82) (1.69)
Walc 0.91 ** 1.11 *
(0.32) (0.52)
sexM:Walc -0.33
(0.66)
---------------------------------
R^2 0.05 0.05
Adj. R^2 0.05 0.04
Num. obs. 395 395
RMSE 7.81 7.82
=================================
*** p < 0.001, ** p < 0.01, * p < 0.05
library(texreg)
htmlreg(list(lm0, lm1))
Model 1 | Model 2 | ||
---|---|---|---|
(Intercept) | -11.84* | -12.56* | |
(5.19) | (5.40) | ||
age | 0.97** | 0.99** | |
(0.31) | (0.31) | ||
sexM | -1.64* | -0.92 | |
(0.82) | (1.69) | ||
Walc | 0.91** | 1.11* | |
(0.32) | (0.52) | ||
sexM:Walc | -0.33 | ||
(0.66) | |||
R2 | 0.05 | 0.05 | |
Adj. R2 | 0.05 | 0.04 | |
Num. obs. | 395 | 395 | |
RMSE | 7.81 | 7.82 | |
p < 0.001, p < 0.01, p < 0.05 |
I went and plotted some of the variables being used to visually understand some of the relationships occuring in this analysis and also to verify visually that the output was correct.
student <- student%>%
mutate(sex = as.factor(sex))
library(visreg)
package ‘visreg’ was built under R version 3.4.1
abstudent2 <- lm(absences ~ age + sex + Walc, data=student)
visreg(abstudent2)
As a very visually driven person the purpose of the extra plots is to simply help me visually understand the data and variables I chose for this analysis.
ggplot(student)+
geom_smooth(aes(x = age, y = Walc), color= "cyan", fill = "blue") + geom_smooth(aes(x = age, y = Dalc), color= "aqua marine1", fill = "black")
library(ggplot2)
ggplot(student)+
geom_smooth(aes(x = absences, y = Walc), color= "cyan", fill = "blue") + geom_smooth(aes(x = absences, y = Dalc), color= "Aqua Marine1", fill = "black")
plot(absences ~ Walc, data = student)
plot(absences ~ age, data = student)
plot(absences ~ sex*Walc, data = student)
studentlm <- lm(absences ~ Walc, data = student)
library(visreg)
visreg(studentlm)
ggplot(student)+
geom_smooth(aes(x = absences, y = sex), color= "cyan", fill = "blue")