This is my initial attempt at looking at the Portugese Student Data.
#Read in data:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
setwd("D:/UTS/36103 - Statistical Thinking for Data Science/36103 - Assessment 2/Part B")
port = read.csv("student-por.csv", header = TRUE, sep = ",")
str(port)
## 'data.frame': 649 obs. of 33 variables:
## $ school : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
## $ age : int 18 17 15 15 16 16 16 17 15 15 ...
## $ address : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
## $ famsize : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
## $ Pstatus : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
## $ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
## $ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
## $ Mjob : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
## $ Fjob : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
## $ reason : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
## $ guardian : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
## $ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
## $ studytime : int 2 2 2 3 2 2 2 2 2 2 ...
## $ failures : int 0 0 0 0 0 0 0 0 0 0 ...
## $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
## $ famsup : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
## $ fatherd : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
## $ nursery : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
## $ higher : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ internet : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
## $ romantic : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
## $ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
## $ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
## $ goout : int 4 3 2 2 2 2 4 4 2 1 ...
## $ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
## $ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
## $ health : int 3 3 3 5 5 5 3 1 1 5 ...
## $ absences : int 4 2 6 0 0 6 0 2 0 0 ...
## $ G1 : int 0 9 12 14 11 12 13 10 15 12 ...
## $ G2 : int 11 11 13 14 13 12 12 13 16 12 ...
## $ G3 : int 11 11 12 14 13 13 13 13 17 13 ...
summary(port)
## school sex age address famsize Pstatus
## GP:423 F:383 Min. :15.00 R:197 GT3:457 A: 80
## MS:226 M:266 1st Qu.:16.00 U:452 LE3:192 T:569
## Median :17.00
## Mean :16.74
## 3rd Qu.:18.00
## Max. :22.00
## Medu Fedu Mjob Fjob
## Min. :0.000 Min. :0.000 at_home :135 at_home : 42
## 1st Qu.:2.000 1st Qu.:1.000 health : 48 health : 23
## Median :2.000 Median :2.000 other :258 other :367
## Mean :2.515 Mean :2.307 services:136 services:181
## 3rd Qu.:4.000 3rd Qu.:3.000 teacher : 72 teacher : 36
## Max. :4.000 Max. :4.000
## reason guardian traveltime studytime
## course :285 father:153 Min. :1.000 Min. :1.000
## home :149 mother:455 1st Qu.:1.000 1st Qu.:1.000
## other : 72 other : 41 Median :1.000 Median :2.000
## reputation:143 Mean :1.569 Mean :1.931
## 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :4.000 Max. :4.000
## failures schoolsup famsup fatherd activities nursery
## Min. :0.0000 no :581 no :251 no :610 no :334 no :128
## 1st Qu.:0.0000 yes: 68 yes:398 yes: 39 yes:315 yes:521
## Median :0.0000
## Mean :0.2219
## 3rd Qu.:0.0000
## Max. :3.0000
## higher internet romantic famrel freetime
## no : 69 no :151 no :410 Min. :1.000 Min. :1.00
## yes:580 yes:498 yes:239 1st Qu.:4.000 1st Qu.:3.00
## Median :4.000 Median :3.00
## Mean :3.931 Mean :3.18
## 3rd Qu.:5.000 3rd Qu.:4.00
## Max. :5.000 Max. :5.00
## goout Dalc Walc health
## Min. :1.000 Min. :1.000 Min. :1.00 Min. :1.000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.00 1st Qu.:2.000
## Median :3.000 Median :1.000 Median :2.00 Median :4.000
## Mean :3.185 Mean :1.502 Mean :2.28 Mean :3.536
## 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:3.00 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.00 Max. :5.000
## absences G1 G2 G3
## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.:10.0 1st Qu.:10.00 1st Qu.:10.00
## Median : 2.000 Median :11.0 Median :11.00 Median :12.00
## Mean : 3.659 Mean :11.4 Mean :11.57 Mean :11.91
## 3rd Qu.: 6.000 3rd Qu.:13.0 3rd Qu.:13.00 3rd Qu.:14.00
## Max. :32.000 Max. :19.0 Max. :19.00 Max. :19.00
head(port)
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob
## 1 GP F 18 U GT3 A 4 4 at_home teacher
## 2 GP F 17 U GT3 T 1 1 at_home other
## 3 GP F 15 U LE3 T 1 1 at_home other
## 4 GP F 15 U GT3 T 4 2 health services
## 5 GP F 16 U GT3 T 3 3 other other
## 6 GP M 16 U LE3 T 4 3 services other
## reason guardian traveltime studytime failures schoolsup famsup
## 1 course mother 2 2 0 yes no
## 2 course father 1 2 0 no yes
## 3 other mother 1 2 0 yes no
## 4 home mother 1 3 0 no yes
## 5 home father 1 2 0 no yes
## 6 reputation mother 1 2 0 no yes
## fatherd activities nursery higher internet romantic famrel freetime
## 1 no no yes yes no no 4 3
## 2 no no no yes yes no 5 3
## 3 no no yes yes yes no 4 3
## 4 no yes yes yes yes yes 3 2
## 5 no no yes yes no no 4 3
## 6 no yes yes yes yes no 5 4
## goout Dalc Walc health absences G1 G2 G3
## 1 4 1 1 3 4 0 11 11
## 2 3 1 1 3 2 9 11 11
## 3 2 2 3 3 6 12 13 12
## 4 2 1 1 5 0 14 14 14
## 5 2 1 2 5 0 11 13 13
## 6 2 1 2 5 6 12 12 13
#Clean dataset for r
#Medu and Fedu are both factor variables according to the data dictionary:
port$Medu = as.factor(port$Medu)
port$Fedu = as.factor(port$Fedu)
#Travel time, study time are both factors:
port$traveltime = as.factor(port$traveltime)
port$studytime = as.factor(port$studytime)
#famrel, freetime, goout, dalc, walc, health are all factors as well.
port$famrel = as.factor(port$famrel)
port$freetime = as.factor(port$freetime)
port$goout = as.factor(port$goout)
port$Dalc = as.factor(port$Dalc)
port$Walc = as.factor(port$Walc)
port$health = as.factor(port$health)
str(port)
## 'data.frame': 649 obs. of 33 variables:
## $ school : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
## $ age : int 18 17 15 15 16 16 16 17 15 15 ...
## $ address : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
## $ famsize : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
## $ Pstatus : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
## $ Medu : Factor w/ 5 levels "0","1","2","3",..: 5 2 2 5 4 5 3 5 4 4 ...
## $ Fedu : Factor w/ 5 levels "0","1","2","3",..: 5 2 2 3 4 4 3 5 3 5 ...
## $ Mjob : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
## $ Fjob : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
## $ reason : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
## $ guardian : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
## $ traveltime: Factor w/ 4 levels "1","2","3","4": 2 1 1 1 1 1 1 2 1 1 ...
## $ studytime : Factor w/ 4 levels "1","2","3","4": 2 2 2 3 2 2 2 2 2 2 ...
## $ failures : int 0 0 0 0 0 0 0 0 0 0 ...
## $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
## $ famsup : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
## $ fatherd : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
## $ nursery : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
## $ higher : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ internet : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
## $ romantic : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
## $ famrel : Factor w/ 5 levels "1","2","3","4",..: 4 5 4 3 4 5 4 4 4 5 ...
## $ freetime : Factor w/ 5 levels "1","2","3","4",..: 3 3 3 2 3 4 4 1 2 5 ...
## $ goout : Factor w/ 5 levels "1","2","3","4",..: 4 3 2 2 2 2 4 4 2 1 ...
## $ Dalc : Factor w/ 5 levels "1","2","3","4",..: 1 1 2 1 1 1 1 1 1 1 ...
## $ Walc : Factor w/ 5 levels "1","2","3","4",..: 1 1 3 1 2 2 1 1 1 1 ...
## $ health : Factor w/ 5 levels "1","2","3","4",..: 3 3 3 5 5 5 3 1 1 5 ...
## $ absences : int 4 2 6 0 0 6 0 2 0 0 ...
## $ G1 : int 0 9 12 14 11 12 13 10 15 12 ...
## $ G2 : int 11 11 13 14 13 12 12 13 16 12 ...
## $ G3 : int 11 11 12 14 13 13 13 13 17 13 ...
Examine correlations between variables:
library(corrplot)
## Warning: package 'corrplot' was built under R version 3.1.3
#G3 vs G2
qplot(data = port, x = G3, y = G2)
#G3 vs G1
qplot(data = port, x = G3, y = G1)
#G3 vs studytime
qplot(data = port, x = G3, y = studytime)
ggplot(data = port, aes(x = studytime, y = G3)) + geom_boxplot()
#g3 vs schoolsup
ggplot(data = port, aes(x = schoolsup, y = G3)) + geom_boxplot()
#G3 vs Fedu and Medu
ggplot(data = port, aes(x = Fedu, y = G3)) + geom_boxplot()
ggplot(data = port, aes(x = Medu, y = G3)) + geom_boxplot()
#G3 vs School
ggplot(data = port, aes(x = school, y = G3)) + geom_boxplot()
#G3 vs Sex
ggplot(data = port, aes(x = sex, y = G3)) + geom_boxplot()
#G3 vs age
ggplot(data = port, aes(x = age, y = G3)) + geom_point() + geom_smooth()
## Warning in loop_apply(n, do.ply): pseudoinverse used at 17
## Warning in loop_apply(n, do.ply): neighborhood radius 1
## Warning in loop_apply(n, do.ply): reciprocal condition number 0
## Warning in loop_apply(n, do.ply): pseudoinverse used at 17
## Warning in loop_apply(n, do.ply): neighborhood radius 1
## Warning in loop_apply(n, do.ply): reciprocal condition number 0
ggplot(data = port, aes(x = as.factor(age), y = G3)) + geom_boxplot()
A simple linear model:
model = lm(G3 ~ studytime + sex + school + failures, data = port)
model2 = lm(G3 ~ ., data = port)
model3 = lm(G3 ~ G2 + G1, data = port)
#all but G2 and G1
model4 = lm(G3 ~ . - G2 - G1, data = port)
summary(model)
##
## Call:
## lm(formula = G3 ~ studytime + sex + school + failures, data = port)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5978 -1.5978 -0.0682 1.6735 7.9607
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.4702 0.2680 46.525 < 2e-16 ***
## studytime2 0.6930 0.2597 2.668 0.00781 **
## studytime3 1.4922 0.3593 4.153 3.72e-05 ***
## studytime4 1.2731 0.5191 2.453 0.01445 *
## sexM -0.5585 0.2342 -2.385 0.01736 *
## schoolMS -1.5653 0.2368 -6.609 8.11e-11 ***
## failures -1.8367 0.1893 -9.704 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.809 on 642 degrees of freedom
## Multiple R-squared: 0.251, Adjusted R-squared: 0.244
## F-statistic: 35.85 on 6 and 642 DF, p-value: < 2.2e-16
summary(model2)
##
## Call:
## lm(formula = G3 ~ ., data = port)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.5265 -0.5037 0.0514 0.5846 5.1000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.854560 1.250375 0.683 0.494600
## schoolMS -0.185519 0.134305 -1.381 0.167711
## sexM -0.153675 0.121364 -1.266 0.205938
## age 0.020864 0.050095 0.416 0.677213
## addressU 0.119170 0.126645 0.941 0.347108
## famsizeLE3 0.040244 0.118088 0.341 0.733381
## PstatusT -0.005103 0.168697 -0.030 0.975880
## Medu1 -0.067572 0.554680 -0.122 0.903083
## Medu2 -0.109891 0.557499 -0.197 0.843808
## Medu3 -0.120981 0.565243 -0.214 0.830596
## Medu4 -0.278399 0.582215 -0.478 0.632708
## Fedu1 -0.407540 0.516947 -0.788 0.430809
## Fedu2 -0.292705 0.521354 -0.561 0.574720
## Fedu3 -0.290418 0.531760 -0.546 0.585177
## Fedu4 -0.260922 0.547096 -0.477 0.633597
## Mjobhealth 0.220224 0.264257 0.833 0.404978
## Mjobother -0.086282 0.146571 -0.589 0.556311
## Mjobservices 0.178138 0.180333 0.988 0.323648
## Mjobteacher 0.193490 0.253749 0.763 0.446057
## Fjobhealth -0.465222 0.365295 -1.274 0.203333
## Fjobother -0.318400 0.222592 -1.430 0.153138
## Fjobservices -0.457871 0.233865 -1.958 0.050729 .
## Fjobteacher -0.478839 0.335551 -1.427 0.154113
## reasonhome -0.113850 0.136652 -0.833 0.405109
## reasonother -0.353286 0.175400 -2.014 0.044454 *
## reasonreputation -0.173352 0.143267 -1.210 0.226776
## guardianmother -0.033753 0.128565 -0.263 0.792999
## guardianother 0.278752 0.256589 1.086 0.277764
## traveltime2 0.056036 0.119714 0.468 0.639902
## traveltime3 0.256481 0.206166 1.244 0.213984
## traveltime4 0.799538 0.351333 2.276 0.023227 *
## studytime2 0.173826 0.128056 1.357 0.175175
## studytime3 0.149322 0.177163 0.843 0.399660
## studytime4 0.056590 0.252628 0.224 0.822832
## failures -0.264892 0.102630 -2.581 0.010095 *
## schoolsupyes -0.193687 0.178631 -1.084 0.278688
## famsupyes 0.072967 0.110509 0.660 0.509335
## fatherdyes -0.160310 0.222389 -0.721 0.471291
## activitiesyes 0.021380 0.107987 0.198 0.843124
## nurseryyes -0.081564 0.131832 -0.619 0.536362
## higheryes 0.131153 0.189172 0.693 0.488397
## internetyes 0.095779 0.133404 0.718 0.473072
## romanticyes -0.024472 0.111866 -0.219 0.826912
## famrel2 0.010282 0.380464 0.027 0.978450
## famrel3 0.046733 0.321044 0.146 0.884315
## famrel4 0.035458 0.302463 0.117 0.906718
## famrel5 -0.072927 0.308297 -0.237 0.813092
## freetime2 0.131331 0.235913 0.557 0.577952
## freetime3 -0.032387 0.216783 -0.149 0.881291
## freetime4 -0.015481 0.230980 -0.067 0.946588
## freetime5 -0.127955 0.266493 -0.480 0.631307
## goout2 0.086207 0.223291 0.386 0.699583
## goout3 0.135375 0.217983 0.621 0.534822
## goout4 0.174081 0.232015 0.750 0.453379
## goout5 -0.134331 0.245601 -0.547 0.584626
## Dalc2 -0.176773 0.155852 -1.134 0.257167
## Dalc3 0.238108 0.249063 0.956 0.339463
## Dalc4 -0.950798 0.348980 -2.725 0.006634 **
## Dalc5 0.125173 0.407571 0.307 0.758864
## Walc2 0.031406 0.144265 0.218 0.827743
## Walc3 -0.128354 0.164042 -0.782 0.434270
## Walc4 0.019873 0.205035 0.097 0.922819
## Walc5 -0.010430 0.307608 -0.034 0.972963
## health2 -0.098271 0.204852 -0.480 0.631610
## health3 -0.139902 0.184673 -0.758 0.449019
## health4 -0.098856 0.190099 -0.520 0.603246
## health5 -0.249850 0.169188 -1.477 0.140284
## absences 0.012684 0.011975 1.059 0.289920
## G1 0.131879 0.038462 3.429 0.000649 ***
## G2 0.860129 0.036163 23.785 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.254 on 579 degrees of freedom
## Multiple R-squared: 0.8653, Adjusted R-squared: 0.8492
## F-statistic: 53.9 on 69 and 579 DF, p-value: < 2.2e-16
summary(model3)
##
## Call:
## lm(formula = G3 ~ G2 + G1, data = port)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5408 -0.4380 -0.0942 0.6296 5.7109
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.17128 0.21510 -0.796 0.426
## G2 0.89714 0.03392 26.448 <2e-16 ***
## G1 0.14890 0.03600 4.136 4e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.262 on 646 degrees of freedom
## Multiple R-squared: 0.8478, Adjusted R-squared: 0.8473
## F-statistic: 1799 on 2 and 646 DF, p-value: < 2.2e-16
summary(model4)
##
## Call:
## lm(formula = G3 ~ . - G2 - G1, data = port)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.2964 -1.3756 0.1337 1.6140 7.4313
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.363722 2.569709 2.866 0.004313 **
## schoolMS -1.209373 0.275001 -4.398 1.30e-05 ***
## sexM -0.699385 0.251905 -2.776 0.005674 **
## age 0.179789 0.103634 1.735 0.083299 .
## addressU 0.428819 0.264192 1.623 0.105103
## famsizeLE3 0.319969 0.246284 1.299 0.194394
## PstatusT 0.260558 0.352268 0.740 0.459806
## Medu1 0.080943 1.158579 0.070 0.944326
## Medu2 -0.073680 1.164607 -0.063 0.949577
## Medu3 0.172797 1.180822 0.146 0.883706
## Medu4 0.420433 1.215789 0.346 0.729610
## Fedu1 -0.905772 1.078986 -0.839 0.401553
## Fedu2 -0.519260 1.089048 -0.477 0.633683
## Fedu3 -0.617996 1.110448 -0.557 0.578064
## Fedu4 -0.246567 1.142449 -0.216 0.829202
## Mjobhealth 0.596969 0.551885 1.082 0.279839
## Mjobother -0.003492 0.306011 -0.011 0.990899
## Mjobservices 0.480087 0.376099 1.276 0.202293
## Mjobteacher 0.139466 0.529621 0.263 0.792388
## Fjobhealth -0.661926 0.763202 -0.867 0.386135
## Fjobother 0.035329 0.464560 0.076 0.939407
## Fjobservices -0.432429 0.488351 -0.885 0.376260
## Fjobteacher 0.653145 0.698759 0.935 0.350320
## reasonhome -0.096305 0.285113 -0.338 0.735652
## reasonother -0.538288 0.366164 -1.470 0.142083
## reasonreputation 0.103345 0.299057 0.346 0.729791
## guardianmother -0.372298 0.267928 -1.390 0.165199
## guardianother 0.158394 0.536027 0.295 0.767720
## traveltime2 0.150974 0.249925 0.604 0.546029
## traveltime3 0.527474 0.430539 1.225 0.221016
## traveltime4 -0.542620 0.730965 -0.742 0.458185
## studytime2 0.371038 0.267005 1.390 0.165174
## studytime3 0.965242 0.368079 2.622 0.008961 **
## studytime4 0.952045 0.523856 1.817 0.069674 .
## failures -1.470348 0.206756 -7.112 3.39e-12 ***
## schoolsupyes -1.147799 0.368400 -3.116 0.001926 **
## famsupyes -0.116344 0.230734 -0.504 0.614287
## fatherdyes -0.343111 0.463672 -0.740 0.459607
## activitiesyes 0.229514 0.225314 1.019 0.308798
## nurseryyes -0.138799 0.275319 -0.504 0.614356
## higheryes 1.621485 0.388541 4.173 3.46e-05 ***
## internetyes 0.197505 0.278668 0.709 0.478768
## romanticyes -0.368523 0.232836 -1.583 0.114020
## famrel2 0.362155 0.792338 0.457 0.647791
## famrel3 0.715030 0.670082 1.067 0.286380
## famrel4 1.215949 0.629529 1.932 0.053905 .
## famrel5 0.798496 0.642861 1.242 0.214702
## freetime2 0.575861 0.492363 1.170 0.242647
## freetime3 -0.183998 0.452889 -0.406 0.684690
## freetime4 -0.017855 0.482573 -0.037 0.970498
## freetime5 -0.066388 0.556748 -0.119 0.905124
## goout2 1.535577 0.461489 3.327 0.000932 ***
## goout3 1.130435 0.452953 2.496 0.012847 *
## goout4 0.924381 0.483488 1.912 0.056380 .
## goout5 0.531734 0.512206 1.038 0.299643
## Dalc2 -0.436419 0.325391 -1.341 0.180374
## Dalc3 0.146139 0.519983 0.281 0.778775
## Dalc4 -2.591048 0.724535 -3.576 0.000378 ***
## Dalc5 -0.416208 0.850094 -0.490 0.624600
## Walc2 -0.117762 0.300903 -0.391 0.695672
## Walc3 -0.208697 0.342708 -0.609 0.542786
## Walc4 -0.334184 0.428009 -0.781 0.435245
## Walc5 0.331736 0.642477 0.516 0.605814
## health2 -0.427290 0.427684 -0.999 0.318172
## health3 -0.840411 0.384427 -2.186 0.029204 *
## health4 -0.359019 0.396713 -0.905 0.365848
## health5 -1.057482 0.351130 -3.012 0.002711 **
## absences -0.031042 0.024876 -1.248 0.212589
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.621 on 581 degrees of freedom
## Multiple R-squared: 0.4098, Adjusted R-squared: 0.3417
## F-statistic: 6.02 on 67 and 581 DF, p-value: < 2.2e-16
A model including all terms leads to G2 and G3 basically subsuming all of the other explantory power of the models.
Model 4 indicates that the variables studytime, higher, failures, schoolsup, school and sex, goout, dalc and health may be explanatory.
model5 = lm(G3 ~ studytime + higher + failures + schoolsup + school + sex + goout + Dalc + health, data = port)
summary(model5)
##
## Call:
## lm(formula = G3 ~ studytime + higher + failures + schoolsup +
## school + sex + goout + Dalc + health, data = port)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.0229 -1.5593 -0.0456 1.5956 8.2900
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.89443 0.60382 18.042 < 2e-16 ***
## studytime2 0.41754 0.25426 1.642 0.101056
## studytime3 1.08549 0.35003 3.101 0.002014 **
## studytime4 1.04300 0.49893 2.090 0.036978 *
## higheryes 1.86448 0.37113 5.024 6.61e-07 ***
## failures -1.48500 0.18792 -7.902 1.23e-14 ***
## schoolsupyes -1.34503 0.34968 -3.846 0.000132 ***
## schoolMS -1.56371 0.22891 -6.831 1.99e-11 ***
## sexM -0.45171 0.23355 -1.934 0.053547 .
## goout2 1.28523 0.44833 2.867 0.004286 **
## goout3 0.93228 0.43288 2.154 0.031645 *
## goout4 0.88889 0.45538 1.952 0.051384 .
## goout5 0.40716 0.47055 0.865 0.387212
## Dalc2 -0.63029 0.28190 -2.236 0.025712 *
## Dalc3 -0.04713 0.44807 -0.105 0.916259
## Dalc4 -2.63277 0.66734 -3.945 8.87e-05 ***
## Dalc5 -0.44183 0.69016 -0.640 0.522286
## health2 -0.13815 0.41490 -0.333 0.739274
## health3 -0.73109 0.37450 -1.952 0.051364 .
## health4 -0.15474 0.38581 -0.401 0.688505
## health5 -0.85973 0.33514 -2.565 0.010541 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.66 on 628 degrees of freedom
## Multiple R-squared: 0.343, Adjusted R-squared: 0.3221
## F-statistic: 16.4 on 20 and 628 DF, p-value: < 2.2e-16