Read the dataset, answer some quetions, and create a knitted pdf file. Then send it to me.
This data is from a survey of students in two secondary schools (Gabriel Pereira and Mousinho da Silveira). It contains a lot of interesting social, gender and study information about the students. You can see the table of variables and their descirptions below.
| Variable | Description |
|---|---|
| school | student’s school (binary: ‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira) |
| sex | student’s sex (binary: ‘F’ - female or ‘M’ - male) |
| age | student’s age (numeric: from 15 to 22) |
| address | student’s home address type (binary: ‘U’ - urban or ‘R’ - rural) |
| famsize | family size (binary: ‘LE3’ - less or equal to 3 or ‘GT3’ - greater than 3) |
| Pstatus | parent’s cohabitation status (binary: ‘T’ - living together or ‘A’ - apart) |
| Medu | mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) |
| Fedu | father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education) |
| Mjob | mother’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’) |
| Fjob | father’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’) |
| reason | reason to choose this school (nominal: close to ‘home’, school ‘reputation’, ‘course’ preference or ‘other’) |
| guardian | student’s guardian (nominal: ‘mother’, ‘father’ or ‘other’) |
| traveltime | Home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour) |
| studytime | Weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours) |
| failures | number of past class failures (numeric: n if 1<=n<3, else 4) |
| schoolsup | extra educational support (binary: yes or no) |
| famsup | family educational support (binary: yes or no) |
| paid | extra paid classes within the course subject (Math or Portuguese) (binary: yes or no) |
| activities | extra-curricular activities (binary: yes or no) |
| nursery | attended nursery school (binary: yes or no) |
| higher | wants to take higher education (binary: yes or no) |
| internet | Internet access at home (binary: yes or no) |
| romantic | with a romantic relationship (binary: yes or no) |
| famrel | quality of family relationships (numeric: from 1 - very bad to 5 - excellent) |
| freetime | free time after school (numeric: from 1 - very low to 5 - very high) |
| goout | going out with friends (numeric: from 1 - very low to 5 - very high) |
| Dalc | workday alcohol consumption (numeric: from 1 - very low to 5 - very high) |
| Walc | weekend alcohol consumption (numeric: from 1 - very low to 5 - very high) |
| health | current health status (numeric: from 1 - very bad to 5 - very good) |
| absences | number of school absences (numeric: from 0 to 93) |
| G1 | first period grade in math (numeric: from 0 to 20) |
| G2 | second period grade in math (numeric: from 0 to 20) |
| G3 | final grade in math (numeric: from 0 to 20, output target) |
midterm <- read.csv("./students.csv")
dim(midterm)
## [1] 395 33
nrow(midterm)
## [1] 395
str(midterm)
## 'data.frame': 395 obs. of 33 variables:
## $ school : Factor w/ 2 levels "GP","MS": 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 2 2 1 2 2 ...
## $ age : int 18 17 15 15 16 16 16 17 15 15 ...
## $ address : Factor w/ 2 levels "R","U": 2 2 2 2 2 2 2 2 2 2 ...
## $ famsize : Factor w/ 2 levels "GT3","LE3": 1 1 2 1 1 2 2 1 2 1 ...
## $ Pstatus : Factor w/ 2 levels "A","T": 1 2 2 2 2 2 2 1 1 2 ...
## $ Medu : int 4 1 1 4 3 4 2 4 3 3 ...
## $ Fedu : int 4 1 1 2 3 3 2 4 2 4 ...
## $ Mjob : Factor w/ 5 levels "at_home","health",..: 1 1 1 2 3 4 3 3 4 3 ...
## $ Fjob : Factor w/ 5 levels "at_home","health",..: 5 3 3 4 3 3 3 5 3 3 ...
## $ reason : Factor w/ 4 levels "course","home",..: 1 1 3 2 2 4 2 2 2 2 ...
## $ guardian : Factor w/ 3 levels "father","mother",..: 2 1 2 2 1 2 2 2 2 2 ...
## $ traveltime: int 2 1 1 1 1 1 1 2 1 1 ...
## $ studytime : int 2 2 2 3 2 2 2 2 2 2 ...
## $ failures : int 0 0 3 0 0 0 0 0 0 0 ...
## $ schoolsup : Factor w/ 2 levels "no","yes": 2 1 2 1 1 1 1 2 1 1 ...
## $ famsup : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
## $ paid : Factor w/ 2 levels "no","yes": 1 1 2 2 2 2 1 1 2 2 ...
## $ activities: Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 1 2 ...
## $ nursery : Factor w/ 2 levels "no","yes": 2 1 2 2 2 2 2 2 2 2 ...
## $ higher : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ internet : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
## $ romantic : Factor w/ 2 levels "no","yes": 1 1 1 2 1 1 1 1 1 1 ...
## $ famrel : int 4 5 4 3 4 5 4 4 4 5 ...
## $ freetime : int 3 3 3 2 3 4 4 1 2 5 ...
## $ goout : int 4 3 2 2 2 2 4 4 2 1 ...
## $ Dalc : int 1 1 2 1 1 1 1 1 1 1 ...
## $ Walc : int 1 1 3 1 2 2 1 1 1 1 ...
## $ health : int 3 3 3 5 5 5 3 1 1 5 ...
## $ absences : int 6 4 10 2 4 10 0 6 0 0 ...
## $ G1 : int 5 5 7 15 6 15 12 6 16 14 ...
## $ G2 : int 6 5 8 14 10 15 12 5 18 15 ...
## $ G3 : int 6 6 10 15 10 15 11 6 19 15 ...
table(midterm$sex)
##
## F M
## 208 187
by(midterm$age, midterm$sex, mean)
## midterm$sex: F
## [1] 16.73077
## ------------------------------------------------------------
## midterm$sex: M
## [1] 16.65775
by(midterm$age, midterm$school, mean)
## midterm$school: GP
## [1] 16.52149
## ------------------------------------------------------------
## midterm$school: MS
## [1] 18.02174
prop.table(table(midterm$famsize, midterm$school))
##
## GP MS
## GT3 0.63797468 0.07341772
## LE3 0.24556962 0.04303797
prop.table(table(midterm$school, midterm$Pstatus))
##
## A T
## GP 0.096202532 0.787341772
## MS 0.007594937 0.108860759
by(midterm$absences, midterm$school, mean)
## midterm$school: GP
## [1] 5.965616
## ------------------------------------------------------------
## midterm$school: MS
## [1] 3.76087
by(data = midterm$Walc, midterm$internet, mean)
## midterm$internet: no
## [1] 2.257576
## ------------------------------------------------------------
## midterm$internet: yes
## [1] 2.297872
by(data = midterm$Dalc, midterm$internet, mean)
## midterm$internet: no
## [1] 1.409091
## ------------------------------------------------------------
## midterm$internet: yes
## [1] 1.495441
by(midterm$studytime, midterm$internet, mean)
## midterm$internet: no
## [1] 1.924242
## ------------------------------------------------------------
## midterm$internet: yes
## [1] 2.057751
by(midterm$G3, midterm$romantic, mean)
## midterm$romantic: no
## [1] 10.8365
## ------------------------------------------------------------
## midterm$romantic: yes
## [1] 9.575758
sd(midterm$G1)
## [1] 3.319195
sd(midterm$G2)
## [1] 3.761505
summary(midterm$G3)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 8.00 11.00 10.42 14.00 20.00
by(midterm$G3, midterm$failures, mean)
## midterm$failures: 0
## [1] 11.25321
## ------------------------------------------------------------
## midterm$failures: 1
## [1] 8.12
## ------------------------------------------------------------
## midterm$failures: 2
## [1] 6.235294
## ------------------------------------------------------------
## midterm$failures: 3
## [1] 5.6875
by(midterm$Medu, midterm$address, mean)
## midterm$address: R
## [1] 2.465909
## ------------------------------------------------------------
## midterm$address: U
## [1] 2.830619
by(midterm$Fedu, midterm$address, mean)
## midterm$address: R
## [1] 2.375
## ------------------------------------------------------------
## midterm$address: U
## [1] 2.563518
by(midterm$famrel, midterm$Pstatus, mean)
## midterm$Pstatus: A
## [1] 3.878049
## ------------------------------------------------------------
## midterm$Pstatus: T
## [1] 3.951977
mean(midterm$age)
## [1] 16.6962
sd(midterm$age)
## [1] 1.276043
sd(midterm$age)/(sqrt(nrow(midterm)))
## [1] 0.06420468
CI: 16.6962 + 1.96 * 0.06420468 16.6962 - 1.96 * 0.06420468