Influences on students language performance

Section 1 - Dataset description

This section describes basic information about dataset.

student.por <- read.csv("~/konstanz/R/Final paper/student-por.csv", sep=";")
student.mat <- read.csv("~/konstanz/R/Final paper/student-mat.csv", sep=";")

1 + 2

I obtained this dataset from this website: http://goo.gl/hm2v4B
Data were collected at two Portuguese schools and it was collected by using school reports and questionnaires. I examine the student performance in subject Portuguese language

3

#The number of rows and columns
nrow(student.por)

## [1] 649

ncol(student.por)

## [1] 33

4

Description of every columns Description is from website http://goo.gl/hm2v4B

#names of column
names(student.por)

##  [1] "school"     "sex"        "age"        "address"    "famsize"   
##  [6] "Pstatus"    "Medu"       "Fedu"       "Mjob"       "Fjob"      
## [11] "reason"     "guardian"   "traveltime" "studytime"  "failures"  
## [16] "schoolsup"  "famsup"     "paid"       "activities" "nursery"   
## [21] "higher"     "internet"   "romantic"   "famrel"     "freetime"  
## [26] "goout"      "Dalc"       "Walc"       "health"     "absences"  
## [31] "G1"         "G2"         "G3"

school - student’s school (binary: ‘GP’ - Gabriel Pereira or ‘MS’ - Mousinho da Silveira)
sex - student’s sex (binary: ‘F’ - female or ‘M’ - male)
age - student’s age (numeric: from 15 to 22)
address - - student’s home address type (binary: ‘U’ - urban or ‘R’ - rural)
famsize - - family size (binary: ‘LE3’ - less or equal to 3 or ‘GT3’ - greater than 3)
Pstatus - - parent’s cohabitation status (binary: ‘T’ - living together or ‘A’ - apart)
Medu - mother’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)
Fedu - - father’s education (numeric: 0 - none, 1 - primary education (4th grade), 2 â€“ 5th to 9th grade, 3 â€“ secondary education or 4 â€“ higher education)
Mjob - - mother’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)
Fjob - - father’s job (nominal: ‘teacher’, ‘health’ care related, civil ‘services’ (e.g. administrative or police), ‘at_home’ or ‘other’)
reason - - reason to choose this school (nominal: close to ‘home’, school ‘reputation’, ‘course’ preference or ‘other’)
guardian - - student’s guardian (nominal: ‘mother’, ‘father’ or ‘other’)
traveltime - - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
studytime - - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
failures - - number of past class failures (numeric: n if 1<=n<3, else 4)
schoolsup - - extra educational support (binary: yes or no)
famsup - - family educational support (binary: yes or no)
paid - - extra paid classes within the course subject (binary: yes or no)
activities - extra-curricular activities (binary: yes or no)
nursery - attended nursery school (binary: yes or no)
higher - wants to take higher education (binary: yes or no)
internet - Internet access at home (binary: yes or no)
romantic - with a romantic relationship (binary: yes or no)
famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
freetime - free time after school (numeric: from 1 - very low to 5 - very high)
goout - going out with friends (numeric: from 1 - very low to 5 - very high)
Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
health - current health status (numeric: from 1 - very bad to 5 - very good)
absences - number of school absences (numeric: from 0 to 93)
G1 - first period grade of Portuguese (numeric: from 0 to 20)
G2 - second period grade of Portuguese(numeric: from 0 to 20)
G3 - final grade of Portuguese (numeric: from 0 to 20, output target)

Section 2 - My questions

How did change the grade from first period to final grade?
How did change the final grade according to age of students?
Is there difference between boys and girls?
Are students in romantic relationship better in Portugese than others?
Is there asociation between absences and final grades?

Section 3 - Analyses

1) How did change the grade from first period to final grade?

First, I want to know mean of grades for each period

#TASK 2
mean(student.por$G1)

## [1] 11.39908

mean(student.por$G2)

## [1] 11.57011

mean(student.por$G3)

## [1] 11.90601

I want to know also median.

#TASK 2
median(student.por$G1)

## [1] 11

median(student.por$G2)

## [1] 11

median(student.por$G3)

## [1] 12

What is standard deviation for each period?

#TASK 2
sd(student.por$G1)

## [1] 2.745265

sd(student.por$G2)

## [1] 2.913639

sd(student.por$G3)

## [1] 3.230656

I see that there are is little difference between grades. But students had better final grades than grades in first period.

I create new collumn with difference between G3 and G1

student.por$dif <- student.por$G3 - student.por$G1

I caluculate mean and standard deviation of this difference

mean(student.por$dif)

## [1] 0.5069337

sd(student.por$dif)

## [1] 1.820756

And now I see the average of difference is 0,5
Is this difference significant?
I use t.test

#TASK 3
test.1 <- t.test(x = student.por$G1,
                 y = student.por$G3, 
                 alternative = "greater",
                 paired = )
test.1

## 
##  Welch Two Sample t-test
## 
## data:  student.por$G1 and student.por$G3
## t = -3.0462, df = 1263.1, p-value = 0.9988
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -0.7808648        Inf
## sample estimates:
## mean of x mean of y 
##  11.39908  11.90601

Results in APA style:
t(1263.1) =-3.0462, p = 0.9988
There is no significant difference between grades in first period and final grades.

I would like to know if is there asociation between first and final grades. I create scatterplot.

plot(x = student.por$G1,
     y = student.por$G3,
     main = "G1 and G3",
     xlab = "Grades 1",
     ylab = "Grades 3",
     col = "red")

It look like correlation, therefore I calculate correlation

#Task 4
test.2 <- cor.test(~ G1 + G3, 
                    data = student.por)
test.2

## 
##  Pearson's product-moment correlation
## 
## data:  G1 and G3
## t = 37.329, df = 647, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.8003265 0.8493311
## sample estimates:
##       cor 
## 0.8263871

Results in APA style:
t(647) = 37.329, p = 2.2e-16
There is positive correlation between first grades and final grades.

2) How does change the final grade according to age of students?

What is a mean of final grade for each age?

#Task 9
with(student.por, aggregate(G3 ~ age, FUN = mean))

##   age       G3
## 1  15 12.10714
## 2  16 11.99435
## 3  17 12.26816
## 4  18 11.77143
## 5  19  9.53125
## 6  20 12.00000
## 7  21 11.00000
## 8  22  5.00000

The worst mean of final grade has age 22.But how many students are old 22?

table(student.por$age)

## 
##  15  16  17  18  19  20  21  22 
## 112 177 179 140  32   6   2   1

Just one student.
I create a plot of distribution

#Task8
boxplot(G3 ~ age,
        data = student.por,
        ylim = c(0, 20),
        ylab = "Final Grade",
        xlab = "Age",
        main = "Final Grade by Age")

The students between 17 - 18 have better grades than students between 15 - 16. Other students have diverse results. It could be because they are in school too long and there is just a little number of them.

3) Is there difference between boys and girls?

First, I calculate average of final grades by boys and girls.

#Task 9
with(student.por, aggregate(G3 ~ sex, FUN = mean))

##   sex       G3
## 1   F 12.25326
## 2   M 11.40602

I create subset for both of sex

grades.boys <- subset(student.por, subset = sex == "M")$G3 
grades.girl <- subset(student.por, subset = sex == "F")$G3

The mean of grade by girls is larger than The mean of grade boys
I create histogram of final grades for girls

#Task 7
hist(x = grades.girl,
     xlab = "Final Grades by girls",
     main = "Histogram of final grades by girls")
abline(v = mean(grades.girl),
       col = rgb(0, 0, 1, alpha = 1),
       lty = 2)
       
abline(v = median(grades.girl),
       col = rgb(1, 0, 0, alpha = 1),
       lty = 1)

legend("bottomright",
legend = c("Mean", "Median"),
col = c("blue", "red"),
pch = c(16, 16)
)

And histogram for boys

#Task 7
hist(x = grades.boys,
     xlab = "Final Grades by boys",
     main = "Histogram of final grades by boys")
abline(v = mean(grades.boys),
       col = rgb(0, 0, 1, alpha = 1),
       lty = 2)
       
abline(v = median(grades.boys),
       col = rgb(1, 0, 0, alpha = 1),
       lty = 1)

legend("bottomright",
legend = c("Mean", "Median"),
col = c("blue", "red"),
pch = c(16, 16)
)

Is the difference between means significant? I calculate t.test

#Task 3
test.4 <- t.test(x = grades.boys,
                 y = grades.girl, 
                 alternative = "greater")
test.4

## 
##  Welch Two Sample t-test
## 
## data:  grades.boys and grades.girl
## t = -3.2747, df = 547.44, p-value = 0.9994
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -1.273535       Inf
## sample estimates:
## mean of x mean of y 
##  11.40602  12.25326

Results in APA style:
t(547.44) =-3.2747, p = 0.001125
Girls have significant larger mean than boys.

4) Are students in romantic relationship better in Portugese than others?

What is the mean of final grade for students in romantic relationship and without?

with(student.por, aggregate(G3 ~ romantic, FUN = mean))

##   romantic       G3
## 1       no 12.12927
## 2      yes 11.52301

Students without romantic relationship have greater grades
I think that the value “yes” and “no” are too long and I recode them

#TASK1
student.por$romantic <- as.character(student.por$romantic) #first i need change vector on character
student.por$romantic[student.por$romantic == "yes"] <- "y" 
student.por$romantic[student.por$romantic == "no"] <- "n"

I create subset for both of group

grades.with <- subset(student.por, subset = romantic == "y")$G3 
grades.without <- subset(student.por, subset = romantic == "n")$G3

Is the difference significant?

#TASK 3
test.5 <- t.test(x = grades.without,
                 y = grades.with, 
                 alternative = "greater")
test.5

## 
##  Welch Two Sample t-test
## 
## data:  grades.without and grades.with
## t = 2.2129, df = 433.04, p-value = 0.01371
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  0.1546587       Inf
## sample estimates:
## mean of x mean of y 
##  12.12927  11.52301

Results in APA style:
t(433.04) =2.2129, p = 0.01371
Students without romantic relationshop have significant greater final grades.

5) Is there asociation between absences and final grades?

First, I know to see the means for each number of absences

with(student.por, aggregate(G3 ~ absences, FUN = mean))

##    absences        G3
## 1         0 12.040984
## 2         1 12.416667
## 3         2 12.190909
## 4         3 10.428571
## 5         4 12.010753
## 6         5 11.750000
## 7         6 12.122449
## 8         7 13.000000
## 9         8 11.619048
## 10        9  9.714286
## 11       10 12.238095
## 12       11 11.200000
## 13       12 10.083333
## 14       13 14.000000
## 15       14 10.375000
## 16       15 11.000000
## 17       16 10.300000
## 18       18 12.333333
## 19       21 11.500000
## 20       22  8.000000
## 21       24  9.000000
## 22       26  8.000000
## 23       30 16.000000
## 24       32 14.000000

table(student.por$absences)

## 
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  18 
## 244  12 110   7  93  12  49   3  42   7  21   5  12   1   8   2  10   3 
##  21  22  24  26  30  32 
##   2   2   1   1   1   1

Is there regression between number of absences and final grades? i create a plot with regressino line and use summary function.

#TASK 5
absences.lm <- lm(G3 ~ absences, data=student.por)
with(student.por, plot(G3, absences))
abline(absences.lm, col="blue")

summary(absences.lm)

## 
## Call:
## lm(formula = G3 ~ absences, data = student.por)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.1388  -1.8207  -0.1388   1.9884   7.1157 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 12.13880    0.16099  75.399   <2e-16 ***
## absences    -0.06361    0.02725  -2.334   0.0199 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.22 on 647 degrees of freedom
## Multiple R-squared:  0.00835,    Adjusted R-squared:  0.006817 
## F-statistic: 5.448 on 1 and 647 DF,  p-value: 0.0199

Results in APA style:
t(647) = 5.448, p = 0.0199 There is linear regression.

Section 4 - Conclusion

I examined what does influence the student performance of the Portuguese language. I found that there is not any significant shift between grades from fisrt perido and final frades. But there is correlation between grades form fisrt period and final grades.There is not association between final grades and student age. This could be caused by little number of students older than 18 age. Moreover these students are at school longer than is common and they have reason why they stay at high school. I found that the girls have greater final grades than boys and that the students without romantic relationship have also greater grades than students in romantic relationship. I examined asociation between abesnces and final grades and I found that there is regression.
There is many influences on students language performance and this ares should be more examined.

Final Paper

Julie Melicharová

února 2016