Introduction
For this assignment, I used the data set “Students’ Academic Performance Data set” from Kaggle. This is an educational data set which is collected from learning management system (LMS) called Kalboard 360. The data is collected using a learning activity tracker tool which is called experience API. The data set contains 480 student records of which 305 are male and 175 are female. There are a total of 16 variables in this data set that are classified into three major categories: (1) Demographic features such as gender and nationality, (2) Academic background features such as educational stage, grade level and section, (3) Behavioral Features such as raised hand in class, viewing resources, answering survey by parents, and school satisfaction.
Research Question and Hypotheses
The purpose of this analysis is to examine how males and females differ in terms of using resources, raising their hands, academic performance, etc. using ggplot extensions. I created visualizations using ggplot extensions to demonstrate the relationship between various variables which I hypothesize affects student success in school. I believe students who visit resources the most often will obtain high grade marks because this enables students to refresh their minds about the course content. Students who are serious and want to perform well in a course will continue to go back to the resources to learn much as possible about the course requirements and assignments. The number of times students visit resources will be positively correlated with high grade marks. I also believe students who raise their hands the most in school will have high grades because the more questions students ask or the more questions they answer during school demonstrate the level of understanding of students in different subjects. I will investigate the strength of the relationship between the number of times students raised their hands in class with the number of times they visit resources. Discussion groups is another interesting variable I explored because participation is an important part of the overall grade. I want to see the relationship between males and females who participate in discussion groups and view announcements often to their overall grades and see if this affects their success. Lastly, I will explore the relationship between participation is discussion groups and raised hands among males and females.
Defining Variables
grades1: grade level of the student (Ordered Categorical Dependent Variable)
The students are classified into three numerical intervals based on their total grade/mark:
Low-Level: interval includes values from 0 to 69,
Middle-Level: interval includes values from 70 to 89,
High-Level: interval includes values from 90-100.
visitedresources: how many times the student visits a course content(numeric:0-100)
raisedhands: how many times the student raises his/her hand in the classroom (numeric:0-100)
discussiongroups: how many times the student participate in discussion groups (numeric:0-100)
gender: whether the student is male of female
Importing and previewing the data set
library(readr)
Student_Academic_Data<-read_csv("C:\\Users\\Sangita Roy\\Desktop\\Student_Academic_Data.csv")
head(Student_Academic_Data)
nrow(Student_Academic_Data)
[1] 480
Cleaning up the data set
library(dplyr)
Student_Data1<- Student_Academic_Data %>%
rename(Nationality=NationalITy,
visitedresources=VisITedResources,
Grades=Class) %>%
mutate(grades1= factor(Grades, ordered=TRUE, levels=c("L", "M", "H")),
gender = as.factor(gender),
StudentAbsenceDays=as.factor(StudentAbsenceDays),
StageID=as.factor(StageID)) %>%
mutate(Relation=as.factor(Relation))
head(Student_Data1)
References
Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods. International Journal of Database Theory and Application, 9(8), 119-136.
Amrieh, E. A., Hamtini, T., & Aljarah, I. (2015, November). Preprocessing and analyzing educational data set using X-API for improving student’s performance. In Applied Electrical Engineering and Computing Technologies (AEECT), 2015 IEEE Jordan Conference on (pp. 1-5). IEEE.
---
title: "Visualizations of Differences in Student Academic Achievement"
author: "Sangita Roy"
date: "April 15, 2018"
output: html_notebook
---

##**Introduction**

For this assignment, I used the data set "Students' Academic Performance Data set" from Kaggle. This is an educational data set which is collected from learning management system (LMS) called Kalboard 360. The data is collected using a learning activity tracker tool which is called experience API. The data set contains 480 student records of which 305 are male and 175 are female. There are a total of 16 variables in this data set that are classified into three major categories: (1) Demographic features such as gender and nationality, (2) Academic background features such as educational stage, grade level and section, (3) Behavioral Features such as raised hand in class, viewing resources, answering survey by parents, and school satisfaction.


##**Research Question and Hypotheses**
The purpose of this analysis is to examine how males and females differ in terms of using resources, raising their hands, academic performance, etc. using ggplot extensions. I created visualizations using ggplot extensions to demonstrate the relationship between various variables which I hypothesize affects student success in school. I believe students who visit resources the most often will obtain high grade marks because this enables students to refresh their minds about the course content. Students who are serious and want to perform well in a course will continue to go back to the resources to learn much as possible about the course requirements and assignments. The number of times students visit resources will be positively correlated with high grade marks. I also believe students who raise their hands the most in school will have high grades because the more questions students ask or the more questions they answer during school demonstrate the level of understanding of students in different subjects. I will investigate the strength of the relationship between the number of times students raised their hands in class with the number of times they visit resources. Discussion groups is another interesting variable I explored because participation is an important part of the overall grade. I want to see the relationship between males and females who participate in discussion groups and view announcements often to their overall grades and see if this affects their success. Lastly, I will explore the relationship between participation is discussion groups and raised hands among males and females. 


##Defining Variables

**grades1**: grade level of the student (Ordered Categorical Dependent Variable) 

The students are classified into three numerical intervals based on their total grade/mark:

Low-Level: interval includes values from 0 to 69,

Middle-Level: interval includes values from 70 to 89,

High-Level: interval includes values from 90-100.


**visitedresources**: how many times the student visits a course 
content(numeric:0-100)

**raisedhands**: how many times the student raises his/her hand in the classroom (numeric:0-100)

**discussiongroups**: how many times the student participate in discussion groups (numeric:0-100)

**gender**: whether the student is male of female


##Importing and previewing the data set
```{r message=FALSE, warning=FALSE}
library(readr)
Student_Academic_Data<-read_csv("C:\\Users\\Sangita Roy\\Desktop\\Student_Academic_Data.csv")
head(Student_Academic_Data)
nrow(Student_Academic_Data)
```


##Cleaning up the data set
```{r message=FALSE, warning=FALSE}
library(dplyr)
Student_Data1<- Student_Academic_Data %>% 
  rename(Nationality=NationalITy,
  visitedresources=VisITedResources,
  Grades=Class) %>%
  mutate(grades1= factor(Grades, ordered=TRUE, levels=c("L", "M", "H")),
         gender = as.factor(gender),
         StudentAbsenceDays=as.factor(StudentAbsenceDays),
         StageID=as.factor(StageID)) %>%
  mutate(Relation=as.factor(Relation))

  head(Student_Data1)
```

## Viewing the differences in student performance based on the average number of hands raised in class
```{r message=FALSE, warning=FALSE}
library(ggplot2)
ggplot(data=Student_Data1)+geom_col(aes(x=grades1, y=mean(raisedhands), fill=grades1))
```

###Grouping the data by gender and grades
###Summarizing the data by the average number of hands raised, announcements viewed, participation in discussion groups, and visited resources
```{r}
student2<-Student_Data1%>%
  group_by(gender, grades1)%>%
   summarize(N=n(), raisedhands=mean(raisedhands),
            AnnouncementsView=mean(AnnouncementsView), 
            discussion=mean(Discussion),
            resources=mean(visitedresources))
print(student2)
```

###Gender differences in academic performance, number of times hands raised, announcements viewed, participation in discussion groups, and viewed resources
```{r}
studentF<-student2%>%
  filter(gender=="F")%>%
mutate(percent=N/sum(.$N)*100)
print(studentF)

studentM<-student2%>%
  filter(gender=="M")%>%
  mutate(percent=N/sum(.$N)*100)
print(studentM)

```

### Gender differences in student academic performance
```{r fig.height=6, fig.width=8, message=FALSE, warning=FALSE}
library(ggplot2)
library(cowplot)
femalegrades<-ggplot(data=studentF)+geom_col(aes(x=grades1, y=percent, fill=grades1))

malegrades<-ggplot(data=studentM)+geom_col(aes(x=grades1, y=percent, fill=grades1))
plot_grid(femalegrades, malegrades, labels=c("females", "males"))
```


###Gender differences in student performance based on the relationship between the number of raised hands and visited resources 
```{r message=FALSE, warning=FALSE}
library(ggplot2)
library(ggthemes)
ggplot(data=Student_Data1)+
  geom_point(aes(x=raisedhands, y=visitedresources, color=grades1))+
  facet_wrap(~gender)+
  theme_solarized()
  
```

###Gender differences in student performance based on the relationship between the number of announcements viewed and participation in discussion groups 
```{r}
ggplot(data=Student_Data1)+
  geom_point(aes(x=AnnouncementsView, y=Discussion, color=grades1))+
  facet_wrap(~gender)+
  theme_grey()
```

###Gender differences in student performance based on the relationship between the number of raised hands and participation in discussion groups 
```{r}
ggplot(data=Student_Data1)+
  geom_point(aes(x=raisedhands, y=Discussion, color=grades1))+
  facet_wrap(~gender)+
  theme_igray()
```




##**References**
Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining Educational Data to Predict Student's academic Performance using Ensemble Methods. International Journal of Database Theory and Application, 9(8), 119-136.

Amrieh, E. A., Hamtini, T., & Aljarah, I. (2015, November). Preprocessing and analyzing educational data set using X-API for improving student's performance. In Applied Electrical Engineering and Computing Technologies (AEECT), 2015 IEEE Jordan Conference on (pp. 1-5). IEEE.