1. Introduction of Data Set

The purpose of this research is to Evaluate the Factors that May Affect Students’ Academic Performance.

1.1 Basic Information

  • Data Set Name: Students’ Academic Performance Dataset (xAPI-Edu-Data)
  • Number of Instances: 480
  • Area: E-learning, Education, Predictive models, Educational Data Mining
  • Attribute Characteristics: Integer/Categorical
  • Number of Attributes: 17
  • Date: 2016-11-8
  • Websitehttps://www.kaggle.com/aljarah/xAPI-Edu-Data?select=xAPI-Edu-Data.csv
  • Source: Elaf Abu Amrieh,Thair Hamtini,and Ibrahim Aljarah,The University of Jordan,Amman,Jordan,http://www.Ibrahimaljarah.com www.ju.edu.jo
  • Relevant Papers:
  1. Amrieh, E. A., Hamtini,T., & Aljarah, I. (2016). Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods. International Journal of Database Theory and Application, 9(8), 119-136.
  2. Amrieh, E. A.,Hamtini, T., & Aljarah, I. (2015,November). Preprocessing and analyzing educational data set using X-API for improving student’s performance. In Applied Electrical Engineering and Computing Technologies (AEECT), 2015 IEEE Jordan Conference on (pp. 1-5). IEEE.

1.2 Dataset Introduction

This is an educational data set which is collected from learning management system (LMS) called Kalboard 360. Kalboard 360 is a multi-agent LMS,which has been designed to facilitate learning through the use of leading-edge technology. Such system provides users with a synchronous access to educational resources from any device with Internet connection.

The data is collected using a learner activity tracker tool,which called experience API (xAPI). The xAPI is a component of the training and learning architecture (TLA) that enables to monitor learning progress and learner’s actions like reading an article or watching a training video. The experience API helps the learning activity providers to determine the learner,activity and objects that describe a learning experience.

The dataset consists of 480 student records and 17 features. The features are classified into 3 major categories: (1) Demographic features such as gender and nationality. (2) Academic background features such as educational stage, grade Level and section. (3) Behavioral features such as raised hand on class,opening resources,answering survey by parents,and school satisfaction.

The dataset consists of 305 males and 175 females. The students come from different origins such as 179 students are from Kuwait,172 students are from Jordan,28 students from Palestine, 22 students are from Iraq,17 students from Lebanon,12 students from Tunis,11 students from Saudi Arabia,9 students from Egypt,7 students from Syria,6 students from USA,Iran and Libya,4 students from Morocco and one student from Venezuela.

The dataset is collected through two educational semesters: 245 student records are collected during the first semester and 235 student records are collected during the second semester.

The data set includes also the school attendance feature such as the students are classified into two categories based on their absence days: 191 students exceed 7 absence days and 289 students their absence days under 7.

This dataset includes also a new category of features; this feature is parent parturition in the educational process. Parent participation feature have two sub features: Parent Answering Survey and Parent School Satisfaction. There are 270 of the parents answered survey and 210 are not,292 of the parents are satisfied from the school and 188 are not.

1.3 Attributes Details

  1. gender:student’s gender
    nominal: ‘Male’,‘Female’

  2. NationalITy:student’s nationality
    nominal: ‘Kuwait’,‘Lebanon’,‘Egypt’,‘SaudiArabia’,‘USA’,‘Jordan’,‘Venezuela’,‘Iran’,‘Tunis’,‘Morocco’,‘Syria’,‘Palestine’,‘Iraq’,‘Lybia’

  3. PlaceofBirth:student’s Place of birth
    nominal: ‘Kuwait’,‘Lebanon’,‘Egypt’,‘SaudiArabia’,‘USA’,‘Jordan’,‘Venezuela’,‘Iran’,‘Tunis’,‘Morocco’,‘Syria’,‘Palestine’,‘Iraq’,‘Lybia’

  4. StageID:educational level student belongs
    nominal: ‘lowerlevel’,‘MiddleSchool’,‘HighSchool’

  5. GradeID:grade student belongs
    nominal: ‘G-01’,‘G-02’,‘G-03’,‘G-04’,‘G-05’,‘G-06’,‘G-07’,‘G-08’,‘G-09’,‘G-10’,‘G-11’,‘G-12’

  6. SectionID:classroom student belongs (nominal:’A’,‘B’,‘C’)

  7. Topic:course topic (nominal:’ English’,’Spanish’,‘French’,’Arabic’,’IT’,’Math’,’Chemistry’,‘Biology’,‘Science’,’History’,’Quran’,’Geology’)

  8. Semester:school year semester (nominal:’ First’,’Second’)

  9. Relation:parent responsible for student (nominal:’mom’,‘father’)

  10. raisedhands:how many times the student raises his/her hand on classroom (numeric:0-100)

  11. VisITedResources:how many times the student visits a course content(numeric:0-100)

  12. AnnouncementsView:how many times the student checks the new announcements(numeric:0-100)

  13. Discussion:how many times the student participate on discussion groups (numeric:0-100)
  14. ParentAnsweringSurvey:parent answered the surveys which are provided from school or not (nominal:’Yes’,‘No’)

  15. ParentschoolSatisfaction:the Degree of parent satisfaction from school(nominal:’Yes’,‘No’)

  16. StudentAbsenceDays:the number of absence days for each student (nominal: above-7,under-7)

  17. Class:The students are classified into three numerical intervals based on their total grade/mark:
  • L(Low-Level):interval includes values from 0 to 69,
  • M(Middle-Level):interval includes values from 70 to 89,
  • H(High-Level):interval includes values from 90-100.
    (Note: This classification uses a discretization mechanism to convert student performance from a numerical value to a nominal value, which represents the class label of the classification problem. In order to complete this step, the data collector divides the data set into three nominal intervals based on the student’s total grades/scores, namely high level, medium level and low level. The discretized data set includes 127 low-level students, 211 middle-level students and 142 high-level students.)

2. Objective

2.1 Reasons for Choosing this Data

Very comprehensive. This data is divided into a collected data set containing 480 student records, students with 17 characteristics. These characteristics are divided into three main categories:

  1. Demographic characteristics, such as gender and nationality.
  2. Academic background characteristics, such as academic qualifications, academic qualifications, grades, subjects, etc.
  3. Behavior characteristics, such as hands-on in class, visiting resources, parent’s answer sheet and parents’ satisfaction with the school, covering students and parents.

These key indicators of the data are very related to the problem, allowing me to analyze the factors that may affect the student’s academic performance from various reasons.

2.2 Plan

  1. Accroding to my goal to ananlyst the reason for the level of the grade, the final data set visualized based on “Class” (total grade/mask)

  2. Some column will be deleted. Both Grade ID and Stage ID showed the educational stage of students, and the Stage ID was divided into 12 category, which is unessisarly difficult to analyst. Therefore, Stage ID will be deleted. And PlaceofBirth will be deleted out of similary reason, which is similary to Natuinallty. And SectionID which presents the classrooms belonging of students and Semester also will be deleted.

  3. I will divide the remaining 13 factors into three categories. The first is demographic characteristics, the second is educational background, and the third is student behavior, to comprehensively analyze the relationship between student performance and other 12 factors. I will analyze which factor has a larger proportion and which one has a smaller proportion. Users are also interacting on the page, analyzing the relationship between different characteristics and performance by adjusting the existence and order of a certain factor.

  4. At the same time, I will also analyze the possible internal connections between each column. For example, according to gender, are female students more active in learning behavior? Regarding the family background function, whether the father or mother is responsible for the student’s learning affects the student’s enthusiasm in learning? Which one is more positive?