1. Pendahuluan

Pada tugas ini, anda diminta untuk membuat visualisasi data menggunakan data yang berasal dari Kagle pada link berikut https://www.kaggle.com/datasets/desalegngeb/students-exam-scores?resource=download

dimana variabelnya adalah sebagai berikut:

Data Dictionary (column description)

Visualisasi menggunakan R Markdown dan Rpubs, Markdown tutorial dan Rpubs dapat dilihat pada link berikut https://www.youtube.com/watch?v=K418swtFnik & https://www.youtube.com/watch?v=GJ36zamYVLg

Note: Nama File Data yang digunakan “Expanded Data”

2. Import Data

data<- read.csv("E:/Lecturer/Mata Kuliah/Pengantar Data Sains/Data Sain 2425/Expanded_data_with_more_features.csv")
# Menampilkan 6 baris pertama dari dataset
head(data)
##   X Gender EthnicGroup         ParentEduc    LunchType TestPrep
## 1 0 female              bachelor's degree     standard     none
## 2 1 female     group C       some college     standard         
## 3 2 female     group B    master's degree     standard     none
## 4 3   male     group A associate's degree free/reduced     none
## 5 4   male     group C       some college     standard     none
## 6 5 female     group B associate's degree     standard     none
##   ParentMaritalStatus PracticeSport IsFirstChild NrSiblings TransportMeans
## 1             married     regularly          yes          3     school_bus
## 2             married     sometimes          yes          0               
## 3              single     sometimes          yes          4     school_bus
## 4             married         never           no          1               
## 5             married     sometimes          yes          0     school_bus
## 6             married     regularly          yes          1     school_bus
##   WklyStudyHours MathScore ReadingScore WritingScore
## 1            < 5        71           71           74
## 2         5 - 10        69           90           88
## 3            < 5        87           93           91
## 4         5 - 10        45           56           42
## 5         5 - 10        76           78           75
## 6         5 - 10        73           84           79
str(data)
## 'data.frame':    30641 obs. of  15 variables:
##  $ X                  : int  0 1 2 3 4 5 6 7 8 9 ...
##  $ Gender             : chr  "female" "female" "female" "male" ...
##  $ EthnicGroup        : chr  "" "group C" "group B" "group A" ...
##  $ ParentEduc         : chr  "bachelor's degree" "some college" "master's degree" "associate's degree" ...
##  $ LunchType          : chr  "standard" "standard" "standard" "free/reduced" ...
##  $ TestPrep           : chr  "none" "" "none" "none" ...
##  $ ParentMaritalStatus: chr  "married" "married" "single" "married" ...
##  $ PracticeSport      : chr  "regularly" "sometimes" "sometimes" "never" ...
##  $ IsFirstChild       : chr  "yes" "yes" "yes" "no" ...
##  $ NrSiblings         : int  3 0 4 1 0 1 1 1 3 NA ...
##  $ TransportMeans     : chr  "school_bus" "" "school_bus" "" ...
##  $ WklyStudyHours     : chr  "< 5" "5 - 10" "< 5" "5 - 10" ...
##  $ MathScore          : int  71 69 87 45 76 73 85 41 65 37 ...
##  $ ReadingScore       : int  71 90 93 56 78 84 93 43 64 59 ...
##  $ WritingScore       : int  74 88 91 42 75 79 89 39 68 50 ...

3. Pertanyaan

  1. Periksa distribusi dari variabel math, reading dan writing score, cek apakah ada outlier
  2. Apa yang dimaksud dengan oulier? jelaskan cara memeriksa dan menanggulanginya?
  3. Jam belajar manakah per minggu yang paling banyak dilakukan oleh student
  4. Periksa perbedaan nilai student (math, writing, reading) per ethnic
  5. cek korelasi antar variabel math, reading, and writing score menggunakan corelation heatmap
  6. Periksa perbedaan antara nilai student per PracticeSport. Apakah rata-rata math score pada siswa menunjukkan nilai yang tinggi pada student yang sering berolahraga

Tulis interpretasi lebih lanjut sesuai hasil analisis.

Kirimkan laporan dalam format link Rpubs.