Load libraries

Get Class 9 Data

https://docs.google.com/spreadsheets/d/15uE5Eo79JzMGZO6IsjZmjmF_zBx12jvuR-DjgJNby0c/edit?usp=sharing

## Sheet successfully identified: "JVM - Class 9 SA1 Scores"
## Accessing worksheet titled '9 COMBINED'.
## Parsed with column specification:
## cols(
##   `S No` = col_integer(),
##   Name = col_character(),
##   `CT-1_MTH` = col_double(),
##   `CT-1_SCI` = col_double(),
##   HYE_MTH = col_double(),
##   HYE_SCI = col_double(),
##   Avanti = col_character(),
##   Absent = col_logical()
## )
## # A tibble: 10 x 8
##    `S No` Name         `CT-1_MTH` `CT-1_SCI` HYE_MTH HYE_SCI Avanti Absent
##     <int> <chr>             <dbl>      <dbl>   <dbl>   <dbl> <chr>  <lgl> 
##  1      1 Abhishek Ma…       26           28    35.5      53 y      FALSE 
##  2      2 Aishwarya G…       37           32    53        64 y      FALSE 
##  3      3 Aman B             14           11    24        22 <NA>   FALSE 
##  4      4 Anjali Kuma…       24.5         27    29.5      43 <NA>   FALSE 
##  5      5 Anjali Pai         35           31    49.5      54 <NA>   FALSE 
##  6      6 Archita Sri…       21           24    29        32 <NA>   FALSE 
##  7      7 Aviral Chan…       37           31    65.5      63 y      FALSE 
##  8      8 Chulika Par…       33           35    70        64 <NA>   FALSE 
##  9      9 Dhiraj Chou…        5           19    12        28 <NA>   FALSE 
## 10     10 G Prashanth        34.5         33    64        58 <NA>   FALSE

Clean Data

## # A tibble: 10 x 8
##    `S No` Name         `CT-1_MTH` `CT-1_SCI` HYE_MTH HYE_SCI Avanti Absent
##     <int> <chr>             <dbl>      <dbl>   <dbl>   <dbl>  <dbl> <lgl> 
##  1      1 Abhishek Ma…       26           28    35.5      53      1 FALSE 
##  2      2 Aishwarya G…       37           32    53        64      1 FALSE 
##  3      3 Aman B             14           11    24        22      0 FALSE 
##  4      4 Anjali Kuma…       24.5         27    29.5      43      0 FALSE 
##  5      5 Anjali Pai         35           31    49.5      54      0 FALSE 
##  6      6 Archita Sri…       21           24    29        32      0 FALSE 
##  7      7 Aviral Chan…       37           31    65.5      63      1 FALSE 
##  8      8 Chulika Par…       33           35    70        64      0 FALSE 
##  9      9 Dhiraj Chou…        5           19    12        28      0 FALSE 
## 10     10 G Prashanth        34.5         33    64        58      0 FALSE

Class Distribution by Groups

1 => Avanti student ; 0 => Not an Avanti student

## # A tibble: 2 x 2
##   `as.factor(Avanti)` count
##   <fct>               <int>
## 1 0                     131
## 2 1                      39

Let’s check Half yearly Exam Scores - Math

Looks similar for both groups

Half Yearly Exam - Science

Avanti kids in the middle

HYE - Maths Density Plot

(area under the plot integrates to 1) More Non-Avanti students in the higher and lower end for Maths

HYE - Science Density Plot

Avanti students slightly ahead in Science

Let’s check CT1 - Science

Avanti kids ahead in Science baseline too

Let’s check CT1 - Maths

Avanti kids seem to be doing better in Maths initially

Scatterplot from Chapter Test 1(40 marks) to Half Yearly Exam - Math(80 marks)

Scores of 3 Avanti kids fell drastically in the HY exams, while 1 kid improved

Scatterplot from CT-1 (40) to HYE - Science (80)

Fewer outliers in science

Let’s normalize the scores using Min Max Scaling to get everything on a common range of [0,100]

## # A tibble: 6 x 8
##   `S No` Name          `CT-1_MTH` `CT-1_SCI` HYE_MTH HYE_SCI Avanti Absent
##    <int> <chr>              <dbl>      <dbl>   <dbl>   <dbl>  <dbl> <lgl> 
## 1      1 Abhishek Mah…         63         67      41      67      1 FALSE 
## 2      2 Aishwarya Gi…         92         78      64      82      1 FALSE 
## 3      3 Aman B                31         19      26      24      0 FALSE 
## 4      4 Anjali Kumar…         59         64      34      53      0 FALSE 
## 5      5 Anjali Pai            87         75      60      68      0 FALSE 
## 6      6 Archita Sriv…         49         56      33      38      0 FALSE

Let’s compare performance of the 2 groups using a linear model. Firstly Maths :

We are modelling the Half Yearly Exam Scores using CT-1 scores and the binary variable Avanti (1/0)

##        (Intercept)         `CT-1_MTH` as.factor(Avanti)1 
##          1.1676126          0.7491484         -4.0307705

Coefficient for the Avanti variable is -ve, indicating Avanti students doing worse

Let’s check coefficients for Science

##        (Intercept)         `CT-1_SCI` as.factor(Avanti)1 
##         -2.9218286          0.9256939          1.7654543

+ve Coefficient indicating Avanti kids doing better in Science !

Up/down movement usins a linear model - MTH

Labelling students as UP/DOWN if their actual score is +/- 5% as compared to the predicted score by linear model. If |delta| < 5% then SAME.

## # A tibble: 2 x 5
## # Groups:   Avanti [2]
##   Avanti  DOWN  SAME    UP NET_perc_UP
##    <dbl> <int> <int> <int>       <dbl>
## 1      0    62    15    54          -6
## 2      1    16     8    15          -3

Net percetage UP is slightly better for Avanti kids in Maths !!

Up/down - SCI

## # A tibble: 2 x 5
## # Groups:   Avanti [2]
##   Avanti  DOWN  SAME    UP NET_perc_UP
##    <dbl> <int> <int> <int>       <dbl>
## 1      0    57    22    52          -4
## 2      1    16     5    18           5

Net Percentage UP movement is clearly better for Avanti kids in Science !!