Data Description

*This dataset is about about combination of attendance rate for Semester 1 in SA Government Schools by year level and Index of Educational disadvantage by Government Schools in SA for year 2000.

*Combination of this two datasets are to analyse the relationship between the attendance levels of students and the index of educational disadvantage.The most disadvantage schools have an index of 1 and the least disadvantaged have an index of 7.

*Detailed explanation regarding the two datasets are done below separately.

*Both these datasets have a common variable of School Number and will be using this for the merge.

*Understanding the dataset, inspecting the variables and converting them to proper data type conversions are done in the Understand section.

#Variable Description

*School Number - Each school has a number in South Austrlia

*School Name - Name of the school

*Type of School - If it’s a Primary/Secondary/Aboroginal/Special school

*Suburb/Postcode - Suburb and Postcode of the school

*Attendance Rates from Reception to Secondary

*Index of Educational Disadvantage - Most disadvantage schools have an index of 1 and the least disadvantaged have an index of 7

#Dataset 1#

*Attendance rate for semester 1 in SA Government schools by school and year level in year 2000

*Calculation for Attendance Rate = (No of days attending school/ No of days student enrolled) x 100

*This dataset contians untidy data and will be dealt with this data in the Tidy data section.

*Lots of zeros can be seen in this dataset and they are for the year levels who doesn’t have such classes Example: Coromandel Valley Primary School doesn’t have attendaces after Year 7 as it’s a primary school.

*Also this dataset has missing values with NA and that will be dealt separately in the Scanning section.

#Dataset 2#

*The Index of Educational disadvantage for SA Government school in year 2000

*The most disadvantage schools have an index of 1 and the least disadvantaged have an index of 7

*Index is calculated using the measures of

  * Parental economic resource
  
  * Parental ecucation and occupation
  
  * Aboriginality
  
  * Student mobility
  

#Sample of Dataset 1

mydata0 <-read.csv ("C:/Data/attendance-rate-by-school-2000.csv")

mydata4 <- head(mydata0,10)


#Styling the Table


knitr:: kable(mydata4, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale down","hold_position"))
Census_Year School_Number School_Name Type_of_School Suburb Postcode Reception Year_1 Year_2 Year_3 Year_4 Year_5 Year_6 Year_7 Primary_Other Year_8 Year_9 Year_10 Year_11 Year_12 Secondary_Other
2020 104 Coromandel Valley Primary School Primary Education Coromandel Valley 5051 92.3 92.3 93.5 90.7 92.2 92.7 87.9 92.2 0.0 0 0 0 0 0 0
2020 106 Cowandilla Primary School Primary Education Cowandilla 5033 86.6 84.3 86.4 84.2 86.7 88.1 84.4 88.3 85.5 0 0 0 0 0 0
2020 107 Crafers Primary School Primary Education Crafers 5152 95.0 92.7 95.6 95.4 95.1 94.4 94.8 94.2 0.0 0 0 0 0 0 0
2020 111 Crystal Brook Primary School Primary Education Crystal Brook 5523 90.2 93.4 92.7 92.9 91.6 92.0 93.8 93.4 0.0 0 0 0 0 0 0
2020 114 Curramulka Primary School Primary Education Curramulka 5580 97.4 90.6 97.9 93.0 87.5 0.0 81.4 99.0 0.0 0 0 0 0 0 0
2020 116 Murray Bridge Special School Special Education Murray Bridge 5253 0.0 0.0 0.0 0.0 100.0 0.0 0.0 0.0 84.6 0 0 0 0 0 0
2020 120 Gilles Street Primary School Primary Education Adelaide 5000 87.7 84.7 85.9 85.2 88.3 88.9 86.3 87.8 84.3 0 0 0 0 0 0
2020 123 Echunga Primary School Primary Education Echunga 5153 93.3 93.3 89.8 92.1 95.7 94.7 90.7 94.7 0.0 0 0 0 0 0 0
2020 124 Eden Hills Primary School Primary Education Eden Hills 5050 90.3 92.2 91.0 92.6 92.8 92.0 92.8 91.4 0.0 0 0 0 0 0 0
2020 127 Edwardstown Primary School Primary Education Melrose Park 5039 86.6 88.0 87.2 89.1 83.7 87.2 83.7 85.7 0.0 0 0 0 0 0 0

#Sample of Dataset 2

mydata1 <-read.csv ("C:/Data/index-of-disadvantage-by-school-2020.csv")

mydata5 <- sample_n(mydata1,10)


#Styling the Table


kable(mydata5, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale down","hold_position"))
ID School_Number School_Name Suburb Post_Code Type_of_Schooling Index_of_Educational_Disadvantage
440 1100 Surrey Downs R-7 School Surrey Downs 5126 Primary Education 6
37 770 Birdwood High School Birdwood 5234 Secondary Education 6
237 895 Loxton High School Loxton 5333 Secondary Education 4
452 810 Thebarton Senior College Torrensville 5031 Secondary Education 3
56 752 Brinkworth Primary School Brinkworth 5464 Primary Education 4
370 1801 Port Lincoln Special School Port Lincoln 5606 Special Education 3
364 355 Port Augusta West Primary School Port Augusta West 5700 Primary Education 4
186 777 Jamestown Community School Jamestown 5491 Primary/Secondary Combined 6
310 936 Naracoorte South Primary School Naracoorte 5271 Primary Education 4
326 762 Oakbank Area School Oakbank 5243 Primary/Secondary Combined 6

#Sample of Merged Dataset

mydata2 <- merge(mydata0,mydata1,by="School_Number")

mydata6 <- sample_n(mydata2,10)


#Styling the Table


kable(mydata6, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale down","hold_position"))
School_Number Census_Year School_Name.x Type_of_School Suburb.x Postcode Reception Year_1 Year_2 Year_3 Year_4 Year_5 Year_6 Year_7 Primary_Other Year_8 Year_9 Year_10 Year_11 Year_12 Secondary_Other ID School_Name.y Suburb.y Post_Code Type_of_Schooling Index_of_Educational_Disadvantage
1858 2020 Greenwith Primary School Primary Education Greenwith 5125 93.8 94.1 92.3 92.2 92.4 90.4 90.2 90.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 157 Greenwith Primary School Greenwith 5125 Primary Education 6
935 2020 Para Hills School P-7 Primary Education Para Hills 5096 81.4 85.3 87.5 84.5 88.1 85.4 90.3 89.0 87.4 0.0 0.0 0.0 0.0 0.0 0.0 337 Para Hills School P-7 Para Hills 5096 Primary Education 4
1399 2020 Port Augusta Special School Special Education Port Augusta West 5700 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 55.6 0.0 0.0 0.0 0.0 0.0 79.3 363 Port Augusta Special School Port Augusta West 5700 Special Education 1
1001 2020 Amata Anangu School Aboriginal/Anangu Schools Amata 872 56.0 52.7 49.6 37.2 43.2 56.3 52.6 45.7 0.0 34.7 40.3 21.4 33.2 27.5 0.0 15 Amata Anangu School Amata 872 Aboriginal/Anangu Schools 1
391 2020 St Leonards Primary School Primary Education Glenelg North 5045 87.3 89.6 88.7 87.0 87.6 86.2 87.5 87.5 95.0 0.0 0.0 0.0 0.0 0.0 0.0 432 St Leonards Primary School Glenelg North 5045 Primary Education 6
962 2020 Riverland Special School Special Education Berri 5343 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 77.0 0.0 0.0 0.0 0.0 0.0 84.4 395 Riverland Special School Berri 5343 Special Education 3
1183 2020 North Ingle School Primary Education Ingle Farm 5098 90.6 89.2 88.4 90.8 88.6 91.0 92.8 90.7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 316 North Ingle School Ingle Farm 5098 Primary Education 4
475 2020 Westbourne Park Primary School Primary Education Westbourne Park 5041 92.4 91.6 93.0 93.3 92.3 92.6 91.9 92.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 487 Westbourne Park Primary School Westbourne Park 5041 Primary Education 7
447 2020 Unley Primary School Primary Education Unley 5061 90.7 89.0 90.7 90.0 92.0 89.2 90.3 87.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 464 Unley Primary School Unley 5061 Primary Education 7
1696 2020 Kenmore Park Anangu School Aboriginal/Anangu Schools Kenmore Park 872 9.1 69.8 33.3 87.5 58.1 79.9 68.3 0.0 75.0 67.9 87.9 100.0 0.0 0.0 0.0 204 Kenmore Park Anangu School Kenmore Park 872 Aboriginal/Anangu Schools 1

#Removing the unnecessary columns from the merged dataset

library(dplyr)
mydata7 <- select(mydata2, -ID, -School_Name.y,-Suburb.y ,-Post_Code,-Type_of_Schooling)

mydata7a <- sample_n(mydata7,5)

#Styling the Table


kable(mydata7a, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale down","hold_position"))
School_Number Census_Year School_Name.x Type_of_School Suburb.x Postcode Reception Year_1 Year_2 Year_3 Year_4 Year_5 Year_6 Year_7 Primary_Other Year_8 Year_9 Year_10 Year_11 Year_12 Secondary_Other Index_of_Educational_Disadvantage
239 2020 Macclesfield Primary School Primary Education Macclesfield 5153 91.5 94.8 92.8 84.0 90.9 90.9 93.1 92.3 0 0 0 0 0 0 0 6
270 2020 Mitcham Primary School Primary Education Kingswood 5062 91.3 91.4 92.6 92.2 91.9 90.9 91.0 89.9 0 0 0 0 0 0 0 7
647 2020 Linden Park Primary School Primary Education Linden Park 5065 90.7 89.6 91.3 91.3 93.5 93.4 91.4 90.0 0 0 0 0 0 0 0 7
1481 2020 Stirling North Primary School Primary Education Stirling North 5710 90.9 91.5 91.1 92.2 89.5 90.5 88.8 89.4 0 0 0 0 0 0 0 3
340 2020 Ascot Park Primary School Primary Education Park Holme 5043 87.2 80.7 90.3 86.1 88.1 87.4 84.6 82.6 0 0 0 0 0 0 0 3

Understand

#Identifying the number of variables and observations.

# No of Observations in the Dataset

nrow(mydata7)
## [1] 505
# No of Variables in the Dataset

ncol(mydata7)
## [1] 22

#Summary of the Dataset

summary(mydata7)
##  School_Number     Census_Year   School_Name.x      Type_of_School    
##  Min.   : 104.0   Min.   :2020   Length:505         Length:505        
##  1st Qu.: 387.0   1st Qu.:2020   Class :character   Class :character  
##  Median : 754.0   Median :2020   Mode  :character   Mode  :character  
##  Mean   : 770.9   Mean   :2020                                        
##  3rd Qu.:1018.0   3rd Qu.:2020                                        
##  Max.   :1919.0   Max.   :2020                                        
##                                                                       
##    Suburb.x            Postcode      Reception          Year_1     
##  Length:505         Min.   : 872   Min.   :  0.00   Min.   : 0.00  
##  Class :character   1st Qu.:5086   1st Qu.: 79.30   1st Qu.:80.30  
##  Mode  :character   Median :5164   Median : 87.50   Median :87.30  
##                     Mean   :5177   Mean   : 71.71   Mean   :72.33  
##                     3rd Qu.:5351   3rd Qu.: 90.50   3rd Qu.:90.70  
##                     Max.   :5734   Max.   :100.00   Max.   :98.60  
##                                                                    
##      Year_2           Year_3          Year_4           Year_5     
##  Min.   :  0.00   Min.   : 0.00   Min.   :  0.00   Min.   : 0.00  
##  1st Qu.: 82.05   1st Qu.:81.10   1st Qu.: 81.30   1st Qu.:80.45  
##  Median : 88.00   Median :88.40   Median : 88.20   Median :87.80  
##  Mean   : 72.75   Mean   :73.33   Mean   : 72.78   Mean   :72.50  
##  3rd Qu.: 91.03   3rd Qu.:91.10   3rd Qu.: 91.15   3rd Qu.:90.80  
##  Max.   :100.00   Max.   :98.30   Max.   :100.00   Max.   :98.30  
##  NA's   :1                        NA's   :2        NA's   :2      
##      Year_6          Year_7       Primary_Other        Year_8      
##  Min.   : 0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:80.45   1st Qu.: 78.45   1st Qu.:  0.00   1st Qu.:  0.00  
##  Median :87.60   Median : 87.35   Median :  0.00   Median :  0.00  
##  Mean   :71.99   Mean   : 71.91   Mean   : 20.56   Mean   : 22.26  
##  3rd Qu.:90.60   3rd Qu.: 90.58   3rd Qu.: 33.85   3rd Qu.: 57.20  
##  Max.   :97.20   Max.   :100.00   Max.   :100.00   Max.   :100.00  
##  NA's   :2       NA's   :3        NA's   :3                        
##      Year_9          Year_10         Year_11          Year_12      
##  Min.   :  0.00   Min.   :  0.0   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  0.00   1st Qu.:  0.0   1st Qu.:  0.00   1st Qu.:  0.00  
##  Median :  0.00   Median :  0.0   Median :  0.00   Median :  0.00  
##  Mean   : 21.73   Mean   : 21.4   Mean   : 20.48   Mean   : 21.19  
##  3rd Qu.: 53.40   3rd Qu.: 52.8   3rd Qu.: 22.80   3rd Qu.: 33.40  
##  Max.   :100.00   Max.   :100.0   Max.   :100.00   Max.   :100.00  
##                                                                    
##  Secondary_Other   Index_of_Educational_Disadvantage
##  Min.   :  0.000   Min.   :1.000                    
##  1st Qu.:  0.000   1st Qu.:3.000                    
##  Median :  0.000   Median :5.000                    
##  Mean   :  9.438   Mean   :4.327                    
##  3rd Qu.:  0.000   3rd Qu.:6.000                    
##  Max.   :100.000   Max.   :7.000                    
## 

#Checking the types of Variables

str(mydata7)
## 'data.frame':    505 obs. of  22 variables:
##  $ School_Number                    : int  104 106 107 111 114 116 120 123 124 127 ...
##  $ Census_Year                      : int  2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##  $ School_Name.x                    : chr  "Coromandel Valley Primary School" "Cowandilla Primary School" "Crafers Primary School" "Crystal Brook Primary School" ...
##  $ Type_of_School                   : chr  "Primary Education" "Primary Education" "Primary Education" "Primary Education" ...
##  $ Suburb.x                         : chr  "Coromandel Valley" "Cowandilla" "Crafers" "Crystal Brook" ...
##  $ Postcode                         : int  5051 5033 5152 5523 5580 5253 5000 5153 5050 5039 ...
##  $ Reception                        : num  92.3 86.6 95 90.2 97.4 0 87.7 93.3 90.3 86.6 ...
##  $ Year_1                           : num  92.3 84.3 92.7 93.4 90.6 0 84.7 93.3 92.2 88 ...
##  $ Year_2                           : num  93.5 86.4 95.6 92.7 97.9 0 85.9 89.8 91 87.2 ...
##  $ Year_3                           : num  90.7 84.2 95.4 92.9 93 0 85.2 92.1 92.6 89.1 ...
##  $ Year_4                           : num  92.2 86.7 95.1 91.6 87.5 100 88.3 95.7 92.8 83.7 ...
##  $ Year_5                           : num  92.7 88.1 94.4 92 0 0 88.9 94.7 92 87.2 ...
##  $ Year_6                           : num  87.9 84.4 94.8 93.8 81.4 0 86.3 90.7 92.8 83.7 ...
##  $ Year_7                           : num  92.2 88.3 94.2 93.4 99 0 87.8 94.7 91.4 85.7 ...
##  $ Primary_Other                    : num  0 85.5 0 0 0 84.6 84.3 0 0 0 ...
##  $ Year_8                           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year_9                           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year_10                          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year_11                          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year_12                          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Secondary_Other                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Index_of_Educational_Disadvantage: int  7 4 7 5 5 3 4 7 7 6 ...

#Converting a Character variable to a Factor Variable and Labeling

mydata8 <-mydata7
mydata8$Type_of_school <- factor (c(mydata7$Type_of_School),levels = c("Aboriginal/Anangu Schools","Primary Education","Primary/Secondary Combined","Secondary Education","Special Education"),labels=c("ABO","PRIM","PRIM/SEC","SECONDRY","SPECIAL"))
mydata8$Type_of_school
##   [1] PRIM     PRIM     PRIM     PRIM     PRIM     SPECIAL  PRIM     PRIM    
##   [9] PRIM     PRIM     PRIM/SEC PRIM     PRIM     PRIM     PRIM     PRIM    
##  [17] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
##  [25] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM/SEC
##  [33] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
##  [41] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     SPECIAL 
##  [49] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM/SEC
##  [57] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     ABO      PRIM    
##  [65] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
##  [73] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM/SEC PRIM    
##  [81] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
##  [89] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
##  [97] ABO      PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [105] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [113] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [121] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [129] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [137] PRIM     PRIM     PRIM     PRIM/SEC PRIM     PRIM     PRIM     PRIM    
## [145] PRIM     PRIM/SEC PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [153] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [161] PRIM     PRIM     PRIM     PRIM     SPECIAL  PRIM     PRIM     PRIM    
## [169] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [177] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [185] SPECIAL  PRIM     PRIM     PRIM     PRIM     SECONDRY PRIM     PRIM    
## [193] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [201] SECONDRY PRIM/SEC PRIM/SEC PRIM     PRIM     PRIM     PRIM     PRIM    
## [209] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [217] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [225] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM/SEC PRIM    
## [233] PRIM/SEC PRIM     PRIM/SEC PRIM/SEC PRIM/SEC PRIM/SEC PRIM/SEC PRIM/SEC
## [241] PRIM     PRIM/SEC PRIM     PRIM/SEC PRIM/SEC PRIM     PRIM/SEC PRIM/SEC
## [249] PRIM/SEC PRIM/SEC PRIM     PRIM/SEC PRIM/SEC PRIM/SEC PRIM/SEC PRIM/SEC
## [257] PRIM/SEC PRIM     PRIM/SEC PRIM/SEC PRIM     PRIM     PRIM/SEC PRIM/SEC
## [265] PRIM/SEC SECONDRY SECONDRY SECONDRY PRIM/SEC PRIM/SEC PRIM/SEC SECONDRY
## [273] SECONDRY PRIM/SEC PRIM/SEC SECONDRY SECONDRY PRIM/SEC SECONDRY SECONDRY
## [281] SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY
## [289] PRIM/SEC SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY
## [297] SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY SECONDRY PRIM    
## [305] SPECIAL  ABO      SECONDRY SECONDRY SECONDRY SECONDRY PRIM     PRIM    
## [313] PRIM     SECONDRY SECONDRY PRIM/SEC PRIM/SEC PRIM/SEC SECONDRY SECONDRY
## [321] PRIM     PRIM     PRIM/SEC PRIM     PRIM     PRIM     SECONDRY SECONDRY
## [329] SECONDRY SECONDRY PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [337] SECONDRY PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [345] PRIM/SEC SPECIAL  SECONDRY SECONDRY PRIM     PRIM     PRIM     PRIM    
## [353] PRIM     PRIM     SECONDRY SECONDRY PRIM     ABO      PRIM     PRIM    
## [361] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     SPECIAL  ABO     
## [369] ABO      ABO      PRIM     PRIM     ABO      SECONDRY SECONDRY SPECIAL 
## [377] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM/SEC
## [385] SECONDRY ABO      PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [393] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [401] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [409] PRIM     PRIM/SEC PRIM     PRIM     SECONDRY PRIM/SEC PRIM/SEC PRIM    
## [417] PRIM     PRIM     PRIM     PRIM     SECONDRY PRIM     PRIM     PRIM    
## [425] PRIM     PRIM     PRIM     PRIM     PRIM/SEC ABO      PRIM     PRIM    
## [433] SECONDRY PRIM     PRIM     SECONDRY ABO      PRIM     PRIM     PRIM    
## [441] PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM     PRIM    
## [449] SECONDRY SPECIAL  PRIM     PRIM     PRIM     SPECIAL  SPECIAL  SPECIAL 
## [457] SPECIAL  PRIM/SEC PRIM     PRIM/SEC PRIM/SEC PRIM     PRIM     PRIM    
## [465] PRIM/SEC ABO      ABO      PRIM     PRIM     PRIM     SPECIAL  PRIM    
## [473] PRIM     SECONDRY SECONDRY ABO      PRIM     PRIM     ABO      SECONDRY
## [481] SPECIAL  PRIM/SEC SECONDRY PRIM     PRIM     PRIM     PRIM     PRIM    
## [489] PRIM     PRIM     ABO      PRIM     PRIM     PRIM     PRIM/SEC PRIM/SEC
## [497] PRIM/SEC PRIM     SECONDRY PRIM     PRIM     PRIM     PRIM/SEC PRIM    
## [505] SECONDRY
## Levels: ABO PRIM PRIM/SEC SECONDRY SPECIAL

#Below Variables are included in the dataset

*Integer

*Factor with Labels

*Character

*Numerical

str(mydata8)
## 'data.frame':    505 obs. of  23 variables:
##  $ School_Number                    : int  104 106 107 111 114 116 120 123 124 127 ...
##  $ Census_Year                      : int  2020 2020 2020 2020 2020 2020 2020 2020 2020 2020 ...
##  $ School_Name.x                    : chr  "Coromandel Valley Primary School" "Cowandilla Primary School" "Crafers Primary School" "Crystal Brook Primary School" ...
##  $ Type_of_School                   : chr  "Primary Education" "Primary Education" "Primary Education" "Primary Education" ...
##  $ Suburb.x                         : chr  "Coromandel Valley" "Cowandilla" "Crafers" "Crystal Brook" ...
##  $ Postcode                         : int  5051 5033 5152 5523 5580 5253 5000 5153 5050 5039 ...
##  $ Reception                        : num  92.3 86.6 95 90.2 97.4 0 87.7 93.3 90.3 86.6 ...
##  $ Year_1                           : num  92.3 84.3 92.7 93.4 90.6 0 84.7 93.3 92.2 88 ...
##  $ Year_2                           : num  93.5 86.4 95.6 92.7 97.9 0 85.9 89.8 91 87.2 ...
##  $ Year_3                           : num  90.7 84.2 95.4 92.9 93 0 85.2 92.1 92.6 89.1 ...
##  $ Year_4                           : num  92.2 86.7 95.1 91.6 87.5 100 88.3 95.7 92.8 83.7 ...
##  $ Year_5                           : num  92.7 88.1 94.4 92 0 0 88.9 94.7 92 87.2 ...
##  $ Year_6                           : num  87.9 84.4 94.8 93.8 81.4 0 86.3 90.7 92.8 83.7 ...
##  $ Year_7                           : num  92.2 88.3 94.2 93.4 99 0 87.8 94.7 91.4 85.7 ...
##  $ Primary_Other                    : num  0 85.5 0 0 0 84.6 84.3 0 0 0 ...
##  $ Year_8                           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year_9                           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year_10                          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year_11                          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Year_12                          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Secondary_Other                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Index_of_Educational_Disadvantage: int  7 4 7 5 5 3 4 7 7 6 ...
##  $ Type_of_school                   : Factor w/ 5 levels "ABO","PRIM","PRIM/SEC",..: 2 2 2 2 2 5 2 2 2 2 ...

Tidy & Manipulate Data I

#Replacing the zeros

library(dplyr)
mydata9 <- mydata8

mydata9 [mydata9 == 0] <- "_"



mydata13 <-mydata9


mydata13a <- sample_n(mydata9,6)


knitr:: kable(mydata13a, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale down","hold_position"))
School_Number Census_Year School_Name.x Type_of_School Suburb.x Postcode Reception Year_1 Year_2 Year_3 Year_4 Year_5 Year_6 Year_7 Primary_Other Year_8 Year_9 Year_10 Year_11 Year_12 Secondary_Other Index_of_Educational_Disadvantage Type_of_school
323 2020 Coorara Primary School Primary Education Morphett Vale 5162 88.8 91.3 85.9 85.8 87.1 84.2 84.3 78 83.1 _ _ _ _ _ _ 3 PRIM
449 2020 Uraidla Primary School Primary Education Uraidla 5142 93.2 89.8 89.9 93.1 92.4 94.7 92.7 92.9 _ _ _ _ _ _ _ 7 PRIM
1060 2020 O’Sullivan Beach Primary School Primary Education O’Sullivan Beach 5166 75 85.1 87.7 83.8 74.2 87.5 84.5 71.4 _ _ _ _ _ _ _ 2 PRIM
253 2020 Melrose Primary School Primary Education Melrose 5483 _ 95.9 88.8 95.5 85.1 92.3 79.9 72.5 _ _ _ _ _ _ _ 7 PRIM
1903 2020 Blair Athol North School B-7 Primary Education Blair Athol 5084 80.1 76.1 83.1 80.9 82 84.4 81.1 84.9 87.6 _ _ _ _ _ _ 2 PRIM
822 2020 Whyalla High School Secondary Education Whyalla 5600 _ _ _ _ _ _ _ _ _ 81.6 78.5 75.3 _ _ _ 2 SECONDRY

#Dropping/ Renaming and Ordering the Variables

library(dplyr)
mydata10 <- select(mydata9, -Census_Year, -Type_of_School)

mydata11 <- (rename(mydata10,School_No =School_Number,School=School_Name.x,Suburb=Suburb.x,Index_Edu_Disadvantage=Index_of_Educational_Disadvantage
))



col_order <- c("School_No","School","Type_of_school","Index_Edu_Disadvantage","Suburb","Postcode","Reception","Year_1","Year_2","Year_3","Year_4","Year_5","Year_6","Year_7","Primary_Other","Year_8","Year_9","Year_10","Year_11","Year_12","Secondary_Other")

mydata12 <- mydata11 %>% select(col_order)
mydata14 <- sample_n(mydata12,5)



#Styling the Table


kable(mydata14, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale down","hold_position"))
School_No School Type_of_school Index_Edu_Disadvantage Suburb Postcode Reception Year_1 Year_2 Year_3 Year_4 Year_5 Year_6 Year_7 Primary_Other Year_8 Year_9 Year_10 Year_11 Year_12 Secondary_Other
424 Tailem Bend Primary School PRIM 3 Tailem Bend 5260 88.2 84.7 87.9 86.3 92 87.9 87.7 87.6 _ _ _ _ _ _ _
237 Lyndoch Primary School PRIM 6 Lyndoch 5351 91.7 90.1 92.8 90.8 89.1 89.9 86.2 92.6 _ _ _ _ _ _ _
990 Augusta Park Primary School PRIM 1 Port Augusta 5700 74.1 76 77.3 76.7 70.1 80.8 68 74.4 66.6 _ _ _ _ _ _
740 Orroroo Area School PRIM/SEC 6 Orroroo 5431 91.9 98.3 94.7 95.4 89.7 95 89.3 95.4 _ 91.4 85.5 91.5 86.5 94 _
928 Grant High School SECONDRY 3 Mount Gambier 5290 _ _ _ _ _ _ _ _ _ 92.2 87.9 89.4 84.5 88.9 87.6

#Rechecking the Variable Types after the clean up

*Some variables have been turned back to Character when replacing the zeros with “_“.

summary(mydata12)
##    School_No         School           Type_of_school Index_Edu_Disadvantage
##  Min.   : 104.0   Length:505         ABO     : 16    Min.   :1.000         
##  1st Qu.: 387.0   Class :character   PRIM    :350    1st Qu.:3.000         
##  Median : 754.0   Mode  :character   PRIM/SEC: 60    Median :5.000         
##  Mean   : 770.9                      SECONDRY: 64    Mean   :4.327         
##  3rd Qu.:1018.0                      SPECIAL : 15    3rd Qu.:6.000         
##  Max.   :1919.0                                      Max.   :7.000         
##     Suburb             Postcode     Reception            Year_1         
##  Length:505         Min.   : 872   Length:505         Length:505        
##  Class :character   1st Qu.:5086   Class :character   Class :character  
##  Mode  :character   Median :5164   Mode  :character   Mode  :character  
##                     Mean   :5177                                        
##                     3rd Qu.:5351                                        
##                     Max.   :5734                                        
##     Year_2             Year_3             Year_4             Year_5         
##  Length:505         Length:505         Length:505         Length:505        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Year_6             Year_7          Primary_Other         Year_8         
##  Length:505         Length:505         Length:505         Length:505        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##     Year_9            Year_10            Year_11            Year_12         
##  Length:505         Length:505         Length:505         Length:505        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  Secondary_Other   
##  Length:505        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

#Converting the Character Variables to Numeric

mydata15 <- mydata12

mydata15$Reception <- as.numeric (mydata12$Reception)

mydata15$Year_1 <- as.numeric (mydata12$Year_1)
mydata15$Year_2 <- as.numeric (mydata12$Year_2)
mydata15$Year_3 <- as.numeric (mydata12$Year_3)
mydata15$Year_4 <- as.numeric (mydata12$Year_4)
mydata15$Year_5 <- as.numeric (mydata12$Year_5)
mydata15$Year_6 <- as.numeric (mydata12$Year_6)
mydata15$Year_7 <- as.numeric (mydata12$Year_7)
mydata15$Primary_Other <- as.numeric (mydata14$Primary_Other)
mydata15$Year_8 <- as.numeric (mydata12$Year_8)
mydata15$Year_9 <- as.numeric (mydata12$Year_9)
mydata15$Year_10 <- as.numeric (mydata12$Year_10)
mydata15$Year_11 <- as.numeric (mydata12$Year_11)
mydata15$Year_12 <- as.numeric (mydata12$Year_12)
mydata15$Secondary_Other<- as.numeric (mydata12$Secondary_Other)

Tidy & Manipulate Data II

#Mutating Variables

Mutating below 3 variables

*Mean_Attendance_Rate

*Minimum_Attendance_Rate

*Maximum_Attendance_Rate
mydata16 <-mydata15

library(dplyr)

mydata17 <- mydata16 %>%
  
  
  mutate(Mean_Attendance_Rate=rowMeans(.[ , c("Reception","Year_1","Year_2","Year_3","Year_4","Year_5","Year_6","Year_7","Primary_Other","Year_8","Year_9","Year_10","Year_11","Year_12","Secondary_Other")], na.rm=TRUE))

mydata18 <- mydata17 %>%


  
  rowwise() %>% 
  

  
  mutate(Minimum_Attendance_Rate = min(Reception, Year_1,Year_2,Year_3,Year_4,Year_5,Year_6,Year_7,Primary_Other,Year_8,Year_9,Year_10,Year_11,Year_12,Secondary_Other, na.rm = T))  %>%

 
  
 mutate(Maximum_Attendance_Rate = max(Reception, Year_1,Year_2,Year_3,Year_4,Year_5,Year_6,Year_7,Primary_Other,Year_8,Year_9,Year_10,Year_11,Year_12,Secondary_Other, na.rm = T))

mydata19 <- head(mydata18,5)



#Styling the Table


kable(mydata19, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale_down","hold_position"))
School_No School Type_of_school Index_Edu_Disadvantage Suburb Postcode Reception Year_1 Year_2 Year_3 Year_4 Year_5 Year_6 Year_7 Primary_Other Year_8 Year_9 Year_10 Year_11 Year_12 Secondary_Other Mean_Attendance_Rate Minimum_Attendance_Rate Maximum_Attendance_Rate
104 Coromandel Valley Primary School PRIM 7 Coromandel Valley 5051 92.3 92.3 93.5 90.7 92.2 92.7 87.9 92.2 NA NA NA NA NA NA NA 91.72500 87.9 93.5
106 Cowandilla Primary School PRIM 4 Cowandilla 5033 86.6 84.3 86.4 84.2 86.7 88.1 84.4 88.3 NA NA NA NA NA NA NA 86.12500 84.2 88.3
107 Crafers Primary School PRIM 7 Crafers 5152 95.0 92.7 95.6 95.4 95.1 94.4 94.8 94.2 66.6 NA NA NA NA NA NA 91.53333 66.6 95.6
111 Crystal Brook Primary School PRIM 5 Crystal Brook 5523 90.2 93.4 92.7 92.9 91.6 92.0 93.8 93.4 NA NA NA NA NA NA NA 92.50000 90.2 93.8
114 Curramulka Primary School PRIM 5 Curramulka 5580 97.4 90.6 97.9 93.0 87.5 NA 81.4 99.0 NA NA NA NA NA NA NA 92.40000 81.4 99.0

#Merging the datasets

*This additional step had to be done because mydata18 has got the actual missing values as well as the values should be missed as well. The reason behind this was when converting to the numeric variable, the character variable which was “_” has converted to NA as well and its mixing up with the actual NA values.

*Therefore I will be merging the dataset -mydata12 which should have the missing values “_” in this format and actual missing values in NA format with the dataset - mydata18 where I can get the mutated variables

*Dropping unwanted variables

*Renaming variables

mydata19 <- merge(mydata12,mydata18, by="School_No")

mydata20 <-select(mydata19,-School.y,-Type_of_school.y,-Index_Edu_Disadvantage.y,-Suburb.y,-Postcode.y,-Reception.y,-Year_1.y,-Year_2.y,-Year_3.y,-Year_4.y,-Year_5.y,-Year_6.y,-Year_7.y,-Primary_Other.y,-Year_8.y,-Year_9.y,-Year_10.y,-Year_11.y,-Year_12.y,-Secondary_Other.y)

mydata21 <- rename(mydata20,School=School.x,Type_of_school =Type_of_school.x,Index_Edu_Disadvantage=Index_Edu_Disadvantage.x,Suburb=Suburb.x,Postcode=Postcode.x,Reception=Reception.x,Year_1=Year_1.x,Year_2=Year_2.x,Year_3=Year_3.x,Year_4=Year_4.x,Year_5=Year_5.x,Year_6=Year_6.x,Year_7=Year_7.x,Primary_Other=Primary_Other.x,Year_8=Year_8.x,Year_9=Year_9.x,Year_10=Year_10.x,Year_11=Year_11.x,Year_12=Year_12.x,Secondary_Other=Secondary_Other.x)

mydata22<-head(mydata21,5)


#Styling the Table


kable(mydata22, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale_down","hold_position"))
School_No School Type_of_school Index_Edu_Disadvantage Suburb Postcode Reception Year_1 Year_2 Year_3 Year_4 Year_5 Year_6 Year_7 Primary_Other Year_8 Year_9 Year_10 Year_11 Year_12 Secondary_Other Mean_Attendance_Rate Minimum_Attendance_Rate Maximum_Attendance_Rate
104 Coromandel Valley Primary School PRIM 7 Coromandel Valley 5051 92.3 92.3 93.5 90.7 92.2 92.7 87.9 92.2 _ _ _ _ _ _ _ 91.72500 87.9 93.5
106 Cowandilla Primary School PRIM 4 Cowandilla 5033 86.6 84.3 86.4 84.2 86.7 88.1 84.4 88.3 85.5 _ _ _ _ _ _ 86.12500 84.2 88.3
107 Crafers Primary School PRIM 7 Crafers 5152 95 92.7 95.6 95.4 95.1 94.4 94.8 94.2 _ _ _ _ _ _ _ 91.53333 66.6 95.6
111 Crystal Brook Primary School PRIM 5 Crystal Brook 5523 90.2 93.4 92.7 92.9 91.6 92 93.8 93.4 _ _ _ _ _ _ _ 92.50000 90.2 93.8
114 Curramulka Primary School PRIM 5 Curramulka 5580 97.4 90.6 97.9 93 87.5 _ 81.4 99 _ _ _ _ _ _ _ 92.40000 81.4 99.0

Scan I

#Identifying the Missing Values

mydata23 <- mydata21
options(max.print = 999999999)

#is.na(mydata23)#

#Count of Missing Values

sum(is.na(mydata23))
## [1] 14

#Calculating the Mean of Missing Value

mean(is.na(mydata23))
## [1] 0.001155116

#Finding where the data is missing

which(is.na(mydata23))
##  [1]  4137  5113  5147  5592  5730  6079  6123  6612  6627  6654  7402  7405
## [13]  7406 10981

#Replacing the missing values with Mean

mydata24<- mydata23
for (i in 1:ncol(mydata23)){mydata24[ ,i][is.na(mydata24[ ,i])] <- mean(mydata24[ ,i],na.rm=TRUE)}
mydata25 <- head(mydata24,15)


#Styling the Table


kable(mydata25, booktabs=T) %>%
  kable_styling(latex_options = c("striped","scale_down","hold_position"))
School_No School Type_of_school Index_Edu_Disadvantage Suburb Postcode Reception Year_1 Year_2 Year_3 Year_4 Year_5 Year_6 Year_7 Primary_Other Year_8 Year_9 Year_10 Year_11 Year_12 Secondary_Other Mean_Attendance_Rate Minimum_Attendance_Rate Maximum_Attendance_Rate
104 Coromandel Valley Primary School PRIM 7 Coromandel Valley 5051 92.3 92.3 93.5 90.7 92.2 92.7 87.9 92.2 _ _ _ _ _ _ _ 91.72500 87.9 93.5
106 Cowandilla Primary School PRIM 4 Cowandilla 5033 86.6 84.3 86.4 84.2 86.7 88.1 84.4 88.3 85.5 _ _ _ _ _ _ 86.12500 84.2 88.3
107 Crafers Primary School PRIM 7 Crafers 5152 95 92.7 95.6 95.4 95.1 94.4 94.8 94.2 _ _ _ _ _ _ _ 91.53333 66.6 95.6
111 Crystal Brook Primary School PRIM 5 Crystal Brook 5523 90.2 93.4 92.7 92.9 91.6 92 93.8 93.4 _ _ _ _ _ _ _ 92.50000 90.2 93.8
114 Curramulka Primary School PRIM 5 Curramulka 5580 97.4 90.6 97.9 93 87.5 _ 81.4 99 _ _ _ _ _ _ _ 92.40000 81.4 99.0
116 Murray Bridge Special School SPECIAL 3 Murray Bridge 5253 _ _ _ _ 100 _ _ _ 84.6 _ _ _ _ _ _ 100.00000 100.0 100.0
120 Gilles Street Primary School PRIM 4 Adelaide 5000 87.7 84.7 85.9 85.2 88.3 88.9 86.3 87.8 84.3 _ _ _ _ _ _ 86.85000 84.7 88.9
123 Echunga Primary School PRIM 7 Echunga 5153 93.3 93.3 89.8 92.1 95.7 94.7 90.7 94.7 _ _ _ _ _ _ _ 90.10000 66.6 95.7
124 Eden Hills Primary School PRIM 7 Eden Hills 5050 90.3 92.2 91 92.6 92.8 92 92.8 91.4 _ _ _ _ _ _ _ 91.88750 90.3 92.8
127 Edwardstown Primary School PRIM 6 Melrose Park 5039 86.6 88 87.2 89.1 83.7 87.2 83.7 85.7 _ _ _ _ _ _ _ 86.40000 83.7 89.1
128 Elliston Area School PRIM/SEC 5 Elliston 5670 81.6 80 93 87.3 91.9 96.6 92.1 89.4 _ 93.7 95.1 97.9 _ _ _ 90.78182 80.0 97.9
131 Norwood Primary School PRIM 7 Norwood 5067 89.7 90.3 92.7 91.7 91.8 91.9 90.6 87.9 _ _ _ _ _ _ _ 90.82500 87.9 92.7
135 Flinders Park Primary School PRIM 4 Flinders Park 5025 86.8 90.8 90.4 87.5 86.9 90.6 90.5 92.1 89.8 _ _ _ _ _ _ 86.91111 66.6 92.1
136 Lenswood Primary School PRIM 7 Lenswood 5240 94.6 86.1 91.4 93.2 91.2 93.4 91.6 87.6 _ _ _ _ _ _ _ 91.13750 86.1 94.6
138 Frances Primary School PRIM 7 Frances 5262 96.1 94.5 95.4 91.5 88.1 87.6 95.4 89.9 _ _ _ _ _ _ _ 92.31250 87.6 96.1

Scan II

#Detecting Outliers of numerical variables

Outliers are detected in the below variables

*Mean_Attendance_Rate

*Minimum_Attendance_Rate

*Maximum_Attendance_Rate

mydata26 <- select(mydata23,Mean_Attendance_Rate :Maximum_Attendance_Rate)


library(ggplot2)
par(mfrow = c(1, ncol(mydata26)))
invisible(lapply(1:ncol(mydata26), function(i) boxplot(mydata26[, i])))

#Finding the Outliers

*Finding the Outliers of the 3 variables seperatly.

Outliers1 <- boxplot(mydata26$Mean_Attendance_Rate,plot =FALSE)$out
Outliers1
##  [1] 100.00000  68.03636  68.86000  74.30000  60.00000  62.15000  74.48750
##  [8]  74.94615  72.78333  70.85000  64.12143  74.61250  68.12727  73.77778
## [15]  46.84286  42.33846  57.25000  68.68750  72.78333  47.11538  65.41250
## [22]  54.01538  53.64000  52.43846  52.88000  74.96250  74.04444  58.33750
## [29]  59.52308  58.60000  66.18000  53.43846  56.35000
Outliers2 <- boxplot(mydata26$Minimum_Attendance_Rate,plot =FALSE)$out
Outliers2
##  [1] 26.5 33.3 12.5 11.9 21.4 22.2  Inf 22.8 32.6 21.4 28.4 33.5  9.1 12.1
Outliers3 <- boxplot(mydata26$Maximum_Attendance_Rate,plot =FALSE)$out
Outliers3
##  [1] 74.3 71.7 79.0 78.6 76.2 80.5 79.1 80.8 80.8 72.8 56.3 78.5 -Inf 75.2 79.3
## [16] 63.1 72.6 61.6 81.0 79.3 72.4 71.7 75.3 66.9

#Summary of Outliers

summary(Outliers1)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   42.34   56.35   65.41   64.30   72.78  100.00
summary(Outliers2)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.10   14.72   22.50     Inf   31.55     Inf
summary(Outliers3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    -Inf   71.70   75.25    -Inf   79.15   81.00

#Remove Outliers - Mean_Attendance_Rate Variable

*Replacing the Outliers1 with the Median for Mean_Attendance_Rate Variable

*When comparing the Before & After boxplot charts, its clear that outliers have been controlled

mydata27 <- mydata26

mydata27$Mean_Attendance_Rate[mydata27$Mean_Attendance_Rate %in% boxplot(mydata27$Mean_Attendance_Rate)$out]<-median(mydata27$Mean_Attendance_Rate)

boxplot(mydata27$Mean_Attendance_Rate)

#Remove Outliers - Minimum_Attendance_Rate Variable

*Replacing the Outliers2 with the Median for Minimum_Attendance_Rate Variable

*When comparing the Before & After boxplot charts, its clear that outliers have been controlled

mydata28 <- mydata27

mydata28$Minimum_Attendance_Rate[mydata28$Minimum_Attendance_Rate %in% boxplot(mydata28$Minimum_Attendance_Rate)$out]<-median(mydata28$Minimum_Attendance_Rate)

boxplot(mydata28$Minimum_Attendance_Rate)

#Remove Outliers - Maximum_Attendance_Rate Variable

*Replacing the Outliers3 with the Median for Minimum_Attendance_Rate Variable

*When comparing the Before & After boxplot charts, its clear that outliers have been controlled

mydata29 <- mydata28

mydata29$Maximum_Attendance_Rate[mydata29$Maximum_Attendance_Rate %in% boxplot(mydata29$Maximum_Attendance_Rate)$out]<-median(mydata29$Maximum_Attendance_Rate)

boxplot(mydata29$Maximum_Attendance_Rate)

Transform

#Histogram for Mean_Attendance_Rate Variable

*Most appropriate Transformation is the Right skewed transformation which converts likely into a normal distribution `

#Applying the sqrt Transformation

sqrt_att <- sqrt (mydata29$Mean_Attendance_Rate)



hist(sqrt_att, 
     main = "Histogram of the Square Root of Mean Attendance Rate", 
     xlab = "Square Root of Mean Attendance Rate")

right_skew <- mydata29
right_skew <- right_skew$Mean_Attendance_Rate

hist(right_skew, 
     main = "Histogram of right-skewed dataset", 
     xlab = "Right-skewed Mean Attendance Rate")

right_skew_recip <- 1 / right_skew

hist(right_skew_recip)

#Histogram for Minimum_Attendance_Rate Variable

*Most appropriate Transformation is the Left skewed transformation which converts likely into a normal distribution. `

#Applying the sqrt Transformation

sqrt_min <- sqrt (mydata29$Minimum_Attendance_Rate)



hist(sqrt_min, 
     main = "Histogram of the Square Root of Minimum Attendance Rate", 
     xlab = "Square Root of Minimum Attendance Rate")

left_skewed <- mydata29
left_skewed <- left_skewed[,-1]

hist(left_skewed$Minimum_Attendance_Rate,
     
     main= "Histogram of left-skewed dataset",
     xlab="Left-skewed Minimum Attendance Rate")

right_skew <- mydata29
right_skew <- right_skew$Minimum_Attendance_Rate

hist(right_skew, 
     main = "Histogram of right-skewed dataset", 
     xlab = "Right-skewed Minimum Attendance Rate")

right_skew_recip <- 1 / right_skew

hist(right_skew_recip)

#Histogram for Maximum_Attendance_Rate Variable

*Most appropriate Transformation is the left skewed transformation which converts likely into a normal distribution

#Applying the sqrt Transformation

sqrt_max <- sqrt (mydata29$Maximum_Attendance_Rate)



hist(sqrt_max, 
     main = "Histogram of the Square Root of Maximum Attendance Rate", 
     xlab = "Square Root of Maximum Attendance Rate")

left_skewed <- mydata29
left_skewed <- left_skewed[,-1]

hist(left_skewed$Maximum_Attendance_Rate,
     
     main= "Histogram of left-skewed dataset",
     xlab="Left-skewed Maximum Attendance Rate")

right_skew <- mydata29
right_skew <- right_skew$Maximum_Attendance_Rate

hist(right_skew, 
     main = "Histogram of right-skewed dataset", 
     xlab = "Right-skewed Maximum Attendance Rate")

right_skew_recip <- 1 / right_skew

hist(right_skew_recip)

References

*Government of South Australia(2020)Attendance rate by school by year level 2020,Data SA,accessed 12 April 2022,https://data.sa.gov.au/data/dataset/attendance-percentage-by-school-by-year-level

*Government of South Australia(2020)Index of Educational Disadvantage by school,Data SA,accessed 15 April 2022,https://data.sa.gov.au/data/dataset/index-of-disadvantage-by-school

*Zhu, Hao. 2021. kableExtra: Construct Complex Table with Kable and Pipe Syntax.

Reflective journal