Lab 1

For this project, you will be working with data that are available through the Chicago Data Portal (https://data.cityofchicago.org/). Specifically, you will be working with data related to various characteristics of Chicago Public Schools. The data file for this project is available through the Google Classroom site and is titled CPS1617.csv. The file contains information for 661 CPS schools. You should use the data contained in this file and your knowledge of R to answer the following questions. You must submit a document that includes your answers and also the R code you used to produce the answers. Your material should be submitted through the Google Classroom site.

Objectives

  1. Use the data contained in the files.
  2. Answer 8 questions based on your R experimentation.

Questions

  1. Looking across all schools, what is the frequency distribution (ungrouped and grouped) of the total number of students?

  2. Looking at just elementary schools, what is the frequency distribution (ungrouped and grouped) of the total number of students?

  3. Looking at just high schools, what is the frequency distribution (ungrouped and grouped) of the total number of students?

    • For the grouped distribution, choose the bin size that you believe best communicates the results.
  4. What is the frequency distribution of Average ACT score across schools for which it is reported?

  5. Report the frequencies (and relative frequencies) for the following variables:

    • School Type
    • Primary Category
    • Overall School Rating
  6. For the following variables, report the mean, median, mode, variance, standard deviation, minimum, and maximum values:

    • Total number of students (across all schools)
    • Total number of students (separately for elementary, middle, and high schools)
    • ACT score
    • College Enrollment Rate
    • Graduation Rate
  7. Convert the Total Number of Students across all schools to z-scores. What School has the largest z-score and what is the z-score? What School has the smallest z-score and what is the z-score?

  8. Produce an ungrouped frequency distribution using the z-scores you computed. How does this distribution compare to the one you computed to answer Question 3?

Question 1

Looking across all schools, what is the frequency distribution (ungrouped and grouped) of the total number of students?

Ungrouped
#First specify complex sequence
bins <- seq(0, 4500, by=1)
#Ask for ungrouped frequency distribution 
hist(example$Student_Count_Total, breaks = bins, main = "Total Students", xlab = "# of Students", col = "grey")

Grouped
#First specify complex sequence
bins <- seq(0, 4500, by=100)
#Then ask for the histogram
hist(example$Student_Count_Total, breaks = bins, main = "Total Students", xlab = "# of Students", col = "grey")

Question 2

Looking at just elementary schools, what is the frequency distribution (ungrouped and grouped) of the total number of students?

Ungrouped
#Creates Data set of all elementary students
#Selects subset Primary_Category that equals "ES" and puts it into "elementary" data set
elementary <- example[example$Primary_Category == "ES",]
#Next specify complex sequence
bins <- seq(0, 4500, by=1)
#Ask for ungrouped frequency distribution 
hist(elementary$Student_Count_Total, breaks = bins, main = "Total Students: elementary schools", xlab = "# of Students", col = "grey")

Grouped
#Creates Data set of all elementary students
elementary <- example[example$Primary_Category == "ES",]
#Next specify complex sequence
bins <- seq(0, 4500, by=100)
#Ask for ungrouped frequency distribution 
hist(elementary$Student_Count_Total, breaks = bins, main = "Total Students: elementary schools", xlab = "# of Students", col = "grey")

Question 3

Looking at just high schools, what is the frequency distribution (ungrouped and grouped) of the total number of students?

Ungrouped
#Creates Data set of all high school students
highschool <- example[example$Primary_Category == "HS",]
#Next specify complex sequence
bins <- seq(0, 4500, by=1)
#Ask for ungrouped frequency distribution 
hist(highschool$Student_Count_Total, breaks = bins, main = "Total Students: high schools", xlab = "# of Students", col = "grey")

Grouped
#Creates Data set of all high school students
highschool <- example[example$Primary_Category == "HS", ]
#Specify high school values and remove outliers that are above 2500
no_outliers <- highschool[highschool$Student_Count_Total < 2500, ]
#Next specify complex sequence
bins <- seq(0, 2500, by=100)
#Ask for ungrouped frequency distribution 
hist(no_outliers$Student_Count_Total, breaks = bins, main = "Total Students: high schools", xlab = "# of Students", col = "grey")

For the grouped distribution, choose the bin size that you believe best communicates the results.

I chose the bin size of 2500 because it fits all of the date while also removing the outliers. I believe this best communicates the results.

Question 4

What is the frequency distribution of Average ACT score across schools for which it is reported?

Ungrouped
#Creates Data set of all schools that have reported their act scores
reporting_schools <- example[example$Average_ACT_School > 0, ]
#Next specify complex sequence
bins <- seq(12, 32, by=1)
#Ask for ungrouped frequency distribution 
hist(reporting_schools$Average_ACT_School, breaks = bins, main = "Average ACT Score", xlab = "ACT Score", col = "grey")

Grouped
#Creates Data set of all schools that have reported their act scores
reporting_schools <- example[example$Average_ACT_School > 0, ]
#Next specify complex sequence
bins <- seq(12, 32, by=4)
#Ask for ungrouped frequency distribution 
hist(reporting_schools$Average_ACT_School, breaks = bins, main = "Average ACT Score", xlab = "ACT Score", col = "grey")

Question 5

Report the frequencies (and relative frequencies) for the following variables:

School Type
#Describes the frequency and relative percentages
prettyR::describe(example$School_Type)
## Description of structure(list(x = structure(c(8L, 3L, 3L, 3L, 8L, 8L, 11L, 3L,  8L, 8L, 8L, 8L, 8L, 2L, 11L, 8L, 8L, 8L, 6L, 8L, 6L, 8L, 8L,  8L, 11L, 2L, 2L, 8L, 8L, 8L, 2L, 2L, 2L, 8L, 8L, 8L, 8L, 8L,  8L, 8L, 2L, 7L, 9L, 2L, 8L, 2L, 8L, 8L, 8L, 8L, 8L, 1L, 8L, 6L,  6L, 8L, 8L, 8L, 2L, 8L, 8L, 3L, 8L, 1L, 8L, 8L, 2L, 2L, 8L, 8L,  8L, 2L, 6L, 2L, 2L, 8L, 8L, 2L, 8L, 8L, 8L, 8L, 8L, 8L, 2L, 8L,  8L, 8L, 2L, 2L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 2L, 8L, 8L, 10L,  3L, 8L, 8L, 2L, 8L, 8L, 8L, 8L, 8L, 11L, 2L, 11L, 6L, 8L, 8L,  8L, 11L, 2L, 2L, 8L, 2L, 11L, 8L, 8L, 6L, 8L, 8L, 6L, 8L, 6L,  8L, 2L, 2L, 8L, 6L, 8L, 11L, 8L, 11L, 3L, 2L, 6L, 2L, 8L, 6L,  8L, 12L, 8L, 8L, 8L, 2L, 8L, 8L, 8L, 8L, 2L, 8L, 8L, 8L, 8L,  8L, 6L, 12L, 2L, 11L, 8L, 9L, 6L, 4L, 8L, 8L, 2L, 8L, 8L, 2L,  8L, 6L, 8L, 2L, 8L, 8L, 2L, 2L, 8L, 11L, 8L, 8L, 8L, 8L, 2L,  8L, 8L, 8L, 2L, 11L, 10L, 11L, 8L, 2L, 3L, 5L, 8L, 8L, 8L, 2L,  8L, 8L, 8L, 6L, 8L, 8L, 8L, 10L, 8L, 6L, 2L, 8L, 2L, 8L, 2L,  8L, 2L, 8L, 2L, 3L, 8L, 2L, 8L, 8L, 8L, 8L, 8L, 8L, 2L, 8L, 6L,  2L, 4L, 8L, 2L, 8L, 3L, 2L, 8L, 8L, 8L, 2L, 8L, 2L, 8L, 2L, 10L,  2L, 8L, 8L, 11L, 11L, 8L, 6L, 8L, 8L, 8L, 8L, 8L, 2L, 8L, 8L,  6L, 8L, 8L, 8L, 8L, 12L, 2L, 7L, 2L, 3L, 6L, 8L, 9L, 8L, 6L,  8L, 8L, 8L, 8L, 8L, 2L, 2L, 7L, 3L, 2L, 8L, 2L, 8L, 8L, 8L, 12L,  8L, 8L, 3L, 8L, 8L, 8L, 8L, 8L, 2L, 8L, 8L, 2L, 8L, 8L, 2L, 8L,  11L, 8L, 2L, 2L, 2L, 8L, 2L, 8L, 8L, 8L, 6L, 2L, 8L, 8L, 8L,  8L, 8L, 8L, 2L, 4L, 8L, 8L, 8L, 11L, 2L, 10L, 3L, 8L, 8L, 8L,  11L, 2L, 8L, 3L, 8L, 2L, 2L, 3L, 8L, 9L, 8L, 8L, 8L, 2L, 3L,  11L, 8L, 8L, 6L, 8L, 2L, 8L, 8L, 8L, 8L, 4L, 8L, 6L, 8L, 3L,  8L, 8L, 2L, 2L, 2L, 8L, 2L, 8L, 8L, 8L, 8L, 6L, 8L, 8L, 8L, 8L,  8L, 8L, 8L, 8L, 6L, 8L, 8L, 2L, 8L, 9L, 8L, 8L, 8L, 2L, 8L, 8L,  6L, 8L, 8L, 8L, 2L, 8L, 12L, 2L, 6L, 2L, 6L, 8L, 2L, 8L, 8L,  8L, 8L, 9L, 8L, 8L, 2L, 8L, 8L, 2L, 8L, 2L, 7L, 2L, 8L, 8L, 8L,  8L, 11L, 8L, 11L, 8L, 6L, 8L, 1L, 8L, 11L, 9L, 6L, 8L, 8L, 8L,  8L, 2L, 8L, 6L, 8L, 8L, 8L, 4L, 8L, 8L, 12L, 8L, 2L, 8L, 8L,  6L, 11L, 2L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 2L, 10L, 8L, 8L, 6L,  12L, 8L, 8L, 2L, 8L, 8L, 8L, 8L, 6L, 8L, 2L, 8L, 8L, 8L, 8L,  6L, 2L, 8L, 8L, 5L, 8L, 6L, 8L, 8L, 8L, 2L, 8L, 8L, 11L, 10L,  8L, 8L, 8L, 8L, 8L, 8L, 6L, 8L, 2L, 2L, 2L, 8L, 2L, 2L, 8L, 8L,  8L, 2L, 8L, 9L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 2L, 2L, 8L, 8L, 8L,  2L, 8L, 8L, 8L, 8L, 1L, 8L, 8L, 8L, 8L, 8L, 8L, 2L, 11L, 10L,  8L, 11L, 6L, 8L, 2L, 8L, 8L, 8L, 3L, 8L, 5L, 8L, 8L, 8L, 8L,  8L, 6L, 2L, 8L, 8L, 8L, 8L, 8L, 8L, 2L, 3L, 8L, 8L, 8L, 5L, 8L,  8L, 8L, 8L, 8L, 8L, 2L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 6L, 8L, 8L,  8L, 10L, 8L, 2L, 8L, 8L, 8L, 3L, 2L, 8L, 2L, 8L, 8L, 8L, 9L,  8L, 8L, 8L, 9L, 8L, 8L, 2L, 8L, 6L, 8L, 2L, 7L, 8L, 8L, 8L, 8L,  8L, 8L, 8L, 2L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 12L, 2L, 8L,  2L, 8L, 8L, 8L, 10L, 2L, 10L, 2L), .Label = c("Career academy",  "Charter", "Citywide-Option", "Classical", "Contract", "Magnet",  "Military academy", "Neighborhood", "Regional gifted center",  "Selective enrollment", "Small", "Special Education"), class = "factor")), row.names = c(NA,  -661L), class = "data.frame")
## 
##  Factor 
##          
## x         Neighborhood Charter Magnet Small Citywide-Option
##   Count         400.00  123.00  43.00 26.00           21.00
##   Percent        60.51   18.61   6.51  3.93            3.18
##          
## x         Selective enrollment Regional gifted center Special Education
##   Count                  11.00                  10.00              8.00
##   Percent                 1.66                   1.51              1.21
##          
## x         Military academy Classical Career academy Contract
##   Count               6.00      5.00           4.00     4.00
##   Percent             0.91      0.76           0.61     0.61
## Mode Neighborhood
Primary Category
#Describes the frequency and relative percentages
prettyR::describe(example$Primary_Category)
## Description of structure(list(x = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L,  1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L,  3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 3L,  1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L,  2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,  2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L,  1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L,  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,  1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,  1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,  1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,  1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,  2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L,  2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,  2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,  1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,  1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,  2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L,  1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,  1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L,  1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L,  1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 3L, 1L,  2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,  1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L,  1L, 2L, 1L, 1L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L,  1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,  2L, 1L, 2L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L,  1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,  1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,  2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  2L, 2L, 2L, 1L, 3L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,  1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L,  1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,  2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,  2L, 2L, 1L, 1L, 1L, 2L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,  1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,  1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,  1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L,  1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("ES",  "HS", "MS"), class = "factor")), row.names = c(NA, -661L), class = "data.frame")
## 
##  Factor 
##          
## x            ES     HS    MS
##   Count   470.0 181.00 10.00
##   Percent  71.1  27.38  1.51
## Mode ES
Overall School Rating
#Describes the frequency and relative percentages
prettyR::describe(example$Overall_Rating)
## Description of structure(list(x = structure(c(1L, 3L, 4L, 4L, 3L, 3L, 3L, 1L,  4L, 6L, 3L, 3L, 4L, 2L, 5L, 4L, 3L, 5L, 3L, 2L, 2L, 2L, 3L, 4L,  4L, 4L, 5L, 6L, 3L, 2L, 3L, 2L, 2L, 5L, 2L, 5L, 3L, 6L, 4L, 5L,  2L, 2L, 3L, 3L, 3L, 5L, 4L, 3L, 5L, 2L, 2L, 2L, 2L, 3L, 2L, 2L,  5L, 4L, 4L, 5L, 2L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 5L, 4L, 3L,  5L, 3L, 2L, 3L, 2L, 3L, 2L, 4L, 4L, 2L, 5L, 4L, 4L, 2L, 2L, 1L,  2L, 2L, 2L, 5L, 3L, 5L, 5L, 5L, 3L, 5L, 3L, 5L, 3L, 5L, 2L, 3L,  1L, 4L, 5L, 2L, 2L, 3L, 6L, 5L, 2L, 2L, 5L, 4L, 2L, 3L, 3L, 2L,  2L, 5L, 2L, 3L, 5L, 5L, 2L, 5L, 2L, 2L, 2L, 2L, 4L, 5L, 3L, 3L,  5L, 5L, 3L, 2L, 3L, 1L, 5L, 4L, 4L, 3L, 5L, 1L, 3L, 2L, 2L, 2L,  3L, 5L, 5L, 5L, 3L, 3L, 4L, 2L, 5L, 2L, 5L, 1L, 3L, 2L, 5L, 3L,  5L, 3L, 3L, 5L, 3L, 2L, 5L, 3L, 2L, 5L, 4L, 2L, 5L, 3L, 4L, 4L,  3L, 5L, 2L, 2L, 3L, 2L, 5L, 5L, 4L, 3L, 5L, 3L, 3L, 2L, 2L, 2L,  4L, 3L, 4L, 5L, 2L, 5L, 2L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 3L, 3L,  4L, 3L, 1L, 5L, 2L, 5L, 3L, 3L, 4L, 3L, 3L, 1L, 2L, 2L, 2L, 5L,  4L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 4L, 1L, 2L, 2L, 3L, 4L, 5L,  2L, 3L, 3L, 5L, 3L, 2L, 4L, 4L, 3L, 2L, 2L, 3L, 3L, 3L, 5L, 3L,  2L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 3L, 1L, 5L, 2L, 3L, 1L, 3L, 5L,  3L, 3L, 2L, 3L, 3L, 4L, 2L, 2L, 3L, 1L, 3L, 4L, 5L, 4L, 5L, 1L,  2L, 3L, 1L, 3L, 4L, 4L, 2L, 5L, 2L, 4L, 5L, 3L, 3L, 2L, 2L, 4L,  2L, 2L, 3L, 3L, 5L, 5L, 3L, 2L, 5L, 3L, 4L, 2L, 3L, 3L, 5L, 6L,  2L, 2L, 5L, 5L, 3L, 2L, 3L, 3L, 3L, 5L, 5L, 3L, 3L, 2L, 4L, 2L,  2L, 6L, 5L, 5L, 4L, 3L, 5L, 3L, 2L, 4L, 3L, 4L, 2L, 3L, 2L, 4L,  3L, 3L, 4L, 3L, 5L, 5L, 3L, 5L, 3L, 4L, 3L, 3L, 3L, 3L, 2L, 2L,  2L, 2L, 2L, 2L, 2L, 4L, 4L, 2L, 3L, 4L, 3L, 3L, 3L, 2L, 3L, 5L,  4L, 6L, 2L, 3L, 3L, 3L, 4L, 2L, 2L, 2L, 5L, 3L, 3L, 5L, 1L, 3L,  2L, 3L, 5L, 2L, 3L, 1L, 2L, 2L, 4L, 2L, 2L, 2L, 2L, 5L, 3L, 2L,  2L, 2L, 3L, 3L, 5L, 2L, 5L, 5L, 5L, 3L, 2L, 5L, 2L, 3L, 2L, 5L,  2L, 5L, 3L, 3L, 5L, 4L, 4L, 5L, 3L, 5L, 3L, 2L, 4L, 3L, 2L, 2L,  3L, 4L, 5L, 3L, 3L, 2L, 2L, 1L, 2L, 5L, 4L, 2L, 5L, 4L, 2L, 5L,  2L, 2L, 5L, 3L, 4L, 3L, 2L, 2L, 2L, 5L, 4L, 1L, 4L, 2L, 5L, 3L,  3L, 3L, 3L, 3L, 3L, 2L, 5L, 5L, 3L, 4L, 5L, 3L, 5L, 2L, 4L, 3L,  3L, 3L, 3L, 3L, 2L, 1L, 2L, 4L, 3L, 2L, 3L, 2L, 3L, 5L, 2L, 3L,  6L, 2L, 3L, 2L, 4L, 1L, 2L, 2L, 3L, 4L, 2L, 2L, 3L, 4L, 1L, 3L,  2L, 2L, 2L, 3L, 2L, 2L, 2L, 5L, 4L, 2L, 3L, 5L, 3L, 2L, 3L, 3L,  4L, 2L, 3L, 2L, 4L, 5L, 2L, 2L, 2L, 2L, 3L, 2L, 5L, 4L, 5L, 3L,  5L, 5L, 5L, 4L, 1L, 5L, 3L, 2L, 2L, 5L, 2L, 5L, 3L, 2L, 3L, 2L,  3L, 1L, 2L, 3L, 3L, 4L, 5L, 2L, 2L, 2L, 2L, 1L, 1L, 5L, 5L, 2L,  4L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 3L, 3L, 3L, 5L, 3L, 3L, 5L, 3L,  5L, 3L, 4L, 4L, 5L, 3L, 4L, 4L, 2L, 5L, 2L, 2L, 5L, 4L, 2L, 3L,  2L, 2L, 5L, 2L, 6L, 5L, 2L, 2L, 3L, 5L, 2L, 2L, 5L, 5L, 3L, 4L,  2L, 5L, 1L, 3L, 4L, 5L, 3L, 3L, 2L, 3L, 3L, 3L, 3L), .Label = c("Inability to Rate",  "Level 1", "Level 1+", "Level 2", "Level 2+", "Level 3"), class = "factor")), row.names = c(NA,  -661L), class = "data.frame")
## 
##  Factor 
##          
## x         Level 1+ Level 1 Level 2+ Level 2 Inability to Rate Level 3
##   Count     204.00  200.00   134.00   87.00             27.00    9.00
##   Percent    30.86   30.26    20.27   13.16              4.08    1.36
## Mode Level 1+

Question 6

For the following variables, report the mean, median, mode, variance, standard deviation, minimum, and maximum values:

Total number of students (across all schools)
#mean
mean(example$Student_Count_Total)
## [1] 559.0076
#median
median(example$Student_Count_Total)
## [1] 462
#mode
prettyR::Mode(example$Student_Count_Total)
## [1] ">1 mode"
#variance
var(example$Student_Count_Total)
## [1] 159203.1
#standard deviation
sd(example$Student_Count_Total)
## [1] 399.0026
#min value
min(example$Student_Count_Total)
## [1] 0
#max value
max(example$Student_Count_Total)
## [1] 4447
#get all values
psych::describe(example$Student_Count_Total)
##    vars   n   mean  sd median trimmed    mad min  max range skew kurtosis
## X1    1 661 559.01 399    462  503.27 266.87   0 4447  4447 3.07    19.13
##       se
## X1 15.52
Total number of students (separately for elementary, middle, and high schools)
#get all values
psych::describeBy(example$Student_Count_Total, example$Primary_Category)
## 
##  Descriptive statistics by group 
## group: ES
##    vars   n   mean     sd median trimmed    mad min  max range skew
## X1    1 470 537.14 271.56    471  504.51 226.84   4 1562  1558 1.17
##    kurtosis    se
## X1     1.37 12.53
## -------------------------------------------------------- 
## group: HS
##    vars   n   mean     sd median trimmed    mad min  max range skew
## X1    1 181 615.77 617.82    387   507.4 370.65   0 4447  4447 2.54
##    kurtosis    se
## X1     9.68 45.92
## -------------------------------------------------------- 
## group: MS
##    vars  n  mean     sd median trimmed    mad min  max range skew kurtosis
## X1    1 10 559.3 328.94    453  520.62 232.77 215 1213   998 0.78    -0.89
##        se
## X1 104.02
#highschool mode
highschool <- example[example$Primary_Category == "HS", ]
DescTools::Mode(highschool$Student_Count_Total)
## [1] 202 322
#elementary mode
elementary <- example[example$Primary_Category == "ES", ]
DescTools::Mode(elementary$Student_Count_Total)
## [1] 419
#middle mode
middle <- example[example$Primary_Category == "MS", ]
DescTools::Mode(middle$Student_Count_Total)
##  [1]  215  289  316  334  422  484  603  718  999 1213
ACT score
#get all values
psych::describe(example$Average_ACT_School)
##    vars   n  mean   sd median trimmed  mad  min  max range skew kurtosis
## X1    1 169 16.78 3.07   16.3    16.4 2.67 12.5 29.6  17.1 1.54      3.4
##      se
## X1 0.24
#Mode
DescTools::Mode(example$Average_ACT_School)
## [1] 14.6
College Enrollment Rate
#get all values
psych::describe(example$College_Enrollment_Rate_School)
##    vars   n mean    sd median trimmed   mad min  max range  skew kurtosis
## X1    1 159 48.9 25.23   51.1   49.67 28.76   0 93.4  93.4 -0.27    -1.05
##    se
## X1  2
#Mode
DescTools::Mode(example$College_Enrollment_Rate_School)
##  [1]  0.0  7.4  8.3 12.5 16.7 18.2 34.4 37.8 44.8 47.4 63.0 63.2 65.8 76.9
## [15] 79.6 80.4
Graduation Rate
#get all values
psych::describe(example$Graduation_Rate_School)
##    vars   n  mean    sd median trimmed  mad min  max range  skew kurtosis
## X1    1 121 72.89 19.36   76.3      76 13.2   0 97.8  97.8 -1.91     4.33
##      se
## X1 1.76
#Mode
DescTools::Mode(example$Graduation_Rate_School)
##  [1] 50.0 57.9 68.8 71.3 71.5 74.3 80.9 82.4 82.8 83.3 83.5 84.6 85.2 85.4
## [15] 91.2

Question 7

Convert the Total Number of Students across all schools to z-scores. What School has the largest z-score and what is the z-score? What School has the smallest z-score and what is the z-score?

Work
#standardizes set
example$zstudentcount <- scale(example$Student_Count_Total, center = TRUE, scale = TRUE)
#describes standardized set
psych::describe(example$zstudentcount)
##    vars   n mean sd median trimmed  mad  min  max range skew kurtosis   se
## X1    1 661    0  1  -0.24   -0.14 0.67 -1.4 9.74 11.15 3.07    19.13 0.04
#creates a table of all the schools with only the zstudentcount score
ex2 <- subset(example, select=c(Long_Name, zstudentcount))
#Sort based on z-scores (low to high, first case)
sort1.example <- ex2[order(example$zstudentcount) , ]
sort1.example[1, ]
##           Long_Name zstudentcount
## 142 YCCS-Virtual HS     -1.401012
#Sort based on z-scores (high to low, first case)
sort1.example <- ex2[order(-example$zstudentcount) , ]
sort1.example[1, ]
##                               Long_Name zstudentcount
## 660 Albert G Lane Technical High School      9.744277
Answer

YCCS-Virtual HS has the lowest score. Albert G Lane Technical High School has the highest score.

Question 8

Produce an ungrouped frequency distribution using the z-scores you computed. How does this distribution compare to the one you computed to answer Question 3?

Ungrouped frequency distribution
#Create histogram
hist(ex2$zstudentcount, main = "Total number of students: Standardized", xlab = "z score", col = "grey")

How does this distribution compare to the one you computed in answer question 3?

It looks the same. The mean is set to zero. The x-axis has been replaced by the z axis. The standard deviance is 1. Overall it’s pretty much the same.