Packages and Data

library(tidyverse)

setwd("C:/Users/chank/OneDrive - University of Illinois Chicago/PA 343 - Fusi/Week 9/Assignment")
data1 = read_csv("data1.csv")
data = read_csv("data.csv")

Unit of Observation

Each row/unit of observation represents a school and three characteristics whose values are unique to that school.

The three characteristics are:

  • School students eligible for subsidized meals
  • School parents with college degrees
  • Fully qualified teachers

Cleaning up the data

data1 = pivot_wider(data, 
                    id_cols = c("uid", "community_schooltype", "county", "api"), 
                    names_from = "variable_name", 
                    values_from ="percentage") %>% 
  separate(community_schooltype, into = c("community", "schooltype"))

Explanatory Plots to Visualize Descriptive Statistics

Academic Performance Index - API

data1 %>% 
  ggplot() + 
  geom_density(mapping = aes(x = api)) +
  
  labs(title = "Distribution of the Academic Performance Index", 
       x ="Academic Performance Index - API")

summary(data1$api)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   336.8   593.6   684.1   680.9   770.1   958.5

The plot that is created is bell-shaped and consistent with the values that were received from the “summary” function

Percentage of Students Eligible for Subsidized Meals

data1 %>% 
  ggplot() + 
  geom_density(mapping = aes(x = meals)) +
  geom_vline(xintercept = mean(data1$meals)) + 
  geom_vline(xintercept = median(data1$meals)) +
  
  labs(title = "Distribution of % of Students Eligible for Subsidized Meals", 
       x = "% of Students Eligible for Subsidized Meals")

summary(data1$meals)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.00659 25.14071 49.40577 49.75635 74.44822 99.98998

The plot that is created is not bell-shaped like the last one; however, is still concurrent with the values received from “summary”

Percentage of Parents with College Degrees

data1 %>% 
  ggplot() + 
  geom_density(mapping = aes(x = colgrad)) +
  
  labs(title = "Distribution of % of Parents with a College Degree", 
       x = "% of Parents with College Degrees")

summary(data1$colgrad)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  0.06105  9.71679 18.12907 20.84864 29.44429 86.48472

There is an initial rise at around 12.5% of parents with college degrees; however, it begins to decrease substantially as the % of parents with degrees increases

Percentage of Fully Qualified Teachers

data1 %>% 
  ggplot() + 
  geom_density(mapping = aes(x =fullqual)) +
  
  geom_vline(xintercept = mean(data1$fullqual), 
             color = "green") + 
  geom_vline(xintercept = median(data1$fullqual), color = "red") +
  
  labs(title = "Distribution of % of Fully Qualified Teachers", 
       x = "% of Fully Qualified Teachers")

summary(data1$fullqual)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.12   81.61   91.82   87.51   97.48  100.00

The plot depicts an exponential curve upwards

Distribution of Schools Based on Community Type

data1 %>% 
  ggplot() + 
  geom_bar(mapping = aes(x = community)) +
  
  labs(title = "Number of Schools by Community Type", 
       x = "Community Type: Urban, Suburban or Rural", 
       y = "Number of Schools")

This bar plot allows the interpreter to see that there is a large number of schools in suburban neighborhoods. The number of schools in urban neighborhoods is about half that previous number (suburban schools). The number of schools in rural neighborhoods is about half that of urban (or 25% of suburban)

Distribution of Schools by Grades/Types

data1 %>% 
  ggplot() + 
  geom_bar(mapping = aes(x = schooltype)) +
  
  labs(title = "Number of Schools by Type", 
       x = "School Type: Elementary, Middle or High School", 
       y = "Number of SChools")

This plot shows that a majority of schools are elementary schools by a great margin

Correlation Between the Academic Performance Index (API) and Parents’ Education

data1 %>% 
  ggplot() + 
  geom_point(mapping = aes(x = colgrad, 
                           y = api)) + 
  labs(title = "Correlation between the API and Parents' Education", 
       x = "% of School Parents with College Degrees", 
       y = "Academic Performance Index - API")

cor(data1$api, data1$colgrad)
## [1] 0.6894666

Based on this plot, as the percentage of parents with college degrees increases, so does the academic performance index of each school

Breaking down the correlation between API and Parents’ Education by School Type

school_type = c(
  E = "Elementary School", 
  M = "Middle School", 
  H = "High School")

data1 %>% 
  ggplot() + 
  geom_point(mapping = aes(x = colgrad, 
                           y = api, 
                           color = community)) + 
                           labs(title = "Correlation Between the API and Parents' Education", 
                                x = "% of School Parents with College Degrees", 
                                y ="Academic Performance Index - API") +
  facet_wrap(~ schooltype, labeller = labeller(schooltype = school_type))

By breaking it down by school types (E, M, H), you are able to see that the distribution’s are similar across the three types; however, the plot for elementary schools (E) are much more congested, as one would expect based on the previous plots

Top 10 School’s with the Highest and Lowest Academic Performance

data1 %>% 
  arrange(desc(api)) %>% 
  mutate(ranking = seq(1, 10000)) %>% 
  filter(ranking <= 10 | ranking >= 9991) %>% 
  mutate(position = c(rep("top", 10), 
                      rep("bottom", 10))) %>%
  
  ggplot() + 
  geom_bar(mapping = aes(x = api, 
                         y = as.character(uid), 
                         fill = position), 
                         stat = 'identity') + 
                         
                         labs(title = "The Gap Between Top and Lowest Performers", 
                              subtitle = "Top and Bottom 10 Schools by API", 
                              x = "Academic Performance Index - API", 
                              y = "School Unique Identifier") + 
  geom_vline(xintercept = mean(data1$api), color = "red")

With the top 10 highest schools depicted in blue on the plot and top 10 lowest schools on red, you are able to see how they compare against the average (mean) across all schools included in the data set

Correlation Between Percentage of Fully Qualified Teachers and Academic Performance Index (API)

data1 %>% 
  ggplot() + 
  geom_point(mapping = aes(x = fullqual, 
                           y = api))

As the percentage of fully qualified teachers in schools increases, so does the academic performance of students in that school