Our dataset is from the Chicago Public School Report Card Data from 2011-2012. We’ve used three maps/graphs to visualize the information. It tells a small story of CPS schools, their safety, and the environment.
Below is a map of the Chicago area with the location of each school ploted on the map. The color of the circle shows the safety score of the school, red representing a lower safety score and blue representing a higher safety score. The size of these circles also represents the safety score of each school, the larger the circle, the safer the school. The intention for the map is to show where the safer schools are located.
From this map we see that there is a band of blue schools towards the north curving like from Lake Michigan to the left upwards. The South Side of Chicago has mostly red dots, suggesting the schools near there are less safe.
jake<-read.csv("C:/Users/Kajal/Downloads/Chicago_Public_Schools_-_Progress_Report_Cards__2011-2012_.csv")
jake$Safety.Score<-jake$Safety.Score/10
jake$Color <- rep("green", nrow(jake))
jake$Color[!is.na(jake$Safety.Score)] <- rgb(1-(jake$Safety.Score[!is.na(jake$Safety.Score)]/10),0, jake$Safety.Score[!is.na(jake$Safety.Score)]/10)
m = leaflet(jake) %>%
setView(lng = -87.6298, lat = 41.8781, zoom = 12) %>%
addTiles() %>%
addCircleMarkers(lng= ~Longitude,lat = ~Latitude, radius=~Safety.Score, col=~Color)
m
Below is an example of a highcharter scatter plot looking at the Saftey Score versus the Environment Score and it is color coded by the Saftey Icon. The environment score represents the support found in the students environment. The plots different levels are the Safety levels. If there is a high safety score, it is considered a “very strong”. If there is a low safety score, it is considered “very weak”. We choose to segment the data with these levels to portray the distinct positive correlation here. As the students support and environment increases, the safety does as well. This also questions the idea of what “safe” means. In the CPS school it is not clearly defined how safety is measured, but safety could also be considered receiving essentials like warmth, food, and comfort at the school that they may not receive at home.
jake$Safety.Icon<-factor(jake$Safety.Icon, levels=c("Very Weak", "Weak", "Average", "Strong", "Very Strong", "NDA"))
hchart(jake[!is.na(jake$Safety.Score) & !is.na(jake$Environment.Score),], "scatter", hcaes(x=Safety.Score, y=Environment.Score, group=Safety.Icon)) %>% hc_title(text="Safety Score and Environment Score grouped by Safety Icon")
jake$Average.Student.Attendance<-as.numeric(sub("%","",jake$Average.Student.Attendance))
jake$Average.Teacher.Attendance<-as.numeric(sub("%","",jake$Average.Teacher.Attendance))
jake$Average.Teacher.Attendance[jake$Average.Teacher.Attendance==0] <- NA
hchart(jake, "scatter", hcaes(x=Average.Student.Attendance, y=Average.Teacher.Attendance))%>%
hc_title(text="Student Attendance Versus Teacher Attendance")
Finally, here is another highcharter example of a scatter plot showing the relationship between student attendance and teacher attendance, viewing a high concentration of points there is both high attendance in students and in teachers. For the second plot below, we fit a loess curve and we see that for a range of student attendance between 60% to 85%, the teacher attendance stays between a smaller range. However, the graph changes from a plateau to an upward curve, suggesting that very high teacher attendance and very high student attendance are correlated.
require(dplyr)
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(broom)
## Loading required package: broom
## Warning: package 'broom' was built under R version 3.3.3
jake$Average.Student.Attendance<-as.numeric(sub("%","",jake$Average.Student.Attendance))
jake$Average.Teacher.Attendance<-as.numeric(sub("%","",jake$Average.Teacher.Attendance))
jake$Average.Teacher.Attendance[jake$Average.Teacher.Attendance==0] <- NA
set.seed(123)
jake <- sample_n(jake, 300)
modlss <- loess(Average.Teacher.Attendance ~ Average.Student.Attendance, data = jake)
fit <- arrange(augment(modlss), Average.Teacher.Attendance)
hchart(jake, "scatter", hcaes(x=Average.Student.Attendance, y=Average.Teacher.Attendance)) %>%
hc_add_series(fit, type = "spline", hcaes(x = Average.Student.Attendance, y = .fitted),
name = "Fit", id = "fit")