Introduction

In this report, we will analyze data from various high schools in Chicago. We have fields such as the school ID, name, location, and then the performance metrics. Below are the fields contained in the data set.
school_id name_of_school elementary_middle_or_high_school street_address city zip_code safety_score family_involvement_score environment_score instruction_score teachers_score parent_engagement_score parent_environment_score rate_of_misconducts_per_100_students graduation_rate_percent
610381 Bronzeville Scholastic Academy High School HS 4934 S Wabash Ave Chicago 60615 41 38 42 43 15 51 43 16.2 85.3
609679 Charles Allen Prosser Career Academy High School HS 2148 N Long Ave Chicago 60639 59 47 53 51 44 46 45 9.7 78.7
609753 Chicago High School for Agricultural Sciences HS 3857 W 111th St Chicago 60655 87 77 49 47 40 52 55 5.2 70.7
609735 Edward Tilden Career Community Academy High School HS 4747 S Union Ave Chicago 60609 34 57 45 37 35 51 47 13.0 41.1
609682 Ellen H Richards Career Academy High School HS 5009 S Laflin St Chicago 60609 30 23 32 19 34 43 47 17.3 57.3
610383 Greater Lawndale High School For Social Justice HS 3120 S Kostner Ave Chicago 60623 43 72 46 44 65 53 46 16.4 67.0
609766 Jacqueline B Vaughn Occupational High School HS 4355 N Linder Ave Chicago 60641 57 72 33 20 69 63 57 3.7 65.3
609694 John Hancock College Preparatory High School HS 4034 W 56th St Chicago 60629 51 44 44 41 42 50 44 7.5 56.1
609769 Ray Graham Training Center High School HS 2347 S Wabash Ave Chicago 60616 90 82 52 39 52 53 58 3.7 10.3
609695 Roald Amundsen High School HS 5110 N Damen Ave Chicago 60625 51 44 43 42 41 43 41 8.7 64.6
609733 Roger C Sullivan High School HS 6631 N Bosworth Ave Chicago 60626 30 44 34 34 33 44 45 14.7 39.3
609755 Whitney M Young Magnet High School HS 211 S Laflin St Chicago 60607 95 80 69 67 46 53 50 1.2 93.9
609711 William Rainey Harper High School HS 6520 S Wood St Chicago 60636 22 32 39 42 31 49 53 63.6 37.0
610392 World Language Academy High School HS 3120 S Kostner Ave Chicago 60623 51 33 49 47 49 50 46 4.0 69.4

However, we will focus on the numerical fields, particularly exploring the associations, or lack of, between the performance metrics. Below are the fields we will use.

safety_score family_involvement_score environment_score instruction_score teachers_score parent_engagement_score parent_environment_score rate_of_misconducts_per_100_students graduation_rate_percent
41 38 42 43 15 51 43 16.2 85.3
59 47 53 51 44 46 45 9.7 78.7
87 77 49 47 40 52 55 5.2 70.7
34 57 45 37 35 51 47 13.0 41.1
30 23 32 19 34 43 47 17.3 57.3
43 72 46 44 65 53 46 16.4 67.0
57 72 33 20 69 63 57 3.7 65.3
51 44 44 41 42 50 44 7.5 56.1
90 82 52 39 52 53 58 3.7 10.3
51 44 43 42 41 43 41 8.7 64.6
30 44 34 34 33 44 45 14.7 39.3
95 80 69 67 46 53 50 1.2 93.9
22 32 39 42 31 49 53 63.6 37.0
51 33 49 47 49 50 46 4.0 69.4

Analysis

I’ll start by creating a correlation matrix that details the associations between the various columns.
safety_score family_involvement_score environment_score instruction_score teachers_score parent_engagement_score parent_environment_score rate_of_misconducts_per_100_students graduation_rate_percent
safety_score 1.00 0.77 0.75 0.51 0.37 0.39 0.48 -0.62 0.20
family_involvement_score 0.77 1.00 0.52 0.30 0.59 0.65 0.59 -0.46 -0.01
environment_score 0.75 0.52 1.00 0.90 0.15 0.14 0.08 -0.36 0.35
instruction_score 0.51 0.30 0.90 1.00 -0.08 -0.04 -0.15 -0.11 0.44
teachers_score 0.37 0.59 0.15 -0.08 1.00 0.58 0.41 -0.38 -0.02
parent_engagement_score 0.39 0.65 0.14 -0.04 0.58 1.00 0.65 -0.22 0.09
parent_environment_score 0.48 0.59 0.08 -0.15 0.41 0.65 1.00 0.04 -0.37
rate_of_misconducts_per_100_students -0.62 -0.46 -0.36 -0.11 -0.38 -0.22 0.04 1.00 -0.30
graduation_rate_percent 0.20 -0.01 0.35 0.44 -0.02 0.09 -0.37 -0.30 1.00

The correlation matrix above is symmetrical along the diagonal from top left to bottom right. I’ll create a correlation plot that includes each association only once. In the correlation plot, red indicates a strong positive correlation between the fields and blue indicates a strong negative correlation. White indicates no association.

Positive Associations

1. Instruction Score and Environment Score

There is a strong positive association (0.9) between the instruction score and the environment score. Below is a scatter plot with a linear regression line to represent the association.

## `geom_smooth()` using formula = 'y ~ x'

Additionally, the linear model can be described by the data below.

## 
## Call:
## lm(formula = cchs_num$environment_score ~ cchs_num$instruction_score, 
##     data = mpg)
## 
## Coefficients:
##                (Intercept)  cchs_num$instruction_score  
##                    15.4210                      0.7227

The equation that represents this association is y = 0.7227x + 15.4210.

2. Safety Score and Family Involvement Score

There is a strong positive association (0.77) between the safety score and the family involvement score. Below is a scatter plot with a linear regression line to represent the association.

## `geom_smooth()` using formula = 'y ~ x'

Additionally, the linear model can be described by the data below.

## 
## Call:
## lm(formula = cchs_num$safety_score ~ cchs_num$family_involvement_score, 
##     data = mpg)
## 
## Coefficients:
##                       (Intercept)  cchs_num$family_involvement_score  
##                            5.1842                             0.8972

The equation that represents this association is y = 0.8972x + 5.1842.

Negative Associations

1. Safety Score and Rate of Misconducts Per 100 Students

There is a strong negative association (0.62) between the safety score and rate of misconduct per 100 students. Below is a scatter plot with a linear regression line to represent the association.

## `geom_smooth()` using formula = 'y ~ x'

Additionally, the linear model can be described by the data below.

## 
## Call:
## lm(formula = cchs_num$rate_of_misconducts_per_100_students ~ 
##     cchs_num$safety_score, data = mpg)
## 
## Coefficients:
##           (Intercept)  cchs_num$safety_score  
##               35.2307                -0.4161

The equation that represents this association is y = -0.4161x + 35.2307.

2. Family Involvement Score and Rate of Misconducts Per 100 Students

There is a strong negative association (-0.46) between the family involvement score and rate of misconduct per 100 students. Below is a scatter plot with a linear regression line to represent the association.

## `geom_smooth()` using formula = 'y ~ x'

Additionally, the linear model can be described by the data below.

## 
## Call:
## lm(formula = cchs_num$rate_of_misconducts_per_100_students ~ 
##     cchs_num$family_involvement_score, data = mpg)
## 
## Coefficients:
##                       (Intercept)  cchs_num$family_involvement_score  
##                           32.4568                            -0.3617

The equation that represents this association is y = -0.3617x + 32.4568.