Assignment Direction 2
The data is from Chicago’s Data Portal regarding schools. It covers the name of the school, the safety, how involved the families are, how many misconducts occur, graduation rates and more.
Fields:
school_id
name_of_school
elementary_middle_or_high_school
street_address
city
zip_code
safety_score
family_involvement_score
environment_score
instruction_score
teachers_score
parent_engagement_score
parent_environment_score
rate_of_misconduct_per_100_students
graduation_rate_percent
Assignment Direction 3
We will examine the strength of linear correlations between all combinations of numerical variables. Represented below is the correlation coefficients matrix.
Correlation Matrix:
Assignment Direction 4
Strong, positive association
The graph indicates a strong, positive correlation between environment_score and instruction_score, since it’s correlation coefficient is 0.896. Anything greater than 0.5 is considered a strong, positive correlation. The scatterplot points appear to be fairly linear and in a positive direction, and the linear regression line also allows us to make out that it is an example of a strong, positive association. As environment_score increases, so does instruction_score.
## [1] 0.8967704
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = instruction_score ~ environment_score, data = cchs)
##
## Coefficients:
## (Intercept) environment_score
## -9.146 1.113
Formula: y = 1.113x + (-9.146)
————————————————————–
This graph has a strong, positive correlation between family_involvement_score and safety_score, since it’s correlation coefficient is 0.769. Anything greater than 0.5 is considered a strong, positive correlation. Although the scatterplot may not follow a positive, straight pattern, it’s correlation coefficient allows us to see that it is an example of a strong, positive association. When the family_involvement_score increases, the safety_score will also increase.
## [1] 0.7692211
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = safety_score ~ family_involvement_score, data = cchs)
##
## Coefficients:
## (Intercept) family_involvement_score
## 5.1842 0.8972
Formula: y = 0.8972x + 5.1842
—————————————————————-
Strong, negative association
This graph represents a strong, negative association between rate_of_misconducts_per_100_students and safety_score. This is because as our y-axis points increase, the x-axis decreases. This makes sense since as safety_score increases, we will be noticing that the rate of misconducts decreases. I believe this scatterplot would also be a good example of a strong, negative association because it’s correlation coefficient is -0.622, and anything less than -0.5 is an example of a strong, negative association.
## [1] -0.6225672
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = safety_score ~ rate_of_misconducts_per_100_students,
## data = cchs)
##
## Coefficients:
## (Intercept) rate_of_misconducts_per_100_students
## 65.2308 -0.9315
Formula: y = -0.9315x + 65.2308
——————————————————————
This graph would represent a medium, negative association between rate_of_misconducts_per_100_students and family_involvement_score, mostly because our correlation coefficient is -0.464. The reason for it to not be a strong, negative association is because it is not less than -0.5, but is fairly close. This graph makes sense though, because as family involvement scores increase, the rate of misconduct amongst students will decrease.
## [1] -0.4640242
## `geom_smooth()` using formula = 'y ~ x'
##
## Call:
## lm(formula = family_involvement_score ~ rate_of_misconducts_per_100_students,
## data = cchs)
##
## Coefficients:
## (Intercept) rate_of_misconducts_per_100_students
## 61.0756 -0.5952
Formula: y = -0.5952x + 61.0756
Work Cited:
“Chicago Data Portal.” Chicago Data Portal. https://data.cityofchicago.org/.