Hypothesis

There is a strong relationship between average composite ACT scores and median household income by zipcode in Georgia.

Introduction

In order to test my hypothesis, I am going to generate a map in tableau that will show the average ACT scores by zip code in Georgia while comparing it to the median household income by zip code. I am using data from the Atlanta open data portal. Atlanta Open Data . The dataset has 2 variables called Average ACT composite score and school zip for 2016 which I will use to help generate a map in Tableau and to conduct statistical analysis in R. I will also use the school name variable to conduct statistical analysis in R.

There have been articles written that have suggested that there could be a relationship between ACT score and median household income. According to Huffington Post. if you want to get a good ACT score it helps to be rich. The article goes on to say that lower-income students are more likely than higher-income students to have not taken the recommended course curriculum to be prepared for the exam. This could be due to a lack of resources that a lower-income student has at their school.

I will generate bar charts, so I can see which schools and zip codes are performing the worst and the best in terms of ACT scores. My data set does not have data from all high schools in Georgia, but I think that it has enough data to possibly see trends between average ACT scores in 2016 and median household income amounts. The map in Tableau only shows ranges for median household income levels for each zip code, so I will be using Incomebyzipcode in order to get the actual numbers. This will allow me to put my bar charts into more perspective when I compare the highest performing schools and zip codes in terms of ACT average composite scores and the lowest performing schools and zip codes in terms of ACT average composite scores.

Tableau

Here’s a link to my map in Tableau:.

Analysis

According to the map, the larger circles appear to be in the Atlanta area. Therefore, the ACT scores tend to be higher in the Atlanta area. Almost all of the very large circles appear to be in zip codes that have median household incomes of 81,000 dollars or greater. When I hover over the circles in zip codes with median household incomes of 81,000 dollars or greater, I notice that many of them have ACT average composite scores of at least a 22. On the other hand, when I hover over circles in zip codes that have median household incomes of $2,490 to 41,300 dollars, I realize that it is very hard to find average ACT composite scores that are at least 20. The map seems to indicate that there is a trend in which the richer zip codes perform a lot better on the ACT than poorer zip codes. I will explore this idea more with my statistical analysis in R.

Code Description

I read in the data from the Atlanta open data portal and my data set has 476 observations.

library(readr)
High_School <- read_csv("~/Documents/High_Schools.csv")

Code Description

I ran code in order to round my Average High School ACT Composite Score in 2016 to whole numbers.

library(tidyverse)
High_School %>% 
 select(AvgHi_ACT_Composite_Score_2016) %>% 
round() %>% 
 drop_na() ->atlantaRounded

Code Description

I ran code below to make a bar chart in terms of school name by ACT compsite score. I used the tail function in order to generate the 12 schools with the lowest average of ACT composite scores in GA within the dataset. I used the filter function in order to tell R that I only wanted schools from my data set to be on the bar chart that have an average ACT composite score of greater than 1, so missing values would not be included.

High_School %>% 
 group_by(SCHOOL_NAME) %>% 
 arrange(desc(AvgHi_ACT_Composite_Score_2016)) %>%
 filter(AvgHi_ACT_Composite_Score_2016 > 1) %>% 
 tail(12) %>%
 ggplot(aes(reorder(SCHOOL_NAME, AvgHi_ACT_Composite_Score_2016),AvgHi_ACT_Composite_Score_2016)) + geom_col() + coord_flip() +
        labs(title="Worst 12 Schools for 2016 Average ACT Composite Scores",
             x="School Name",
             y="Average ACT Score")

Analysis

According to the bar chart, Butler had the lowest average composite ACT scores amongst its students. Butler appeared to have an average composite score of 14 which means that it is well below the national average. The national average is 20.8 according to Prep Scholar {target=“blank”}. Hanncock Central and School of Technology at Carver had the 2nd and 3rd lowest with composite scores of about 15. Butler High School, according to the map in tableau, is in zip code 30901 which has a median household income in the range of $2,490-41,300 a year. The same can be said for Hancock Central and the School of Technology at Carver which are both in zip codes that have median household incomes in the range of 2,490 to 41,300 dollars a year.

Code Description

I wrote code below to make a bar chart in terms of school name by ACT Score. I used the head function in order to generate a bar chart that shows the top 12 schools with the highest average ACT composite scores in GA within the dataset. I used the arrange function in order to tell R that I wanted my bars to be layed out in descending order by average ACT composite score.

High_School %>% 
 group_by(SCHOOL_NAME) %>% 
 arrange(desc(AvgHi_ACT_Composite_Score_2016)) %>%
 filter(AvgHi_ACT_Composite_Score_2016 > 1) %>% 
 head(12) %>%
 ggplot(aes(reorder(SCHOOL_NAME, AvgHi_ACT_Composite_Score_2016),AvgHi_ACT_Composite_Score_2016)) + geom_col() +   coord_flip() +
          labs(title="Top 12 Schools for ACT Composite Scores",
             x="School Name",
             y="Average ACT Score")

Analysis

According to the bar chart, the Gwinnett School of Mathematics, Science, and Technology had the highest average ACT composite score of about 28. They are a public special school according to The Davidson’s Institute so they are an outlier since most of the other schools in the dataset are purely public schools. The area that they are in does not have a median household income of over $81,000 a year, which is defintely possible since they are a special school. However, the next two highest schools in terms of average ACT score were Northview and Johns Creek. Northview and Johns Creek are both in zip codes that have median annual household incomes of 81,000 dollars or higher according to the map in tableau. Therefore, we are seeing a trend with the median household income and the average ACT composite score, with the exception of the Gwinnett School of Mathematics, Science, and Technology which is an outlier.

Code Description

I wrote code below to make a bar chart showing the average ACT composite scores by zip code. I used the group_by function in order to just look at zip code. I used the summary function in order to take the mean of the average composite ACT scores for schools in each zip code. I later used the tail function to tell R that I only wanted the 12 zip codes on the bar chart that have the lowest average ACT scores amongst its schools.

High_School %>% 
 group_by(SCHOOL_ZIP) %>% 
 summarize(total = mean(AvgHi_ACT_Composite_Score_2016)) %>% 
 arrange(desc(total)) %>% 
 filter(total > 1) %>% 
 tail(12) %>% 
 ggplot(aes(reorder(SCHOOL_ZIP,total),total)) + geom_col() + coord_flip() +
         labs(title="12 Worst Zip Codes in Terms of 2016 Average ACT Composite Scores",
             x="Zip Code",
             y="Average ACT Score")

Analysis

According to the bar chart, 31087 had the lowest average. 31907, 39840, and 31044 were the next highest. 31087 according to the map in tableau is in the income bracket of $2,490-41,300 a year. To be more specific according to Incomebyzipcode the median household income is just 31,714 dollars. 31907, 39840, and 31044 also all fall into the same income bracket of 2,490-41,300 dollars a year. 39840 in particular according to Incomebyzipcode has a median household income of 30,300. Based on these results, it is clear that zip codes with the lowest average ACT scores have some of the lowest median household income per year amounts in the state. I also noticed that most of these populations of these 12 zip codes are relatively low. For instance, 31907, 39840, and 31044 all had populations of less than 10,000. 31907 and some of the other zip codes had populations of over 30,000, so it does not appear that all of the worst performing zip codes are neccsarily in small towns. Hometownlocator A good amount of them are in small towns, but not all.

Code Description

I wrote code below to make a bar chart showing the average ACT composite scores by zipcode in 2016. I used the group_by function in order to just look at zipcode. I used the summary function in order to take the mean of the average ACT scores for schools in each zip code. I later used the head function to tell R that I only wanted the 12 zip codes on the bar chart that have the highest average ACT scores amongst its schools.

High_School %>% 
 group_by(SCHOOL_ZIP) %>% 
 summarize(total = mean(AvgHi_ACT_Composite_Score_2016)) %>% 
 arrange(desc(total)) %>% 
 head(12) %>% 
 ggplot(aes(reorder(SCHOOL_ZIP,total),total)) + geom_col() + coord_flip() +
        labs(title="Top 12 Zip Codes in Terms of 2016 Average ACT Composite Scores",
             x="Zip Code",
             y="Average ACT Score")

Analysis

30044 had the highest average of about 28. According to the map in Tableau, it is in the $51,100-61,300 income bracket. However, this zipcode in our data is solely representing the Gwinnett School of Mathematics, Science, and Technology which is an outlier as mentioned earlier, so this zipcode is an exception. The 2nd, 3rd, and 4th highest zipcodes of 30022, 30062, and 30269 are all in income brackets of 81,000 to 250,000 dollars per year according to tableau. 30022, 30062, and 30005 all had average ACT composite scores in 2016 of about 25. In particular, according to incomebyzipcode 30022 has a median household income of 105,988 dollars, 30062 has a median housheold income of 99,101 dollars, 30269 has a median household income of 95,837 dollars. With the expection of 30044, it appears that the zip codes with the three highest average ACT scores in 2016 have a median household income that is more than double than the zipcodes with the three lowest average ACT scores. It also appeared that top performing zip codes such as 30044, 30022, and 30062 have populations of at least 40,000 people, Hometownlocator so it appears that it is relatively hard for a very rural zip code to be performing highly.

Conclusion

My hypothesis appears to be true that there is a strong relationship between average ACT scores and median household income by zip code in Georgia. My map in tableau indicated that many of the zip codes who perform the best on the ACT are located in the Atlanta area. The zip codes with the largest circles tended to have median household incomes of at least 95,837 dollars. Additionally, based on the bar charts that I generated in R, there defintely appears to be a trend in which the zip codes that have lower average ACT scores tend to have lower median household incomes than zip codes with higher average ACT scores. For instance, the zip codes with the four lowest average ACT composite scores in 2016 all had median household incomes of $42,100 or lower. On the other hand the top four zip codes, excluding 30044 which is an outlier as mentioned earlier, have a median household income of 95,837 dollars or higher. We can conclude that there appears to be a significant difference in average ACT composite scores in 2016 for zip codes that are lower-income in comparison to zip code in higher-income areas. The same can be said for average ACT scores by school in which schools that score higher tend to be in zip codes that have higher median household incomes. While the schools and zipcodes who performed the lowest on the ACT were spread out around Georgia, it appeared that generally zip codes who were peforming the best have populations that tend to be larger than zip codes who perform the worst. I would like to further my research by looking at how average ACT scores in zip codes can be influenced by other demographics such as race and ethnicity.