Communities, Language, and Hardship

Chicagoland is a unique and diverse metropolis, made up of various regions, which consist of a number of neighborhoods.

Figure 1. Chicagoland Communities by Region, image courtesy of Wikipedi and reporduced for educational purposes only (https://en.wikipedia.org/wiki/Community_areas_in_Chicago#/media/File:Chicago_community_areas_map.svg). Accessed 28 May 2023

While the dominant language of Chicago is English, many of these neighborhoods consist of communities with a high population of non-English speakers, with Spanish speakers being the dominant non-English speakers across the region, followed by Polish, Chinese, and a small number of African languages.

This lab sets out to examine neighborhoods with high populations of Polish and Chinese speakers. The goal is to determine if Polish and Chinese communities are grouped together in different regions and see if there is a correlation between high populations of Polish and Chinese speakers and hardship levels in the community.

Data Ingest and Preparation

Reading in Packages and Additional Data Sets

This lab begins by using Language data cleaned and prepared in a previous session. This data, which was obtained from the Chicago Open Data Portal depicts each community, which can be seen in Figure 1, and the associated dominant non-English language and additional languages spoken. It does not include socioeconomic data for each of those communities. To correct this, additional data was imported and then merged with the language data.

library(tidyverse)
library(here)
library(janitor)
library(knitr)
library(kableExtra)

# Set our favorite Theme ----
theme_set(theme_classic())

# Read in Language Data

languages <- read_csv(here("data", "grouped_languages.csv"))
socioeconomic <- read_csv(here("data", "socioeconomic_indicators_2008-2012.csv"))
socioeconomic <- socioeconomic %>%
  clean_names() %>%
  rename(community_area = community_area_number) # Making join columns match

# Joining Languages w/ Socioeconomic data ----
languages <- merge(x = languages, y = socioeconomic, by = "community_area")

Grouping by Region

With the additional data merged to community languages, each community needs to be added to a region, as depicted by the legend in Figure 1.

# Assigning communities to regions in Chicago ----
farnorth = c("O'Hare", "Edison Park", "Norwood Park", "Jefferson Park", "Forest Glen",
             "North Park", "Albany Park", "West Ridge", "Lincoln Square", "Rogers Park",
             "Edgewater", "Uptown")
central = c("Near North Side", "Loop", "Near South Side")
farsoutheast = c("Chatham", "Avalon Park", "Burnside", "Calumet Heights", "South Chicago",
                 "Roseland", "Pullman", "South Deering", "East Side", "Riverdale",
                 "west Pullman", "Hegewisch")
farsouthwest = c("Ashburn", "Auburn Gresham", "Washington Heights", "Beverly",
                 "Mount Greenwood", "Morgan Park")
north = c("Avondale", "Logan Square", "North Center", "Lakeview", "Licoln Park")
northwest = c("Dunning", "Montclare", "Portage Park", "Belmonth Cragin", "Hermosa", "Irving Park")
south = c("Bridgeport", "Armour Square", "Fuller Park", "Douglas", "Oakland", "Grand Boulevard", 
          "Kenwood", "Washignton Park", "Hyde Park", "Woodlawn", "Greater Grand Crossing",
          "South Shore")
southwest = c("Garfield Ridge", "Clearing", "Archer Heights", "West Elsdon", "West Lawn",
              "Brigton Park", "Gage Park", "Chicago Lawn", "McKinley Park", "New City",
              "west Englewood", "Englewood")
west = c("Austin", "Humboldt Park", "West Garfield Park", "East Garfield Park", "North Lawndale",
         "South Lawndale", "West Town", "Near West Side", "Lower West Side")


# Adding a column for region based on community
languages <- languages %>%
  mutate(region = case_when(community %in% farnorth ~ "Far North Side",
                            community %in% central ~ "Central",
                            community %in% farsoutheast ~ "Far Southeast Side",
                            community %in% farsouthwest ~ "Far Southwest Side",
                            community %in% north ~ "North Side",
                            community %in% northwest ~ "Northwest Side",
                            community %in% south ~ "South Side",
                            community %in% southwest ~"Southwest Side",
                            community %in% west ~ "West Side"))

kable(head(languages), booktabe = TRUE, format = "html")  %>% # Display table in readable format
  kable_styling(font_size = 10)
community_area community predominant_language dominant_language percent languages count meanlanguagecount count_higher com_count_higher community_area_name percent_of_housing_crowded percent_households_below_poverty percent_aged_16_unemployed percent_aged_25_without_high_school_diploma percent_aged_under_18_or_over_64 per_capita_income hardship_index region
1 Rogers Park SPANISH (9.9%) SPANISH 9 gujarati 70 41.641026 TRUE FALSE Rogers Park 7.7 23.6 8.7 18.2 27.5 23939 39 Far North Side
1 Rogers Park SPANISH (9.9%) SPANISH 9 hebrew 0 7.025641 FALSE FALSE Rogers Park 7.7 23.6 8.7 18.2 27.5 23939 39 Far North Side
1 Rogers Park SPANISH (9.9%) SPANISH 9 african_languages 332 111.923077 TRUE TRUE Rogers Park 7.7 23.6 8.7 18.2 27.5 23939 39 Far North Side
1 Rogers Park SPANISH (9.9%) SPANISH 9 arabic 165 141.076923 TRUE FALSE Rogers Park 7.7 23.6 8.7 18.2 27.5 23939 39 Far North Side
1 Rogers Park SPANISH (9.9%) SPANISH 9 armenian 65 4.923077 TRUE FALSE Rogers Park 7.7 23.6 8.7 18.2 27.5 23939 39 Far North Side
1 Rogers Park SPANISH (9.9%) SPANISH 9 cambodian_mon_khmer 19 13.256410 TRUE FALSE Rogers Park 7.7 23.6 8.7 18.2 27.5 23939 39 Far North Side

Dominant Languages by Region

With each community assigned to its respective region, we can examine the dominant non-English languages. Spanish is by far the predominant non-English language, but Polish and Chinese have high populations in the Far North and North West Sides, and Central and South Sides, respectively.

# Plot Dominant Languages ----

languages %>%
  na.omit() %>%
  group_by(dominant_language, region) %>%
  summarise(meanpercent = mean(percent)) %>%
  ggplot(aes(x = dominant_language, y = meanpercent)) +
  geom_col(aes(fill = dominant_language)) +
  facet_wrap(~ region) +
  theme(text = element_text(size = 15)) +
  labs(title = "Chicagoland Dominant Languages",
       subtitle = "Grouped by Region", 
       caption = "Data Courtesy of https://data.cityofchicago.org/",
       fill = "Dominant Non-English Language",
       y = "Percentage of Region (%)",
       x = "Non-English Language")
Figure 2. Dominant Non-English Languages. Outside of English, there are four dominate languages in Chicagoland, with Spanish being the dominant non-English Language across the various Regions.

Figure 2. Dominant Non-English Languages. Outside of English, there are four dominate languages in Chicagoland, with Spanish being the dominant non-English Language across the various Regions.

Closer Examination of Regions with Polish and Chinese Speakers

With the Polish and Chinese speaking regions narrowed down, we next examine which communities in those regions have a high population of Polish and Chinese speakers.

Polish Speaking Communities

High populations of Polish speakers were found in Northwest and Far North Sides of Chicago, with the former having a population of approximately 15% and the latter around 5%. Examining the data at the community level in these regions shows Polish speakers are found in Dunning, Edison Park, Forest Glen, Jefferson Park, Norwood Park, and O’Hare. These communities are grouped together on the western edge of Far North and Northwest Sides.

# Plot Polish Speaking communities ----
languages %>%
  na.omit() %>%
  filter(region %in% c("Far North Side", "Northwest Side")) %>%
  group_by(dominant_language, community, region) %>%
  summarize(mean_percent = mean(percent)) %>%
  ggplot(aes(x = dominant_language, y = mean_percent, color =)) +
  facet_wrap(~ community) +
  geom_col(aes(fill = dominant_language)) +
  theme(text = element_text(size = 15)) +
  labs(title = "Far North and Northwest Side Regions",
       subtitle = "High Population of Polish Speakers", 
       caption = "Data Courtesy of https://data.cityofchicago.org/",
       fill = "Dominant Non-English Language",
       y = "Percentage of Region (%)",
       x = "Non-English Language")
Figure 3. Dominant Non-English Languages. Polish speakers can be found in communities clustered together on the western side of Far North and Northwest Sides of Chicago.

Figure 3. Dominant Non-English Languages. Polish speakers can be found in communities clustered together on the western side of Far North and Northwest Sides of Chicago.

Chinese Speaking Communities

High populations of Chinese speakers were found in the Central and South Sides of Chicago, with the former having less than 5% and the latter between 15-20%. When examining the data at the community level, we find high populations of Chinese speakers in Armour Park (over 40%!), Bridgeport, Douglas, Hyde Park, the Loop, and Near South Side. Again, most of these communities are clustered on the northern portion of South Side and mid- to southern portions of Central Side, with the exception of Hyde Park, which is separated from the cluster by a number of communities.

# Plot Chinese Speaking communities ----

languages %>%
  na.omit() %>%
  filter(region %in% c("Central", "South Side")) %>%
  group_by(dominant_language, community, region) %>%
  summarize(mean_percent = mean(percent)) %>%
  ggplot(aes(x = dominant_language, y = mean_percent)) +
  facet_wrap(~ community) +
  geom_col(aes(fill = dominant_language)) +
  theme(text = element_text(size = 15)) +
  labs(title = "Central and South Side Regions",
       subtitle = "High Population of Chinese Speakers", 
       caption = "Data Courtesy of https://data.cityofchicago.org/",
       fill = "Dominant Non-English Language",
       y = "Percentage of Region (%)",
       x = "Non-English Language")       
Figure 4. Dominant Non-English Languages. Chinese speakers can be found in communities clustered together on the northern portion of South Side and mid- to southern portions of Central Side of Chicago.

Figure 4. Dominant Non-English Languages. Chinese speakers can be found in communities clustered together on the northern portion of South Side and mid- to southern portions of Central Side of Chicago.

Hardship Index compared to Polish and Chinese Communities

Finally, we examine if there is a link between communities with high populations of Polish and Chinese Speakers as the dominant non-English language. According to the metadata for the socioeconomic factors found on the City of Chicago’s open data portal, hardship index is a calculation which takes six socioeconomic factors into account: percent of housing crowded; percent of households below poverty; percent aged 16+ unemployed; percent 25+ without high school education; percent aged under 16 and over 64; and per capita income. With these indicators, hardship index is calculated as a value between 0 and 100, with 100 being extreme hardship.

Hardhip in Polish Speaking Communities

With an understanding of what communities Polish speakers are living in, we can examine if there is a correlation between communities with high populations of Polish speakers and the hardship index associated with those communities.

Figure 5. Hardship in Communities with High Populations of Polish Speakers. Within the Far North and Northwest Sides, communities with high populations of Polsih speakers show a positive correlation between the amount of Polish speakers and the hardship index.

Figure 5 shows there appears to be a correlation between Polish speaking communities and hardship. Communities with lower populations of Polish speakers, Forest Glen and Edison Park, have lower hardship indexes, whereas the highest population of Polish speakers is in Dunning which also has a highest hardship index.

Hardhip in Chinese Speaking Communities

Finally, we examine the communities with high populations of Chinese speakers to determine if there is a correlation between Chinese speaking population and hardship.

# Plot Hardship index in Chinese Speaking communities ----

languages %>%
  na.omit() %>%
  filter(dominant_language == "CHINESE") %>%
  group_by(region, community, percent, hardship_index) %>%
  summarise(meanpercent = mean(percent), meanhardship = mean(hardship_index)) %>%
  ggplot(aes(x = meanpercent, y = meanhardship, color = community)) +
  geom_point(size = 8) + 
  theme(text = element_text(size = 15)) +
  labs(title = "Hardship vs Percentage of Chinese Speakers",
       subtitle = "Central and South Side Regions", 
       caption = "Data Courtesy of https://data.cityofchicago.org/",
       color = "Community",
       y = "Community Hardship Index",
       x = "Percentage Chinese Speakers (%)")
Figure 6. Hardship in Communities with High Populations of Chinese Speakers.  Within the Central and South Sides, communities with high populations of Chinese speakers show a positive correlation between the amount of Chinese speakers and the hardship index.

Figure 6. Hardship in Communities with High Populations of Chinese Speakers. Within the Central and South Sides, communities with high populations of Chinese speakers show a positive correlation between the amount of Chinese speakers and the hardship index.

Figure 6 shows there once again appears to be a correlation between communities with high populations of non-English speakers and hardship. Communities with lower populations of Chinese speakers show lower hardship index, as seen in the Loop, Near South Side and Hyde Park. As Chinese speaking population increases, so does hardship index, as seen in Armour Park with over 40% population of Chinese speakers and a hardship of over 80. There is an outlier in Douglas, which has a relatively low population of Chinese speakers but a high hardship index. It is not immediately clear why, as Douglas does not have a large population of non-English speakers in general.

Conclusion and Future Work

In this lab, we examined the communities with non-English predominant languages in Chicagoland in order to determine if there was a correlation between communities with higher populations of Polish and Chinese speakers and hardship levels. We first determined which regions and communities had higher populations of Polish and Chinese speakers, then plotted the hardship index as a function of percentage of Polish and Chinese speakers. These plots revealed there appears to be a positive correlation between communities with higher, non-English speaking populations and higher hardship indexes.

While there appears to be a positive correlation, future work needs to be done to determine if other factors contribute to hardship within the non-English communities as well as the English speaking communities. Additionally, work should be done to determine what draws Polish and Chinese speakers to the communities where their population is highest.