1 Introduction

This is a follow-up to the analysis I recently published about the Sterigenics plants in Willowbrook, IL, and the cancer diagnosis rates in the surrounding areas. This can be found at http://www.rpubs.com/astama/illinois_cancer. In that analysis, I found that there was a statistically significant link between proximity to that specific EtO polluting plant and cancer diagnosis rates in the 4 mile radius surrounding that plant. Since then, I have been introduced to the United States EPA’s 2014 National Air Toxics Assessment (NATA) map and the EPA’s Toxics Release Invnetory (TRI) database. From these resources, I found more Ethylene Oxide emitting facilities around the state of Illinois so I decided to see if there was a significant link between those facilities and the cancer diagnosis rates in the surrounding areas. I would have liked to extend the analysis to other states in the country that have EtO plants such as Louisiana, Texas, Colorado, and West Virginia, but they didn’t have publically available cancer diagnosis data down to the ZIP Code level like Illinois did, so I was not able to do this.

The NATA map (https://www.epa.gov/national-air-toxics-assessment/2014-nata-map) shows how many Metric Tons Per Year (TPY) of a wide variety of chemicals were polluted by facilites across the United States. In addition to this, they calculated additional cancer risk (per 1,000,000 people) that the people in each EPA land tract face due to the pollutants. If an area is highlighted in the dark blue color it means that at least an additional 100 people per million were at risk of developing cancer. In almost all such areas around the country, there is a facility that emits Ethylene Oxide.

The EPA’s TRI Explorer (https://iaspub.epa.gov/triexplorer/tri_release.facility) allows you to search by year, by state, by chemical to see how much of a particular chemical has been released by facilities around the country per year. The data from this website was in a rather messy format, so for the purpose of this analysis, I removed all rows and columns except for the “year” and “total emissions” column. This resource reported emissions in pounds, so I converted to metric tons by dividing by 2204.6. These plots can be seen in each section below.

The facilities I was able to find that release EtO are listed below.

Facility_Name Location
Sterigenics Willowbrook
Ele Corp McCook
Azko Nobel Surface Chemistry McCook
Lutheran General Hospital Park Ridge
Medline Industries/Steris Isomedix Northfield
Vantage Specialties Gurnee
Lambent Technologies Skokie
Stepan Co Elwood/Joliet
Tate & Lyle Decatur

Further explanation of these facilities will be given later in the report.

Here is a link to my Github repository where all my code and data used can be found: https://github.com/athanasios8193/illinois_cancer

2 Cancer Rate Tables and Pollution Data

Following are plots showing EtO pollution from facilities around Illinois and the ZIP codes largely contained within 2.5 miles of their respective EtO emitting facilities. Note that the baseline cancer diagnosis rate for the entire state was previously calculated to be 522 per 100,000 people per year.

There is a map at the end of this section that contains the cancer diagnosis rates for every ZIP code in Illinois. The 9 facilities I examine are all shown on the map with 2.5 mile circles surrounding them.

NOTE: All emissions plots are interactive. You can hover over each bar to see the exact number of tons per year released by the facility.

2.1 Sterigenics

Sterigenics was largely reported on in my previous report. In this analysis, I shrank the radius from 4 miles to 2.5 miles. Now, I am only considering Willowbrook and Darien.

The TRI report had a few years missing between 1989 and 1994. It’s interesting to see the very high number in 1988 and then the sharp decline in EtO emissions from 1998 to 1999.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
60561 23115 2341 156.0667 148.1098 675.1749 1.293305
60527 27486 2720 181.3333 151.5772 659.7298 1.263720

The overall rate per 100,000 people in the Sterigenics area is 666.79 and compared to the state baseline it is 1.277, which is 27.7% higher.

2.2 Ele Corp and Azko Nobel Surface Chemistry

The Ele Corp and Azko Nobel facilities are both located in McCook, Illinois. These two facilities are located very near each other, so I decided to group them together for the sake of this analysis. Appearing below is the pollution graph. The Azko Nobel data ends in 2007, which is the same year that Ele first appears on the plot. Like Sterigenics, there is a very sharp decline in EtO emissions from 1998 to 1999.

It is important to note that 60525, La Grange, is sandwiched between the Sterigenics facility to the southwest and the Ele and Azko Noble facilities to the east.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
60525 31168 3389 225.93333 137.9522 724.8888 1.3885327
60546 15668 1643 109.53333 143.0432 699.0894 1.3391138
60526 13576 1354 90.26667 150.3988 664.8988 1.2736213
60513 19047 1733 115.53333 164.8615 606.5697 1.1618912
60638 55026 5002 333.46667 165.0120 606.0165 1.1608316
60534 10649 876 58.40000 182.3459 548.4083 1.0504824
60402 63448 4263 284.20000 223.2512 447.9259 0.8580071
60501 11626 723 48.20000 241.2033 414.5880 0.7941480

The overall rate per 100,000 people in the Ele Corp/Azko Nobel area is 574.7 and compared to the state baseline it is 1.101, which is 10.1% higher.

2.3 Lutheran General

Lutheran General is a hospital in Park Ridge, Illinois. The US EPA did not have year by year data for Lutheran General. I found this facility from the 2014 NATA Map. According to the map, Lutheran General released 0.10 tons of Ethylene Oxide into the air that year. Since no other data are available for this facility, I don’t know how long they have been releasing EtO or how much they have been releasing.

The areas surrounding this hospital have some of the highest cancer diagnosis rates of all of the ZIP codes being analyzed.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
60714 29931 3808 253.8667 117.9005 848.1730 1.624685
60053 23260 2494 166.2667 139.8957 714.8180 1.369242
60068 37475 3997 266.4667 140.6367 711.0518 1.362028
60025 39105 4091 272.7333 143.3818 697.4385 1.335951
60016 59690 5181 345.4000 172.8141 578.6564 1.108423

The overall rate per 100,000 people in the Lutheran area is 688.66 and compared to the state baseline it is 1.319, which is 31.9% higher.

2.4 Medline Industries/Steris Isomedix

The EPA’s TRI report lists this facility as Steris Isomedix and the data ends at 2005. In the 2014 NATA Map, the facility is listed as Medline Industries. According to the map, the facility released 1.529 tons of EtO that year. I saw nothing on the TRI report about Medline Industries even though it appears on the NATA Map.

These results are pretty interesting. Only one ZIP code in this area has a cancer diagnosis rate higher than the state baseline.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
60048 29095 2367 157.80000 184.3790 542.3612 1.0388991
60031 37947 2481 165.40000 229.4256 435.8711 0.8349160
60064 15407 901 60.06667 256.4983 389.8661 0.7467929
60085 71714 3241 216.06667 331.9068 301.2894 0.5771233

The overall rate per 100,000 people in the Medline Industries/Steris Isomedix area is 388.77 and compared to the state baseline it is 0.745, which is 25.5% lower.

2.5 Vantage Specialties

The Vantage Specialties facility is located in Gurnee, not too far from the Medline Industries facility. Similar to Sterigenics and Azko Nobel, there is a sharp decline from 1998 to 1999. The EtO emissions then spike around 2010.

Similarly to the Medline Industries facility, there is a surpringly low cancer diagnosis rate in this area.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
60031 37947 2481 165.4000 229.4256 435.8711 0.8349160
60087 26978 1751 116.7333 231.1079 432.6982 0.8288384

The overall rate per 100,000 people in the Vantage Specialties area is 434.55 and compared to the state baseline it is 0.832, which is 16.8% lower.

2.6 Lambent Technologies

Lambent Technologies was located in Skokie. Data stopped being collected in 2005. The Lambent Technologies website says that they changed their name to Vantage Specialties.

This area contains some ZIP codes with very high cancer diagnosis rates.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
60712 12590 1468 97.86667 128.6444 777.3365 1.4889970
60203 4523 460 30.66667 147.4891 678.0161 1.2987475
60077 26825 2707 180.46667 148.6424 672.7555 1.2886709
60076 33415 3372 224.80000 148.6432 672.7518 1.2886637
60646 27177 2717 181.13333 150.0386 666.4950 1.2766787
60202 31361 2361 157.40000 199.2440 501.8973 0.9613899
60645 45274 3233 215.53333 210.0557 476.0643 0.9119065
60659 38104 2369 157.93333 241.2664 414.4797 0.7939406
60626 50139 2977 198.46667 252.6318 395.8329 0.7582225

The overall rate per 100,000 people in the Lambent Technologies area is 536.09 and compared to the state baseline it is 1.027, which is 2.7% higher.

2.7 Stepan Co

Stepan Co is a chemical company in Elwood/Joliet, Illinois. They had a huge spike in EtO emissions in 1997 then it dropped back down the next year.

The ZIP code the facility is in, 60421, has a higher cancer diagnosis rate than the adjacent ZIP code.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
60421 3968 349 23.26667 170.5444 586.3575 1.123175
60410 12687 748 49.86667 254.4184 393.0533 0.752898

The overall rate per 100,000 people in the Stepan Co area is 439.11 and compared to the state baseline it is 0.841, which is 15.9% lower.

2.8 Tate & Lyle

Tate & Lyle is a food manufacturing business in Decatur. The EtO emissions were very high before 1992 before they dropped down to low amounts from 1993-1998. The emissions then rise in 1999 where they remained fairly constant from then.

The two more largely populated ZIP codes in the area had cancer diagnosis rates around 50% higher than the state baseline while the lowest ZIP code was 33% less.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
62526 34075 4064 270.933333 125.7689 795.1088 1.5230401
62521 35851 4115 274.333333 130.6841 765.2041 1.4657573
62523 1237 65 4.333333 285.4615 350.3099 0.6710226

The overall rate per 100,000 people in the Tate & Lyle area is 772.31 and compared to the state baseline it is 1.479, which is 47.9% higher.

2.9 Map

Here is a map of every ZIP code in the state of Illinois with the darkness of the fill corresponding to a higher cancer diagnosis rate. The filled in circles are centered around EtO emitting facilities. Hovering over these circles will tell you which facility it is.

3 Deeper Statistical Analysis - \(\chi^2\) Test

In my original analysis which focused only on Sterigenics, I used both t-testing and \(\chi^2\) testing. I mentioned there that it was more appropriate to use the \(\chi^2\) test over the t-test. In the t-tests, the cancer rates per 100,000 people were averaged and comapred to the average rate of the rest of the state. By doing this, you misrepresent the distribution because of how the cancer rate per 100,000 people is calculated. The \(\chi^2\) test is based off a contingency table. This table contains the number of people per year who were diagnosed with cancer or not diagnosed with cancer in the EtO and out of the EtO affected areas. The expected value of each cell is calculated and compared to the actual recorded value. If any value has an unusually high \(\chi^2\) value, it is statistically significant.

Following is a table of all the EtO affected ZIP codes that I showed above.

zip population freq per_year one_in_every_per_year per_100000_per_year zip_vs_state
60714 29931 3808 253.866667 117.9005 848.1730 1.6246851
62526 34075 4064 270.933333 125.7689 795.1088 1.5230401
60712 12590 1468 97.866667 128.6444 777.3365 1.4889970
62521 35851 4115 274.333333 130.6841 765.2041 1.4657573
60525 31168 3389 225.933333 137.9522 724.8888 1.3885327
60053 23260 2494 166.266667 139.8957 714.8180 1.3692421
60068 37475 3997 266.466667 140.6367 711.0518 1.3620279
60546 15668 1643 109.533333 143.0432 699.0894 1.3391138
60025 39105 4091 272.733333 143.3818 697.4385 1.3359515
60203 4523 460 30.666667 147.4891 678.0161 1.2987475
60561 23115 2341 156.066667 148.1098 675.1749 1.2933052
60077 26825 2707 180.466667 148.6424 672.7555 1.2886709
60076 33415 3372 224.800000 148.6432 672.7518 1.2886637
60646 27177 2717 181.133333 150.0386 666.4950 1.2766787
60526 13576 1354 90.266667 150.3988 664.8988 1.2736213
60527 27486 2720 181.333333 151.5772 659.7298 1.2637200
60513 19047 1733 115.533333 164.8615 606.5697 1.1618912
60638 55026 5002 333.466667 165.0120 606.0165 1.1608316
60421 3968 349 23.266667 170.5444 586.3575 1.1231746
60016 59690 5181 345.400000 172.8141 578.6564 1.1084229
60534 10649 876 58.400000 182.3459 548.4083 1.0504824
60048 29095 2367 157.800000 184.3790 542.3612 1.0388991
60202 31361 2361 157.400000 199.2440 501.8973 0.9613899
60645 45274 3233 215.533333 210.0557 476.0643 0.9119065
60402 63448 4263 284.200000 223.2512 447.9259 0.8580071
60031 37947 2481 165.400000 229.4256 435.8711 0.8349160
60087 26978 1751 116.733333 231.1079 432.6982 0.8288384
60501 11626 723 48.200000 241.2033 414.5880 0.7941480
60659 38104 2369 157.933333 241.2664 414.4797 0.7939406
60626 50139 2977 198.466667 252.6318 395.8329 0.7582225
60410 12687 748 49.866667 254.4184 393.0533 0.7528980
60064 15407 901 60.066667 256.4983 389.8661 0.7467929
62523 1237 65 4.333333 285.4615 350.3099 0.6710226
60085 71714 3241 216.066667 331.9068 301.2894 0.5771233

The overall rate per 100,000 people in the EtO areas is 569.85 and compared to the state baseline it is 1.092, which is 9.2% higher.

Since the lowest population in the EtO affected areas is 1,237, I will remove all ZIP Codes with populations less than 1,000. In the Sterigenics only analysis, I removed all ZIP Codes with fewer than 5,000 people to make the samples more representative for comparison’s sake.

3.1 \(\chi^2\) Test

The point of this analysis is to determine if there is a link between cancer diagnosis rates and proximity to EtO facilities. Rather than perform a test for each facility individually, I will aggregate all the ZIP codes and do a test of EtO adjacent areas vs non-EtO adjacent areas.

In the table below, the first row contains the number of cancer diagnoses in the EtO adjacent areas from 2001-2015 in the first column and the number of people in the areas who were not diagnosed with cancer in the second column. The second row shows the same thing for the rest of the ZIP codes in Illinois.

Cancer No Cancer Sum
EtO Affected Areas 85361 895108 980469
Rest of State 913276 10715742 11629018
Sum 998637 11610850 12609487

A \(\chi^2\) test was performed on this contingency table and the results are below. The p-value is very small, which means that this result is statistically significant. The formula for calculating the \(\chi^2\) value is shown in my Sterigenics-only analysis.

## 
##  Pearson's Chi-squared test
## 
## data:  contingency_table
## X-squared = 901.6, df = 1, p-value < 2.2e-16

The table below shows how many cancer diagnoses would be expected for each of the four groups based on the proportion of people in each group compared to the population overall. You can see that each number is off by around 7-8,000. This is most significant in the EtO Affected Cancer diagnoses because the number is at least an order of magnitude smaller than the rest in the table.

Cancer No Cancer Sum
EtO Affected Areas 77650 902819 980469
Rest of State 920987 10708031 11629018
Sum 998637 11610850 12609487

This following table confirms that the number of cancer diagnoses in the EtO affected areas is significantly larger than it should be. Additionally, since the number of cancer diagnoses is so high in the EtO areas, the number of cancer diagnoses for the rest of the state is lower than it should be and the number of people who hadn’t been diagnosed with cancer is much lower than it should be.

Cancer No Cancer
EtO Affected Areas 27.670184 -8.114913
Rest of State -8.034474 2.356293

4 Conclusions

Just like the previous analysis I performed, I found a statistically significant link between proximity to EtO plants and high cancer diagnosis rates. Very interestingly, there were areas around some facilities that didn’t have cancer rates higher than the state baseline, but when looking at all areas around EtO facilities, there is a cancer diagnosis rate of 9.2% higher than the state baseline. Based on this, there is a lot of evidence to suggest that living near an EtO polluting facility made you more likely to be diagnosed with cancer than if you lived anywhere else in the state.

5 Appendix

5.1 References

US EPA Toxics Release Inventory

-United States Environmental Protection Agency. (2018). TRI Explorer (2017 Dataset (released October 2018)) [Internet database]. Retrieved from https://www.epa.gov/triexplorer, (November 02, 2018).

US EPA 2014 National Air Toxics Assessment

-United States Environmental Protection Agency. Retrieved from https://www.epa.gov/national-air-toxics-assessment/2014-national-air-toxics-assessment, (November 02, 2018).

Illinois Cancer Data

-Illinois Department of Public Health, Illinois State Cancer Registry, public dataset, 1986-2015, data as of November 2017

Illinois Population Data

-U.S. Census Bureau; Census 2010, Summary File 1, 5-Digit ZIP Code Tabulation within Illinois; generated by Athanasios Stamatoukos; using American FactFinder; http://factfinder.census.gov; (5 October 2018)

Motivation

I got the idea to do this analysis after seeing posts by “Stop Sterigenics” group members Richard Morton and Katherine M Howard

5.2 Code

5.2.1 Loading and Introducing Data

library(dplyr)
library(ggplot2)
library(plotly)
library(knitr)

library(leaflet)
library(RColorBrewer)

library(tigris)
library(rgeos)
library(sp)
facility <- c('Sterigenics', 'Ele Corp', 'Azko Nobel Surface Chemistry', 'Lutheran General Hospital',
              'Medline Industries/Steris Isomedix', 'Vantage Specialties', 'Lambent Technologies',
              'Stepan Co', 'Tate & Lyle')
town <- c('Willowbrook', 'McCook', 'McCook', 'Park Ridge',
          'Northfield', 'Gurnee', 'Skokie',
          'Elwood/Joliet', 'Decatur')
facilitytable <- cbind(Facility_Name=facility, Location=town)
kable(facilitytable, align='c')
cancer <- read.csv('./Data/il_cancer_statistics.csv', header = TRUE, stringsAsFactors = FALSE)
il_rate <- 522.0538
sterigenics <- read.csv('./Data/facilities/sterigenics.csv', header=TRUE, stringsAsFactors = FALSE)
ele <- read.csv('./Data/facilities/ele_corp.csv', header=TRUE, stringsAsFactors = FALSE)
azko <- read.csv('./Data/facilities/azko.csv', header=TRUE, stringsAsFactors = FALSE)
steris <- read.csv('./Data/facilities/steris_isomedix.csv', header=TRUE, stringsAsFactors = FALSE)
vantage <- read.csv('./Data/facilities/vantage.csv', header=TRUE, stringsAsFactors = FALSE)
lambent <- read.csv('./Data/facilities/lambent.csv', header=TRUE, stringsAsFactors = FALSE)
stepan <- read.csv('./Data/facilities/stepan_co.csv', header=TRUE, stringsAsFactors = FALSE)
tatelyle <- read.csv('./Data/facilities/tate_lyle.csv', header=TRUE, stringsAsFactors = FALSE)
zipSterigenics <- c(60561, 60527)
zipEle <- c(60525, 60501, 60534, 60546, 60402, 60513, 60638, 60526)
# zipAzko <- c(60525, 60501, 60534, 60546, 60402, 60513, 60638)
zipLuthGen <- c(60068, 60714, 60053, 60016,60025)
zipMedline <- c(60085, 60064, 60048, 60031)
zipVantage <- c(60031, 60087, 60058)
zipStepan <- c(60410, 60421)
zipTateLyle <- c(62526, 62523, 62521)
zipLambent <- c(60712, 60659, 60645, 60646, 60626, 60202, 60076, 60077, 60203)
zipEtO <- unique(c(zipSterigenics, zipEle, zipLuthGen, zipMedline, zipVantage, zipStepan, zipTateLyle, zipLambent))

5.2.2 Plotting and Subsetting Data

g <- ggplot(sterigenics, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Sterigenics')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_steri <- subset(cancer, zip %in% zipSterigenics)
kable(arrange(cancer_steri[,-c(3,4)], desc(zip_vs_state)), align='c')
per_steri <- sum(cancer_steri$per_year)*100000/sum(cancer_steri$population)
comp_steri <- per_steri/il_rate
g <- ggplot() + geom_bar(data=ele, aes(year, total_emissions_tons), stat='identity', fill='red')
g <- g + geom_bar(data=azko, aes(year, total_emissions_tons), stat='identity', fill='blue')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Ele Corp (Red) and \n
                                                                                 Azko Nobel Surface Chemistry (Blue)')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_ele <- subset(cancer, zip %in% zipEle)
kable(arrange(cancer_ele[,-c(3,4)], desc(zip_vs_state)), align='c')
per_ele <- sum(cancer_ele$per_year)*100000/sum(cancer_ele$population)
comp_ele <- per_ele/il_rate
cancer_luthgen <- subset(cancer, zip %in% zipLuthGen)
kable(arrange(cancer_luthgen[,-c(3,4)], desc(zip_vs_state)), align='c')
per_lutheran <- sum(cancer_luthgen$per_year)*100000/sum(cancer_luthgen$population)
comp_lutheran <- per_lutheran/il_rate
g <- ggplot(steris, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Steris Isomedix')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_medline <- subset(cancer, zip %in% zipMedline)
kable(arrange(cancer_medline[,-c(3,4)], desc(zip_vs_state)), align='c')
per_medline <- sum(cancer_medline$per_year)*100000/sum(cancer_medline$population)
comp_medline <- per_medline/il_rate
g <- ggplot(vantage, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Vantage Specialties')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_vantage <- subset(cancer, zip %in% zipVantage)
kable(arrange(cancer_vantage[,-c(3,4)], desc(zip_vs_state)), align='c')
per_vantage <- sum(cancer_vantage$per_year)*100000/sum(cancer_vantage$population)
comp_vantage <- per_vantage/il_rate
g <- ggplot(lambent, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Lambent Technologies')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_lambent <- subset(cancer, zip %in% zipLambent)
kable(arrange(cancer_lambent[,-c(3,4)], desc(zip_vs_state)), align='c')
per_lambent <- sum(cancer_lambent$per_year)*100000/sum(cancer_lambent$population)
comp_lambent <- per_lambent/il_rate
g <- ggplot(stepan, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Stepan Co')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_stepan <- subset(cancer, zip %in% zipStepan)
kable(arrange(cancer_stepan[,-c(3,4)], desc(zip_vs_state)), align='c')
per_stepan <- sum(cancer_stepan$per_year)*100000/sum(cancer_stepan$population)
comp_stepan <- per_stepan/il_rate
g <- ggplot(tatelyle, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Tate & Lyle')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_tatelyle <- subset(cancer, zip %in% zipTateLyle)
kable(arrange(cancer_tatelyle[,-c(3,4)], desc(zip_vs_state)), align='c')
per_tatelyle <- sum(cancer_tatelyle$per_year)*100000/sum(cancer_tatelyle$population)
comp_tatelyle <- per_tatelyle/il_rate

5.2.3 Map

datashp <- zctas(cb = TRUE, starts_with = cancer$zip)
data_map <- merge(datashp, cancer, by.x = 'GEOID10', by.y ='zip')

n <- leaflet(data_map) %>% addTiles()
n <- n %>% addMarkers(lat = ~lat, lng = ~long, 
                      popup=~paste('Per 100,000: ', as.character(round(per_100000_per_year, 2)), '<br>',
                                   'ZIP Code vs State: ', as.character(round(zip_vs_state, 2))),
                      label=~paste('Geographic Center of ', GEOID10, ' ZIP Code'),
                      clusterOptions = markerClusterOptions())
n <- n %>% addPolygons(data=data_map, weight=2, opacity = 1, fillOpacity = 0.5,
                       fillColor = ~colorQuantile('Greys', per_100000_per_year)(per_100000_per_year))
n <- n %>% addCircles(lat = 41.747375, lng = -87.939954,
                      label = 'Sterigenics Plant Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 41.805248, lng = -87.817419,
                      label = 'Ele Corp Plant Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 42.038210, lng = -87.847630,
                      label = 'Lutheran General Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 42.336930, lng = -87.889110,
                      label = 'Medline Industries Plant Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 41.442163, lng = -88.159382,
                      label = 'Stepan Co Plant Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 39.849503, lng = -88.918665,
                      label = 'Tate & Lyle Plant Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 42.383157, lng = -87.899563,
                      label = 'Vantage Specialties Plant Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 42.01284, lng = -87.717112,
                      label = 'Lambent Technologies Plant Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 41.805017, lng = -87.827975,
                      label = 'Azko Nobel Surface Chemistry Plant Radius',
                      color=rev(brewer.pal(5, 'Spectral')),
                      radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)

n

5.2.4 Chi-Square Test

cancer_EtO <- subset(cancer, zip %in% zipEtO)
kable(arrange(cancer_EtO[,-c(3,4)], desc(zip_vs_state)), align='c')
per_EtO <- sum(cancer_EtO$per_year)*100000/sum(cancer_EtO$population)
comp_EtO <- per_EtO/il_rate
cancer_rest <- subset(cancer, !(zip %in% zipEtO))
cancer_rest_big <- subset(cancer_rest, population > 1000)

EtO_cancer <- sum(cancer_EtO$freq)
EtO_no_cancer <- sum(cancer_EtO$population) - EtO_cancer
rest_cancer <- sum(cancer_rest_big$freq)
rest_no_cancer <- sum(cancer_rest_big$population) - rest_cancer

contingency_table <- matrix(c(EtO_cancer, EtO_no_cancer, rest_cancer, rest_no_cancer), nrow=2)
contingency_table <- round(contingency_table, 0)

rownames(contingency_table) <- c('EtO Affected Areas', 'Rest of State')
colnames(contingency_table) <- c('Cancer', 'No Cancer')

kable(addmargins(round(contingency_table, 0)), align='c')
chitest <- chisq.test(contingency_table, correct=FALSE)
chitest
kable(addmargins(round(chitest$expected, 0)), align = 'c')
kable(chitest$residuals, align = 'c')