This is a follow-up to the analysis I recently published about the Sterigenics plants in Willowbrook, IL, and the cancer diagnosis rates in the surrounding areas. This can be found at http://www.rpubs.com/astama/illinois_cancer. In that analysis, I found that there was a statistically significant link between proximity to that specific EtO polluting plant and cancer diagnosis rates in the 4 mile radius surrounding that plant. Since then, I have been introduced to the United States EPA’s 2014 National Air Toxics Assessment (NATA) map and the EPA’s Toxics Release Invnetory (TRI) database. From these resources, I found more Ethylene Oxide emitting facilities around the state of Illinois so I decided to see if there was a significant link between those facilities and the cancer diagnosis rates in the surrounding areas. I would have liked to extend the analysis to other states in the country that have EtO plants such as Louisiana, Texas, Colorado, and West Virginia, but they didn’t have publically available cancer diagnosis data down to the ZIP Code level like Illinois did, so I was not able to do this.
The NATA map (https://www.epa.gov/national-air-toxics-assessment/2014-nata-map) shows how many Metric Tons Per Year (TPY) of a wide variety of chemicals were polluted by facilites across the United States. In addition to this, they calculated additional cancer risk (per 1,000,000 people) that the people in each EPA land tract face due to the pollutants. If an area is highlighted in the dark blue color it means that at least an additional 100 people per million were at risk of developing cancer. In almost all such areas around the country, there is a facility that emits Ethylene Oxide.
The EPA’s TRI Explorer (https://iaspub.epa.gov/triexplorer/tri_release.facility) allows you to search by year, by state, by chemical to see how much of a particular chemical has been released by facilities around the country per year. The data from this website was in a rather messy format, so for the purpose of this analysis, I removed all rows and columns except for the “year” and “total emissions” column. This resource reported emissions in pounds, so I converted to metric tons by dividing by 2204.6. These plots can be seen in each section below.
The facilities I was able to find that release EtO are listed below.
| Facility_Name | Location |
|---|---|
| Sterigenics | Willowbrook |
| Ele Corp | McCook |
| Azko Nobel Surface Chemistry | McCook |
| Lutheran General Hospital | Park Ridge |
| Medline Industries/Steris Isomedix | Northfield |
| Vantage Specialties | Gurnee |
| Lambent Technologies | Skokie |
| Stepan Co | Elwood/Joliet |
| Tate & Lyle | Decatur |
Further explanation of these facilities will be given later in the report.
Here is a link to my Github repository where all my code and data used can be found: https://github.com/athanasios8193/illinois_cancer
Following are plots showing EtO pollution from facilities around Illinois and the ZIP codes largely contained within 2.5 miles of their respective EtO emitting facilities. Note that the baseline cancer diagnosis rate for the entire state was previously calculated to be 522 per 100,000 people per year.
There is a map at the end of this section that contains the cancer diagnosis rates for every ZIP code in Illinois. The 9 facilities I examine are all shown on the map with 2.5 mile circles surrounding them.
NOTE: All emissions plots are interactive. You can hover over each bar to see the exact number of tons per year released by the facility.
Sterigenics was largely reported on in my previous report. In this analysis, I shrank the radius from 4 miles to 2.5 miles. Now, I am only considering Willowbrook and Darien.
The TRI report had a few years missing between 1989 and 1994. It’s interesting to see the very high number in 1988 and then the sharp decline in EtO emissions from 1998 to 1999.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 60561 | 23115 | 2341 | 156.0667 | 148.1098 | 675.1749 | 1.293305 |
| 60527 | 27486 | 2720 | 181.3333 | 151.5772 | 659.7298 | 1.263720 |
The overall rate per 100,000 people in the Sterigenics area is 666.79 and compared to the state baseline it is 1.277, which is 27.7% higher.
The Ele Corp and Azko Nobel facilities are both located in McCook, Illinois. These two facilities are located very near each other, so I decided to group them together for the sake of this analysis. Appearing below is the pollution graph. The Azko Nobel data ends in 2007, which is the same year that Ele first appears on the plot. Like Sterigenics, there is a very sharp decline in EtO emissions from 1998 to 1999.
It is important to note that 60525, La Grange, is sandwiched between the Sterigenics facility to the southwest and the Ele and Azko Noble facilities to the east.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 60525 | 31168 | 3389 | 225.93333 | 137.9522 | 724.8888 | 1.3885327 |
| 60546 | 15668 | 1643 | 109.53333 | 143.0432 | 699.0894 | 1.3391138 |
| 60526 | 13576 | 1354 | 90.26667 | 150.3988 | 664.8988 | 1.2736213 |
| 60513 | 19047 | 1733 | 115.53333 | 164.8615 | 606.5697 | 1.1618912 |
| 60638 | 55026 | 5002 | 333.46667 | 165.0120 | 606.0165 | 1.1608316 |
| 60534 | 10649 | 876 | 58.40000 | 182.3459 | 548.4083 | 1.0504824 |
| 60402 | 63448 | 4263 | 284.20000 | 223.2512 | 447.9259 | 0.8580071 |
| 60501 | 11626 | 723 | 48.20000 | 241.2033 | 414.5880 | 0.7941480 |
The overall rate per 100,000 people in the Ele Corp/Azko Nobel area is 574.7 and compared to the state baseline it is 1.101, which is 10.1% higher.
Lutheran General is a hospital in Park Ridge, Illinois. The US EPA did not have year by year data for Lutheran General. I found this facility from the 2014 NATA Map. According to the map, Lutheran General released 0.10 tons of Ethylene Oxide into the air that year. Since no other data are available for this facility, I don’t know how long they have been releasing EtO or how much they have been releasing.
The areas surrounding this hospital have some of the highest cancer diagnosis rates of all of the ZIP codes being analyzed.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 60714 | 29931 | 3808 | 253.8667 | 117.9005 | 848.1730 | 1.624685 |
| 60053 | 23260 | 2494 | 166.2667 | 139.8957 | 714.8180 | 1.369242 |
| 60068 | 37475 | 3997 | 266.4667 | 140.6367 | 711.0518 | 1.362028 |
| 60025 | 39105 | 4091 | 272.7333 | 143.3818 | 697.4385 | 1.335951 |
| 60016 | 59690 | 5181 | 345.4000 | 172.8141 | 578.6564 | 1.108423 |
The overall rate per 100,000 people in the Lutheran area is 688.66 and compared to the state baseline it is 1.319, which is 31.9% higher.
The EPA’s TRI report lists this facility as Steris Isomedix and the data ends at 2005. In the 2014 NATA Map, the facility is listed as Medline Industries. According to the map, the facility released 1.529 tons of EtO that year. I saw nothing on the TRI report about Medline Industries even though it appears on the NATA Map.
These results are pretty interesting. Only one ZIP code in this area has a cancer diagnosis rate higher than the state baseline.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 60048 | 29095 | 2367 | 157.80000 | 184.3790 | 542.3612 | 1.0388991 |
| 60031 | 37947 | 2481 | 165.40000 | 229.4256 | 435.8711 | 0.8349160 |
| 60064 | 15407 | 901 | 60.06667 | 256.4983 | 389.8661 | 0.7467929 |
| 60085 | 71714 | 3241 | 216.06667 | 331.9068 | 301.2894 | 0.5771233 |
The overall rate per 100,000 people in the Medline Industries/Steris Isomedix area is 388.77 and compared to the state baseline it is 0.745, which is 25.5% lower.
The Vantage Specialties facility is located in Gurnee, not too far from the Medline Industries facility. Similar to Sterigenics and Azko Nobel, there is a sharp decline from 1998 to 1999. The EtO emissions then spike around 2010.
Similarly to the Medline Industries facility, there is a surpringly low cancer diagnosis rate in this area.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 60031 | 37947 | 2481 | 165.4000 | 229.4256 | 435.8711 | 0.8349160 |
| 60087 | 26978 | 1751 | 116.7333 | 231.1079 | 432.6982 | 0.8288384 |
The overall rate per 100,000 people in the Vantage Specialties area is 434.55 and compared to the state baseline it is 0.832, which is 16.8% lower.
Lambent Technologies was located in Skokie. Data stopped being collected in 2005. The Lambent Technologies website says that they changed their name to Vantage Specialties.
This area contains some ZIP codes with very high cancer diagnosis rates.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 60712 | 12590 | 1468 | 97.86667 | 128.6444 | 777.3365 | 1.4889970 |
| 60203 | 4523 | 460 | 30.66667 | 147.4891 | 678.0161 | 1.2987475 |
| 60077 | 26825 | 2707 | 180.46667 | 148.6424 | 672.7555 | 1.2886709 |
| 60076 | 33415 | 3372 | 224.80000 | 148.6432 | 672.7518 | 1.2886637 |
| 60646 | 27177 | 2717 | 181.13333 | 150.0386 | 666.4950 | 1.2766787 |
| 60202 | 31361 | 2361 | 157.40000 | 199.2440 | 501.8973 | 0.9613899 |
| 60645 | 45274 | 3233 | 215.53333 | 210.0557 | 476.0643 | 0.9119065 |
| 60659 | 38104 | 2369 | 157.93333 | 241.2664 | 414.4797 | 0.7939406 |
| 60626 | 50139 | 2977 | 198.46667 | 252.6318 | 395.8329 | 0.7582225 |
The overall rate per 100,000 people in the Lambent Technologies area is 536.09 and compared to the state baseline it is 1.027, which is 2.7% higher.
Stepan Co is a chemical company in Elwood/Joliet, Illinois. They had a huge spike in EtO emissions in 1997 then it dropped back down the next year.
The ZIP code the facility is in, 60421, has a higher cancer diagnosis rate than the adjacent ZIP code.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 60421 | 3968 | 349 | 23.26667 | 170.5444 | 586.3575 | 1.123175 |
| 60410 | 12687 | 748 | 49.86667 | 254.4184 | 393.0533 | 0.752898 |
The overall rate per 100,000 people in the Stepan Co area is 439.11 and compared to the state baseline it is 0.841, which is 15.9% lower.
Tate & Lyle is a food manufacturing business in Decatur. The EtO emissions were very high before 1992 before they dropped down to low amounts from 1993-1998. The emissions then rise in 1999 where they remained fairly constant from then.
The two more largely populated ZIP codes in the area had cancer diagnosis rates around 50% higher than the state baseline while the lowest ZIP code was 33% less.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 62526 | 34075 | 4064 | 270.933333 | 125.7689 | 795.1088 | 1.5230401 |
| 62521 | 35851 | 4115 | 274.333333 | 130.6841 | 765.2041 | 1.4657573 |
| 62523 | 1237 | 65 | 4.333333 | 285.4615 | 350.3099 | 0.6710226 |
The overall rate per 100,000 people in the Tate & Lyle area is 772.31 and compared to the state baseline it is 1.479, which is 47.9% higher.
Here is a map of every ZIP code in the state of Illinois with the darkness of the fill corresponding to a higher cancer diagnosis rate. The filled in circles are centered around EtO emitting facilities. Hovering over these circles will tell you which facility it is.
In my original analysis which focused only on Sterigenics, I used both t-testing and \(\chi^2\) testing. I mentioned there that it was more appropriate to use the \(\chi^2\) test over the t-test. In the t-tests, the cancer rates per 100,000 people were averaged and comapred to the average rate of the rest of the state. By doing this, you misrepresent the distribution because of how the cancer rate per 100,000 people is calculated. The \(\chi^2\) test is based off a contingency table. This table contains the number of people per year who were diagnosed with cancer or not diagnosed with cancer in the EtO and out of the EtO affected areas. The expected value of each cell is calculated and compared to the actual recorded value. If any value has an unusually high \(\chi^2\) value, it is statistically significant.
Following is a table of all the EtO affected ZIP codes that I showed above.
| zip | population | freq | per_year | one_in_every_per_year | per_100000_per_year | zip_vs_state |
|---|---|---|---|---|---|---|
| 60714 | 29931 | 3808 | 253.866667 | 117.9005 | 848.1730 | 1.6246851 |
| 62526 | 34075 | 4064 | 270.933333 | 125.7689 | 795.1088 | 1.5230401 |
| 60712 | 12590 | 1468 | 97.866667 | 128.6444 | 777.3365 | 1.4889970 |
| 62521 | 35851 | 4115 | 274.333333 | 130.6841 | 765.2041 | 1.4657573 |
| 60525 | 31168 | 3389 | 225.933333 | 137.9522 | 724.8888 | 1.3885327 |
| 60053 | 23260 | 2494 | 166.266667 | 139.8957 | 714.8180 | 1.3692421 |
| 60068 | 37475 | 3997 | 266.466667 | 140.6367 | 711.0518 | 1.3620279 |
| 60546 | 15668 | 1643 | 109.533333 | 143.0432 | 699.0894 | 1.3391138 |
| 60025 | 39105 | 4091 | 272.733333 | 143.3818 | 697.4385 | 1.3359515 |
| 60203 | 4523 | 460 | 30.666667 | 147.4891 | 678.0161 | 1.2987475 |
| 60561 | 23115 | 2341 | 156.066667 | 148.1098 | 675.1749 | 1.2933052 |
| 60077 | 26825 | 2707 | 180.466667 | 148.6424 | 672.7555 | 1.2886709 |
| 60076 | 33415 | 3372 | 224.800000 | 148.6432 | 672.7518 | 1.2886637 |
| 60646 | 27177 | 2717 | 181.133333 | 150.0386 | 666.4950 | 1.2766787 |
| 60526 | 13576 | 1354 | 90.266667 | 150.3988 | 664.8988 | 1.2736213 |
| 60527 | 27486 | 2720 | 181.333333 | 151.5772 | 659.7298 | 1.2637200 |
| 60513 | 19047 | 1733 | 115.533333 | 164.8615 | 606.5697 | 1.1618912 |
| 60638 | 55026 | 5002 | 333.466667 | 165.0120 | 606.0165 | 1.1608316 |
| 60421 | 3968 | 349 | 23.266667 | 170.5444 | 586.3575 | 1.1231746 |
| 60016 | 59690 | 5181 | 345.400000 | 172.8141 | 578.6564 | 1.1084229 |
| 60534 | 10649 | 876 | 58.400000 | 182.3459 | 548.4083 | 1.0504824 |
| 60048 | 29095 | 2367 | 157.800000 | 184.3790 | 542.3612 | 1.0388991 |
| 60202 | 31361 | 2361 | 157.400000 | 199.2440 | 501.8973 | 0.9613899 |
| 60645 | 45274 | 3233 | 215.533333 | 210.0557 | 476.0643 | 0.9119065 |
| 60402 | 63448 | 4263 | 284.200000 | 223.2512 | 447.9259 | 0.8580071 |
| 60031 | 37947 | 2481 | 165.400000 | 229.4256 | 435.8711 | 0.8349160 |
| 60087 | 26978 | 1751 | 116.733333 | 231.1079 | 432.6982 | 0.8288384 |
| 60501 | 11626 | 723 | 48.200000 | 241.2033 | 414.5880 | 0.7941480 |
| 60659 | 38104 | 2369 | 157.933333 | 241.2664 | 414.4797 | 0.7939406 |
| 60626 | 50139 | 2977 | 198.466667 | 252.6318 | 395.8329 | 0.7582225 |
| 60410 | 12687 | 748 | 49.866667 | 254.4184 | 393.0533 | 0.7528980 |
| 60064 | 15407 | 901 | 60.066667 | 256.4983 | 389.8661 | 0.7467929 |
| 62523 | 1237 | 65 | 4.333333 | 285.4615 | 350.3099 | 0.6710226 |
| 60085 | 71714 | 3241 | 216.066667 | 331.9068 | 301.2894 | 0.5771233 |
The overall rate per 100,000 people in the EtO areas is 569.85 and compared to the state baseline it is 1.092, which is 9.2% higher.
Since the lowest population in the EtO affected areas is 1,237, I will remove all ZIP Codes with populations less than 1,000. In the Sterigenics only analysis, I removed all ZIP Codes with fewer than 5,000 people to make the samples more representative for comparison’s sake.
The point of this analysis is to determine if there is a link between cancer diagnosis rates and proximity to EtO facilities. Rather than perform a test for each facility individually, I will aggregate all the ZIP codes and do a test of EtO adjacent areas vs non-EtO adjacent areas.
In the table below, the first row contains the number of cancer diagnoses in the EtO adjacent areas from 2001-2015 in the first column and the number of people in the areas who were not diagnosed with cancer in the second column. The second row shows the same thing for the rest of the ZIP codes in Illinois.
| Cancer | No Cancer | Sum | |
|---|---|---|---|
| EtO Affected Areas | 85361 | 895108 | 980469 |
| Rest of State | 913276 | 10715742 | 11629018 |
| Sum | 998637 | 11610850 | 12609487 |
A \(\chi^2\) test was performed on this contingency table and the results are below. The p-value is very small, which means that this result is statistically significant. The formula for calculating the \(\chi^2\) value is shown in my Sterigenics-only analysis.
##
## Pearson's Chi-squared test
##
## data: contingency_table
## X-squared = 901.6, df = 1, p-value < 2.2e-16
The table below shows how many cancer diagnoses would be expected for each of the four groups based on the proportion of people in each group compared to the population overall. You can see that each number is off by around 7-8,000. This is most significant in the EtO Affected Cancer diagnoses because the number is at least an order of magnitude smaller than the rest in the table.
| Cancer | No Cancer | Sum | |
|---|---|---|---|
| EtO Affected Areas | 77650 | 902819 | 980469 |
| Rest of State | 920987 | 10708031 | 11629018 |
| Sum | 998637 | 11610850 | 12609487 |
This following table confirms that the number of cancer diagnoses in the EtO affected areas is significantly larger than it should be. Additionally, since the number of cancer diagnoses is so high in the EtO areas, the number of cancer diagnoses for the rest of the state is lower than it should be and the number of people who hadn’t been diagnosed with cancer is much lower than it should be.
| Cancer | No Cancer | |
|---|---|---|
| EtO Affected Areas | 27.670184 | -8.114913 |
| Rest of State | -8.034474 | 2.356293 |
Just like the previous analysis I performed, I found a statistically significant link between proximity to EtO plants and high cancer diagnosis rates. Very interestingly, there were areas around some facilities that didn’t have cancer rates higher than the state baseline, but when looking at all areas around EtO facilities, there is a cancer diagnosis rate of 9.2% higher than the state baseline. Based on this, there is a lot of evidence to suggest that living near an EtO polluting facility made you more likely to be diagnosed with cancer than if you lived anywhere else in the state.
US EPA Toxics Release Inventory
-United States Environmental Protection Agency. (2018). TRI Explorer (2017 Dataset (released October 2018)) [Internet database]. Retrieved from https://www.epa.gov/triexplorer, (November 02, 2018).
US EPA 2014 National Air Toxics Assessment
-United States Environmental Protection Agency. Retrieved from https://www.epa.gov/national-air-toxics-assessment/2014-national-air-toxics-assessment, (November 02, 2018).
Illinois Cancer Data
-Illinois Department of Public Health, Illinois State Cancer Registry, public dataset, 1986-2015, data as of November 2017
Illinois Population Data
-U.S. Census Bureau; Census 2010, Summary File 1, 5-Digit ZIP Code Tabulation within Illinois; generated by Athanasios Stamatoukos; using American FactFinder; http://factfinder.census.gov; (5 October 2018)
Motivation
I got the idea to do this analysis after seeing posts by “Stop Sterigenics” group members Richard Morton and Katherine M Howard
library(dplyr)
library(ggplot2)
library(plotly)
library(knitr)
library(leaflet)
library(RColorBrewer)
library(tigris)
library(rgeos)
library(sp)
facility <- c('Sterigenics', 'Ele Corp', 'Azko Nobel Surface Chemistry', 'Lutheran General Hospital',
'Medline Industries/Steris Isomedix', 'Vantage Specialties', 'Lambent Technologies',
'Stepan Co', 'Tate & Lyle')
town <- c('Willowbrook', 'McCook', 'McCook', 'Park Ridge',
'Northfield', 'Gurnee', 'Skokie',
'Elwood/Joliet', 'Decatur')
facilitytable <- cbind(Facility_Name=facility, Location=town)
kable(facilitytable, align='c')
cancer <- read.csv('./Data/il_cancer_statistics.csv', header = TRUE, stringsAsFactors = FALSE)
il_rate <- 522.0538
sterigenics <- read.csv('./Data/facilities/sterigenics.csv', header=TRUE, stringsAsFactors = FALSE)
ele <- read.csv('./Data/facilities/ele_corp.csv', header=TRUE, stringsAsFactors = FALSE)
azko <- read.csv('./Data/facilities/azko.csv', header=TRUE, stringsAsFactors = FALSE)
steris <- read.csv('./Data/facilities/steris_isomedix.csv', header=TRUE, stringsAsFactors = FALSE)
vantage <- read.csv('./Data/facilities/vantage.csv', header=TRUE, stringsAsFactors = FALSE)
lambent <- read.csv('./Data/facilities/lambent.csv', header=TRUE, stringsAsFactors = FALSE)
stepan <- read.csv('./Data/facilities/stepan_co.csv', header=TRUE, stringsAsFactors = FALSE)
tatelyle <- read.csv('./Data/facilities/tate_lyle.csv', header=TRUE, stringsAsFactors = FALSE)
zipSterigenics <- c(60561, 60527)
zipEle <- c(60525, 60501, 60534, 60546, 60402, 60513, 60638, 60526)
# zipAzko <- c(60525, 60501, 60534, 60546, 60402, 60513, 60638)
zipLuthGen <- c(60068, 60714, 60053, 60016,60025)
zipMedline <- c(60085, 60064, 60048, 60031)
zipVantage <- c(60031, 60087, 60058)
zipStepan <- c(60410, 60421)
zipTateLyle <- c(62526, 62523, 62521)
zipLambent <- c(60712, 60659, 60645, 60646, 60626, 60202, 60076, 60077, 60203)
zipEtO <- unique(c(zipSterigenics, zipEle, zipLuthGen, zipMedline, zipVantage, zipStepan, zipTateLyle, zipLambent))
g <- ggplot(sterigenics, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Sterigenics')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_steri <- subset(cancer, zip %in% zipSterigenics)
kable(arrange(cancer_steri[,-c(3,4)], desc(zip_vs_state)), align='c')
per_steri <- sum(cancer_steri$per_year)*100000/sum(cancer_steri$population)
comp_steri <- per_steri/il_rate
g <- ggplot() + geom_bar(data=ele, aes(year, total_emissions_tons), stat='identity', fill='red')
g <- g + geom_bar(data=azko, aes(year, total_emissions_tons), stat='identity', fill='blue')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Ele Corp (Red) and \n
Azko Nobel Surface Chemistry (Blue)')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_ele <- subset(cancer, zip %in% zipEle)
kable(arrange(cancer_ele[,-c(3,4)], desc(zip_vs_state)), align='c')
per_ele <- sum(cancer_ele$per_year)*100000/sum(cancer_ele$population)
comp_ele <- per_ele/il_rate
cancer_luthgen <- subset(cancer, zip %in% zipLuthGen)
kable(arrange(cancer_luthgen[,-c(3,4)], desc(zip_vs_state)), align='c')
per_lutheran <- sum(cancer_luthgen$per_year)*100000/sum(cancer_luthgen$population)
comp_lutheran <- per_lutheran/il_rate
g <- ggplot(steris, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Steris Isomedix')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_medline <- subset(cancer, zip %in% zipMedline)
kable(arrange(cancer_medline[,-c(3,4)], desc(zip_vs_state)), align='c')
per_medline <- sum(cancer_medline$per_year)*100000/sum(cancer_medline$population)
comp_medline <- per_medline/il_rate
g <- ggplot(vantage, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Vantage Specialties')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_vantage <- subset(cancer, zip %in% zipVantage)
kable(arrange(cancer_vantage[,-c(3,4)], desc(zip_vs_state)), align='c')
per_vantage <- sum(cancer_vantage$per_year)*100000/sum(cancer_vantage$population)
comp_vantage <- per_vantage/il_rate
g <- ggplot(lambent, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Lambent Technologies')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_lambent <- subset(cancer, zip %in% zipLambent)
kable(arrange(cancer_lambent[,-c(3,4)], desc(zip_vs_state)), align='c')
per_lambent <- sum(cancer_lambent$per_year)*100000/sum(cancer_lambent$population)
comp_lambent <- per_lambent/il_rate
g <- ggplot(stepan, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Stepan Co')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_stepan <- subset(cancer, zip %in% zipStepan)
kable(arrange(cancer_stepan[,-c(3,4)], desc(zip_vs_state)), align='c')
per_stepan <- sum(cancer_stepan$per_year)*100000/sum(cancer_stepan$population)
comp_stepan <- per_stepan/il_rate
g <- ggplot(tatelyle, aes(x=year, y=total_emissions_tons)) + geom_bar(stat='identity')
g <- g + xlab('Year') + ylab('EtO Emissions per Year (Metric Tons)') + ggtitle('EtO Emissions from Tate & Lyle')
g <- g + xlim(1987, 2018)
ggplotly(g)
cancer_tatelyle <- subset(cancer, zip %in% zipTateLyle)
kable(arrange(cancer_tatelyle[,-c(3,4)], desc(zip_vs_state)), align='c')
per_tatelyle <- sum(cancer_tatelyle$per_year)*100000/sum(cancer_tatelyle$population)
comp_tatelyle <- per_tatelyle/il_rate
datashp <- zctas(cb = TRUE, starts_with = cancer$zip)
data_map <- merge(datashp, cancer, by.x = 'GEOID10', by.y ='zip')
n <- leaflet(data_map) %>% addTiles()
n <- n %>% addMarkers(lat = ~lat, lng = ~long,
popup=~paste('Per 100,000: ', as.character(round(per_100000_per_year, 2)), '<br>',
'ZIP Code vs State: ', as.character(round(zip_vs_state, 2))),
label=~paste('Geographic Center of ', GEOID10, ' ZIP Code'),
clusterOptions = markerClusterOptions())
n <- n %>% addPolygons(data=data_map, weight=2, opacity = 1, fillOpacity = 0.5,
fillColor = ~colorQuantile('Greys', per_100000_per_year)(per_100000_per_year))
n <- n %>% addCircles(lat = 41.747375, lng = -87.939954,
label = 'Sterigenics Plant Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 41.805248, lng = -87.817419,
label = 'Ele Corp Plant Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 42.038210, lng = -87.847630,
label = 'Lutheran General Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 42.336930, lng = -87.889110,
label = 'Medline Industries Plant Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 41.442163, lng = -88.159382,
label = 'Stepan Co Plant Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 39.849503, lng = -88.918665,
label = 'Tate & Lyle Plant Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 42.383157, lng = -87.899563,
label = 'Vantage Specialties Plant Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 42.01284, lng = -87.717112,
label = 'Lambent Technologies Plant Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n <- n %>% addCircles(lat = 41.805017, lng = -87.827975,
label = 'Azko Nobel Surface Chemistry Plant Radius',
color=rev(brewer.pal(5, 'Spectral')),
radius = rev(seq(5))*804.672, opacity = 1, fillOpacity = 0.15)
n
cancer_EtO <- subset(cancer, zip %in% zipEtO)
kable(arrange(cancer_EtO[,-c(3,4)], desc(zip_vs_state)), align='c')
per_EtO <- sum(cancer_EtO$per_year)*100000/sum(cancer_EtO$population)
comp_EtO <- per_EtO/il_rate
cancer_rest <- subset(cancer, !(zip %in% zipEtO))
cancer_rest_big <- subset(cancer_rest, population > 1000)
EtO_cancer <- sum(cancer_EtO$freq)
EtO_no_cancer <- sum(cancer_EtO$population) - EtO_cancer
rest_cancer <- sum(cancer_rest_big$freq)
rest_no_cancer <- sum(cancer_rest_big$population) - rest_cancer
contingency_table <- matrix(c(EtO_cancer, EtO_no_cancer, rest_cancer, rest_no_cancer), nrow=2)
contingency_table <- round(contingency_table, 0)
rownames(contingency_table) <- c('EtO Affected Areas', 'Rest of State')
colnames(contingency_table) <- c('Cancer', 'No Cancer')
kable(addmargins(round(contingency_table, 0)), align='c')
chitest <- chisq.test(contingency_table, correct=FALSE)
chitest
kable(addmargins(round(chitest$expected, 0)), align = 'c')
kable(chitest$residuals, align = 'c')