The work below explains how the Sustainabilty Team used basic data imputation methods to handle missing data from Macalester’s waste data. The primary reason there are missing patches of data is because of the change in Macalester’s waste management on June 2012. While the previous vendor provided trash data, the current vendor does not. The specific types and timelines for the missing data is overall trash numbers from June 2012 to September 2013 and cardboard and bottles/cans data from July 2012 to September 2012.
Since we know the cause of our missing data we can safely assume that the missingness is not completely at random. To handle the missing data we considered taking the last observation carried forward, average data value, and median data value for a specified time frame. We concluded that due to severity of the variability in the data values we narrowed our time frame by months. This makes sense in terms of observing trash value, because we can see from our data that at certain months such as May, we observe high values of trash due to school events (Move-out). With this in mind, we decided to use median for these reasons. Last observation carried forward is a conservative approach that assumes that the value has not changed from last time it was measured. Mean seemed to be a good choice, until we looked at the variability of the values by month. Even though we narrowed the time frame by months, the numbers fluctuated widely. That being said we chose median over mean, because of median’s robust nature to give a more accurate presentation of the missing data. Below is the graph of the median values by month.
In the case of the cardboard and bottles/cans data, we take a similar approach to handle the three missing data points. Below are the values that we computed to use and two graphs of the value by Month for both categories.
## [1] 3460
## [1] 5240
## [1] 9850
## [1] 1587
## [1] 1588
## [1] 2968
With our data now complete, below is a graph of the diversion rate over time.