Trash Collection Baltimore 2004-2023

Author

Paul Daniel-Orie

Project 01: Trash Collection Baltimore 2014-2023

Trash Collection Baltimore 2014-2023

Trash Collection Baltimore 2014-2023

My project involves loading a stored dataset on trash collection in Baltimore from 2014 to 2023 from my drive into R. I will use various tools, including tidyverse and ggplot2, to load, analyze, and visualize the data, aiming to identify trends in trash collection in Baltimore.

The first step is to load the dataset into R and prepare it for analysis. This involves reading the data from a stored file in my workstation into R studio,loading it into my global environment, cleaning the data, and transforming it into a suitable format for analysis and visualization. The tools from the tidyverse package, such as dplyr for data manipulation and ggplot2 for visualization, will be instrumental in this process.

Lord required library and input the data from working directory in the global environment

library(tidyverse)
setwd("C:/Users/olait/OneDrive/Desktop/data 110")
trashcollection <- read_csv("baltimore trash collection 2004-2023.csv")

Make headers lowercase and remove space

names(trashcollection)<- gsub(" ","_",tolower(names(trashcollection)))
head(trashcollection)
# A tibble: 6 × 16
  dumpster month  year date      `weight_(tons)` `volume_(cubic_yards)`
     <dbl> <chr> <dbl> <chr>               <dbl>                  <dbl>
1        1 May    2014 5/16/2014            4.31                     18
2        2 May    2014 5/16/2014            2.74                     13
3        3 May    2014 5/16/2014            3.45                     15
4        4 May    2014 5/17/2014            3.1                      15
5        5 May    2014 5/17/2014            4.06                     18
6        6 May    2014 5/20/2014            2.71                     13
# ℹ 10 more variables: plastic_bottles <dbl>, polystyrene <dbl>,
#   cigarette_butts <dbl>, glass_bottles <dbl>, plastic_bags <dbl>,
#   wrappers <dbl>, sports_balls <dbl>, `homes_powered*` <dbl>, ...15 <lgl>,
#   ...16 <lgl>

Notice R interprets the variable “group” as continuous values (col_double),ensure that the groups are considered as factors, rather than numbers.

Use as.factor as another way to ensure numerical values are read as categorical

trashcollection$dumpster <- as.factor(trashcollection$dumpster)
head(trashcollection)
# A tibble: 6 × 16
  dumpster month  year date      `weight_(tons)` `volume_(cubic_yards)`
  <fct>    <chr> <dbl> <chr>               <dbl>                  <dbl>
1 1        May    2014 5/16/2014            4.31                     18
2 2        May    2014 5/16/2014            2.74                     13
3 3        May    2014 5/16/2014            3.45                     15
4 4        May    2014 5/17/2014            3.1                      15
5 5        May    2014 5/17/2014            4.06                     18
6 6        May    2014 5/20/2014            2.71                     13
# ℹ 10 more variables: plastic_bottles <dbl>, polystyrene <dbl>,
#   cigarette_butts <dbl>, glass_bottles <dbl>, plastic_bags <dbl>,
#   wrappers <dbl>, sports_balls <dbl>, `homes_powered*` <dbl>, ...15 <lgl>,
#   ...16 <lgl>

Check the dimensions and the summary to make sure no missing values

dim(trashcollection)
[1] 630  16

check the structure of the data

str(trashcollection)
spc_tbl_ [630 × 16] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ dumpster            : Factor w/ 629 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ month               : chr [1:630] "May" "May" "May" "May" ...
 $ year                : num [1:630] 2014 2014 2014 2014 2014 ...
 $ date                : chr [1:630] "5/16/2014" "5/16/2014" "5/16/2014" "5/17/2014" ...
 $ weight_(tons)       : num [1:630] 4.31 2.74 3.45 3.1 4.06 2.71 1.91 3.7 2.52 3.76 ...
 $ volume_(cubic_yards): num [1:630] 18 13 15 15 18 13 8 16 14 18 ...
 $ plastic_bottles     : num [1:630] 1450 1120 2450 2380 980 1430 910 3580 2400 1340 ...
 $ polystyrene         : num [1:630] 1820 1030 3100 2730 870 2140 1090 4310 2790 1730 ...
 $ cigarette_butts     : num [1:630] 126000 91000 105000 100000 120000 90000 56000 112000 98000 130000 ...
 $ glass_bottles       : num [1:630] 72 42 50 52 72 46 32 58 49 75 ...
 $ plastic_bags        : num [1:630] 584 496 1080 896 368 ...
 $ wrappers            : num [1:630] 1162 874 2032 1971 753 ...
 $ sports_balls        : num [1:630] 7 5 6 6 7 5 3 6 6 7 ...
 $ homes_powered*      : num [1:630] 0 0 0 0 0 0 0 0 0 0 ...
 $ ...15               : logi [1:630] NA NA NA NA NA NA ...
 $ ...16               : logi [1:630] NA NA NA NA NA NA ...
 - attr(*, "spec")=
  .. cols(
  ..   Dumpster = col_double(),
  ..   Month = col_character(),
  ..   Year = col_double(),
  ..   Date = col_character(),
  ..   `Weight (tons)` = col_number(),
  ..   `Volume (cubic yards)` = col_number(),
  ..   `Plastic Bottles` = col_number(),
  ..   Polystyrene = col_number(),
  ..   `Cigarette Butts` = col_number(),
  ..   `Glass Bottles` = col_number(),
  ..   `Plastic Bags` = col_number(),
  ..   Wrappers = col_number(),
  ..   `Sports Balls` = col_number(),
  ..   `Homes Powered*` = col_number(),
  ..   ...15 = col_logical(),
  ..   ...16 = col_logical()
  .. )
 - attr(*, "problems")=<externalptr> 

summarise trashcollection data to display the 5-number summary of dataset dataset

summary(trashcollection) # display the 5-number summary of my data
    dumpster      month                year          date          
 1      :  1   Length:630         Min.   :2014   Length:630        
 2      :  1   Class :character   1st Qu.:2016   Class :character  
 3      :  1   Mode  :character   Median :2019   Mode  :character  
 4      :  1                      Mean   :2019                     
 5      :  1                      3rd Qu.:2021                     
 (Other):624                      Max.   :2023                     
 NA's   :  1                      NA's   :1                        
 weight_(tons)      volume_(cubic_yards) plastic_bottles    polystyrene    
 Min.   :   0.780   Min.   :   7.00      Min.   :     80   Min.   :    20  
 1st Qu.:   2.720   1st Qu.:  15.00      1st Qu.:   1025   1st Qu.:   440  
 Median :   3.205   Median :  15.00      Median :   1900   Median :  1040  
 Mean   :   6.411   Mean   :  30.44      Mean   :   3956   Mean   :  2921  
 3rd Qu.:   3.730   3rd Qu.:  15.00      3rd Qu.:   2780   3rd Qu.:  2258  
 Max.   :2019.540   Max.   :9589.00      Max.   :1246155   Max.   :920011  
                                                                           
 cigarette_butts    glass_bottles       plastic_bags       wrappers       
 Min.   :     500   Min.   :    0.00   Min.   :    24   Min.   :   180.0  
 1st Qu.:    3600   1st Qu.:   10.00   1st Qu.:   270   1st Qu.:   776.2  
 Median :    6000   Median :   18.00   Median :   551   Median :  1142.0  
 Mean   :   37254   Mean   :   42.86   Mean   :  1732   Mean   :  2851.2  
 3rd Qu.:   22000   3rd Qu.:   29.75   3rd Qu.:  1140   3rd Qu.:  1980.0  
 Max.   :11735100   Max.   :13502.00   Max.   :545554   Max.   :898129.0  
                                                                          
  sports_balls     homes_powered*      ...15          ...16        
 Min.   :   0.00   Min.   :    0.00   Mode:logical   Mode:logical  
 1st Qu.:   6.00   1st Qu.:   41.00   NA's:630       NA's:630      
 Median :  12.00   Median :   52.00                                
 Mean   :  27.15   Mean   :   95.38                                
 3rd Qu.:  20.00   3rd Qu.:   60.75                                
 Max.   :8553.00   Max.   :30020.00                                
                                                                   

aggregate data by year (for simplicity)

While the trash collection data for Baltimore is very rich, I intend to study the yearly trash collection by grouping the data by months to examine the trash trends…so for simplicity and clarity “trash_collection_baltimore” was changed to “trash collection data for Baltimore”. I intend to study the yearly trash collection by grouping the data by months to examine the trash trends.

yearly_data <- trashcollection |>
  mutate(year = format(as.Date(date, format = "%m/%d/%Y"), "%Y")) |>
  group_by(year) |>
  summarise(
    plastic_bottles = sum(plastic_bottles, na.rm = TRUE),
    polystyrene = sum(polystyrene, na.rm = TRUE),
    cigarette_butts = sum(cigarette_butts, na.rm = TRUE),
    glass_bottles = sum(glass_bottles, na.rm = TRUE),
    plastic_bags = sum(plastic_bags, na.rm = TRUE),
    wrappers = sum(wrappers, na.rm = TRUE),
      )
yearly_data
# A tibble: 11 × 7
   year  plastic_bottles polystyrene cigarette_butts glass_bottles plastic_bags
   <chr>           <dbl>       <dbl>           <dbl>         <dbl>        <dbl>
 1 2014            82590      102909         4162000          2057        38730
 2 2015           136130      177710         2856000          2327        98840
 3 2016           149210      178808         1887600          2010       113647
 4 2017           112340      112150          731000          1178       109890
 5 2018           123800      118550          803300           896        69950
 6 2019           115985       90400          329120           718        35221
 7 2020           138560       88810          335900           961        33434
 8 2021           128150       25690          213100          1015        21625
 9 2022           126280       10527          205000          1061         9819
10 2023           133110       14457          212080          1279        14398
11 <NA>          1246155      920011        11735100         13502       545554
# ℹ 1 more variable: wrappers <dbl>

View the structure of the dataset to verify column names

str(yearly_data)
tibble [11 × 7] (S3: tbl_df/tbl/data.frame)
 $ year           : chr [1:11] "2014" "2015" "2016" "2017" ...
 $ plastic_bottles: num [1:11] 82590 136130 149210 112340 123800 ...
 $ polystyrene    : num [1:11] 102909 177710 178808 112150 118550 ...
 $ cigarette_butts: num [1:11] 4162000 2856000 1887600 731000 803300 ...
 $ glass_bottles  : num [1:11] 2057 2327 2010 1178 896 ...
 $ plastic_bags   : num [1:11] 38730 98840 113647 109890 69950 ...
 $ wrappers       : num [1:11] 70849 131290 135940 127470 118980 ...

Create a data frame with the provided data

data <- data.frame(
  Year = c(2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 0),  # Add 0 for the total row
  Plastic_Bottles = c(82590, 136130, 149210, 112340, 123800, 115985, 138560, 128150, 126280, 133110, 1246155),
  Polystyrene = c(102909, 177710, 178808, 112150, 118550, 90400, 88810, 25690, 10527, 14457, 920011),
  Cigarette_Butts = c(4162000, 2856000, 1887600, 731000, 803300, 329120, 335900, 213100, 205000, 212080, 11735100),
  Glass_Bottles = c(2057, 2327, 2010, 1178, 896, 718, 961, 1015, 1061, 1279, 13502),
  Plastic_Bags = c(38730, 98840, 113647, 109890, 69950, 35221, 33434, 21625, 9819, 14398, 545554),
  Wrappers = c(70849, 131290, 135940, 127470, 118980, 67645, 53240, 42625, 50540, 99550, 898129)
)

data <- data[-nrow(data), ] ## Remove the last row which contains total values and not values associated to a particular variable.

Calculate growth rates

growth_rates <- data[-1, -1] / data[-nrow(data), -1] - 1  # Percentage change from year to year

Convert growth rates data into long format suitable for ggplot

library(tidyr)
growth_rates_long <- gather(data.frame(Year = data$Year[-1], growth_rates), Category, Growth_Rate, -Year)

Define custom colors for each category

category_colors <- c("Plastic_Bottles" = "#115CF1",
                     "Polystyrene" = "#50F111",
                     "Cigarette_Butts" = "#EF541B",
                     "Glass_Bottles" = "#F49507",
                     "Plastic_Bags" = "#C50CF8",
                     "Wrappers" = "#b15928")

Plot growth rates using ggplot2

library(ggplot2)
ggplot(growth_rates_long, aes(x = Year, y = Growth_Rate, color = Category)) +
  geom_line() +
  labs(title = "Growth Rate of Waste Categories",
       x = "Year",
       y = "Growth Rate (%)",
       color = "Category")  +
  scale_color_manual(values = category_colors) +  # Set custom colors
  theme_minimal()

Create a data frame with the provided average annual waste data

average_annual_waste <- data.frame(
  Waste_Type = c("Avg_Plastic_Bottles", "Avg_Polystyrene", "Avg_Cigarette_Butts", "Avg_Glass_Bottles", "Avg_Plastic_Bags", "Avg_Wrappers"),
  Average_Annual_Waste = c(226573.636, 167274.727, 2133654.545, 2454.909, 99191.636, 163296.182)
)

Plotting with ggplot2

# Plotting with ggplot2 as histograms
plot2 <- ggplot(average_annual_waste, aes(x = Average_Annual_Waste, fill = Waste_Type)) +
  geom_histogram(binwidth = 500000, position = "identity", alpha = 0.7, color = "black") +
  facet_wrap(~ Waste_Type, scales = "free_y") +
  labs(title = "Distribution of Average Annual Waste by Type (2014-2023)",
       x = "Average Annual Waste",
       y = "Frequency",
       fill = "Waste Type") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        strip.text = element_text(size = 12))

# Print the plot
print(plot2)

Analysis of Trash Collection in Baltimore from 2014 to 2023

The dataset detailing trash collection in Baltimore from 2014 to 2023 provides valuable insights into the city’s waste collection trends. The data encompasses various types of waste, mostly non-biodegradable materials such as plastic bottles, polystyrene (Styrofoam), cigarette butts, glass bottles, plastic bags, and wrappers.Each type of waste has a significant environmental impact due to its decomposition timeline:

  1. Plastic Bags: Plastic bags can take 10–20 years to decompose under typical conditions, but in landfills, they can persist for up to 1,000 years.

  2. Glass Bottles: Glass bottles can take up to 1 million years to decompose in a landfill, with some sources suggesting they may never fully decompose in certain conditions.

  3. Polystyrene (Styrofoam): Polystyrene is extremely durable and resistant to natural degradation. It can take over 500 years to decompose and is biologically inert, making it challenging for microorganisms to break down.

  4. Plastic Bottles: Plastic bottles can take approximately 450 years to break down in the ocean or in a landfill. The decomposition rate varies based on the type of plastic and environmental conditions.

  5. Wrappers: The decomposition of plastic wrappers depends on the type of plastic used and environmental factors. For instance, chip bags and candy wrappers can take 10–20 years to decompose, while cling wrap may take anywhere from 10 years to several hundred years.

Understanding these decomposition timelines underscores the importance of sustainable waste management practices and policies to mitigate the environmental impact of these materials. Hence, the need to track it’s collection and analysis.

This analysis aims to explore the annual trends, the contribution of different trash types to the overall waste, and the year-over-year growth rates. Understanding these patterns is crucial for developing effective waste management policies and mitigating environmental impacts.

Average Annual Waste

Calculating the average annual waste provides a comprehensive overview of the city’s waste landscape. The total annual waste over the years highlights the fluctuating nature of trash collection, driven by both external factors and municipal efforts. For example, the average amount of plastic bottles collected annually was approximately 200000+, while polystyrene and cigarette butts averaged 102,223 and 1,914,050, respectively. These figures underscore the need for targeted policies addressing specific types of waste.

Year-over-Year Growth Rates

Investigating the year-over-year growth rates of different waste types provides insights into emerging trends and potential areas of concern. The plot of growth rates from 2014 to 2023 reveals periods of both sharp increases and decreases across all waste types. For instance, the growth rate for plastic bottles spiked in 2015 and 2020, possibly due to changes in consumption patterns or waste collection efficiency. Similarly, the growth rate of polystyrene saw a dramatic decline after 2016, which might be attributed to regulatory measures or increased public awareness about its environmental impact.

Understanding these growth rates is vital for crafting effective waste management strategies. For example, a consistent increase in plastic bottle waste may necessitate policies promoting recycling or alternative packaging materials. Similarly, a significant reduction in certain types of waste could indicate the success of existing policies and provide a model for other waste types.

Implications for Waste Management Policies

The analysis of Baltimore’s trash collection data underscores the complexity of waste management and the need for multifaceted approaches. Policymakers must consider the diverse types of waste and their respective trends to devise comprehensive strategies. For instance, targeted recycling programs, public awareness campaigns, and regulatory measures can collectively address the varying waste types and their environmental impacts.

Moreover, the data highlights the importance of monitoring and adapting policies based on emerging trends. The fluctuating growth rates suggest that static policies may be insufficient to address the dynamic nature of waste generation and collection. Continuous data analysis and flexible policy frameworks are essential for effectively managing the city’s waste and mitigating its environmental footprint.

Conclusion

The dataset on Baltimore’s trash collection from 2014 to 2023 offers a rich source of information for understanding waste management trends. By analyzing average annual waste, and year-over-year growth rates, we can gain valuable insights into the city’s waste landscape. These insights are crucial for developing targeted policies that address the specific challenges posed by different types of waste and promote sustainable waste management practices.