library(tidyverse)
setwd("C:/Users/olait/OneDrive/Desktop/data 110")
trashcollection <- read_csv("baltimore trash collection 2004-2023.csv")Trash Collection Baltimore 2004-2023
Project 01: Trash Collection Baltimore 2014-2023
My project involves loading a stored dataset on trash collection in Baltimore from 2014 to 2023 from my drive into R. I will use various tools, including tidyverse and ggplot2, to load, analyze, and visualize the data, aiming to identify trends in trash collection in Baltimore.
The first step is to load the dataset into R and prepare it for analysis. This involves reading the data from a stored file in my workstation into R studio,loading it into my global environment, cleaning the data, and transforming it into a suitable format for analysis and visualization. The tools from the tidyverse package, such as dplyr for data manipulation and ggplot2 for visualization, will be instrumental in this process.
Lord required library and input the data from working directory in the global environment
Make headers lowercase and remove space
names(trashcollection)<- gsub(" ","_",tolower(names(trashcollection)))
head(trashcollection)# A tibble: 6 × 16
dumpster month year date `weight_(tons)` `volume_(cubic_yards)`
<dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 1 May 2014 5/16/2014 4.31 18
2 2 May 2014 5/16/2014 2.74 13
3 3 May 2014 5/16/2014 3.45 15
4 4 May 2014 5/17/2014 3.1 15
5 5 May 2014 5/17/2014 4.06 18
6 6 May 2014 5/20/2014 2.71 13
# ℹ 10 more variables: plastic_bottles <dbl>, polystyrene <dbl>,
# cigarette_butts <dbl>, glass_bottles <dbl>, plastic_bags <dbl>,
# wrappers <dbl>, sports_balls <dbl>, `homes_powered*` <dbl>, ...15 <lgl>,
# ...16 <lgl>
Notice R interprets the variable “group” as continuous values (col_double),ensure that the groups are considered as factors, rather than numbers.
Use as.factor as another way to ensure numerical values are read as categorical
trashcollection$dumpster <- as.factor(trashcollection$dumpster)
head(trashcollection)# A tibble: 6 × 16
dumpster month year date `weight_(tons)` `volume_(cubic_yards)`
<fct> <chr> <dbl> <chr> <dbl> <dbl>
1 1 May 2014 5/16/2014 4.31 18
2 2 May 2014 5/16/2014 2.74 13
3 3 May 2014 5/16/2014 3.45 15
4 4 May 2014 5/17/2014 3.1 15
5 5 May 2014 5/17/2014 4.06 18
6 6 May 2014 5/20/2014 2.71 13
# ℹ 10 more variables: plastic_bottles <dbl>, polystyrene <dbl>,
# cigarette_butts <dbl>, glass_bottles <dbl>, plastic_bags <dbl>,
# wrappers <dbl>, sports_balls <dbl>, `homes_powered*` <dbl>, ...15 <lgl>,
# ...16 <lgl>
Check the dimensions and the summary to make sure no missing values
dim(trashcollection)[1] 630 16
check the structure of the data
str(trashcollection)spc_tbl_ [630 × 16] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ dumpster : Factor w/ 629 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ month : chr [1:630] "May" "May" "May" "May" ...
$ year : num [1:630] 2014 2014 2014 2014 2014 ...
$ date : chr [1:630] "5/16/2014" "5/16/2014" "5/16/2014" "5/17/2014" ...
$ weight_(tons) : num [1:630] 4.31 2.74 3.45 3.1 4.06 2.71 1.91 3.7 2.52 3.76 ...
$ volume_(cubic_yards): num [1:630] 18 13 15 15 18 13 8 16 14 18 ...
$ plastic_bottles : num [1:630] 1450 1120 2450 2380 980 1430 910 3580 2400 1340 ...
$ polystyrene : num [1:630] 1820 1030 3100 2730 870 2140 1090 4310 2790 1730 ...
$ cigarette_butts : num [1:630] 126000 91000 105000 100000 120000 90000 56000 112000 98000 130000 ...
$ glass_bottles : num [1:630] 72 42 50 52 72 46 32 58 49 75 ...
$ plastic_bags : num [1:630] 584 496 1080 896 368 ...
$ wrappers : num [1:630] 1162 874 2032 1971 753 ...
$ sports_balls : num [1:630] 7 5 6 6 7 5 3 6 6 7 ...
$ homes_powered* : num [1:630] 0 0 0 0 0 0 0 0 0 0 ...
$ ...15 : logi [1:630] NA NA NA NA NA NA ...
$ ...16 : logi [1:630] NA NA NA NA NA NA ...
- attr(*, "spec")=
.. cols(
.. Dumpster = col_double(),
.. Month = col_character(),
.. Year = col_double(),
.. Date = col_character(),
.. `Weight (tons)` = col_number(),
.. `Volume (cubic yards)` = col_number(),
.. `Plastic Bottles` = col_number(),
.. Polystyrene = col_number(),
.. `Cigarette Butts` = col_number(),
.. `Glass Bottles` = col_number(),
.. `Plastic Bags` = col_number(),
.. Wrappers = col_number(),
.. `Sports Balls` = col_number(),
.. `Homes Powered*` = col_number(),
.. ...15 = col_logical(),
.. ...16 = col_logical()
.. )
- attr(*, "problems")=<externalptr>
summarise trashcollection data to display the 5-number summary of dataset dataset
summary(trashcollection) # display the 5-number summary of my data dumpster month year date
1 : 1 Length:630 Min. :2014 Length:630
2 : 1 Class :character 1st Qu.:2016 Class :character
3 : 1 Mode :character Median :2019 Mode :character
4 : 1 Mean :2019
5 : 1 3rd Qu.:2021
(Other):624 Max. :2023
NA's : 1 NA's :1
weight_(tons) volume_(cubic_yards) plastic_bottles polystyrene
Min. : 0.780 Min. : 7.00 Min. : 80 Min. : 20
1st Qu.: 2.720 1st Qu.: 15.00 1st Qu.: 1025 1st Qu.: 440
Median : 3.205 Median : 15.00 Median : 1900 Median : 1040
Mean : 6.411 Mean : 30.44 Mean : 3956 Mean : 2921
3rd Qu.: 3.730 3rd Qu.: 15.00 3rd Qu.: 2780 3rd Qu.: 2258
Max. :2019.540 Max. :9589.00 Max. :1246155 Max. :920011
cigarette_butts glass_bottles plastic_bags wrappers
Min. : 500 Min. : 0.00 Min. : 24 Min. : 180.0
1st Qu.: 3600 1st Qu.: 10.00 1st Qu.: 270 1st Qu.: 776.2
Median : 6000 Median : 18.00 Median : 551 Median : 1142.0
Mean : 37254 Mean : 42.86 Mean : 1732 Mean : 2851.2
3rd Qu.: 22000 3rd Qu.: 29.75 3rd Qu.: 1140 3rd Qu.: 1980.0
Max. :11735100 Max. :13502.00 Max. :545554 Max. :898129.0
sports_balls homes_powered* ...15 ...16
Min. : 0.00 Min. : 0.00 Mode:logical Mode:logical
1st Qu.: 6.00 1st Qu.: 41.00 NA's:630 NA's:630
Median : 12.00 Median : 52.00
Mean : 27.15 Mean : 95.38
3rd Qu.: 20.00 3rd Qu.: 60.75
Max. :8553.00 Max. :30020.00
aggregate data by year (for simplicity)
While the trash collection data for Baltimore is very rich, I intend to study the yearly trash collection by grouping the data by months to examine the trash trends…so for simplicity and clarity “trash_collection_baltimore” was changed to “trash collection data for Baltimore”. I intend to study the yearly trash collection by grouping the data by months to examine the trash trends.
yearly_data <- trashcollection |>
mutate(year = format(as.Date(date, format = "%m/%d/%Y"), "%Y")) |>
group_by(year) |>
summarise(
plastic_bottles = sum(plastic_bottles, na.rm = TRUE),
polystyrene = sum(polystyrene, na.rm = TRUE),
cigarette_butts = sum(cigarette_butts, na.rm = TRUE),
glass_bottles = sum(glass_bottles, na.rm = TRUE),
plastic_bags = sum(plastic_bags, na.rm = TRUE),
wrappers = sum(wrappers, na.rm = TRUE),
)
yearly_data# A tibble: 11 × 7
year plastic_bottles polystyrene cigarette_butts glass_bottles plastic_bags
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2014 82590 102909 4162000 2057 38730
2 2015 136130 177710 2856000 2327 98840
3 2016 149210 178808 1887600 2010 113647
4 2017 112340 112150 731000 1178 109890
5 2018 123800 118550 803300 896 69950
6 2019 115985 90400 329120 718 35221
7 2020 138560 88810 335900 961 33434
8 2021 128150 25690 213100 1015 21625
9 2022 126280 10527 205000 1061 9819
10 2023 133110 14457 212080 1279 14398
11 <NA> 1246155 920011 11735100 13502 545554
# ℹ 1 more variable: wrappers <dbl>
View the structure of the dataset to verify column names
str(yearly_data)tibble [11 × 7] (S3: tbl_df/tbl/data.frame)
$ year : chr [1:11] "2014" "2015" "2016" "2017" ...
$ plastic_bottles: num [1:11] 82590 136130 149210 112340 123800 ...
$ polystyrene : num [1:11] 102909 177710 178808 112150 118550 ...
$ cigarette_butts: num [1:11] 4162000 2856000 1887600 731000 803300 ...
$ glass_bottles : num [1:11] 2057 2327 2010 1178 896 ...
$ plastic_bags : num [1:11] 38730 98840 113647 109890 69950 ...
$ wrappers : num [1:11] 70849 131290 135940 127470 118980 ...
Create a data frame with the provided data
data <- data.frame(
Year = c(2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 0), # Add 0 for the total row
Plastic_Bottles = c(82590, 136130, 149210, 112340, 123800, 115985, 138560, 128150, 126280, 133110, 1246155),
Polystyrene = c(102909, 177710, 178808, 112150, 118550, 90400, 88810, 25690, 10527, 14457, 920011),
Cigarette_Butts = c(4162000, 2856000, 1887600, 731000, 803300, 329120, 335900, 213100, 205000, 212080, 11735100),
Glass_Bottles = c(2057, 2327, 2010, 1178, 896, 718, 961, 1015, 1061, 1279, 13502),
Plastic_Bags = c(38730, 98840, 113647, 109890, 69950, 35221, 33434, 21625, 9819, 14398, 545554),
Wrappers = c(70849, 131290, 135940, 127470, 118980, 67645, 53240, 42625, 50540, 99550, 898129)
)
data <- data[-nrow(data), ] ## Remove the last row which contains total values and not values associated to a particular variable.Calculate growth rates
growth_rates <- data[-1, -1] / data[-nrow(data), -1] - 1 # Percentage change from year to yearConvert growth rates data into long format suitable for ggplot
library(tidyr)
growth_rates_long <- gather(data.frame(Year = data$Year[-1], growth_rates), Category, Growth_Rate, -Year)Define custom colors for each category
category_colors <- c("Plastic_Bottles" = "#115CF1",
"Polystyrene" = "#50F111",
"Cigarette_Butts" = "#EF541B",
"Glass_Bottles" = "#F49507",
"Plastic_Bags" = "#C50CF8",
"Wrappers" = "#b15928")Plot growth rates using ggplot2
library(ggplot2)
ggplot(growth_rates_long, aes(x = Year, y = Growth_Rate, color = Category)) +
geom_line() +
labs(title = "Growth Rate of Waste Categories",
x = "Year",
y = "Growth Rate (%)",
color = "Category") +
scale_color_manual(values = category_colors) + # Set custom colors
theme_minimal()Create a data frame with the provided average annual waste data
average_annual_waste <- data.frame(
Waste_Type = c("Avg_Plastic_Bottles", "Avg_Polystyrene", "Avg_Cigarette_Butts", "Avg_Glass_Bottles", "Avg_Plastic_Bags", "Avg_Wrappers"),
Average_Annual_Waste = c(226573.636, 167274.727, 2133654.545, 2454.909, 99191.636, 163296.182)
)Plotting with ggplot2
# Plotting with ggplot2 as histograms
plot2 <- ggplot(average_annual_waste, aes(x = Average_Annual_Waste, fill = Waste_Type)) +
geom_histogram(binwidth = 500000, position = "identity", alpha = 0.7, color = "black") +
facet_wrap(~ Waste_Type, scales = "free_y") +
labs(title = "Distribution of Average Annual Waste by Type (2014-2023)",
x = "Average Annual Waste",
y = "Frequency",
fill = "Waste Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
strip.text = element_text(size = 12))
# Print the plot
print(plot2)Analysis of Trash Collection in Baltimore from 2014 to 2023
The dataset detailing trash collection in Baltimore from 2014 to 2023 provides valuable insights into the city’s waste collection trends. The data encompasses various types of waste, mostly non-biodegradable materials such as plastic bottles, polystyrene (Styrofoam), cigarette butts, glass bottles, plastic bags, and wrappers.Each type of waste has a significant environmental impact due to its decomposition timeline:
Plastic Bags: Plastic bags can take 10–20 years to decompose under typical conditions, but in landfills, they can persist for up to 1,000 years.
Glass Bottles: Glass bottles can take up to 1 million years to decompose in a landfill, with some sources suggesting they may never fully decompose in certain conditions.
Polystyrene (Styrofoam): Polystyrene is extremely durable and resistant to natural degradation. It can take over 500 years to decompose and is biologically inert, making it challenging for microorganisms to break down.
Plastic Bottles: Plastic bottles can take approximately 450 years to break down in the ocean or in a landfill. The decomposition rate varies based on the type of plastic and environmental conditions.
Wrappers: The decomposition of plastic wrappers depends on the type of plastic used and environmental factors. For instance, chip bags and candy wrappers can take 10–20 years to decompose, while cling wrap may take anywhere from 10 years to several hundred years.
Understanding these decomposition timelines underscores the importance of sustainable waste management practices and policies to mitigate the environmental impact of these materials. Hence, the need to track it’s collection and analysis.
This analysis aims to explore the annual trends, the contribution of different trash types to the overall waste, and the year-over-year growth rates. Understanding these patterns is crucial for developing effective waste management policies and mitigating environmental impacts.
Yearly Trends in Waste Collection
The data reveals significant annual variations in the amounts of different types of waste collected. One noticeable trend is the interchangeability of trash types and how their quantities fluctuate from year to year. For instance, the number of plastic bottles collected peaked in 2016 but showed a declining trend in subsequent years.One can not really infer if the decline was due to policies(due to rebound of unemployment: as it hits an all time high of 9.6% in 2010…), Polystyrene waste remained relatively high in 2015 and 2016 but dropped sharply in the following years. This variation indicates changes in consumer behavior, waste management practices, or both.
Cigarette butts consistently dominated the waste collected, with the highest figures recorded in 2014 and a substantial decline thereafter. This trend could reflect successful public health campaigns or shifts in smoking habits. Conversely, the collection of glass bottles remained relatively low compared to other waste types, suggesting either effective recycling practices or lower consumption rates.
Average Annual Waste
Calculating the average annual waste provides a comprehensive overview of the city’s waste landscape. The total annual waste over the years highlights the fluctuating nature of trash collection, driven by both external factors and municipal efforts. For example, the average amount of plastic bottles collected annually was approximately 200000+, while polystyrene and cigarette butts averaged 102,223 and 1,914,050, respectively. These figures underscore the need for targeted policies addressing specific types of waste.
Year-over-Year Growth Rates
Investigating the year-over-year growth rates of different waste types provides insights into emerging trends and potential areas of concern. The plot of growth rates from 2014 to 2023 reveals periods of both sharp increases and decreases across all waste types. For instance, the growth rate for plastic bottles spiked in 2015 and 2020, possibly due to changes in consumption patterns or waste collection efficiency. Similarly, the growth rate of polystyrene saw a dramatic decline after 2016, which might be attributed to regulatory measures or increased public awareness about its environmental impact.
Understanding these growth rates is vital for crafting effective waste management strategies. For example, a consistent increase in plastic bottle waste may necessitate policies promoting recycling or alternative packaging materials. Similarly, a significant reduction in certain types of waste could indicate the success of existing policies and provide a model for other waste types.
Implications for Waste Management Policies
The analysis of Baltimore’s trash collection data underscores the complexity of waste management and the need for multifaceted approaches. Policymakers must consider the diverse types of waste and their respective trends to devise comprehensive strategies. For instance, targeted recycling programs, public awareness campaigns, and regulatory measures can collectively address the varying waste types and their environmental impacts.
Moreover, the data highlights the importance of monitoring and adapting policies based on emerging trends. The fluctuating growth rates suggest that static policies may be insufficient to address the dynamic nature of waste generation and collection. Continuous data analysis and flexible policy frameworks are essential for effectively managing the city’s waste and mitigating its environmental footprint.
Conclusion
The dataset on Baltimore’s trash collection from 2014 to 2023 offers a rich source of information for understanding waste management trends. By analyzing average annual waste, and year-over-year growth rates, we can gain valuable insights into the city’s waste landscape. These insights are crucial for developing targeted policies that address the specific challenges posed by different types of waste and promote sustainable waste management practices.