This report investigates waste generation patterns and environmental impacts across Australian states using the National Waste Report 2022 dataset. The dataset spans the financial years from 2006–07 to 2020–21 and contains detailed records of waste types, quantities, sources (streams), and treatment methods (fates). The aim is to extract meaningful insights from the data through exploratory data analysis (EDA), statistical summaries, and visualisation techniques using R.
In addition to the primary dataset (Wastes.csv), a
secondary dataset (Year_State_ID.csv) is used to
incorporate temporal and geographic context such as economic growth
rates by state and year. For selected questions, relevant external data
(e.g., national building activity) is integrated to strengthen the
investigation.
The tasks include computing key statistics, identifying waste trends, examining the relationship between waste treatment and environmental impact scores, and investigating drivers of C&D waste using both internal and external data sources.
The analysis begins by importing the core datasets provided in the assignment:
Wastes.csv: Contains detailed waste records, including
type, category, stream, fate, tonnes, and descriptive metadata with
embedded environmental impact scores.Year_State_ID.csv: Maps each record to a year and
state, and includes economic growth figures.Below, we load both datasets and inspect their structure.
wastes <- read_csv("Wastes.csv")
year_state <- read_csv("Year_State_ID.csv")
glimpse(wastes)
## Rows: 36,718
## Columns: 9
## $ Case_ID <dbl> 34658, 34659, 34662, 34663, 34664, 34666, 34667, 34668…
## $ Year_State_ID <dbl> 1001, 1001, 1001, 1001, 1001, 1001, 1001, 1001, 1001, …
## $ Category <chr> "Biosolids", "Biosolids", "Building and demolition mat…
## $ Type <chr> "Biosolids", "Biosolids", "Asphalt", "Asphalt", "Aspha…
## $ Stream <chr> "C&I", "C&I", "C&D", "C&I", "MSW", "C&D", "C&I", "MSW"…
## $ Fate <chr> "Disposal", "Recycling", "Recycling", "Recycling", "Re…
## $ Tonnes <dbl> 2.018555e+04, 0.000000e+00, 0.000000e+00, 0.000000e+00…
## $ `Core_Non-core` <chr> "Core waste", "Core waste", "Core waste", "Core waste"…
## $ Description <chr> "A total of 20185.55 tonnes of Biosolids (C&I) was sen…
glimpse(year_state)
## Rows: 130
## Columns: 4
## $ ID <dbl> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, …
## $ Year <chr> "2020-2021", "2020-2021", "2020-2021", "2020-2021", "2…
## $ State <chr> "ACT", "Australia", "NSW", "NT", "Qld", "SA", "Tas", "…
## $ Economic_Growth <dbl> 1.07, -0.03, -0.88, -0.49, -0.10, -1.31, -0.52, -0.87,…
These datasets are then preprocessed as needed in later sections to clean inconsistencies, extract variables (e.g., environmental scores), and merge additional contextual information (e.g., year, state).
# Clean and standardize waste categories
unique_categories <- wastes %>%
filter(!is.na(Category)) %>%
mutate(Category = str_trim(Category)) %>%
mutate(Category = str_replace_all(Category, "and", "&")) %>%
mutate(Category = str_to_title(Category)) %>%
mutate(Category = case_when(
str_to_lower(Category) %in% c("plastic") ~ "Plastics",
str_to_lower(Category) %in% c("organic") ~ "Organics",
str_to_lower(Category) == "metals & plastics" ~ "Metals & Plastic",
str_to_lower(Category) %in% c("hw", "hazardous wastes") ~ "Hazardous Wastes",
TRUE ~ Category
)) %>%
distinct(Category) %>%
arrange(Category) %>%
pull(Category)
length(unique_categories)
## [1] 16
kable(data.frame(Categories = unique_categories), caption = "Standardised Unique Waste Categories") %>%
kable_styling(full_width = FALSE)
| Categories |
|---|
| Aluminium |
| Ash |
| Biosolids |
| Building & Demolition Materials |
| Glass |
| Hazardous Wastes |
| Metals |
| Metals & Plastic |
| Organics |
| Paper & Cardboard |
| Plastics |
| Recycling |
| Space Debris |
| Textiles, Leather & Rubber (Excl. Tyres) |
| Toxic Wastes |
| Unclassified Materials |
Answer: A total of 16 unique and standardised waste categories were identified in the dataset after cleaning.
Explanation: To ensure accurate and meaningful categorisation, the
Category column was preprocessed through several steps: -
Missing values were removed to prevent distortion in
the count. - Whitespace was trimmed using
str_trim() to avoid discrepancies due to leading or
trailing spaces. - Text standardisation was applied:
all category names were converted to Title Case using
str_to_title(). - Common naming
inconsistencies were addressed, such as replacing variations
like "plastic" with "Plastics" and harmonising
"HW" and "hazardous wastes" under a single
label. - The term "and" was normalised to
"&" for alignment across entries.
These cleaning steps ensured that semantically identical entries were not treated as distinct, which is crucial for generating accurate statistics and visualisations in later sections of the report. The cleaned list of unique categories is presented in the table above.
negative_feedback <- wastes %>%
filter(str_detect(Description, "Environmental Impact Score: [23]/10"))
nrow(negative_feedback)
## [1] 6312
Answer: There are 6312 feedback comments with an environmental impact score of 2 or 3, which indicate a negative environmental outcome.
Explanation: To identify negative feedback cases, I filtered the
Description column for any text containing an
Environmental Impact Score of 2 or 3 using the regular
expression:“Environmental Impact Score: [23]/10”
wastes_clean <- wastes %>%
filter(!is.na(Category), !is.na(Stream), Tonnes > 0) %>%
mutate(Category = str_trim(Category),
Category = str_replace_all(Category, "and", "&"),
Category = str_to_title(Category)) %>%
mutate(Category = case_when(
str_to_lower(Category) %in% c("plastic") ~ "Plastics",
str_to_lower(Category) %in% c("organic") ~ "Organics",
str_to_lower(Category) == "metals & plastics" ~ "Metals & Plastic",
str_to_lower(Category) %in% c("hw", "hazardous wastes") ~ "Hazardous Wastes",
TRUE ~ Category
))
# Calculate fraction of waste per Category-Stream combo
waste_fraction <- wastes_clean %>%
group_by(Category, Stream) %>%
summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop") %>%
group_by(Category) %>%
mutate(Fraction = Total_Tonnes / sum(Total_Tonnes)) %>%
ungroup()
# Visualize: stacked bar chart
ggplot(waste_fraction, aes(x = reorder(Category, -Fraction), y = Fraction, fill = Stream)) +
geom_bar(stat = "identity", width = 0.7) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(
title = "Fraction of Waste Streams per Category",
x = "Waste Category",
y = "Fraction of Total Tonnes",
fill = "Waste Source"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "right",
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank()
)
Answer: The stacked bar chart above displays the fraction of waste tonnes by source (Stream) for each waste category. Each bar represents a waste category, and the segments within each bar show the proportion of total tonnes coming from municipal solid waste (MSW), commercial & industrial (C&I), and construction & demolition (C&D) sources.
Explanation: To generate this plot: - The dataset was
filtered to include only records with valid
Category, Stream, and positive
Tonnes values. - Categories were cleaned and
standardised to ensure consistency (e.g., harmonising naming variations
like "HW" and "Hazardous Wastes"). - Waste
tonnes were aggregated by Category and
Stream, then converted into
fractions relative to the total waste for each
category. - A stacked bar chart was created using
ggplot2, with percentages displayed on the y-axis for clear
interpretation.
This visualisation provides insight into how different waste categories are distributed across waste sources. For example: - Organics and MSW are closely linked, indicating household-generated organic waste is a significant contributor. - C&D dominates categories like Soils and Concrete, bricks and asphalt, reflecting their origin in construction activities. - C&I contributes more heavily to categories like Plastics and Hazardous Wastes, suggesting that industrial processes are key producers of these materials.
This analysis helps identify the primary generators of each waste category and informs targeted waste management strategies.
# Step 1: Merge and clean
waste_merged <- wastes %>%
left_join(year_state, by = c("Year_State_ID" = "ID"))
# Step 2: Compute total tonnes by Year and State, excluding bad entries
temp <- waste_merged %>%
filter(!is.na(State) & State != "Australia") %>%
group_by(Year, State) %>%
summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop")
# Step 3: Compute summary statistics per State
summary_stats <- temp %>%
group_by(State) %>%
summarise(
Total_Tonnes_Min = min(Total_Tonnes),
Total_Tonnes_Max = max(Total_Tonnes),
Total_Tonnes_Mean = mean(Total_Tonnes),
.groups = "drop"
) %>%
mutate(
Total_Tonnes_Min = comma(Total_Tonnes_Min, accuracy = 0.01),
Total_Tonnes_Max = comma(Total_Tonnes_Max, accuracy = 0.01),
Total_Tonnes_Mean = comma(Total_Tonnes_Mean, accuracy = 0.01)
)
# Step 4: Display table
kable(
summary_stats,
caption = "Statistical Summary of Total Waste Tonnes per State",
align = c("l", "r", "r", "r")
) %>%
kable_styling(full_width = FALSE, position = "center")
| State | Total_Tonnes_Min | Total_Tonnes_Max | Total_Tonnes_Mean |
|---|---|---|---|
| ACT | 10,254.25 | 939,356.31 | 261,622.42 |
| NSW | 1,117,130.60 | 21,707,178.83 | 13,952,193.65 |
| NT | 31,372.45 | 247,839.88 | 153,095.13 |
| Qld | 1,182,725.76 | 13,242,399.75 | 8,112,047.12 |
| SA | 350,233.78 | 4,641,795.18 | 2,317,120.21 |
| Tas | 270,085.88 | 787,983.47 | 532,748.27 |
| Vic | 1,035,186.42 | 15,468,345.78 | 10,142,362.32 |
| WA | 435,379.31 | 6,551,575.45 | 3,866,050.28 |
Answer: The table above presents the minimum,
maximum, and average total waste
tonnes recorded annually for each Australian state between 2006
and 2021. These statistics exclude aggregated national values (i.e.,
rows where State == "Australia"), ensuring state-level
granularity.
Explanation: To compute the summary: 1. I first merged the
waste data with year_state to associate each waste
record with its respective year and state. 2. Records where the
State was missing or marked as "Australia"
were excluded, as these do not represent individual states. 3. For each
Year and State combination, the total
waste tonnes were aggregated. 4. Summary statistics —
minimum, maximum, and
mean — were then computed for each state using a single
summarising operation. 5. All values were formatted for readability
using scales::comma().
This summary provides a concise yet informative overview of the variation in annual waste generation across states. For instance: - States with large populations and urban activity (e.g., NSW, Vic, Qld) show higher average and maximum waste outputs. - Smaller states like Tas and ACT report lower waste volumes, consistent with their size and economic activity. - The spread between min and max values also hints at changes in industrial or policy factors over time.
These insights can help environmental planners identify high-impact regions and assess the stability or volatility of waste generation patterns.
# Preprocess Year-State join
waste_joined <- wastes %>%
filter(str_to_lower(Type) == "food organics") %>%
left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
mutate(Year = str_extract(Year, "^\\d{4}")) %>% # Extract first year
mutate(Year = as.integer(Year)) %>%
group_by(Year, State) %>%
summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop")
years <- sort(unique(waste_joined$Year))
ggplot(waste_joined, aes(x = Year, y = Total_Tonnes, color = State)) +
geom_line(size = 0.8) +
geom_point(size = 1.2) +
scale_x_continuous(
breaks = years, # Show every year on X-axis
labels = years
) +
labs(
title = "Yearly Trend of Food Organics Waste Tonnes by State",
x = "Year",
y = "Total Tonnes",
color = "State"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "right"
)
Answer:
The line chart above displays the yearly trend of food organics
waste tonnes for each state in Australia from 2006 to 2021. It
reveals distinct temporal patterns in organic waste generation across
states. States such as NSW, Vic, and
Qld consistently reported the highest volumes, with
noticeable increases over time, while smaller jurisdictions like
ACT and Tas contributed lower amounts.
The trend for most states shows gradual growth, with minor fluctuations
that may relate to policy shifts or population changes.
Explanation:
To create this chart, I filtered the dataset to include only records
where the Type is "Food organics". I then
merged this subset with the Year_State_ID.csv dataset to
obtain the corresponding Year and State
values. The year was extracted from financial year formats (e.g.,
“2014–15” → “2014”) using str_extract() and converted to
integer. I aggregated the total food organics waste by both
Year and State, and visualised the result
using a multi-line plot, where each line represents a different state.
The X-axis was manually set to display every year for clarity, and the
chart was formatted using theme_minimal() for clean
presentation.
# Join and preprocess year info
waste_with_year <- wastes %>%
left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
mutate(Year = as.integer(str_extract(Year, "^\\d{4}"))) # Extract first 4-digit year
# 1. Most recycled Type
most_recycled <- waste_with_year %>%
filter(str_to_lower(Fate) == "recycling") %>%
group_by(Type, Year) %>%
summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(Total_Tonnes)) %>%
slice(1)
print("Most Recycled Type and Year:")
## [1] "Most Recycled Type and Year:"
print(most_recycled)
## # A tibble: 1 × 3
## Type Year Total_Tonnes
## <chr> <int> <dbl>
## 1 Bricks, concrete and pavers 2020 10478681.
# 2. Most disposed Type
most_disposed <- waste_with_year %>%
filter(str_to_lower(Fate) == "disposal") %>%
group_by(Type, Year) %>%
summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(Total_Tonnes)) %>%
slice(1)
print("Most Disposed Type and Year:")
## [1] "Most Disposed Type and Year:"
print(most_disposed)
## # A tibble: 1 × 3
## Type Year Total_Tonnes
## <chr> <int> <dbl>
## 1 Food organics 2008 4089186
# 3. Type with most increase from earliest to latest year
type_trend <- waste_with_year %>%
group_by(Type, Year) %>%
summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop")
type_increase <- type_trend %>%
group_by(Type) %>%
summarise(
Start_Year = min(Year),
End_Year = max(Year),
Start_Tonnes = Total_Tonnes[Year == Start_Year][1],
End_Tonnes = Total_Tonnes[Year == End_Year][1],
Increase = End_Tonnes - Start_Tonnes
) %>%
arrange(desc(Increase)) %>%
slice(1)
print("Waste Type with the Most Increase Across Years:")
## [1] "Waste Type with the Most Increase Across Years:"
print(type_increase)
## # A tibble: 1 × 6
## Type Start_Year End_Year Start_Tonnes End_Tonnes Increase
## <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 Bricks, concrete and pav… 2006 2020 2133152 10478681. 8345529.
Answer:
The most recycled waste type is Bricks, concrete and pavers in the year
2020, with a total of 10,478,680.72 tonnes recycled.
The most disposed waste type is Food organics in the year 2008, with a
total of 4,089,186.00 tonnes sent to disposal.
The waste type that experienced the largest increase in total tonnes
from the first to the last year in the dataset is Bricks, concrete and
pavers, increasing by 8,345,528.72 tonnes from 2006 to 2020.
Explanation:
To answer this question, I first merged the waste records with year and
state metadata using Year_State_ID.csv. Then, I extracted
the 4-digit year using str_extract() and converted it to an
integer format.
- For the most recycled and most
disposed types, I filtered the data by Fate,
grouped it by Type and Year, and calculated
the total waste tonnes for each combination. I sorted the results in
descending order and selected the top record.
- To identify the waste type with the greatest increase over
time, I grouped data by Type and
Year, calculated annual totals, and then compared each
type’s total in the first and last year. The difference was calculated
and sorted to identify the maximum increase.
These analyses provide insights into which materials are being
effectively recycled, which are most commonly disposed of, and which are
growing most rapidly — useful for shaping future waste management
priorities.
# ---- 1. Preprocessing and Merging ----
env_data <- wastes %>%
mutate(Impact_Score = as.numeric(str_extract(Description, "(?<=Score: )\\d+"))) %>%
left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
mutate(
Year = as.integer(str_extract(Year, "^\\d{4}")),
Fate = str_to_title(str_trim(Fate)),
Stream = str_to_upper(str_trim(Stream)),
Type = str_trim(Type),
Category = str_to_title(str_trim(Category))
) %>%
filter(!is.na(Impact_Score), Tonnes > 0) %>%
mutate(
Fate = factor(Fate, levels = c("Disposal", "Energy Recovery", "Long-Term Storage", "Recycling", "Waste Reuse")),
Category = reorder(Category, Impact_Score, FUN = median, na.rm = TRUE)
)
# ---- 2. Impact Score by Waste Category ----
ggplot(env_data, aes(x = Category, y = Impact_Score, fill = Category)) +
geom_boxplot() +
labs(title = "Environmental Impact Score by Waste Category", x = "Waste Category", y = "Impact Score") +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none"
)
# ---- 3. Impact Score vs Waste Tonnes ----
ggplot(env_data, aes(x = Tonnes, y = Impact_Score)) +
geom_point(alpha = 0.3) +
scale_x_log10(labels = scales::comma) +
labs(title = "Tonnes vs Environmental Impact Score", x = "Waste Tonnes", y = "Impact Score") +
theme_minimal()
# ---- 4. Impact Score by Stream ----
ggplot(env_data, aes(x = Stream, y = Impact_Score, fill = Stream)) +
geom_boxplot() +
labs(title = "Impact Score by Waste Stream", x = "Stream", y = "Environmental Impact Score") +
theme_minimal()
# ---- 5. Impact Score vs Economic Growth ----
ggscatter(env_data, x = "Economic_Growth", y = "Impact_Score",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Economic Growth (%)", ylab = "Impact Score")
# ---- 6. Impact Score by Core/Non-core ----
ggplot(env_data, aes(x = `Core_Non-core`, y = Impact_Score, fill = `Core_Non-core`)) +
geom_boxplot() +
labs(title = "Impact Score by Core/Non-core Waste", x = "Core Type", y = "Impact Score") +
theme_minimal()
# ---- 7. Impact Score by State ----
state_summary <- env_data %>%
group_by(State) %>%
summarise(mean_score = mean(Impact_Score, na.rm = TRUE))
ggplot(state_summary, aes(x = reorder(State, mean_score), y = mean_score, fill = State)) +
geom_col() +
coord_flip() +
labs(title = "Average Impact Score by State", x = "State", y = "Impact Score") +
theme_minimal()
# ---- 8. Impact Score Over Time ----
year_summary <- env_data %>%
group_by(Year) %>%
summarise(mean_score = mean(Impact_Score, na.rm = TRUE))
ggplot(year_summary, aes(x = Year, y = mean_score)) +
geom_line(color = "steelblue", size = 1.2) +
geom_point() +
labs(title = "Trend of Environmental Impact Score Over Time", x = "Year", y = "Mean Impact Score") +
theme_minimal()
Answer:
The environmental impact score is influenced by several waste-related
factors and varies significantly across waste categories, streams,
treatment methods (fates), and locations. The most notable findings
are:
Waste Category:
Waste categories such as Organics, Biosolids, and
Ash tend to have higher environmental scores (5–8), suggesting
these are more sustainably managed or inherently less harmful.
Conversely, Toxic Wastes and Hazardous Wastes show
lower scores and wider variability, highlighting concern for ecological
risks.
Waste Stream (Source):
Municipal Solid Waste (MSW) displays the highest median impact
score compared to Commercial & Industrial (C&I) and
Construction & Demolition (C&D) waste. This implies MSW
management is relatively more environmentally friendly, likely due to
better collection and recycling systems in place for household
waste.
Waste Volume (Tonnes):
There is no strong correlation between the quantity of
waste and its environmental impact score. As seen in the scatterplot,
large and small waste quantities are spread across the entire score
range, suggesting that treatment method and waste type matter
more than volume in determining environmental impact.
Core vs Non-core Waste:
Both waste types show similar distributions of scores, but core waste
has a slightly broader spread, reflecting more variability in how it is
treated. Non-core waste (from mining, agriculture, etc.) may have more
standardised handling or less diverse processing.
State-wise Variation:
States like Vic, WA, and NSW report higher
average scores, indicating more effective waste management or
environmentally favourable policy implementation. ACT and
NT show lower averages, which may point to operational or
regulatory limitations.
Economic Growth:
The scatterplot shows a very weak and slightly negative
correlation between economic growth and environmental impact
score (r = -0.047, p < 0.001), meaning economic performance does not
necessarily coincide with more sustainable waste practices.
Temporal Trend:
The average impact score has gradually increased from 2013
onwards, rising from ~5.0 to nearly 5.8 in 2020. This suggests
an overall improvement in environmental outcomes, possibly due to better
recycling technologies, policy reforms (e.g., landfill levies, plastic
bans), or public awareness.
Explanation:
This analysis was based on cleaned and merged datasets from
Wastes.csv and Year_State_ID.csv, including
extraction of embedded impact scores from text descriptions.
Visualisations were used to compare scores across categories (boxplots),
evaluate relationships (scatterplots and time series), and understand
patterns at the state and national levels.
The findings demonstrate that environmental impact is primarily driven by the nature and treatment of waste, not the volume or economic conditions. These insights could inform more targeted sustainability policies, such as encouraging recycling of low-scoring categories or reviewing waste practices in underperforming states.
# Step 1: Filter the data
tyres_data <- wastes %>%
filter(
str_to_lower(Category) == "hazardous wastes",
str_to_lower(Type) == "tyres (t140)",
Tonnes > 0
)
# Step 2: Add Tonnes_range column
tyres_data <- tyres_data %>%
mutate(Tonnes_range = case_when(
Tonnes >= 0 & Tonnes < 10000 ~ "[0, 10000)",
Tonnes >= 10000 & Tonnes < 20000 ~ "[10000, 20000)",
Tonnes >= 20000 & Tonnes < 40000 ~ "[20000, 40000)",
Tonnes >= 40000 & Tonnes <= 80000 ~ "[40000, 80000]",
TRUE ~ "Out of range"
))
# Step 3: Join with year_state to get State
tyres_data <- tyres_data %>%
left_join(year_state, by = c("Year_State_ID" = "ID"))
# Step 4: Plot count of Tonnes_range per State
library(ggplot2)
ggplot(tyres_data, aes(x = State, fill = Tonnes_range)) +
geom_bar(position = "dodge") +
labs(
title = "Distribution of Hazardous Waste Tyres by Tonnes Range and State",
x = "State",
y = "Number of Records",
fill = "Tonnes Range"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1)
)
Answer:
The bar chart illustrates the distribution of hazardous waste tyre
records by tonnes range across Australian states. The
most common volume category is [0, 10,000) tonnes,
dominating in ACT, NT, and Tas with nearly 50 records each. Higher
volume categories such as [10,000, 20,000) and
[20,000, 40,000) appear more frequently in states like
SA, Vic, and Qld,
suggesting large-scale tyre disposal or treatment facilities.
NSW shows a relatively even spread across all four ranges, while
WA displays moderate activity across categories, with
slightly fewer in the lowest range compared to others.
This variation suggests differences in both the scale of waste generation and regional disposal infrastructure.
Explanation:
To produce this analysis: - I filtered the dataset to retain only
records for hazardous wastes of type “Tyres
(T140)” with positive tonne values. - A new column named
Tonnes_range was created using case_when() to
classify each record into one of four predefined volume brackets. - The
dataset was then merged with Year_State_ID.csv to retrieve
the associated state for each record. - Finally, I
generated a dodge-style bar chart to show the count of
records in each volume range by state.
This approach allows us to assess both quantity distribution and regional disposal patterns for tyre-related hazardous waste. The dominance of low-volume records in ACT, NT, and Tas may reflect smaller population sizes or more frequent reporting of small batches. In contrast, states like SA and Qld handling more mid-to-large volume entries may suggest centralized collection systems or industrial-level waste producers.
cd_waste <- wastes %>%
left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
mutate(Year = as.integer(str_extract(Year, "^\\d{4}"))) %>%
filter(Stream == "C&D") %>%
group_by(Year) %>%
summarise(CD_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop")
# Load and clean ABS data
abs_data <- read_excel("ABS_Building_Trend.xlsx", sheet = "Data1",skip = 9)
colnames(abs_data)[1] <- "Date"
colnames(abs_data)[22] <- "Construction_Value"
construction_data <- abs_data %>%
select(Date, Construction_Value) %>%
filter(!is.na(Date), !is.na(Construction_Value)) %>%
mutate(
Quarter = as.yearqtr(Date, format = "%b-%Y"), # Convert "Sep-1978" format
Year = year(Quarter),
Construction_Value = as.numeric(Construction_Value)
) %>%
filter(Year >= 2006, Year <= 2020) %>%
group_by(Year) %>%
summarise(Yearly_Construction = sum(Construction_Value, na.rm = TRUE)) %>%
ungroup()
cd_waste <- wastes %>%
left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
mutate(Year = as.integer(str_extract(Year, "^\\d{4}"))) %>%
filter(Stream == "C&D") %>%
group_by(Year) %>%
summarise(CD_Tonnes = sum(Tonnes, na.rm = TRUE)) %>%
ungroup()
cd_vs_building <- inner_join(cd_waste, construction_data, by = "Year")
# Normalize values to 0–1 for comparison
cd_vs_building_scaled <- cd_vs_building %>%
mutate(
CD_Waste_Scaled = (CD_Tonnes - min(CD_Tonnes)) / (max(CD_Tonnes) - min(CD_Tonnes)),
Building_Scaled = (Yearly_Construction - min(Yearly_Construction)) /
(max(Yearly_Construction) - min(Yearly_Construction))
)
ggplot(cd_vs_building_scaled, aes(x = Year)) +
geom_line(aes(y = CD_Waste_Scaled, color = "C&D Waste (Normalized)"), size = 1.2) +
geom_line(aes(y = Building_Scaled, color = "Building Activity (Normalized)"), size = 1.2) +
labs(
title = "Normalized Yearly Trend: C&D Waste vs Building Activity",
x = "Year",
y = "Normalized Value (0–1 Scale)",
color = "Metric"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.title = element_blank()
)
Answer:
The comparison between C&D (Construction and Demolition) waste and
building activity from 2006 to 2020 shows a clear upward trend in both
metrics, particularly after 2013. The normalized plot reveals that while
building activity experienced a sharp peak around 2010 followed by a
decline, C&D waste volumes rose steadily and even surpassed
construction growth from 2013 onwards.
This pattern indicates a partial correlation—higher building activity tends to be associated with more C&D waste, but the rate of waste generation may outpace construction growth in later years, possibly due to: - Increased urban renewal or demolition projects - Inefficient material reuse - Greater waste reporting or stricter regulations in recent years
The divergence after 2015 also suggests that waste minimization practices may not have improved proportionally with construction efficiency.
External Dataset Source - Link: ABS
Table A83779075K - Import Instruction:
The file was downloaded from ABS, and the "Data1" sheet was
imported using
read_excel("ABS_Building_Trend.xlsx", sheet = "Data1", skip = 9).
Column 1 contains dates, and column 21 contains total building value
across all sectors (Trend series). I converted quarterly dates to years
and aggregated values.
Insights: The analysis confirms that building activity is a key factor influencing C&D waste trends. Peaks and dips in construction output strongly correlate with changes in waste generation, reinforcing the conclusion that economic cycles, infrastructure projects, and urban development directly affect the volume of C&D waste produced in Australia.