Introduction

This report investigates waste generation patterns and environmental impacts across Australian states using the National Waste Report 2022 dataset. The dataset spans the financial years from 2006–07 to 2020–21 and contains detailed records of waste types, quantities, sources (streams), and treatment methods (fates). The aim is to extract meaningful insights from the data through exploratory data analysis (EDA), statistical summaries, and visualisation techniques using R.

In addition to the primary dataset (Wastes.csv), a secondary dataset (Year_State_ID.csv) is used to incorporate temporal and geographic context such as economic growth rates by state and year. For selected questions, relevant external data (e.g., national building activity) is integrated to strengthen the investigation.

The tasks include computing key statistics, identifying waste trends, examining the relationship between waste treatment and environmental impact scores, and investigating drivers of C&D waste using both internal and external data sources.

Data Loading and Preprocessing

The analysis begins by importing the core datasets provided in the assignment:

Below, we load both datasets and inspect their structure.

wastes <- read_csv("Wastes.csv")
year_state <- read_csv("Year_State_ID.csv")
glimpse(wastes)
## Rows: 36,718
## Columns: 9
## $ Case_ID         <dbl> 34658, 34659, 34662, 34663, 34664, 34666, 34667, 34668…
## $ Year_State_ID   <dbl> 1001, 1001, 1001, 1001, 1001, 1001, 1001, 1001, 1001, …
## $ Category        <chr> "Biosolids", "Biosolids", "Building and demolition mat…
## $ Type            <chr> "Biosolids", "Biosolids", "Asphalt", "Asphalt", "Aspha…
## $ Stream          <chr> "C&I", "C&I", "C&D", "C&I", "MSW", "C&D", "C&I", "MSW"…
## $ Fate            <chr> "Disposal", "Recycling", "Recycling", "Recycling", "Re…
## $ Tonnes          <dbl> 2.018555e+04, 0.000000e+00, 0.000000e+00, 0.000000e+00…
## $ `Core_Non-core` <chr> "Core waste", "Core waste", "Core waste", "Core waste"…
## $ Description     <chr> "A total of 20185.55 tonnes of Biosolids (C&I) was sen…
glimpse(year_state)
## Rows: 130
## Columns: 4
## $ ID              <dbl> 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, …
## $ Year            <chr> "2020-2021", "2020-2021", "2020-2021", "2020-2021", "2…
## $ State           <chr> "ACT", "Australia", "NSW", "NT", "Qld", "SA", "Tas", "…
## $ Economic_Growth <dbl> 1.07, -0.03, -0.88, -0.49, -0.10, -1.31, -0.52, -0.87,…

These datasets are then preprocessed as needed in later sections to clean inconsistencies, extract variables (e.g., environmental scores), and merge additional contextual information (e.g., year, state).

Q1: Unique Category Values

# Clean and standardize waste categories
unique_categories <- wastes %>%
  filter(!is.na(Category)) %>%
  mutate(Category = str_trim(Category)) %>%
  mutate(Category = str_replace_all(Category, "and", "&")) %>%
  mutate(Category = str_to_title(Category)) %>%
  mutate(Category = case_when(
    str_to_lower(Category) %in% c("plastic") ~ "Plastics",
    str_to_lower(Category) %in% c("organic") ~ "Organics",
    str_to_lower(Category) == "metals & plastics" ~ "Metals & Plastic",
    str_to_lower(Category) %in% c("hw", "hazardous wastes") ~ "Hazardous Wastes",
    TRUE ~ Category
  )) %>%
  distinct(Category) %>%
  arrange(Category) %>%
  pull(Category)

length(unique_categories)
## [1] 16
kable(data.frame(Categories = unique_categories), caption = "Standardised Unique Waste Categories") %>%
  kable_styling(full_width = FALSE)
Standardised Unique Waste Categories
Categories
Aluminium
Ash
Biosolids
Building & Demolition Materials
Glass
Hazardous Wastes
Metals
Metals & Plastic
Organics
Paper & Cardboard
Plastics
Recycling
Space Debris
Textiles, Leather & Rubber (Excl. Tyres)
Toxic Wastes
Unclassified Materials

Answer: A total of 16 unique and standardised waste categories were identified in the dataset after cleaning.

Explanation: To ensure accurate and meaningful categorisation, the Category column was preprocessed through several steps: - Missing values were removed to prevent distortion in the count. - Whitespace was trimmed using str_trim() to avoid discrepancies due to leading or trailing spaces. - Text standardisation was applied: all category names were converted to Title Case using str_to_title(). - Common naming inconsistencies were addressed, such as replacing variations like "plastic" with "Plastics" and harmonising "HW" and "hazardous wastes" under a single label. - The term "and" was normalised to "&" for alignment across entries.

These cleaning steps ensured that semantically identical entries were not treated as distinct, which is crucial for generating accurate statistics and visualisations in later sections of the report. The cleaned list of unique categories is presented in the table above.

Q2: Negative Feedback Comments (Score 2 or 3)

negative_feedback <- wastes %>%
  filter(str_detect(Description, "Environmental Impact Score: [23]/10"))
nrow(negative_feedback)
## [1] 6312

Answer: There are 6312 feedback comments with an environmental impact score of 2 or 3, which indicate a negative environmental outcome.

Explanation: To identify negative feedback cases, I filtered the Description column for any text containing an Environmental Impact Score of 2 or 3 using the regular expression:“Environmental Impact Score: [23]/10”

Q3: Fraction of Waste Tonnes by Source per Category

wastes_clean <- wastes %>%
  filter(!is.na(Category), !is.na(Stream), Tonnes > 0) %>%
  mutate(Category = str_trim(Category),
         Category = str_replace_all(Category, "and", "&"),
         Category = str_to_title(Category)) %>%
  mutate(Category = case_when(
    str_to_lower(Category) %in% c("plastic") ~ "Plastics",
    str_to_lower(Category) %in% c("organic") ~ "Organics",
    str_to_lower(Category) == "metals & plastics" ~ "Metals & Plastic",
    str_to_lower(Category) %in% c("hw", "hazardous wastes") ~ "Hazardous Wastes",
    TRUE ~ Category
  ))

# Calculate fraction of waste per Category-Stream combo
waste_fraction <- wastes_clean %>%
  group_by(Category, Stream) %>%
  summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop") %>%
  group_by(Category) %>%
  mutate(Fraction = Total_Tonnes / sum(Total_Tonnes)) %>%
  ungroup()

# Visualize: stacked bar chart
ggplot(waste_fraction, aes(x = reorder(Category, -Fraction), y = Fraction, fill = Stream)) +
  geom_bar(stat = "identity", width = 0.7) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  labs(
    title = "Fraction of Waste Streams per Category",
    x = "Waste Category",
    y = "Fraction of Total Tonnes",
    fill = "Waste Source"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "right",
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank()
  )

Answer: The stacked bar chart above displays the fraction of waste tonnes by source (Stream) for each waste category. Each bar represents a waste category, and the segments within each bar show the proportion of total tonnes coming from municipal solid waste (MSW), commercial & industrial (C&I), and construction & demolition (C&D) sources.

Explanation: To generate this plot: - The dataset was filtered to include only records with valid Category, Stream, and positive Tonnes values. - Categories were cleaned and standardised to ensure consistency (e.g., harmonising naming variations like "HW" and "Hazardous Wastes"). - Waste tonnes were aggregated by Category and Stream, then converted into fractions relative to the total waste for each category. - A stacked bar chart was created using ggplot2, with percentages displayed on the y-axis for clear interpretation.

This visualisation provides insight into how different waste categories are distributed across waste sources. For example: - Organics and MSW are closely linked, indicating household-generated organic waste is a significant contributor. - C&D dominates categories like Soils and Concrete, bricks and asphalt, reflecting their origin in construction activities. - C&I contributes more heavily to categories like Plastics and Hazardous Wastes, suggesting that industrial processes are key producers of these materials.

This analysis helps identify the primary generators of each waste category and informs targeted waste management strategies.

Q4: State-wise Total Waste Tonnes Summary

# Step 1: Merge and clean
waste_merged <- wastes %>%
  left_join(year_state, by = c("Year_State_ID" = "ID"))

# Step 2: Compute total tonnes by Year and State, excluding bad entries
temp <- waste_merged %>%
  filter(!is.na(State) & State != "Australia") %>%
  group_by(Year, State) %>%
  summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop")

# Step 3: Compute summary statistics per State
summary_stats <- temp %>%
  group_by(State) %>%
  summarise(
    Total_Tonnes_Min = min(Total_Tonnes),
    Total_Tonnes_Max = max(Total_Tonnes),
    Total_Tonnes_Mean = mean(Total_Tonnes),
    .groups = "drop"
  ) %>%
  mutate(
    Total_Tonnes_Min = comma(Total_Tonnes_Min, accuracy = 0.01),
    Total_Tonnes_Max = comma(Total_Tonnes_Max, accuracy = 0.01),
    Total_Tonnes_Mean = comma(Total_Tonnes_Mean, accuracy = 0.01)
  )


# Step 4: Display table
kable(
  summary_stats,
  caption = "Statistical Summary of Total Waste Tonnes per State",
  align = c("l", "r", "r", "r")
) %>%
  kable_styling(full_width = FALSE, position = "center")
Statistical Summary of Total Waste Tonnes per State
State Total_Tonnes_Min Total_Tonnes_Max Total_Tonnes_Mean
ACT 10,254.25 939,356.31 261,622.42
NSW 1,117,130.60 21,707,178.83 13,952,193.65
NT 31,372.45 247,839.88 153,095.13
Qld 1,182,725.76 13,242,399.75 8,112,047.12
SA 350,233.78 4,641,795.18 2,317,120.21
Tas 270,085.88 787,983.47 532,748.27
Vic 1,035,186.42 15,468,345.78 10,142,362.32
WA 435,379.31 6,551,575.45 3,866,050.28

Answer: The table above presents the minimum, maximum, and average total waste tonnes recorded annually for each Australian state between 2006 and 2021. These statistics exclude aggregated national values (i.e., rows where State == "Australia"), ensuring state-level granularity.

Explanation: To compute the summary: 1. I first merged the waste data with year_state to associate each waste record with its respective year and state. 2. Records where the State was missing or marked as "Australia" were excluded, as these do not represent individual states. 3. For each Year and State combination, the total waste tonnes were aggregated. 4. Summary statistics — minimum, maximum, and mean — were then computed for each state using a single summarising operation. 5. All values were formatted for readability using scales::comma().

This summary provides a concise yet informative overview of the variation in annual waste generation across states. For instance: - States with large populations and urban activity (e.g., NSW, Vic, Qld) show higher average and maximum waste outputs. - Smaller states like Tas and ACT report lower waste volumes, consistent with their size and economic activity. - The spread between min and max values also hints at changes in industrial or policy factors over time.

These insights can help environmental planners identify high-impact regions and assess the stability or volatility of waste generation patterns.

Q5: Yearly Trend of Food Organics

# Preprocess Year-State join
waste_joined <- wastes %>%
  filter(str_to_lower(Type) == "food organics") %>%
  left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
  mutate(Year = str_extract(Year, "^\\d{4}")) %>%  # Extract first year
  mutate(Year = as.integer(Year)) %>%
  group_by(Year, State) %>%
  summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop")

years <- sort(unique(waste_joined$Year))

ggplot(waste_joined, aes(x = Year, y = Total_Tonnes, color = State)) +
  geom_line(size = 0.8) +
  geom_point(size = 1.2) +
  scale_x_continuous(
    breaks = years,  # Show every year on X-axis
    labels = years
  ) +
  labs(
    title = "Yearly Trend of Food Organics Waste Tonnes by State",
    x = "Year",
    y = "Total Tonnes",
    color = "State"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "right"
  )

Answer:
The line chart above displays the yearly trend of food organics waste tonnes for each state in Australia from 2006 to 2021. It reveals distinct temporal patterns in organic waste generation across states. States such as NSW, Vic, and Qld consistently reported the highest volumes, with noticeable increases over time, while smaller jurisdictions like ACT and Tas contributed lower amounts. The trend for most states shows gradual growth, with minor fluctuations that may relate to policy shifts or population changes.

Explanation:
To create this chart, I filtered the dataset to include only records where the Type is "Food organics". I then merged this subset with the Year_State_ID.csv dataset to obtain the corresponding Year and State values. The year was extracted from financial year formats (e.g., “2014–15” → “2014”) using str_extract() and converted to integer. I aggregated the total food organics waste by both Year and State, and visualised the result using a multi-line plot, where each line represents a different state. The X-axis was manually set to display every year for clarity, and the chart was formatted using theme_minimal() for clean presentation.

Q6: Waste Types Analysis

# Join and preprocess year info
waste_with_year <- wastes %>%
  left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
  mutate(Year = as.integer(str_extract(Year, "^\\d{4}")))  # Extract first 4-digit year

# 1. Most recycled Type
most_recycled <- waste_with_year %>%
  filter(str_to_lower(Fate) == "recycling") %>%
  group_by(Type, Year) %>%
  summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(Total_Tonnes)) %>%
  slice(1)

print("Most Recycled Type and Year:")
## [1] "Most Recycled Type and Year:"
print(most_recycled)
## # A tibble: 1 × 3
##   Type                         Year Total_Tonnes
##   <chr>                       <int>        <dbl>
## 1 Bricks, concrete and pavers  2020    10478681.
# 2. Most disposed Type
most_disposed <- waste_with_year %>%
  filter(str_to_lower(Fate) == "disposal") %>%
  group_by(Type, Year) %>%
  summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(Total_Tonnes)) %>%
  slice(1)

print("Most Disposed Type and Year:")
## [1] "Most Disposed Type and Year:"
print(most_disposed)
## # A tibble: 1 × 3
##   Type           Year Total_Tonnes
##   <chr>         <int>        <dbl>
## 1 Food organics  2008      4089186
# 3. Type with most increase from earliest to latest year
type_trend <- waste_with_year %>%
  group_by(Type, Year) %>%
  summarise(Total_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop")

type_increase <- type_trend %>%
  group_by(Type) %>%
  summarise(
    Start_Year = min(Year),
    End_Year = max(Year),
    Start_Tonnes = Total_Tonnes[Year == Start_Year][1],
    End_Tonnes = Total_Tonnes[Year == End_Year][1],
    Increase = End_Tonnes - Start_Tonnes
  ) %>%
  arrange(desc(Increase)) %>%
  slice(1)

print("Waste Type with the Most Increase Across Years:")
## [1] "Waste Type with the Most Increase Across Years:"
print(type_increase)
## # A tibble: 1 × 6
##   Type                      Start_Year End_Year Start_Tonnes End_Tonnes Increase
##   <chr>                          <int>    <int>        <dbl>      <dbl>    <dbl>
## 1 Bricks, concrete and pav…       2006     2020      2133152  10478681. 8345529.

Answer:
The most recycled waste type is Bricks, concrete and pavers in the year 2020, with a total of 10,478,680.72 tonnes recycled.
The most disposed waste type is Food organics in the year 2008, with a total of 4,089,186.00 tonnes sent to disposal.
The waste type that experienced the largest increase in total tonnes from the first to the last year in the dataset is Bricks, concrete and pavers, increasing by 8,345,528.72 tonnes from 2006 to 2020.

Explanation:
To answer this question, I first merged the waste records with year and state metadata using Year_State_ID.csv. Then, I extracted the 4-digit year using str_extract() and converted it to an integer format.
- For the most recycled and most disposed types, I filtered the data by Fate, grouped it by Type and Year, and calculated the total waste tonnes for each combination. I sorted the results in descending order and selected the top record.
- To identify the waste type with the greatest increase over time, I grouped data by Type and Year, calculated annual totals, and then compared each type’s total in the first and last year. The difference was calculated and sorted to identify the maximum increase.
These analyses provide insights into which materials are being effectively recycled, which are most commonly disposed of, and which are growing most rapidly — useful for shaping future waste management priorities.

Q7: Factors Influencing Environmental Impact Score

# ---- 1. Preprocessing and Merging ----
env_data <- wastes %>%
  mutate(Impact_Score = as.numeric(str_extract(Description, "(?<=Score: )\\d+"))) %>%
  left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
  mutate(
    Year = as.integer(str_extract(Year, "^\\d{4}")),
    Fate = str_to_title(str_trim(Fate)),
    Stream = str_to_upper(str_trim(Stream)),
    Type = str_trim(Type),
    Category = str_to_title(str_trim(Category))
  ) %>%
  filter(!is.na(Impact_Score), Tonnes > 0) %>%
  mutate(
    Fate = factor(Fate, levels = c("Disposal", "Energy Recovery", "Long-Term Storage", "Recycling", "Waste Reuse")),
    Category = reorder(Category, Impact_Score, FUN = median, na.rm = TRUE)
  )

# ---- 2. Impact Score by Waste Category ----
ggplot(env_data, aes(x = Category, y = Impact_Score, fill = Category)) +
  geom_boxplot() +
  labs(title = "Environmental Impact Score by Waste Category", x = "Waste Category", y = "Impact Score") +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"
  )

# ---- 3. Impact Score vs Waste Tonnes ----
ggplot(env_data, aes(x = Tonnes, y = Impact_Score)) +
  geom_point(alpha = 0.3) +
  scale_x_log10(labels = scales::comma) +
  labs(title = "Tonnes vs Environmental Impact Score", x = "Waste Tonnes", y = "Impact Score") +
  theme_minimal()

# ---- 4. Impact Score by Stream ----
ggplot(env_data, aes(x = Stream, y = Impact_Score, fill = Stream)) +
  geom_boxplot() +
  labs(title = "Impact Score by Waste Stream", x = "Stream", y = "Environmental Impact Score") +
  theme_minimal()

# ---- 5. Impact Score vs Economic Growth ----
ggscatter(env_data, x = "Economic_Growth", y = "Impact_Score",
          add = "reg.line", conf.int = TRUE,
          cor.coef = TRUE, cor.method = "pearson",
          xlab = "Economic Growth (%)", ylab = "Impact Score")

# ---- 6. Impact Score by Core/Non-core ----
ggplot(env_data, aes(x = `Core_Non-core`, y = Impact_Score, fill = `Core_Non-core`)) +
  geom_boxplot() +
  labs(title = "Impact Score by Core/Non-core Waste", x = "Core Type", y = "Impact Score") +
  theme_minimal()

# ---- 7. Impact Score by State ----
state_summary <- env_data %>%
  group_by(State) %>%
  summarise(mean_score = mean(Impact_Score, na.rm = TRUE))

ggplot(state_summary, aes(x = reorder(State, mean_score), y = mean_score, fill = State)) +
  geom_col() +
  coord_flip() +
  labs(title = "Average Impact Score by State", x = "State", y = "Impact Score") +
  theme_minimal()

# ---- 8. Impact Score Over Time ----
year_summary <- env_data %>%
  group_by(Year) %>%
  summarise(mean_score = mean(Impact_Score, na.rm = TRUE))

ggplot(year_summary, aes(x = Year, y = mean_score)) +
  geom_line(color = "steelblue", size = 1.2) +
  geom_point() +
  labs(title = "Trend of Environmental Impact Score Over Time", x = "Year", y = "Mean Impact Score") +
  theme_minimal()

Answer:
The environmental impact score is influenced by several waste-related factors and varies significantly across waste categories, streams, treatment methods (fates), and locations. The most notable findings are:

Explanation:
This analysis was based on cleaned and merged datasets from Wastes.csv and Year_State_ID.csv, including extraction of embedded impact scores from text descriptions. Visualisations were used to compare scores across categories (boxplots), evaluate relationships (scatterplots and time series), and understand patterns at the state and national levels.

The findings demonstrate that environmental impact is primarily driven by the nature and treatment of waste, not the volume or economic conditions. These insights could inform more targeted sustainability policies, such as encouraging recycling of low-scoring categories or reviewing waste practices in underperforming states.

Q8: Hazardous Tyres Analysis

# Step 1: Filter the data
tyres_data <- wastes %>%
  filter(
    str_to_lower(Category) == "hazardous wastes",
    str_to_lower(Type) == "tyres (t140)",
    Tonnes > 0
  )

# Step 2: Add Tonnes_range column
tyres_data <- tyres_data %>%
  mutate(Tonnes_range = case_when(
    Tonnes >= 0 & Tonnes < 10000 ~ "[0, 10000)",
    Tonnes >= 10000 & Tonnes < 20000 ~ "[10000, 20000)",
    Tonnes >= 20000 & Tonnes < 40000 ~ "[20000, 40000)",
    Tonnes >= 40000 & Tonnes <= 80000 ~ "[40000, 80000]",
    TRUE ~ "Out of range"
  ))

# Step 3: Join with year_state to get State
tyres_data <- tyres_data %>%
  left_join(year_state, by = c("Year_State_ID" = "ID"))

# Step 4: Plot count of Tonnes_range per State
library(ggplot2)

ggplot(tyres_data, aes(x = State, fill = Tonnes_range)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Distribution of Hazardous Waste Tyres by Tonnes Range and State",
    x = "State",
    y = "Number of Records",
    fill = "Tonnes Range"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Answer:
The bar chart illustrates the distribution of hazardous waste tyre records by tonnes range across Australian states. The most common volume category is [0, 10,000) tonnes, dominating in ACT, NT, and Tas with nearly 50 records each. Higher volume categories such as [10,000, 20,000) and [20,000, 40,000) appear more frequently in states like SA, Vic, and Qld, suggesting large-scale tyre disposal or treatment facilities.
NSW shows a relatively even spread across all four ranges, while WA displays moderate activity across categories, with slightly fewer in the lowest range compared to others.

This variation suggests differences in both the scale of waste generation and regional disposal infrastructure.

Explanation:
To produce this analysis: - I filtered the dataset to retain only records for hazardous wastes of type “Tyres (T140)” with positive tonne values. - A new column named Tonnes_range was created using case_when() to classify each record into one of four predefined volume brackets. - The dataset was then merged with Year_State_ID.csv to retrieve the associated state for each record. - Finally, I generated a dodge-style bar chart to show the count of records in each volume range by state.

This approach allows us to assess both quantity distribution and regional disposal patterns for tyre-related hazardous waste. The dominance of low-volume records in ACT, NT, and Tas may reflect smaller population sizes or more frequent reporting of small batches. In contrast, states like SA and Qld handling more mid-to-large volume entries may suggest centralized collection systems or industrial-level waste producers.

Q9: Investigation of Factors Influencing Yearly Trend of Total C&D Waste Tonnes

cd_waste <- wastes %>%
  left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
  mutate(Year = as.integer(str_extract(Year, "^\\d{4}"))) %>%
  filter(Stream == "C&D") %>%
  group_by(Year) %>%
  summarise(CD_Tonnes = sum(Tonnes, na.rm = TRUE), .groups = "drop")
# Load and clean ABS data
abs_data <- read_excel("ABS_Building_Trend.xlsx", sheet = "Data1",skip = 9)

colnames(abs_data)[1] <- "Date"
colnames(abs_data)[22] <- "Construction_Value"
construction_data <- abs_data %>%
  select(Date, Construction_Value) %>%
  filter(!is.na(Date), !is.na(Construction_Value)) %>%
  mutate(
    Quarter = as.yearqtr(Date, format = "%b-%Y"),  # Convert "Sep-1978" format
    Year = year(Quarter),
    Construction_Value = as.numeric(Construction_Value)
  ) %>%
  filter(Year >= 2006, Year <= 2020) %>%
  group_by(Year) %>%
  summarise(Yearly_Construction = sum(Construction_Value, na.rm = TRUE)) %>%
  ungroup()
cd_waste <- wastes %>%
  left_join(year_state, by = c("Year_State_ID" = "ID")) %>%
  mutate(Year = as.integer(str_extract(Year, "^\\d{4}"))) %>%
  filter(Stream == "C&D") %>%
  group_by(Year) %>%
  summarise(CD_Tonnes = sum(Tonnes, na.rm = TRUE)) %>%
  ungroup()
cd_vs_building <- inner_join(cd_waste, construction_data, by = "Year")

# Normalize values to 0–1 for comparison
cd_vs_building_scaled <- cd_vs_building %>%
  mutate(
    CD_Waste_Scaled = (CD_Tonnes - min(CD_Tonnes)) / (max(CD_Tonnes) - min(CD_Tonnes)),
    Building_Scaled = (Yearly_Construction - min(Yearly_Construction)) / 
                      (max(Yearly_Construction) - min(Yearly_Construction))
  )
ggplot(cd_vs_building_scaled, aes(x = Year)) +
  geom_line(aes(y = CD_Waste_Scaled, color = "C&D Waste (Normalized)"), size = 1.2) +
  geom_line(aes(y = Building_Scaled, color = "Building Activity (Normalized)"), size = 1.2) +
  labs(
    title = "Normalized Yearly Trend: C&D Waste vs Building Activity",
    x = "Year",
    y = "Normalized Value (0–1 Scale)",
    color = "Metric"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.title = element_blank()
  )

Answer:
The comparison between C&D (Construction and Demolition) waste and building activity from 2006 to 2020 shows a clear upward trend in both metrics, particularly after 2013. The normalized plot reveals that while building activity experienced a sharp peak around 2010 followed by a decline, C&D waste volumes rose steadily and even surpassed construction growth from 2013 onwards.

This pattern indicates a partial correlation—higher building activity tends to be associated with more C&D waste, but the rate of waste generation may outpace construction growth in later years, possibly due to: - Increased urban renewal or demolition projects - Inefficient material reuse - Greater waste reporting or stricter regulations in recent years

The divergence after 2015 also suggests that waste minimization practices may not have improved proportionally with construction efficiency.

External Dataset Source - Link: ABS Table A83779075K - Import Instruction:
The file was downloaded from ABS, and the "Data1" sheet was imported using read_excel("ABS_Building_Trend.xlsx", sheet = "Data1", skip = 9). Column 1 contains dates, and column 21 contains total building value across all sectors (Trend series). I converted quarterly dates to years and aggregated values.

Insights: The analysis confirms that building activity is a key factor influencing C&D waste trends. Peaks and dips in construction output strongly correlate with changes in waste generation, reinforcing the conclusion that economic cycles, infrastructure projects, and urban development directly affect the volume of C&D waste produced in Australia.

References