Mr. Trash Wheel

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
trash <- read.csv("Trash Wheels.csv")

There is no need for the first column. The dumpster number provides us with no meaningful information.

trash <- subset(trash, select = -Dumpster)

Linear Regression

library(GGally)
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
ggpairs(trash_by_year, progress = FALSE)
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.

Is it more meaningful to look at the rate at which each type of trash is collected? rpd = rate per deployment

trash_by_year_rpd <- mutate(trash_by_year, 
                            weight_rpd = weight / deployments,
                            plastic_bottles_rpd = plastic_bottles / deployments,
                            cigarette_butts_rpd = cigarette_butts / deployments,
                            plastic_bags_rpd = plastic_bags / deployments,
                            wrappers_rpd = wrappers / deployments,
                            polystyrene_rpd = polystyrene / deployments,
                            glass_bottles_rpd = glass_bottles / deployments)
ggpairs(trash_by_year_rpd, columns = 10:16, progress = FALSE)
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.
Don't know how to automatically pick scale for object of type <table>.
Defaulting to continuous.

Backward Elimination

library(ggfortify)
fit1 <- lm(Year ~
             plastic_bottles_rpd + 
             cigarette_butts_rpd +
             plastic_bags_rpd +
             wrappers_rpd +
             polystyrene_rpd +
             glass_bottles_rpd,
           data = trash_by_year_rpd)
summary(fit1)

Call:
lm(formula = Year ~ plastic_bottles_rpd + cigarette_butts_rpd + 
    plastic_bags_rpd + wrappers_rpd + polystyrene_rpd + glass_bottles_rpd, 
    data = trash_by_year_rpd)

Residuals:
       1        2        3        4        5        6        7        8 
-0.33656 -0.75477  0.25248  0.02239  0.76111 -0.41942  0.37659 -0.11553 
       9       10 
 0.96963 -0.75594 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          2.039e+03  8.787e+00 232.034 1.77e-07 ***
plastic_bottles_rpd -2.573e-02  8.770e-03  -2.934   0.0608 .  
cigarette_butts_rpd -4.353e-03  7.007e-03  -0.621   0.5785    
plastic_bags_rpd    -6.013e-03  9.733e-03  -0.618   0.5805    
wrappers_rpd         1.665e-02  5.170e-03   3.221   0.0486 *  
polystyrene_rpd     -2.793e-02  1.350e-02  -2.070   0.1303    
glass_bottles_rpd   -1.671e-02  8.108e-03  -2.061   0.1314    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.028 on 3 degrees of freedom
Multiple R-squared:  0.9616,    Adjusted R-squared:  0.8848 
F-statistic: 12.52 on 6 and 3 DF,  p-value: 0.03144
autoplot(fit1, 1:4, nrow = 2, ncol = 2)

88.48% of the variation in the observations may be explained by this model.

The p-value for plastic bags is the highest and will be the first eliminated.

fit2 <- lm(Year ~
             plastic_bottles_rpd + 
             cigarette_butts_rpd +
             wrappers_rpd +
             polystyrene_rpd +
             glass_bottles_rpd,
           data = trash_by_year_rpd)
summary(fit2)

Call:
lm(formula = Year ~ plastic_bottles_rpd + cigarette_butts_rpd + 
    wrappers_rpd + polystyrene_rpd + glass_bottles_rpd, data = trash_by_year_rpd)

Residuals:
       1        2        3        4        5        6        7        8 
-0.68670 -0.56646  0.15605  0.24012  0.99815 -0.64025  0.45093  0.04193 
       9       10 
 0.74020 -0.73396 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          2.035e+03  6.103e+00 333.515 4.85e-10 ***
plastic_bottles_rpd -2.199e-02  5.840e-03  -3.766   0.0197 *  
cigarette_butts_rpd -4.656e-03  6.427e-03  -0.724   0.5089    
wrappers_rpd         1.704e-02  4.718e-03   3.610   0.0225 *  
polystyrene_rpd     -2.546e-02  1.185e-02  -2.148   0.0981 .  
glass_bottles_rpd   -1.271e-02  4.491e-03  -2.830   0.0473 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.945 on 4 degrees of freedom
Multiple R-squared:  0.9567,    Adjusted R-squared:  0.9026 
F-statistic: 17.68 on 5 and 4 DF,  p-value: 0.007851
autoplot(fit2, 1:4, nrow = 2, ncol = 2)

The adjusted R-squared has gone up slightly to 0.9026.

With a p-value of 0.5089, cigarette butts is the next on the chopping block.

fit3 <- lm(Year ~
             plastic_bottles_rpd +
             wrappers_rpd +
             polystyrene_rpd +
             glass_bottles_rpd,
           data = trash_by_year_rpd)
summary(fit3)

Call:
lm(formula = Year ~ plastic_bottles_rpd + wrappers_rpd + polystyrene_rpd + 
    glass_bottles_rpd, data = trash_by_year_rpd)

Residuals:
      1       2       3       4       5       6       7       8       9      10 
-0.8827 -0.4578  0.1158  0.3654  1.1188 -0.5011  0.3602 -0.4938  0.8799 -0.5047 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          2.032e+03  3.870e+00 525.054 4.76e-13 ***
plastic_bottles_rpd -2.013e-02  4.989e-03  -4.035  0.00997 ** 
wrappers_rpd         1.516e-02  3.749e-03   4.043  0.00990 ** 
polystyrene_rpd     -2.119e-02  9.776e-03  -2.167  0.08244 .  
glass_bottles_rpd   -1.044e-02  3.061e-03  -3.411  0.01903 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.899 on 5 degrees of freedom
Multiple R-squared:  0.951, Adjusted R-squared:  0.9118 
F-statistic: 24.27 on 4 and 5 DF,  p-value: 0.001794
autoplot(fit3, 1:4, nrow = 2, ncol = 2)

91.18% of the variation in observations can be explained by this model. Can it be better by removing polystyrene?

fit4 <- lm(Year ~
             plastic_bottles_rpd +
             wrappers_rpd +
             glass_bottles_rpd,
           data = trash_by_year_rpd)
summary(fit4)

Call:
lm(formula = Year ~ plastic_bottles_rpd + wrappers_rpd + glass_bottles_rpd, 
    data = trash_by_year_rpd)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.0576 -0.6737 -0.1413  0.4898  1.8518 

Coefficients:
                      Estimate Std. Error  t value Pr(>|t|)    
(Intercept)          2.024e+03  1.020e+00 1983.910  < 2e-16 ***
plastic_bottles_rpd -1.644e-02  5.961e-03   -2.758 0.032962 *  
wrappers_rpd         1.025e-02  3.801e-03    2.698 0.035679 *  
glass_bottles_rpd   -1.571e-02  2.367e-03   -6.637 0.000565 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.143 on 6 degrees of freedom
Multiple R-squared:  0.905, Adjusted R-squared:  0.8575 
F-statistic: 19.05 on 3 and 6 DF,  p-value: 0.001807
autoplot(fit4, 1:4, nrow = 2, ncol = 2)

Although the adjusted r-squared has gone down, it is still relatively high at 0.8575. The line estimated in the Residuals vs Fitted plot is flatter and closer to 0. The Q-Q plot appears unchanged. Observations 2 and 5, 2015 and 2018 respectively, are noted. What will happen if they are removed from the model?

adj_trash_by_year_rpd <- trash_by_year_rpd[-c(2,5),]
fit5 <- lm(Year ~
             plastic_bottles_rpd +
             wrappers_rpd +
             glass_bottles_rpd,
           data = adj_trash_by_year_rpd)
summary(fit5)

Call:
lm(formula = Year ~ plastic_bottles_rpd + wrappers_rpd + glass_bottles_rpd, 
    data = adj_trash_by_year_rpd)

Residuals:
       1        2        3        4        5        6        7        8 
 0.01732  0.36102 -0.89532 -0.09520  0.67918 -0.47092  0.26109  0.14283 

Coefficients:
                      Estimate Std. Error  t value Pr(>|t|)    
(Intercept)          2.025e+03  6.312e-01 3207.743 5.67e-14 ***
plastic_bottles_rpd -2.553e-02  4.868e-03   -5.245 0.006320 ** 
wrappers_rpd         1.165e-02  2.515e-03    4.631 0.009797 ** 
glass_bottles_rpd   -1.586e-02  1.379e-03  -11.505 0.000326 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6544 on 4 degrees of freedom
Multiple R-squared:  0.9748,    Adjusted R-squared:  0.9559 
F-statistic:  51.6 on 3 and 4 DF,  p-value: 0.00118
autoplot(fit5, 1:4, nrow = 2, ncol = 2)

Removing 2016-2018 results in an increased adjusted r-squared, with 95.59% of the variation in observations explained by this model. However, this results in three noted values and the residuals vs fitted plot suggests a linear model may no longer be appropriate. Without additional information, it does not make sense to remove these two years. Fit4 remains the best model.

trash_by_year_rpd_long <- trash_by_year_rpd[,c(1, 11:16)] |>
  pivot_longer(2:7)
library(streamgraph)
library(RColorBrewer)

streamgraph(trash_by_year_rpd_long, key = name, value = value, date = Year, 
            order = "inside-out", interpolate = "cardinal") |>
   sg_fill_brewer("BrBG") |>
  sg_axis_x(tick_interval = 1, tick_units = "Year") |>
  sg_title(title = "Types of Trash per Average Dumpster")
Warning in widget_html(name, package, id = x$id, style = css(width =
validateCssUnit(sizeInfo$width), : streamgraph_html returned an object of class
`list` instead of a `shiny.tag`.
Warning: `bindFillRole()` only works on htmltools::tag() objects (e.g., div(),
p(), etc.), not objects of type 'list'.
Types of Trash per Average Dumpster