Break Free From Plastic is an global movement in which its main goal is to have a future that is free from plastic pollution. As of now, they have more than 13,000 organization and individuals around the world that are coming to together in demanding for the reduction in single-use plastics to combat plastic pollution crisis. Sarah Suave, a member for the movement, and few of her friends has created an audit at St. Johns to contribute in the global audit to understand the sources of plastics within the city. As we looked into the data, we cause varies of variable of plastics, including the parent_company of these sources of plastics and what country it came from along with the grand total of plastics. Based on this, We decided to ask, what continent produces the most single-use plastics? For this, we concluded the total amount of plastics would be our response variable with the continent as our potential explanatory variable.
plastics_with_continent <- plastics %>%
mutate(
continent = case_when(
country %in% c('United States', 'Canada', 'Mexico', 'Guatemala', 'Belize', 'Honduras', 'El Salvador', 'Nicaragua', 'Costa Rica', 'Panama', 'Cuba', 'Jamaica', 'Trinidad and Tobago', 'Barbados', 'Saint Lucia', 'Grenada', 'Saint Vincent and the Grenadines', 'Antigua and Barbuda', 'Saint Kitts and Nevis', 'Dominica', 'Bahamas', 'Cayman Islands', 'Bermuda', 'Turks and Caicos Islands') ~ 'North America',
country %in% c('Brazil', 'Argentina', 'Colombia', 'Chile', 'Peru', 'Venezuela', 'Ecuador', 'Paraguay', 'Bolivia', 'Uruguay', 'Guyana', 'Suriname', 'French Guiana', 'Belize') ~ 'South America',
country %in% c('Germany', 'France', 'Italy', 'Spain', 'United Kingdom', 'Sweden', 'Netherlands', 'Belgium', 'Norway', 'Poland', 'Greece', 'Denmark', 'Portugal', 'Finland', 'Ireland', 'Switzerland', 'Austria', 'Czech Republic', 'Hungary', 'Romania', 'Bulgaria', 'Slovakia', 'Croatia', 'Slovenia', 'Estonia', 'Latvia', 'Lithuania', 'Malta', 'Luxembourg', 'Cyprus', 'Albania', 'Kosovo', 'North Macedonia', 'Montenegro', 'Serbia', 'Bosnia and Herzegovina', 'Moldova', 'Belarus', 'Ukraine', 'Russia') ~ 'Europe',
country %in% c('India', 'China', 'Japan', 'South Korea', 'Indonesia', 'Pakistan', 'Bangladesh', 'Vietnam', 'Philippines', 'Thailand', 'Myanmar', 'Malaysia', 'Singapore', 'Nepal', 'Sri Lanka', 'Afghanistan', 'Kazakhstan', 'Uzbekistan', 'Turkmenistan', 'Kyrgyzstan', 'Tajikistan', 'Armenia', 'Azerbaijan', 'Georgia', 'Mongolia', 'North Korea', 'Laos', 'Cambodia', 'Brunei') ~ 'Asia',
country %in% c('Nigeria', 'South Africa', 'Kenya', 'Egypt', 'Ethiopia', 'Uganda', 'Ghana', 'Tanzania', 'Morocco', 'Algeria', 'Angola', 'Sudan', 'Mozambique', 'Zambia', 'Zimbabwe', 'Namibia', 'Botswana', 'Tunisia', 'Madagascar', 'Rwanda', 'Burkina Faso', 'Mali', 'Niger', 'Malawi', 'Sierra Leone', 'Liberia', 'Mauritania', 'Cote d\'Ivoire', 'Togo', 'Benin', 'Gabon', 'Congo (Republic)', 'Congo (Democratic Republic)', 'Somalia', 'Central African Republic', 'Chad', 'Eritrea', 'South Sudan', 'Lesotho', 'Eswatini', 'Comoros', 'Seychelles', 'Mauritius', 'Cape Verde', 'Saint Helena', 'Ascension Island', 'Equatorial Guinea') ~ 'Africa',
country %in% c('Australia', 'New Zealand', 'Papua New Guinea', 'Fiji', 'Solomon Islands', 'Vanuatu', 'Samoa', 'Tonga', 'Kiribati', 'Tuvalu', 'Marshall Islands', 'Palau', 'Micronesia', 'Nauru', 'Cook Islands', 'Niue', 'French Polynesia', 'New Caledonia', 'Wallis and Futuna') ~ 'Oceania', TRUE ~ 'Unknown'))
new_df_0 <- plastics_with_continent[, c("parent_company", "grand_total", "continent")]
new_df_0
## # A tibble: 13,380 × 3
## parent_company grand_total continent
## <chr> <dbl> <chr>
## 1 Grand Total 2668 South America
## 2 Unbranded 1838 South America
## 3 The Coca-Cola Company 257 South America
## 4 Secco 43 South America
## 5 Doble Cola 38 South America
## 6 Pritty 29 South America
## 7 PepsiCo 27 South America
## 8 Casoni 26 South America
## 9 Villa Del Sur - Levite 20 South America
## 10 Manaos 18 South America
## # ℹ 13,370 more rows
##cleaned up dataframe
new_df <- new_df_0 %>%
group_by(continent, parent_company) %>%
summarise(total_grand_total = sum(grand_total, na.rm = TRUE)) %>%
arrange(continent, desc(total_grand_total))
## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.
new_df
## # A tibble: 11,587 × 3
## # Groups: continent [7]
## continent parent_company total_grand_total
## <chr> <chr> <dbl>
## 1 Africa null 49290
## 2 Africa Unbranded 25228
## 3 Africa Grand Total 23499
## 4 Africa The Coca-Cola Company 9719
## 5 Africa Pure Water, Inc. 4988
## 6 Africa Inconnu 2996
## 7 Africa Pepsico 2288
## 8 Africa Blow-Chem Industries 1890
## 9 Africa Rite Foods Limited 1703
## 10 Africa Voltic Ghana Limited 1605
## # ℹ 11,577 more rows
is.factor(new_df$continent)
## [1] FALSE
new_df$continent <- as.factor(new_df$continent)
levels(new_df$continent)
## [1] "Africa" "Asia" "Europe" "North America"
## [5] "Oceania" "South America" "Unknown"
??subset
df.2 <- subset(new_df, continent != "Unknown")
df.2
## # A tibble: 8,262 × 3
## # Groups: continent [6]
## continent parent_company total_grand_total
## <fct> <chr> <dbl>
## 1 Africa null 49290
## 2 Africa Unbranded 25228
## 3 Africa Grand Total 23499
## 4 Africa The Coca-Cola Company 9719
## 5 Africa Pure Water, Inc. 4988
## 6 Africa Inconnu 2996
## 7 Africa Pepsico 2288
## 8 Africa Blow-Chem Industries 1890
## 9 Africa Rite Foods Limited 1703
## 10 Africa Voltic Ghana Limited 1605
## # ℹ 8,252 more rows
df.2 <- subset(df.2, subset = total_grand_total <= 10000)
df.2
## # A tibble: 8,253 × 3
## # Groups: continent [6]
## continent parent_company total_grand_total
## <fct> <chr> <dbl>
## 1 Africa The Coca-Cola Company 9719
## 2 Africa Pure Water, Inc. 4988
## 3 Africa Inconnu 2996
## 4 Africa Pepsico 2288
## 5 Africa Blow-Chem Industries 1890
## 6 Africa Rite Foods Limited 1703
## 7 Africa Voltic Ghana Limited 1605
## 8 Africa Bakhresa Group 1513
## 9 Africa Philip Morris International 1269
## 10 Africa Master Chef 1142
## # ℹ 8,243 more rows
mean(df.2$total_grand_total)
## [1] 28.4814
median(df.2$total_grand_total)
## [1] 1
summary(df.2$total_grand_total)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.00 1.00 28.48 5.00 9719.00
sd(df.2$total_grand_total, na.rm = TRUE)
## [1] 288.6059
#boxplot
hist(df.2$total_grand_total,
main = "Grand Total of Plastics Distribution Between Continent",
xlab = "Continent",
ylab = "Grand Total of Plastics")
ggplot(df.2, aes(x=continent,y=total_grand_total)) + geom_boxplot()
##try with ggpubr
ggbarplot(df.2, x="continent", y="total_grand_total",add="mean_se",label=FALSE,lab.vjust = -1.6)
PlasticLM<-lm(total_grand_total~continent,data=df.2)
qqnorm(resid(PlasticLM))
qqline(resid(PlasticLM))
hist(resid(PlasticLM))
ggbarplot(df.2, x="continent", y="total_grand_total",add="mean_se",label=FALSE,lab.vjust = -1.6)
PlasticLM<-lm(total_grand_total~continent,data=df.2)
qqnorm(resid(PlasticLM))
qqline(resid(PlasticLM))
hist(resid(PlasticLM))
When testing for normality based on the QQ plot we see points towards the end from a curve rather than following the line, which tells us the data is skewed to the right. When looking at variance the histogram suggests that equal variance is violated. Since the normality and variance show a violation we will be performing a one way anova and a Kruskal-Wallis test to compare the medians of a quantitative variable across multiple groups.
kruskal.test(total_grand_total~continent,data=df.2)
##
## Kruskal-Wallis rank sum test
##
## data: total_grand_total by continent
## Kruskal-Wallis chi-squared = 165.87, df = 5, p-value < 2.2e-16
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
is.factor(df.2$continent)
## [1] TRUE
??dunn_test
colnames(df.2)
## [1] "continent" "parent_company" "total_grand_total"
str(df.2)
## gropd_df [8,253 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
## $ continent : Factor w/ 7 levels "Africa","Asia",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ parent_company : chr [1:8253] "The Coca-Cola Company" "Pure Water, Inc." "Inconnu" "Pepsico" ...
## $ total_grand_total: num [1:8253] 9719 4988 2996 2288 1890 ...
## - attr(*, "groups")= tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
## ..$ continent: Factor w/ 7 levels "Africa","Asia",..: 1 2 3 4 5 6
## ..$ .rows : list<int> [1:6]
## .. ..$ : int [1:723] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ : int [1:4407] 724 725 726 727 728 729 730 731 732 733 ...
## .. ..$ : int [1:1979] 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 ...
## .. ..$ : int [1:345] 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 ...
## .. ..$ : int [1:70] 7455 7456 7457 7458 7459 7460 7461 7462 7463 7464 ...
## .. ..$ : int [1:729] 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 ...
## .. ..@ ptype: int(0)
## ..- attr(*, ".drop")= logi TRUE
df.2 <- df.2%>%
ungroup()
dunn_test(df.2, total_grand_total ~ continent)
## # A tibble: 15 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 total_gra… Africa Asia 723 4407 -7.81 5.89e-15 6.48e-14 ****
## 2 total_gra… Africa Europe 723 1979 -8.75 2.06e-18 2.68e-17 ****
## 3 total_gra… Africa North… 723 345 -11.5 1.01e-30 1.51e-29 ****
## 4 total_gra… Africa Ocean… 723 70 -0.290 7.72e- 1 7.72e- 1 ns
## 5 total_gra… Africa South… 723 729 -8.81 1.22e-18 1.71e-17 ****
## 6 total_gra… Asia Europe 4407 1979 -2.48 1.30e- 2 5.21e- 2 ns
## 7 total_gra… Asia North… 4407 345 -7.88 3.15e-15 3.78e-14 ****
## 8 total_gra… Asia Ocean… 4407 70 2.30 2.15e- 2 6.45e- 2 ns
## 9 total_gra… Asia South… 4407 729 -3.73 1.88e- 4 1.32e- 3 **
## 10 total_gra… Europe North… 1979 345 -6.40 1.52e-10 1.52e- 9 ****
## 11 total_gra… Europe Ocean… 1979 70 2.83 4.66e- 3 2.33e- 2 *
## 12 total_gra… Europe South… 1979 729 -1.90 5.80e- 2 1.16e- 1 ns
## 13 total_gra… North… Ocean… 345 70 5.48 4.36e- 8 3.92e- 7 ****
## 14 total_gra… North… South… 345 729 4.46 8.17e- 6 6.54e- 5 ****
## 15 total_gra… Ocean… South… 70 729 -3.41 6.57e- 4 3.94e- 3 **
PlasticLM
##
## Call:
## lm(formula = total_grand_total ~ continent, data = df.2)
##
## Coefficients:
## (Intercept) continentAsia continentEurope
## 85.03 -60.25 -71.31
## continentNorth America continentOceania continentSouth America
## -66.15 -61.12 -45.22
anova(PlasticLM)
## Analysis of Variance Table
##
## Response: total_grand_total
## Df Sum Sq Mean Sq F value Pr(>F)
## continent 5 2930263 586053 7.0618 1.347e-06 ***
## Residuals 8247 684406543 82989
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null hypothesis: The amount of plastic produced is the same across all continents. Alternative Hypothesis: The amount of plastic produced has at least one continent that differs in plastic production than others. T(7.937),DF(5),R2(0.0036),p<0.5, this output we got shows us that we can reject the null hypothesis since p<0.05 meaning the data is significant and there is likely at least one continent that differs in plastic production and our R squared value tells us that there’s 0.36% variation in the amount of plastic produced between the different continents.
Based on the results of a p value less than 0.05 from the Kruskal-Wallis test we can reject the null hypothesis that plastic production is the same across all continents. The results shows statistical evidence that suggest the amount of plastic produced is not the same across all continents. Based on the results from the dunn post-hoc test we see Africa compared Asia, Europe, North America, and and Oceania having a p value less than 0.05 showing there’s significant differences among those continents and we see Africa showing more difference than the other continents compared to each other. Based histogram, Africa produces significantly more plastics than all other continent we see this based on the bar for Africa being the highest on the graph whereas the rest of the continents have a lower plastic production. This analysis is important because plastic pollution is a major cause in contributing a global environment crisis due to most plastic not being able to break down this is important because it can help with prioritizing which continents to really focus on especially when it comes to policy making to help reduce plastic waste production and minimize its impact on human and ecosystem health. Some limitations to this analysis would be it lacks an explanation on why difference in the data exist and has a limited scope of type of plastic, recycling rates, etc..
admin. (n.d.). About. Break Free from Plastic. https://www.breakfreefromplastic.org/about/
Reddy, S. (2018, September 24). Plastic Pollution Affects Sea Life Throughout the Ocean. Pewtrusts.org; The Pew Charitable Trusts. https://www.pewtrusts.org/en/research-and-analysis/articles/2018/09/24/plastic-pollution-affects-sea-life-throughout-the-ocean
rfordatascience. (2021). tidytuesday/data/2021/2021-01-26/readme.md at main · rfordatascience/tidytuesday. GitHub. https://github.com/rfordatascience/tidytuesday/blob/main/data/2021/2021-01-26/readme.md
Whitlock, M. W., & Schluter, D. (2025). Resources for The Analysis of Biological Data. Zoology.ubc.ca. https://whitlockschluter3e.zoology.ubc.ca/index.html