Introduction

Break Free From Plastic is an global movement in which its main goal is to have a future that is free from plastic pollution. As of now, they have more than 13,000 organization and individuals around the world that are coming to together in demanding for the reduction in single-use plastics to combat plastic pollution crisis. Sarah Suave, a member for the movement, and few of her friends has created an audit at St. Johns to contribute in the global audit to understand the sources of plastics within the city. As we looked into the data, we cause varies of variable of plastics, including the parent_company of these sources of plastics and what country it came from along with the grand total of plastics. Based on this, We decided to ask, what continent produces the most single-use plastics? For this, we concluded the total amount of plastics would be our response variable with the continent as our potential explanatory variable.

PlaticPollution

Exploring the Data (Descriptive Statistics)

plastics_with_continent <- plastics %>%
  mutate(
    continent = case_when(
      country %in% c('United States', 'Canada', 'Mexico', 'Guatemala', 'Belize', 'Honduras', 'El Salvador', 'Nicaragua', 'Costa Rica', 'Panama', 'Cuba', 'Jamaica', 'Trinidad and Tobago', 'Barbados', 'Saint Lucia', 'Grenada', 'Saint Vincent and the Grenadines', 'Antigua and Barbuda', 'Saint Kitts and Nevis', 'Dominica', 'Bahamas', 'Cayman Islands', 'Bermuda', 'Turks and Caicos Islands') ~ 'North America',
      country %in% c('Brazil', 'Argentina', 'Colombia', 'Chile', 'Peru', 'Venezuela', 'Ecuador', 'Paraguay', 'Bolivia', 'Uruguay', 'Guyana', 'Suriname', 'French Guiana', 'Belize') ~ 'South America',
      country %in% c('Germany', 'France', 'Italy', 'Spain', 'United Kingdom', 'Sweden', 'Netherlands', 'Belgium', 'Norway', 'Poland', 'Greece', 'Denmark', 'Portugal', 'Finland', 'Ireland', 'Switzerland', 'Austria', 'Czech Republic', 'Hungary', 'Romania', 'Bulgaria', 'Slovakia', 'Croatia', 'Slovenia', 'Estonia', 'Latvia', 'Lithuania', 'Malta', 'Luxembourg', 'Cyprus', 'Albania', 'Kosovo', 'North Macedonia', 'Montenegro', 'Serbia', 'Bosnia and Herzegovina', 'Moldova', 'Belarus', 'Ukraine', 'Russia') ~ 'Europe',
      country %in% c('India', 'China', 'Japan', 'South Korea', 'Indonesia', 'Pakistan', 'Bangladesh', 'Vietnam', 'Philippines', 'Thailand', 'Myanmar', 'Malaysia', 'Singapore', 'Nepal', 'Sri Lanka', 'Afghanistan', 'Kazakhstan', 'Uzbekistan', 'Turkmenistan', 'Kyrgyzstan', 'Tajikistan', 'Armenia', 'Azerbaijan', 'Georgia', 'Mongolia', 'North Korea', 'Laos', 'Cambodia', 'Brunei') ~ 'Asia',
      country %in% c('Nigeria', 'South Africa', 'Kenya', 'Egypt', 'Ethiopia', 'Uganda', 'Ghana', 'Tanzania', 'Morocco', 'Algeria', 'Angola', 'Sudan', 'Mozambique', 'Zambia', 'Zimbabwe', 'Namibia', 'Botswana', 'Tunisia', 'Madagascar', 'Rwanda', 'Burkina Faso', 'Mali', 'Niger', 'Malawi', 'Sierra Leone', 'Liberia', 'Mauritania', 'Cote d\'Ivoire', 'Togo', 'Benin', 'Gabon', 'Congo (Republic)', 'Congo (Democratic Republic)', 'Somalia', 'Central African Republic', 'Chad', 'Eritrea', 'South Sudan', 'Lesotho', 'Eswatini', 'Comoros', 'Seychelles', 'Mauritius', 'Cape Verde', 'Saint Helena', 'Ascension Island', 'Equatorial Guinea') ~ 'Africa',
      country %in% c('Australia', 'New Zealand', 'Papua New Guinea', 'Fiji', 'Solomon Islands', 'Vanuatu', 'Samoa', 'Tonga', 'Kiribati', 'Tuvalu', 'Marshall Islands', 'Palau', 'Micronesia', 'Nauru', 'Cook Islands', 'Niue', 'French Polynesia', 'New Caledonia', 'Wallis and Futuna') ~ 'Oceania', TRUE ~ 'Unknown'))

new_df_0 <- plastics_with_continent[, c("parent_company", "grand_total", "continent")]
new_df_0

## # A tibble: 13,380 × 3
##    parent_company         grand_total continent    
##    <chr>                        <dbl> <chr>        
##  1 Grand Total                   2668 South America
##  2 Unbranded                     1838 South America
##  3 The Coca-Cola Company          257 South America
##  4 Secco                           43 South America
##  5 Doble Cola                      38 South America
##  6 Pritty                          29 South America
##  7 PepsiCo                         27 South America
##  8 Casoni                          26 South America
##  9 Villa Del Sur - Levite          20 South America
## 10 Manaos                          18 South America
## # ℹ 13,370 more rows

##cleaned up dataframe 
new_df <- new_df_0 %>%
  group_by(continent, parent_company) %>%
  summarise(total_grand_total = sum(grand_total, na.rm = TRUE)) %>%
  arrange(continent, desc(total_grand_total))

## `summarise()` has grouped output by 'continent'. You can override using the
## `.groups` argument.

new_df

## # A tibble: 11,587 × 3
## # Groups:   continent [7]
##    continent parent_company        total_grand_total
##    <chr>     <chr>                             <dbl>
##  1 Africa    null                              49290
##  2 Africa    Unbranded                         25228
##  3 Africa    Grand Total                       23499
##  4 Africa    The Coca-Cola Company              9719
##  5 Africa    Pure Water, Inc.                   4988
##  6 Africa    Inconnu                            2996
##  7 Africa    Pepsico                            2288
##  8 Africa    Blow-Chem Industries               1890
##  9 Africa    Rite Foods Limited                 1703
## 10 Africa    Voltic Ghana Limited               1605
## # ℹ 11,577 more rows

is.factor(new_df$continent)

## [1] FALSE

new_df$continent <- as.factor(new_df$continent)
levels(new_df$continent)

## [1] "Africa"        "Asia"          "Europe"        "North America"
## [5] "Oceania"       "South America" "Unknown"

??subset
df.2 <- subset(new_df, continent != "Unknown")
df.2

## # A tibble: 8,262 × 3
## # Groups:   continent [6]
##    continent parent_company        total_grand_total
##    <fct>     <chr>                             <dbl>
##  1 Africa    null                              49290
##  2 Africa    Unbranded                         25228
##  3 Africa    Grand Total                       23499
##  4 Africa    The Coca-Cola Company              9719
##  5 Africa    Pure Water, Inc.                   4988
##  6 Africa    Inconnu                            2996
##  7 Africa    Pepsico                            2288
##  8 Africa    Blow-Chem Industries               1890
##  9 Africa    Rite Foods Limited                 1703
## 10 Africa    Voltic Ghana Limited               1605
## # ℹ 8,252 more rows

df.2 <- subset(df.2, subset = total_grand_total <= 10000)
df.2

## # A tibble: 8,253 × 3
## # Groups:   continent [6]
##    continent parent_company              total_grand_total
##    <fct>     <chr>                                   <dbl>
##  1 Africa    The Coca-Cola Company                    9719
##  2 Africa    Pure Water, Inc.                         4988
##  3 Africa    Inconnu                                  2996
##  4 Africa    Pepsico                                  2288
##  5 Africa    Blow-Chem Industries                     1890
##  6 Africa    Rite Foods Limited                       1703
##  7 Africa    Voltic Ghana Limited                     1605
##  8 Africa    Bakhresa Group                           1513
##  9 Africa    Philip Morris International              1269
## 10 Africa    Master Chef                              1142
## # ℹ 8,243 more rows

mean(df.2$total_grand_total)

## [1] 28.4814

median(df.2$total_grand_total)

## [1] 1

summary(df.2$total_grand_total)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.00    1.00   28.48    5.00 9719.00

sd(df.2$total_grand_total, na.rm = TRUE)

## [1] 288.6059

#boxplot
hist(df.2$total_grand_total,
     main = "Grand Total of Plastics Distribution Between Continent",
     xlab = "Continent",
     ylab = "Grand Total of Plastics")

ggplot(df.2, aes(x=continent,y=total_grand_total)) + geom_boxplot()

##try with ggpubr
ggbarplot(df.2, x="continent", y="total_grand_total",add="mean_se",label=FALSE,lab.vjust = -1.6)

PlasticLM<-lm(total_grand_total~continent,data=df.2)
qqnorm(resid(PlasticLM))
qqline(resid(PlasticLM))

hist(resid(PlasticLM))

Statistical Test (Inferential Statistics)

ggbarplot(df.2, x="continent", y="total_grand_total",add="mean_se",label=FALSE,lab.vjust = -1.6)

PlasticLM<-lm(total_grand_total~continent,data=df.2)
qqnorm(resid(PlasticLM))
qqline(resid(PlasticLM))

hist(resid(PlasticLM))

When testing for normality based on the QQ plot we see points towards the end from a curve rather than following the line, which tells us the data is skewed to the right. When looking at variance the histogram suggests that equal variance is violated. Since the normality and variance show a violation we will be performing a one way anova and a Kruskal-Wallis test to compare the medians of a quantitative variable across multiple groups.

kruskal.test(total_grand_total~continent,data=df.2)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  total_grand_total by continent
## Kruskal-Wallis chi-squared = 165.87, df = 5, p-value < 2.2e-16

library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

is.factor(df.2$continent)

## [1] TRUE

??dunn_test
colnames(df.2)

## [1] "continent"         "parent_company"    "total_grand_total"

str(df.2)

## gropd_df [8,253 × 3] (S3: grouped_df/tbl_df/tbl/data.frame)
##  $ continent        : Factor w/ 7 levels "Africa","Asia",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ parent_company   : chr [1:8253] "The Coca-Cola Company" "Pure Water, Inc." "Inconnu" "Pepsico" ...
##  $ total_grand_total: num [1:8253] 9719 4988 2996 2288 1890 ...
##  - attr(*, "groups")= tibble [6 × 2] (S3: tbl_df/tbl/data.frame)
##   ..$ continent: Factor w/ 7 levels "Africa","Asia",..: 1 2 3 4 5 6
##   ..$ .rows    : list<int> [1:6] 
##   .. ..$ : int [1:723] 1 2 3 4 5 6 7 8 9 10 ...
##   .. ..$ : int [1:4407] 724 725 726 727 728 729 730 731 732 733 ...
##   .. ..$ : int [1:1979] 5131 5132 5133 5134 5135 5136 5137 5138 5139 5140 ...
##   .. ..$ : int [1:345] 7110 7111 7112 7113 7114 7115 7116 7117 7118 7119 ...
##   .. ..$ : int [1:70] 7455 7456 7457 7458 7459 7460 7461 7462 7463 7464 ...
##   .. ..$ : int [1:729] 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 ...
##   .. ..@ ptype: int(0) 
##   ..- attr(*, ".drop")= logi TRUE

df.2 <- df.2%>%
  ungroup()
dunn_test(df.2, total_grand_total ~ continent)

## # A tibble: 15 × 9
##    .y.        group1 group2    n1    n2 statistic        p    p.adj p.adj.signif
##  * <chr>      <chr>  <chr>  <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
##  1 total_gra… Africa Asia     723  4407    -7.81  5.89e-15 6.48e-14 ****        
##  2 total_gra… Africa Europe   723  1979    -8.75  2.06e-18 2.68e-17 ****        
##  3 total_gra… Africa North…   723   345   -11.5   1.01e-30 1.51e-29 ****        
##  4 total_gra… Africa Ocean…   723    70    -0.290 7.72e- 1 7.72e- 1 ns          
##  5 total_gra… Africa South…   723   729    -8.81  1.22e-18 1.71e-17 ****        
##  6 total_gra… Asia   Europe  4407  1979    -2.48  1.30e- 2 5.21e- 2 ns          
##  7 total_gra… Asia   North…  4407   345    -7.88  3.15e-15 3.78e-14 ****        
##  8 total_gra… Asia   Ocean…  4407    70     2.30  2.15e- 2 6.45e- 2 ns          
##  9 total_gra… Asia   South…  4407   729    -3.73  1.88e- 4 1.32e- 3 **          
## 10 total_gra… Europe North…  1979   345    -6.40  1.52e-10 1.52e- 9 ****        
## 11 total_gra… Europe Ocean…  1979    70     2.83  4.66e- 3 2.33e- 2 *           
## 12 total_gra… Europe South…  1979   729    -1.90  5.80e- 2 1.16e- 1 ns          
## 13 total_gra… North… Ocean…   345    70     5.48  4.36e- 8 3.92e- 7 ****        
## 14 total_gra… North… South…   345   729     4.46  8.17e- 6 6.54e- 5 ****        
## 15 total_gra… Ocean… South…    70   729    -3.41  6.57e- 4 3.94e- 3 **

PlasticLM

## 
## Call:
## lm(formula = total_grand_total ~ continent, data = df.2)
## 
## Coefficients:
##            (Intercept)           continentAsia         continentEurope  
##                  85.03                  -60.25                  -71.31  
## continentNorth America        continentOceania  continentSouth America  
##                 -66.15                  -61.12                  -45.22

anova(PlasticLM)

## Analysis of Variance Table
## 
## Response: total_grand_total
##             Df    Sum Sq Mean Sq F value    Pr(>F)    
## continent    5   2930263  586053  7.0618 1.347e-06 ***
## Residuals 8247 684406543   82989                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Null hypothesis: The amount of plastic produced is the same across all continents. Alternative Hypothesis: The amount of plastic produced has at least one continent that differs in plastic production than others. T(7.937),DF(5),R2(0.0036),p<0.5, this output we got shows us that we can reject the null hypothesis since p<0.05 meaning the data is significant and there is likely at least one continent that differs in plastic production and our R squared value tells us that there’s 0.36% variation in the amount of plastic produced between the different continents.

Conclusion

Based on the results of a p value less than 0.05 from the Kruskal-Wallis test we can reject the null hypothesis that plastic production is the same across all continents. The results shows statistical evidence that suggest the amount of plastic produced is not the same across all continents. Based on the results from the dunn post-hoc test we see Africa compared Asia, Europe, North America, and and Oceania having a p value less than 0.05 showing there’s significant differences among those continents and we see Africa showing more difference than the other continents compared to each other. Based histogram, Africa produces significantly more plastics than all other continent we see this based on the bar for Africa being the highest on the graph whereas the rest of the continents have a lower plastic production. This analysis is important because plastic pollution is a major cause in contributing a global environment crisis due to most plastic not being able to break down this is important because it can help with prioritizing which continents to really focus on especially when it comes to policy making to help reduce plastic waste production and minimize its impact on human and ecosystem health. Some limitations to this analysis would be it lacks an explanation on why difference in the data exist and has a limited scope of type of plastic, recycling rates, etc..

Which Continent Contributes the Most Towards Plastics Pollution?

Ja-kenya Stewart and Stacy Larose

2025-04-29

Introduction

Exploring the Data (Descriptive Statistics)

Statistical Test (Inferential Statistics)

Conclusion

References