The Relationship between Recycling Rates and Household Income in the 32 boroughs of London

by Katie Butler-Manuel

Introduction

I have decided to investigate the relationship between Recycling Rates and Household Income in different boroughs of London, testing the hypothesis that higher income households recycle a greater proportion of their rubbish than lower income households. Both the Household Income and Recycling Rates datasets are for 2012/2013, when the most recent household income data was available, and was downloaded from the Office of National Statistics

What is the relationship between Recycling Rates (%) and Household Income in London?

Below is a scatter plot indicating the relationship between Recycling Rates and Household Income for the 32 different boroughs of London. A least squares line (linear model) has been added to the plot to describe this relationship.

library(ggplot2)
combined2013=read.csv("/Users/katherinebutler-manuel/Library/Mobile Documents/com~apple~CloudDocs/CCMF/combined2013.csv")
ggplot(data=combined2013, aes(x=Recycling_Rates, y=Average_Income)) + 
  geom_point() + 
  xlab("Recycling Rate per borough (%)") + 
  ylab("Average Household Income per borough (£)") + 
  ggtitle("Relationship between Household Income and Recycling Rates in London") + 
  geom_point() +  
  geom_smooth(method="lm", color="red", fill="#69b3a2")
## `geom_smooth()` using formula 'y ~ x'

What are the properties of the linear model? What does this tell us about the relationship between Recycling Rates and Household Income in different boroughs of London?

The properties of the linear model are summarised as follows:

## 
## Call:
## lm(formula = Average_Income ~ Recycling_Rates, data = combined2013)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9827  -3545  -1984   3180  16904 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     38148.00    4281.26    8.91 6.25e-10 ***
## Recycling_Rates    21.85     121.38    0.18    0.858    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6207 on 30 degrees of freedom
## Multiple R-squared:  0.001079,   Adjusted R-squared:  -0.03222 
## F-statistic: 0.03239 on 1 and 30 DF,  p-value: 0.8584

The R squared value of 0.001079 indicates that there is no relationship between Recycling Rates and Household Income in London. Although it is possible that the high income but low recycling rate data point (Kensington and Cheslea) is an outlier and is influecing the least squares line.

Does the relationship between Household Income and Recycling Rates become more positive when “Kensington and Chelsea” is excluded from the plot?

The scatter plot and accompanying least squares line is shown below, excluding the data for this borough.

Note: the dplyr extension is necessary to use the fitler() command

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## `geom_smooth()` using formula 'y ~ x'

The linear model characteristics for this scatterplot is:

## 
## Call:
## lm(formula = Average_Income ~ Recycling_Rates, data = combined2013 %>% 
##     filter(Average_Income < 55000))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8541  -2907  -1344   2765  14217 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     35697.81    3817.90   9.350 2.96e-10 ***
## Recycling_Rates    77.29     107.45   0.719    0.478    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5423 on 29 degrees of freedom
## Multiple R-squared:  0.01753,    Adjusted R-squared:  -0.01635 
## F-statistic: 0.5174 on 1 and 29 DF,  p-value: 0.4777

The correlation coefficient is slightly stronger when the data from Kensington and Chelsea is excluded from the plot (0.01753 as opposed to 0.001079). However, both values are close to 0, indicating that there is no relationship between Household Income and Recycling Rates in London.

Further analysis

  • Other possible variables to investigate could be age and / or education level of the population
  • It might also be interesting to correlate the Rate of Recycling with the quality of environment to see whether boroughs with higher Recycling Rates also have a higher quality of environment
  • Alternatively, it might be more useful to analyse this relationship using the raw data rather than the borough average, as this might conceal some useful data.

``` –>