by Katie Butler-Manuel
I have decided to investigate the relationship between Recycling Rates and Household Income in different boroughs of London, testing the hypothesis that higher income households recycle a greater proportion of their rubbish than lower income households. Both the Household Income and Recycling Rates datasets are for 2012/2013, when the most recent household income data was available, and was downloaded from the Office of National Statistics
Below is a scatter plot indicating the relationship between Recycling Rates and Household Income for the 32 different boroughs of London. A least squares line (linear model) has been added to the plot to describe this relationship.
library(ggplot2)
combined2013=read.csv("/Users/katherinebutler-manuel/Library/Mobile Documents/com~apple~CloudDocs/CCMF/combined2013.csv")
ggplot(data=combined2013, aes(x=Recycling_Rates, y=Average_Income)) +
geom_point() +
xlab("Recycling Rate per borough (%)") +
ylab("Average Household Income per borough (£)") +
ggtitle("Relationship between Household Income and Recycling Rates in London") +
geom_point() +
geom_smooth(method="lm", color="red", fill="#69b3a2")
## `geom_smooth()` using formula 'y ~ x'
The properties of the linear model are summarised as follows:
##
## Call:
## lm(formula = Average_Income ~ Recycling_Rates, data = combined2013)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9827 -3545 -1984 3180 16904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38148.00 4281.26 8.91 6.25e-10 ***
## Recycling_Rates 21.85 121.38 0.18 0.858
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6207 on 30 degrees of freedom
## Multiple R-squared: 0.001079, Adjusted R-squared: -0.03222
## F-statistic: 0.03239 on 1 and 30 DF, p-value: 0.8584
The R squared value of 0.001079 indicates that there is no relationship between Recycling Rates and Household Income in London. Although it is possible that the high income but low recycling rate data point (Kensington and Cheslea) is an outlier and is influecing the least squares line.
The scatter plot and accompanying least squares line is shown below, excluding the data for this borough.
Note: the dplyr extension is necessary to use the fitler() command
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## `geom_smooth()` using formula 'y ~ x'
The linear model characteristics for this scatterplot is:
##
## Call:
## lm(formula = Average_Income ~ Recycling_Rates, data = combined2013 %>%
## filter(Average_Income < 55000))
##
## Residuals:
## Min 1Q Median 3Q Max
## -8541 -2907 -1344 2765 14217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35697.81 3817.90 9.350 2.96e-10 ***
## Recycling_Rates 77.29 107.45 0.719 0.478
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5423 on 29 degrees of freedom
## Multiple R-squared: 0.01753, Adjusted R-squared: -0.01635
## F-statistic: 0.5174 on 1 and 29 DF, p-value: 0.4777
The correlation coefficient is slightly stronger when the data from Kensington and Chelsea is excluded from the plot (0.01753 as opposed to 0.001079). However, both values are close to 0, indicating that there is no relationship between Household Income and Recycling Rates in London.
``` –>