Social Integration by Borough Analysis

Description:

The data used for this report was generated by the Greater London Authority to get a percentage of people who agree that their local area is a place where people from different backgrounds get on well together. The question is asked in the Public Attitudes Survey and is a continuous survey that was last updated March 31st 2018. The London Datastore is a free and open data-sharing portal, where anyone can access data relating to the capital.

This data ranges from 2012-2018 and includes the following locations:

The Thirty-two London Boroughs
London itself

Original Dataset Used and Cleaning Data For Analysis

original_data <- read.csv("PAS_social_integration.csv", stringsAsFactors = FALSE)
# Changing the column names to the data easier to work with
names(original_data) <- c("Location", "Year", "Percent_that_agree")
str(original_data)

## 'data.frame':    198 obs. of  3 variables:
##  $ Location          : chr  "Barking and Dagenham" "Barnet" "Bexley" "Brent" ...
##  $ Year              : chr  "2012/13" "2012/13" "2012/13" "2012/13" ...
##  $ Percent_that_agree: num  84.5 96.1 95.9 94 94.2 ...

Chart comparing the Four London Boroughs with the highest population. Seeing how the percentage of people who feel their neighborhood is inclusive or not has changed through the years 2012-2018.

library(tidyverse)
library(dplyr)
library(ggplot2)
ggplot(original_data %>% filter(Location %in% c("Enfield", "Ealing", "Croydon", "Barnet")), aes(x = Year, y = Percent_that_agree, col = Location)) + geom_line() + geom_point() + ggtitle("Comparision of The Four London Boroughs With The Highest Population") + ylab("Percentage Of People That Feel Their Neighborhood Is Inclusive") + theme(text=element_text(size = 10, family = "serif"))

* Our plot shows us that the Borough Enfield was arguably the most volatile among the compared Boroughs. While Ealing was the most consistent over the six year period.

Tables of The 10 Best and Worst Locations Based on The Average Percentage of People Who Were Surveyed

library(tidyverse)
library(dplyr)
library(knitr)
average_percent <- original_data %>% group_by(Location) %>% summarize(Percent_that_agree = mean(Percent_that_agree))
top_10_best <- average_percent %>% arrange(desc(Percent_that_agree))
kable(top_10_best[1:10,], caption = "On Average The 10 Best Locations")

On Average The 10 Best Locations
Location	Percent_that_agree
Wandsworth	95.43008
Kingston upon Thames	95.26439
Westminster	95.16698
Southwark	95.05725
Lewisham	95.03112
Richmond upon Thames	95.02470
Lambeth	94.68958
Kensington and Chelsea	94.63949
Tower Hamlets	94.59401
Greenwich	94.46188

library(tidyverse)
library(dplyr)
library(knitr)
bottom_10_worst <- top_10_best %>% arrange(Percent_that_agree)
kable(bottom_10_worst[1:10,], caption = "On Average The 10 Worst Locations")

On Average The 10 Worst Locations
Location	Percent_that_agree
Barking and Dagenham	84.22376
Enfield	88.39967
Havering	89.25624
Haringey	89.79727
Redbridge	89.94341
Hounslow	90.10059
Hillingdon	90.22660
Waltham Forest	90.93332
Brent	90.99101
Bexley	91.42041

Plot Representing Barking and Dagenham Where On Average Was Considered The Least Inclusive Socially

library(tidyverse)
library(dplyr)
library(ggplot2)
Bark_Dagen <- original_data %>% filter(Location == "Barking and Dagenham") %>% group_by(Location)
ggplot(Bark_Dagen, aes(x = Year, y = Percent_that_agree)) + geom_line(color = "red") + geom_point(color = "darkblue") + ggtitle("Barking and Dagenham Analysis from 2012-2018") + ylab("% Of People That Feel Their Neighborhood Is Inclusive") + theme(text=element_text(size = 10, family = "sans")) + theme_minimal()

* The chart makes one wonder why the dramatic drop off of public perceptions from the beginning of 2013 to the end of 2015 occured. It would be interesting to do a deeper analysis comparing other socio-economic factors to tell more of the story.

Summary:

It was interesting to work with open data for the first time and the process of deciding what to choose to analyze was compelling. Just open data in general is engaging to work with because of the transparency it provides to help reduce corruption. My data set was relatively small and clean which did not requrie too much manipulation before analysis. Overall, it has been rewarding to become more familiar with the basics of data analysis and how much there is still to learn.