Description:

The data used for this report was generated by the Greater London Authority to get a percentage of people who agree that their local area is a place where people from different backgrounds get on well together. The question is asked in the Public Attitudes Survey and is a continuous survey that was last updated March 31st 2018. The London Datastore is a free and open data-sharing portal, where anyone can access data relating to the capital.

This data ranges from 2012-2018 and includes the following locations:

Original Dataset Used and Cleaning Data For Analysis

original_data <- read.csv("PAS_social_integration.csv", stringsAsFactors = FALSE)
# Changing the column names to the data easier to work with
names(original_data) <- c("Location", "Year", "Percent_that_agree")
str(original_data)
## 'data.frame':    198 obs. of  3 variables:
##  $ Location          : chr  "Barking and Dagenham" "Barnet" "Bexley" "Brent" ...
##  $ Year              : chr  "2012/13" "2012/13" "2012/13" "2012/13" ...
##  $ Percent_that_agree: num  84.5 96.1 95.9 94 94.2 ...

Chart comparing the Four London Boroughs with the highest population. Seeing how the percentage of people who feel their neighborhood is inclusive or not has changed through the years 2012-2018.

library(tidyverse)
library(dplyr)
library(ggplot2)
ggplot(original_data %>% filter(Location %in% c("Enfield", "Ealing", "Croydon", "Barnet")), aes(x = Year, y = Percent_that_agree, col = Location)) + geom_line() + geom_point() + ggtitle("Comparision of The Four London Boroughs With The Highest Population") + ylab("Percentage Of People That Feel Their Neighborhood Is Inclusive") + theme(text=element_text(size = 10, family = "serif"))

* Our plot shows us that the Borough Enfield was arguably the most volatile among the compared Boroughs. While Ealing was the most consistent over the six year period.

Tables of The 10 Best and Worst Locations Based on The Average Percentage of People Who Were Surveyed

library(tidyverse)
library(dplyr)
library(knitr)
average_percent <- original_data %>% group_by(Location) %>% summarize(Percent_that_agree = mean(Percent_that_agree))
top_10_best <- average_percent %>% arrange(desc(Percent_that_agree))
kable(top_10_best[1:10,], caption = "On Average The 10 Best Locations")
On Average The 10 Best Locations
Location Percent_that_agree
Wandsworth 95.43008
Kingston upon Thames 95.26439
Westminster 95.16698
Southwark 95.05725
Lewisham 95.03112
Richmond upon Thames 95.02470
Lambeth 94.68958
Kensington and Chelsea 94.63949
Tower Hamlets 94.59401
Greenwich 94.46188
library(tidyverse)
library(dplyr)
library(knitr)
bottom_10_worst <- top_10_best %>% arrange(Percent_that_agree)
kable(bottom_10_worst[1:10,], caption = "On Average The 10 Worst Locations")
On Average The 10 Worst Locations
Location Percent_that_agree
Barking and Dagenham 84.22376
Enfield 88.39967
Havering 89.25624
Haringey 89.79727
Redbridge 89.94341
Hounslow 90.10059
Hillingdon 90.22660
Waltham Forest 90.93332
Brent 90.99101
Bexley 91.42041

Plot Representing Barking and Dagenham Where On Average Was Considered The Least Inclusive Socially

library(tidyverse)
library(dplyr)
library(ggplot2)
Bark_Dagen <- original_data %>% filter(Location == "Barking and Dagenham") %>% group_by(Location)
ggplot(Bark_Dagen, aes(x = Year, y = Percent_that_agree)) + geom_line(color = "red") + geom_point(color = "darkblue") + ggtitle("Barking and Dagenham Analysis from 2012-2018") + ylab("% Of People That Feel Their Neighborhood Is Inclusive") + theme(text=element_text(size = 10, family = "sans")) + theme_minimal()

* The chart makes one wonder why the dramatic drop off of public perceptions from the beginning of 2013 to the end of 2015 occured. It would be interesting to do a deeper analysis comparing other socio-economic factors to tell more of the story.

Summary:

It was interesting to work with open data for the first time and the process of deciding what to choose to analyze was compelling. Just open data in general is engaging to work with because of the transparency it provides to help reduce corruption. My data set was relatively small and clean which did not requrie too much manipulation before analysis. Overall, it has been rewarding to become more familiar with the basics of data analysis and how much there is still to learn.