The data used for this report was generated by the Greater London Authority to get a percentage of people who agree that their local area is a place where people from different backgrounds get on well together. The question is asked in the Public Attitudes Survey and is a continuous survey that was last updated March 31st 2018. The London Datastore is a free and open data-sharing portal, where anyone can access data relating to the capital.
This data ranges from 2012-2018 and includes the following locations:
original_data <- read.csv("PAS_social_integration.csv", stringsAsFactors = FALSE)
# Changing the column names to the data easier to work with
names(original_data) <- c("Location", "Year", "Percent_that_agree")
str(original_data)
## 'data.frame': 198 obs. of 3 variables:
## $ Location : chr "Barking and Dagenham" "Barnet" "Bexley" "Brent" ...
## $ Year : chr "2012/13" "2012/13" "2012/13" "2012/13" ...
## $ Percent_that_agree: num 84.5 96.1 95.9 94 94.2 ...
library(tidyverse)
library(dplyr)
library(ggplot2)
ggplot(original_data %>% filter(Location %in% c("Enfield", "Ealing", "Croydon", "Barnet")), aes(x = Year, y = Percent_that_agree, col = Location)) + geom_line() + geom_point() + ggtitle("Comparision of The Four London Boroughs With The Highest Population") + ylab("Percentage Of People That Feel Their Neighborhood Is Inclusive") + theme(text=element_text(size = 10, family = "serif"))
* Our plot shows us that the Borough Enfield was arguably the most volatile among the compared Boroughs. While Ealing was the most consistent over the six year period.
library(tidyverse)
library(dplyr)
library(knitr)
average_percent <- original_data %>% group_by(Location) %>% summarize(Percent_that_agree = mean(Percent_that_agree))
top_10_best <- average_percent %>% arrange(desc(Percent_that_agree))
kable(top_10_best[1:10,], caption = "On Average The 10 Best Locations")
| Location | Percent_that_agree |
|---|---|
| Wandsworth | 95.43008 |
| Kingston upon Thames | 95.26439 |
| Westminster | 95.16698 |
| Southwark | 95.05725 |
| Lewisham | 95.03112 |
| Richmond upon Thames | 95.02470 |
| Lambeth | 94.68958 |
| Kensington and Chelsea | 94.63949 |
| Tower Hamlets | 94.59401 |
| Greenwich | 94.46188 |
library(tidyverse)
library(dplyr)
library(knitr)
bottom_10_worst <- top_10_best %>% arrange(Percent_that_agree)
kable(bottom_10_worst[1:10,], caption = "On Average The 10 Worst Locations")
| Location | Percent_that_agree |
|---|---|
| Barking and Dagenham | 84.22376 |
| Enfield | 88.39967 |
| Havering | 89.25624 |
| Haringey | 89.79727 |
| Redbridge | 89.94341 |
| Hounslow | 90.10059 |
| Hillingdon | 90.22660 |
| Waltham Forest | 90.93332 |
| Brent | 90.99101 |
| Bexley | 91.42041 |
It was interesting to work with open data for the first time and the process of deciding what to choose to analyze was compelling. Just open data in general is engaging to work with because of the transparency it provides to help reduce corruption. My data set was relatively small and clean which did not requrie too much manipulation before analysis. Overall, it has been rewarding to become more familiar with the basics of data analysis and how much there is still to learn.