For this assignment, you must find an open dataset that is from a country, city, or region outside of the United States. Browse some of the sites in the chapter notes, or find something on your own. Pick any data you find interesting and create a summary report of what you find interesting in the data. Your report must contain the following elements:
Click here to get to the file source This file contains homeless locations in Vancouver.
I verified that the data was open by reading about the source – “We share open datasets through publications such as the Open Data Portal and VanMap under the Open Government License - Vancouver.” This dataset was retrieved from the above mentioned Open Data Portal.
Let’s start by loading the libraries:
library(tidyverse)
library(tidyr)
library(dplyr)
library(readr)
Now, let’s explore the file
dataFile <- read_csv("rental-standards-current-issues-1.csv")
#Let's look at properties with 100 or more total Units and return the business name, address and number of units
# str(dataFile) Uncomment to see the structure of the file
data100orMoreUnits <- dataFile %>% filter(TotalUnits>100)
#Business Operator, Street Number, Street, Total Units
finalList <- data100orMoreUnits[c("BUSINESSOPERATOR", "StreetNumber","Street","TotalUnits","GeoLocalArea")] %>% arrange(desc(TotalUnits))
#top_n(finalList,5) # Uncomment to view top 5
Now, let’s plot our data:
library(ggplot2)
ggplot(finalList, aes(x = BUSINESSOPERATOR, y = TotalUnits, fill = GeoLocalArea)) +geom_col(position = "dodge")+ coord_flip()
### A look at Average Number of Units a Business Owner has per City
tableGeo <- group_by(finalList, GeoLocalArea) %>% summarise(mean = mean(TotalUnits))
library(ggplot2)
ggplot(tableGeo, aes(x ="", y = tableGeo$mean, fill=GeoLocalArea)) + geom_bar(stat="identity", width=1) +
coord_polar("y", start=0) + theme_void()
The above graph shows us that all the cities have relatively a similar number of Units per business operate. Let’s check that in a table.
library(kableExtra)
tableGeo %>% arrange(desc(tableGeo$mean)) %>% kable() %>% kable_styling()
| GeoLocalArea | mean |
|---|---|
| West End | 178.3846 |
| Downtown | 176.1818 |
| Renfrew-Collingwood | 173.0000 |
| Mount Pleasant | 142.5000 |
| Strathcona | 129.1667 |
| Killarney | 123.0000 |
| Fairview | 118.0000 |
| Riley Park | 117.0000 |
| Grandview-Woodland | 101.0000 |
The table above shows that west End has the highest number of total units per business owner.