Instructions

For this assignment, you must find an open dataset that is from a country, city, or region outside of the United States. Browse some of the sites in the chapter notes, or find something on your own. Pick any data you find interesting and create a summary report of what you find interesting in the data. Your report must contain the following elements:

  1. a description of how you verified that the data is open and a link back to the website where you found it (10 points)
  2. code folding for all code, initially set to hide (10 points)
  3. at least two ggplot2 charts (20 points)
  4. at least one nicely formatted table that is not too long so that it overwhelms the report formatted using kable (10 points)
  5. a narrative discussing what you find interesting along with any issues you might have had preparing the data (10 points)
  6. no typos or spelling errors (10 points) published on RPubs (10 points)
  7. a clickable link in Brightspace (please paste in the comments when you submit) to your RPubs report that shows all the code you used to generate the report (i.e., no code chunks can be hidden) (10 points)
  8. in addition to the link, attach your Rmd file to this assignment (10 points)

Assignment Starts Here

Data and Source

Click here to get to the file source This file contains homeless locations in Vancouver.

I verified that the data was open by reading about the source – “We share open datasets through publications such as the Open Data Portal and VanMap under the Open Government License - Vancouver.” This dataset was retrieved from the above mentioned Open Data Portal.

Code – Note: It is initially set to hide

Let’s start by loading the libraries:

  • Tidyverse
  • Tidyr
  • dplyr
library(tidyverse)
library(tidyr)
library(dplyr)
library(readr)

Now, let’s explore the file

dataFile <- read_csv("rental-standards-current-issues-1.csv")
#Let's look at properties with 100 or more total Units and return the business name, address and number of units
# str(dataFile) Uncomment to see the structure of the file
data100orMoreUnits <- dataFile %>% filter(TotalUnits>100) 
#Business Operator, Street Number, Street, Total Units
finalList <- data100orMoreUnits[c("BUSINESSOPERATOR", "StreetNumber","Street","TotalUnits","GeoLocalArea")] %>%  arrange(desc(TotalUnits))
#top_n(finalList,5) # Uncomment to view top 5

Now, let’s plot our data:

A look at Number of Units per BusinessOperator

library(ggplot2)
ggplot(finalList, aes(x = BUSINESSOPERATOR, y = TotalUnits, fill = GeoLocalArea)) +geom_col(position = "dodge")+ coord_flip()

### A look at Average Number of Units a Business Owner has per City

tableGeo <- group_by(finalList, GeoLocalArea) %>% summarise(mean = mean(TotalUnits))
library(ggplot2)

ggplot(tableGeo, aes(x ="", y = tableGeo$mean, fill=GeoLocalArea)) + geom_bar(stat="identity", width=1) +
 coord_polar("y", start=0) + theme_void()

The above graph shows us that all the cities have relatively a similar number of Units per business operate. Let’s check that in a table.

library(kableExtra)
tableGeo %>% arrange(desc(tableGeo$mean)) %>% kable() %>% kable_styling()
GeoLocalArea mean
West End 178.3846
Downtown 176.1818
Renfrew-Collingwood 173.0000
Mount Pleasant 142.5000
Strathcona 129.1667
Killarney 123.0000
Fairview 118.0000
Riley Park 117.0000
Grandview-Woodland 101.0000

The table above shows that west End has the highest number of total units per business owner.