The dataset on city owned trees was downloaded from City of Vancouver Open Data Catalogue.
The page also contains the legend for column names.
cityTreesDF <- read.delim("cityTreesDF.txt", header = T, sep = ";")
# what is the size of this database (after cleaning)
dim(cityTreesDF)
## [1] 140969 5
# what are the names of neighbourhoods?
levels(cityTreesDF$NEIGHBOURHOOD_NAME)
## [1] "ARBUTUS RIDGE" "DOWNTOWN"
## [3] "DUNBAR - SOUTHLANDS" "FAIRVIEW"
## [5] "GRANDVIEW - WOODLANDS" "HASTINGS - SUNRISE"
## [7] "KENSINGTON-CEDAR COTTAGE" "KERRISDALE"
## [9] "KILLARNEY" "KITSILANO"
## [11] "MARPOLE" "MOUNT PLEASANT"
## [13] "OAKRIDGE" "RENFREW - COLLINGWOOD"
## [15] "RILEY PARK" "SHAUGHNESSY"
## [17] "SOUTH CAMBIE" "STRATHCONA"
## [19] "SUNSET" "VICTORIA - FRASERVIEW"
## [21] "WEST END" "WEST POINT GREY"
The height of trees in the city is classified into 10 categories, with the following name:
And to show visually the distribution of tree height:
Just as a sanity check, it should be that height and diameter are correlated. The median is a good measure for comparison since the data seem to have a lot of outliers.
Actually, there is. From the graph below we see that the “older” and “richer” residential neighborhoods like Shaunessey, Kitsilano, and Dunbar have bigger trees, whereas Downtown, being newer and primarily commercial, have smaller trees.
To explore this question, first I looked at the frequency of trees planted in each month by year, and the figure seem to suggest that the seasonal pattern is consistent each year.
Then, I aggregated the years, and looked at only monthly frequencies. I fitted a sinusoidal curve by glm to assess the pattern. Tree planting seems to peak around January, and cease in July!