# load data
library(jsonlite)
library(knitr)
library(curl)
library(XML)
library(xml2)
library(ggplot2)
url <- curl(url = "https://data.cityofnewyork.us/resource/uvpi-gqnh.json")
tree_df <- fromJSON(url)
head(tree_df)
##I'm still working on scraping the web for this data
##url2 <- curl(url = "http://www.zillow.com/webservice/GetRegionChildren.htm?zws-id=<X1-ZWz17zfucflgcr_8jmj2>&state=ny&city=newyorkcity&childtype=neighborhood")
##redf <- xmlParse(url2)
##head(redf)
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Does tree health correlate to rent prices?
What are the cases, and how many are there?
There are 683,788 cases, and the cases are individual trees
Describe the method of data collection.
Through census collection, people went around NYC and took down information of every tree
What type of study is this (observational/experiment)?
observational
If you collected the data, state self-collected. If not, provide a citation/link.
https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh
What is the response variable? Is it quantitative or qualitative?
The response variable is the perceived health of the tree, so the response variable is qualitative.
You should have two independent variables, one quantitative and one qualitative.
There are a few explanatory variables. One is the property price (average rent) of the neighborhood, which is quantitative and another is or how invested people are in taking care of the trees. Also, people’s perception of tree health could differ and cause inaccuracies in the data.
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(tree_df)
## address bbl bin
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## block_id boro_ct borocode
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## boroname brch_light brch_other
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## brch_shoe cb_num census_tract
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## cncldist council_district created_at
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## curb_loc guards health
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## latitude longitude nta
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## nta_name problems root_grate
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## root_other root_stone sidewalk
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## spc_common spc_latin st_assem
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## st_senate state status
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## steward stump_diam tree_dbh
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## tree_id trnk_light trnk_other
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## trunk_wire user_type x_sp
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## y_sp zip_city zipcode
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
tree_df <- na.omit(tree_df)
plot(factor(tree_df$health), xlab = "tree health")
plot(factor(tree_df$boroname), xlab = "borough")
with(tree_df, table(tree_df$boroname, tree_df$health))
##
## Fair Good Poor
## Bronx 16 64 6
## Brooklyn 39 253 17
## Manhattan 32 159 5
## Queens 74 191 25
## Staten Island 19 60 0
Explanatory: What is the explanatory variable(s), and what type is it (numerical/categorical)?
There are a few explanatory variables. One could be the property price of the neighborhood or how invested people are in taking care of the trees. Also, people’s perception of tree health could differ and cause inaccuracies in the data.
I want to graph the data in a few ways. I want to plot the trees of a map, so we can see the density of the trees and I also want to reshape the data such that each row is be a neighborhood and the variables are average cost of the property value, average tree diameter, number of trees, and average tree health. I still need to find a database with reliable information about cost of property by neighborhood.