DDP_Week4_Presentation

Rohit Joshi
August 11 2017

Introduction

The purpose of this project is to analyze the relationship between property area and CO2 output in NYC buildings and see how this differs by borough

The simple app allows a user to select one or more boroughs and view a scatterplot of property area vs. CO2 output for buildings in the selected borough(s). The app also automatically calculates and displays a linear fit.

Getting and cleaning the data

The data used for this project comes from NYC Open data available here.

The data is in JSON format. After accessing the data using jsonlite's fromJSON function, we use complete.cases to only get complete obserations.

library(jsonlite)
my_data<-fromJSON("https://data.cityofnewyork.us/resource/xwwh-wcee.json"
  ,flatten=TRUE)
my_data<-my_data[complete.cases(my_data),]
colnames(my_data)[12]<-"area"
colnames(my_data)[16]<-"co2"
my_data$area<-as.numeric(my_data$area)
my_data$co2<-as.numeric(my_data$co2)

Building the plot

Plotly was used to make a scatterplot and trendline. Below is an example of the code, with the output on the next slide.

library(plotly)
library(curl)
library(webshot)
p<-plot_ly(my_data, x = ~area) %>% add_markers(y = ~co2, name="Points") %>% add_lines(y = ~fitted(lm(co2~area)), name="Linear fit")
htmlwidgets::saveWidget(as.widget(p), file = "demo.html")

The Plot and Interactive Tool

Below is the plot. The tool includes checkmark boxes to select one or more boroughs and automatically update the plot. Try it out!