Data Preparation

City of Chicago - Locations of Array of Things sensor nodes

  • Listing an overview of projects and plans that some went live and some are still planned and locating sensors on the map.
  • Replacing the following values in Status with: Live with True and Planned with False
array_of_things_locations_data<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/DATA606/Project-Proposal/array-of-things-locations-1.csv")

array_of_things_locations_data$Status<-as.character(array_of_things_locations_data$Status)
array_of_things_locations_data$Status[array_of_things_locations_data$Status=="Live"]<-"True"
array_of_things_locations_data$Status[array_of_things_locations_data$Status=="Planned"]<-"False"

status_of_things<-array_of_things_locations_data %>% 
                    group_by(Status) %>%
                    summarise(count=n())

dat <- data.frame(
    status = factor(status_of_things$Status, levels=status_of_things$Status),
    count = status_of_things$count
)

df<-round(data.frame(
  x = jitter(array_of_things_locations_data$Longitude, amount = .3),
  y = jitter(array_of_things_locations_data$Latitude, amount = .3)), 
  digits = 2)

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Whats the status of plans for the Array of Things project done by the City of Chicago? And where the sensors are located?

Cases

What are the cases, and how many are there?

Each case represents a sensor. There are 41 observations in the given data set.

Data collection

Describe the method of data collection.

Type of study

What type of study is this (observational/experiment)?

This is an observational study.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

The data is collected by the City of Chicago and found here for more relevance:

Response

What is the response variable, and what type is it (numerical/categorical)?

The response variable is status and is categrical.

Explanatory

What is the explanatory variable, and what type is it (numerical/categorival)?

The explanatory variables are the count and the geolocations are numerical.

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

glimpse(array_of_things_locations_data)
## Observations: 41
## Variables: 8
## $ Name          <fct> Ashland Av - Division St , Wabansia - Milwaukee,...
## $ Location.Type <fct> CDOT Placemaking Project, CDOT Placemaking Proje...
## $ Category      <fct> Urban Placemaking, Urban Placemaking, Urban Plac...
## $ Notes         <fct> , , , , , , , , , , , single node, Single node w...
## $ Status        <chr> "False", "False", "False", "False", "False", "Fa...
## $ Latitude      <dbl> 41.90351, 41.91235, 41.91409, 41.89200, 41.83866...
## $ Longitude     <dbl> -87.66716, -87.68214, -87.68302, -87.61164, -87....
## $ Location      <fct> (41.9035068, -87.6671648), (41.9123537, -87.6821...
kable(status_of_things)
Status count
False 29
True 12
ggplot(data=dat, aes(x=status, y=count, fill=time)) + 
    geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity") + 
    guides(fill=FALSE) +
    xlab("Status type") + ylab("Status count per type") +
    ggtitle("Chicago's planning status")

ggmap(map, extent = 'device')
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead