City of Chicago - Locations of Array of Things Sensor Nodes

Part 1 - Introduction

The Array of Things (AoT) is an urban sensing project, a network of interactive, modular sensor boxes that will be installed around Chicago to collect real-time data on the city’s environment, infrastructure, and activity for research and public use.

A total of 500 nodes will be mounted around the city over the next two to three years. The first prototype nodes were installed in summer 2016, 2017 and more will be installed throughout 2018.

The objectives of this final project are:

  • What’s the status of plans for the AoT project?
  • And where the installed sensors are located at?

Part 2 - Data

  • Listing an overview of projects and plans that some went live and some are still planned and locating sensors on the map.
  • Replacing the following values in Status with: Live with True and Planned with False
array_of_things_locations_data<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/DATA606/Project-Proposal/array-of-things-locations-1.csv")

array_of_things_locations_data$Status<-as.character(array_of_things_locations_data$Status)
array_of_things_locations_data$Status[array_of_things_locations_data$Status=="Live"]<-"True"
t<-array_of_things_locations_data$Status[array_of_things_locations_data$Status=="True"]
array_of_things_locations_data$Status[array_of_things_locations_data$Status=="Planned"]<-"False"
f<-array_of_things_locations_data$Status[array_of_things_locations_data$Status=="False"]

status_of_things<-array_of_things_locations_data %>% 
                    group_by(Status) %>%
                    summarise(count=n())

dat <- data.frame(
    status = factor(status_of_things$Status, levels=status_of_things$Status),
    count = status_of_things$count
)

df<-round(data.frame(
  x = jitter(array_of_things_locations_data$Longitude, amount = .3),
  y = jitter(array_of_things_locations_data$Latitude, amount = .3)), 
  digits = 2)

Part 3 - Exploratory data analysis

  • Each case represents a sensor. There are 41 observations in the given data set.
glimpse(array_of_things_locations_data)
## Observations: 41
## Variables: 8
## $ Name          <fct> Ashland Av - Division St , Wabansia - Milwaukee,...
## $ Location.Type <fct> CDOT Placemaking Project, CDOT Placemaking Proje...
## $ Category      <fct> Urban Placemaking, Urban Placemaking, Urban Plac...
## $ Notes         <fct> , , , , , , , , , , , single node, Single node w...
## $ Status        <chr> "False", "False", "False", "False", "False", "Fa...
## $ Latitude      <dbl> 41.90351, 41.91235, 41.91409, 41.89200, 41.83866...
## $ Longitude     <dbl> -87.66716, -87.68214, -87.68302, -87.61164, -87....
## $ Location      <fct> (41.9035068, -87.6671648), (41.9123537, -87.6821...
kable(status_of_things)
Status count
False 29
True 12
ggplot(data=dat, aes(x=status, y=count, fill=time)) + 
    geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity") + 
    guides(fill=FALSE) +
    xlab("Status type") + ylab("Status count per type") +
    ggtitle("Chicago's planning status")

Longitude<-df$x
Latitude<-df$y
ggplot(df, aes(x=Longitude, y=Latitude)) + geom_point() + stat_smooth(method="lm", se=FALSE)

  • Feed same geolocation values into the linear regression model scatterplot
plot(Longitude~Latitude, data=df)
abline(lm(Longitude~Latitude, data=df))

ggmap(map, extent = 'device')
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

Part 4 - Conclusion

This is an observational study done to map 41 Nodes (devices) and to understand which ones went Live (True in this project) or which ones still stated as Planned (False).

The response variable in this study is the status which is considered to be categorical and the explanatory variables are the count and the geolocations (longitude, latitude) which are considered to be numerical.

It is worth to still continue this study when the project is completed by the City of Chicago, then we can check the spread of the 500 Nodes and what data will they be streaming in.

Comparing the scatterplots above, the linear model and the non-linear model, we can conclude that the geolocation data need to be more accurate. We’re not sure if this manually inserted to the datasets or if it was read from the nodes themselves.

References