The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work.
Your task is to:
(1) Choose any three of the “wide” datasets identified in the Week 6 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 6 assignment!) For each of the three chosen datasets:
(2) Please include in your homework submission, for each of the three chosen datasets:
# List of IoT Platforms
iot_platform_data<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/Data607/Project2/IoT%20Platforms.csv")
# Ride Austin
ride_data_1<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/Data607/Project2/merged_ride_weather_data_1.csv")
ride_data_2<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/Data607/Project2/merged_ride_weather_data_2.csv")
# City of Chicago - Locations of Array of Things sensor nodes
array_of_things_locations_data<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/Data607/Project2/array-of-things-locations-1.csv")
List of IoT Platforms
Listing an overview of the IoT platforms data and checking which company offers on-prem services.
glimpse(iot_platform_data)
## Observations: 34
## Variables: 7
## $ Company <fct> ClearBlade, Amazon, PTC, IBM, Microsoft, Ayla Ne...
## $ IoT.Platform <fct> ClearBlade IoT Platform, AWS IoT, Thingworx, Wat...
## $ Edge.Platform <fct> ClearBlade Edge, AWS Greengrass, none, none, non...
## $ SaaS.Cloud <fct> Google, AWS, AWS, Bluemix, Azure, AWS, Tradition...
## $ Any.Cloud <fct> Yes, No, No, No, No, No, No, , , , No, , No, , ,...
## $ On.Prem <fct> Yes, No, No, Yes, No, No, No, No, No, No, No, , ...
## $ Notes <fct> , , , , , , Cellular-based., SIP (system in pack...
iot_platform_data %>%
summarise(companies_total_count=n()) %>%
kable()
| companies_total_count |
|---|
| 34 |
iot_platform_data %>%
filter(On.Prem == "Yes") %>%
kable()
| Company | IoT.Platform | Edge.Platform | SaaS.Cloud | Any.Cloud | On.Prem | Notes |
|---|---|---|---|---|---|---|
| ClearBlade | ClearBlade IoT Platform | ClearBlade Edge | Yes | Yes | ||
| IBM | Watson IoT Platform | none | Bluemix | No | Yes |
Ride Austin
Listing an overview of the Ride Austin (a transportation company based in Austin, TX) data and mapping riders starting locations.
ride_data<-rbind(ride_data_1,ride_data_2)
riders_of_ride_data<-ride_data %>%
group_by(rider_rating) %>%
summarise(count=n())
riders_fares_ride_data<-ride_data %>%
gather(fares, n, base_fare:time_fare) %>%
mutate(fares = gsub("fares","",fares)) %>%
arrange(rider_id, rider_rating) %>%
select(53,54,55)
head(riders_fares_ride_data,10)
## rating fares n
## 1 5 base_fare 1.50
## 2 5 base_fare 1.50
## 3 5 base_fare 1.50
## 4 5 total_fare 25.84
## 5 5 total_fare 5.00
## 6 5 total_fare 5.00
## 7 5 rate_per_mile 1.50
## 8 5 rate_per_mile 1.50
## 9 5 rate_per_mile 1.50
## 10 5 rate_per_minute 0.25
kable(riders_of_ride_data[with(riders_of_ride_data, order(-riders_of_ride_data$rider_rating)),])
| rider_rating | count |
|---|---|
| 5 | 56 |
| 4 | 1 |
| 1 | 1 |
refine_ride_data<-ride_data %>%
group_by(end_location_long,end_location_lat) %>%
summarize(count=n())
glimpse(refine_ride_data)
## Observations: 48
## Variables: 3
## $ end_location_long <dbl> -121.039, -97.788, -97.781, -97.776, -97.772...
## $ end_location_lat <dbl> 38.676, 30.258, 30.242, 30.236, 30.202, 30.2...
## $ count <int> 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,...
ggmap(map, extent = 'device')
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead
City of Chicago - Locations of Array of Things sensor nodes
Listing an overview of projects and plans that some went live and some are still planned and locating sensors on the map.
glimpse(array_of_things_locations_data)
## Observations: 41
## Variables: 8
## $ Name <fct> Ashland Av - Division St , Wabansia - Milwaukee,...
## $ Location.Type <fct> CDOT Placemaking Project, CDOT Placemaking Proje...
## $ Category <fct> Urban Placemaking, Urban Placemaking, Urban Plac...
## $ Notes <fct> , , , , , , , , , , , single node, Single node w...
## $ Status <fct> Planned, Planned, Planned, Planned, Planned, Pla...
## $ Latitude <dbl> 41.90351, 41.91235, 41.91409, 41.89200, 41.83866...
## $ Longitude <dbl> -87.66716, -87.68214, -87.68302, -87.61164, -87....
## $ Location <fct> (41.9035068, -87.6671648), (41.9123537, -87.6821...
status_of_things<-array_of_things_locations_data %>%
group_by(Status) %>%
summarise(count=n())
kable(status_of_things)
| Status | count |
|---|---|
| Live | 12 |
| Planned | 29 |
dat <- data.frame(
status = factor(status_of_things$Status, levels=status_of_things$Status),
count = status_of_things$count
)
ggplot(data=dat, aes(x=status, y=count, fill=time)) +
geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity") +
guides(fill=FALSE) +
xlab("Status type") + ylab("Status count per type") +
ggtitle("Chicago's planning status")
ggmap(map, extent = 'device')
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead