The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work.

Your task is to:
(1) Choose any three of the “wide” datasets identified in the Week 6 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 6 assignment!) For each of the three chosen datasets:


(2) Please include in your homework submission, for each of the three chosen datasets:

List of datasets used respectively:
- List of IoT Platforms
- Ride Austin
- City of Chicago - Locations of Array of Things sensor nodes
# List of IoT Platforms
iot_platform_data<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/Data607/Project2/IoT%20Platforms.csv")

# Ride Austin
ride_data_1<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/Data607/Project2/merged_ride_weather_data_1.csv")
ride_data_2<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/Data607/Project2/merged_ride_weather_data_2.csv")

# City of Chicago - Locations of Array of Things sensor nodes
array_of_things_locations_data<-read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/Data607/Project2/array-of-things-locations-1.csv")

List of IoT Platforms
Listing an overview of the IoT platforms data and checking which company offers on-prem services.

glimpse(iot_platform_data)
## Observations: 34
## Variables: 7
## $ Company       <fct> ClearBlade, Amazon, PTC, IBM, Microsoft, Ayla Ne...
## $ IoT.Platform  <fct> ClearBlade IoT Platform, AWS IoT, Thingworx, Wat...
## $ Edge.Platform <fct> ClearBlade Edge, AWS Greengrass, none, none, non...
## $ SaaS.Cloud    <fct> Google, AWS, AWS, Bluemix, Azure, AWS, Tradition...
## $ Any.Cloud     <fct> Yes, No, No, No, No, No, No, , , , No, , No, , ,...
## $ On.Prem       <fct> Yes, No, No, Yes, No, No, No, No, No, No, No, , ...
## $ Notes         <fct> , , , , , , Cellular-based., SIP (system in pack...
iot_platform_data %>%
  summarise(companies_total_count=n()) %>%
  kable()
companies_total_count
34
iot_platform_data %>%
  filter(On.Prem == "Yes") %>%
  kable()
Company IoT.Platform Edge.Platform SaaS.Cloud Any.Cloud On.Prem Notes
ClearBlade ClearBlade IoT Platform ClearBlade Edge Google Yes Yes
IBM Watson IoT Platform none Bluemix No Yes

Ride Austin
Listing an overview of the Ride Austin (a transportation company based in Austin, TX) data and mapping riders starting locations.

ride_data<-rbind(ride_data_1,ride_data_2)
riders_of_ride_data<-ride_data %>%
                      group_by(rider_rating) %>%
                      summarise(count=n())
riders_fares_ride_data<-ride_data %>%
                          gather(fares, n, base_fare:time_fare) %>%
                          mutate(fares = gsub("fares","",fares)) %>%
                          arrange(rider_id, rider_rating) %>%
                          select(53,54,55)
head(riders_fares_ride_data,10)
##    rating           fares     n
## 1       5       base_fare  1.50
## 2       5       base_fare  1.50
## 3       5       base_fare  1.50
## 4       5      total_fare 25.84
## 5       5      total_fare  5.00
## 6       5      total_fare  5.00
## 7       5   rate_per_mile  1.50
## 8       5   rate_per_mile  1.50
## 9       5   rate_per_mile  1.50
## 10      5 rate_per_minute  0.25
kable(riders_of_ride_data[with(riders_of_ride_data, order(-riders_of_ride_data$rider_rating)),])
rider_rating count
5 56
4 1
1 1
refine_ride_data<-ride_data %>%
                    group_by(end_location_long,end_location_lat) %>%
                    summarize(count=n())
glimpse(refine_ride_data)
## Observations: 48
## Variables: 3
## $ end_location_long <dbl> -121.039, -97.788, -97.781, -97.776, -97.772...
## $ end_location_lat  <dbl> 38.676, 30.258, 30.242, 30.236, 30.202, 30.2...
## $ count             <int> 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,...
ggmap(map, extent = 'device')
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead

City of Chicago - Locations of Array of Things sensor nodes
Listing an overview of projects and plans that some went live and some are still planned and locating sensors on the map.

glimpse(array_of_things_locations_data)
## Observations: 41
## Variables: 8
## $ Name          <fct> Ashland Av - Division St , Wabansia - Milwaukee,...
## $ Location.Type <fct> CDOT Placemaking Project, CDOT Placemaking Proje...
## $ Category      <fct> Urban Placemaking, Urban Placemaking, Urban Plac...
## $ Notes         <fct> , , , , , , , , , , , single node, Single node w...
## $ Status        <fct> Planned, Planned, Planned, Planned, Planned, Pla...
## $ Latitude      <dbl> 41.90351, 41.91235, 41.91409, 41.89200, 41.83866...
## $ Longitude     <dbl> -87.66716, -87.68214, -87.68302, -87.61164, -87....
## $ Location      <fct> (41.9035068, -87.6671648), (41.9123537, -87.6821...
status_of_things<-array_of_things_locations_data %>% 
                    group_by(Status) %>%
                    summarise(count=n())
kable(status_of_things)
Status count
Live 12
Planned 29
dat <- data.frame(
    status = factor(status_of_things$Status, levels=status_of_things$Status),
    count = status_of_things$count
)

ggplot(data=dat, aes(x=status, y=count, fill=time)) + 
    geom_bar(colour="black", fill="#DD8888", width=.8, stat="identity") + 
    guides(fill=FALSE) +
    xlab("Status type") + ylab("Status count per type") +
    ggtitle("Chicago's planning status")

ggmap(map, extent = 'device')
## Warning: `panel.margin` is deprecated. Please use `panel.spacing` property
## instead