Assignment:-
For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
I came across NYCflights package and it has flights related data.
This package contains information about all flights that departed from NYC (i.e., EWR, JFK and LGA) in 2013: 336,776 flights with 16 variables. To help understand what causes delays, it also includes a number of other useful datasets: weather, planes, airports, airlines.
I loaded the data into mySQL in my local and will read the data and from it into dataframes and load them to Mongodb.
#library(nycflights13)
library(mongolite)
library(jsonlite)
library(RMySQL)
## Loading required package: DBI
library(sys)
#Importing data into dataframes
mydb = dbConnect(MySQL(), user=Sys.getenv("userid"), password=Sys.getenv("pwd"), dbname='week13-flights', host='localhost')
dbListTables(mydb)
## [1] "airlines" "airports" "flights" "planes" "weather"
airlines <- dbReadTable(mydb, "airlines")
head(airlines)
## carrier name
## 1 9E Endeavor Air Inc.
## 2 AA American Airlines Inc.
## 3 AS Alaska Airlines Inc.
## 4 B6 JetBlue Airways
## 5 DL Delta Air Lines Inc.
## 6 EV ExpressJet Airlines Inc.
airports <- dbReadTable(mydb, "airports")
head(airports)
## faa name lat lon alt tz dst
## 1 04G Lansdowne Airport 41.13047 -80.61958 1044 -5 A
## 2 06A Moton Field Municipal Airport 32.46057 -85.68003 264 -6 A
## 3 06C Schaumburg Regional 41.98934 -88.10124 801 -6 A
## 4 06N Randall Airport 41.43191 -74.39156 523 -5 A
## 5 09J Jekyll Island Airport 31.07447 -81.42778 11 -5 A
## 6 0A9 Elizabethton Municipal Airport 36.37122 -82.17342 1593 -5 A
## tzone
## 1 America/New_York
## 2 America/Chicago
## 3 America/Chicago
## 4 America/New_York
## 5 America/New_York
## 6 America/New_York
flights <- dbReadTable(mydb, "flights")
head(flights)
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## arr_delay carrier flight tailnum origin dest air_time distance hour
## 1 11 UA 1545 N14228 EWR IAH 227 1400 5
## 2 20 UA 1714 N24211 LGA IAH 227 1416 5
## 3 33 AA 1141 N619AA JFK MIA 160 1089 5
## 4 -18 B6 725 N804JB JFK BQN 183 1576 5
## 5 -25 DL 461 N668DN LGA ATL 116 762 6
## 6 12 UA 1696 N39463 EWR ORD 150 719 5
## minute time_hour
## 1 15 2013-01-01 05:00:00
## 2 29 2013-01-01 05:00:00
## 3 40 2013-01-01 05:00:00
## 4 45 2013-01-01 05:00:00
## 5 0 2013-01-01 06:00:00
## 6 58 2013-01-01 05:00:00
planes <- dbReadTable(mydb, "planes")
head(planes)
## tailnum year type manufacturer model engines
## 1 N10156 2004 Fixed wing multi engine EMBRAER EMB-145XR 2
## 2 N102UW 1998 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 3 N103US 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 4 N104UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## 5 N10575 2002 Fixed wing multi engine EMBRAER EMB-145LR 2
## 6 N105UW 1999 Fixed wing multi engine AIRBUS INDUSTRIE A320-214 2
## seats speed engine
## 1 55 NA Turbo-fan
## 2 182 NA Turbo-fan
## 3 182 NA Turbo-fan
## 4 182 NA Turbo-fan
## 5 55 NA Turbo-fan
## 6 182 NA Turbo-fan
weather <- dbReadTable(mydb, "weather")
head(weather)
## origin year month day hour temp dewp humid wind_dir wind_speed
## 1 EWR 2013 1 1 0 37.04 21.92 53.97 230 10.35702
## 2 EWR 2013 1 1 1 37.04 21.92 53.97 230 13.80936
## 3 EWR 2013 1 1 2 37.94 21.92 52.09 230 12.65858
## 4 EWR 2013 1 1 3 37.94 23.00 54.51 230 13.80936
## 5 EWR 2013 1 1 4 37.94 24.08 57.04 240 14.96014
## 6 EWR 2013 1 1 6 39.02 26.06 59.37 270 10.35702
## wind_gust precip pressure visib time_hour
## 1 11.91865 0 1013.9 10 2012-12-31 19:00:00
## 2 15.89154 0 1013.0 10 2012-12-31 20:00:00
## 3 14.56724 0 1012.6 10 2012-12-31 21:00:00
## 4 15.89154 0 1012.7 10 2012-12-31 22:00:00
## 5 17.21583 0 1012.8 10 2012-12-31 23:00:00
## 6 11.91865 0 1012.0 10 2013-01-01 01:00:00
Creating Data collection in mongo.
I have started mondo deamon on the machine prior to running this
mon_airlines <- mongo(collection = "airlines", db = "db")
mon_airlines$remove('{}') # Had to empty the collection to avoid duplicates as i ran the program multiple times
mon_airlines$insert(airlines)
## List of 5
## $ nInserted : num 16
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mon_airlines$count() # Check count to ensure record numbers
## [1] 16
mon_airports <- mongo(collection = "airports", db = "db")
mon_airports$remove('{}') # Had to empty the collection to avoid duplicates as i ran the program multiple times
mon_airports$insert(airports)
## List of 5
## $ nInserted : num 1458
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mon_airports$count() # Check count to ensure record numbers
## [1] 1458
mon_flights <- mongo(collection = "flights", db = "db")
mon_flights$remove('{}') # Had to empty the collection to avoid duplicates as i ran the program multiple times
mon_flights$insert(flights)
## List of 5
## $ nInserted : num 327346
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mon_flights$count() # Check count to ensure record numbers
## [1] 327346
mon_planes <- mongo(collection = "planes", db = "db")
mon_planes$remove('{}') # Had to empty the collection to avoid duplicates as i ran the program multiple times
mon_planes$insert(planes)
## List of 5
## $ nInserted : num 3252
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mon_planes$count() # Check count to ensure record numbers
## [1] 3252
mon_weather <- mongo(collection = "weather", db = "db")
mon_weather$remove('{}') # Had to empty the collection to avoid duplicates as i ran the program multiple times
mon_weather$insert(weather)
## List of 5
## $ nInserted : num 23021
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mon_weather$count() # Check count to ensure record numbers
## [1] 23021