NoSQL Migration

The objective of this project is to migrate data from Relational Database (MySQL) to NoSQL Database (MongoDB):
The datasets for this project is downloaded from http://openflights.org/data.html. The data consists of flights information for all airlines in the world. For the purpose of this project, I’ve selected all flights (airlines, airports, and routes) that have source and destination in the United States.
library(RMySQL)
## Warning: package 'RMySQL' was built under R version 3.3.2
## Loading required package: DBI
library(mongolite)
## Warning: package 'mongolite' was built under R version 3.3.2
(1) Download datasets from http://openflights.org. Clean and prepare it to be imported to MySQL database tables airlines, airports and routes. The structure of the database tables is shown here.
(1.a) Create MySQL Tables

Airlines_Table_Create_Script

Airports_Table_Create_Script

Routes_Table_Create_Script

(1.b) Load data into Airlines Table.

Table_Load_Script

(2) MySQL Database Schema
(2.a) Airlines/Airports/Routes Table Structure.

Tables

(2.b) Airlines Table Data

Airlines

(2.c) Airports Table Data

Airports

(2.d) Routes Table Data

Routes

(3) Connect to MySQL database and fetch data from airlines, airports and routes tables:
(3.a) Read data into 3 different data frames airlines_df, airports_d and routes_df.
flightsDB = dbConnect(MySQL(), dbname='flights', host='localhost', user='root', password='root')
dbListTables(flightsDB)
## [1] "airlines" "airports" "routes"
airlines_rs = dbSendQuery(flightsDB, "select * from airlines")
airlines_df = fetch(airlines_rs, n=-1)

airports_rs = dbSendQuery(flightsDB, "select * from airports")
## Warning in .local(conn, statement, ...): Decimal MySQL column 6 imported as numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 7 imported as numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 8 imported as numeric
## Warning in .local(conn, statement, ...): Decimal MySQL column 9 imported as numeric
airports_df = fetch(airports_rs, n=-1)

routes_rs = dbSendQuery(flightsDB, "select * from routes")
## Warning in .local(conn, statement, ...): Decimal MySQL column 7 imported as numeric
routes_df = fetch(routes_rs, n=-1)
(3.b) Check the size of data frames to make sure it retrieved all data from the database tables.
nrow(airlines_df)
## [1] 93
nrow(airports_df)
## [1] 153
nrow(routes_df)
## [1] 563
head(airlines_df, 10)
##    AIRLINE_ID                   NAME IATA_CODE ICAO_CODE     CALL_SIGN       COUNTRY ACTIVE
## 1          10            40 Mile Air        Q5       MLA      MILE-AIR United States      Y
## 2          22         Aloha Airlines        AQ       AAH         ALOHA United States      Y
## 3          24      American Airlines        AA       AAL      AMERICAN United States      Y
## 4          35          Allegiant Air        G4       AAY     ALLEGIANT United States      Y
## 5         109 Alaska Central Express        KO       AER       ACE AIR United States      Y
## 6         149     Air Cargo Carriers        2Q       SNC   NIGHT CARGO United States      Y
## 7         281  America West Airlines        HP       AWE        CACTUS United States      Y
## 8         282          Air Wisconsin        ZW       AWI AIR WISCONSIN United States      Y
## 9         397              Arrow Air        JW       APW         BIG A United States      Y
## 10        439        Alaska Airlines        AS       ASA        ALASKA United States      Y
head(airports_df, 10)
##    AIRPORT_ID                      NAME              CITY       COUNTRY IATA_FAA ICAO LATITUDE LONGITUDE ALTITUDE TIMEZONE DST     TIMEZONE_DATABASE
## 1        3433               Marina Muni          Fort Ord United States      OAR KOAR 36.68188 -121.7623      134       -8   A America/Los_Angeles\r
## 2        3434         Sacramento Mather        Sacramento United States      MHR KMHR 38.55390 -121.2976       96       -8   A America/Los_Angeles\r
## 3        3435          Bicycle Lake Aaf        Fort Irwin United States      BYS KBYS 35.28053 -116.6300     2350       -8   A America/Los_Angeles\r
## 4        3436      Twentynine Palms Eaf Twenty Nine Palms United States      NXP KNXP 34.29616 -116.1622     2051       -8   A America/Los_Angeles\r
## 5        3449                Travis Afb         Fairfield United States      SUU KSUU 38.26269 -121.9275       62       -8   A America/Los_Angeles\r
## 6        3453 Metropolitan Oakland Intl           Oakland United States      OAK KOAK 37.72128 -122.2207        9       -8   A America/Los_Angeles\r
## 7        3455         Port Angeles Cgas      Port Angeles United States      NOW KNOW 48.14148 -123.4141       13       -8   A America/Los_Angeles\r
## 8        3467              Spokane Intl           Spokane United States      GEG KGEG 47.61986 -117.5338     2376       -8   A America/Los_Angeles\r
## 9        3469        San Francisco Intl     San Francisco United States      SFO KSFO 37.61897 -122.3749       13       -8   A America/Los_Angeles\r
## 10       3484          Los Angeles Intl       Los Angeles United States      LAX KLAX 33.94254 -118.4081      126       -8   A America/Los_Angeles\r
head(routes_df, 10)
##    AIRLINE_CODE AIRLINE_ID SOURCE_AIRPORT_CODE SOURCE_AIRPORT_ID DEST_AIRPORT_CODE DEST_AIRPORT_ID CODESHARE STOPS EQUIPMENT
## 1            AA         24                 BUR              3644               PDX            3720         Y     0     CR7\r
## 2            AA         24                 BUR              3644               SEA            3577         Y     0     737\r
## 3            AA         24                 EUG              4099               LAX            3484         Y     0     CRJ\r
## 4            AA         24                 EUG              4099               PDX            3720         Y     0     DH4\r
## 5            AA         24                 FAT              3687               LAX            3484         Y     0     CRJ\r
## 6            AA         24                 LAS              3877               LAX            3484               0     738\r
## 7            AA         24                 LAX              3484               EUG            4099         Y     0     CRJ\r
## 8            AA         24                 LAX              3484               FAT            3687         Y     0     CRJ\r
## 9            AA         24                 LAX              3484               LAS            3877               0     738\r
## 10           AA         24                 LAX              3484               MFR            4101         Y     0     DH4\r
(4) Connect to MongoDB to import data into collections:
(4.a) Create 3 collections airlines, airports and routes.
airlines_collection = mongo(collection = "airlines", db="flights", url="mongodb://localhost")
airports_collection = mongo(collection = "airports", db="flights", url="mongodb://localhost")
routes_collection   = mongo(collection = "routes",   db="flights", url="mongodb://localhost")
(4.b) Insert data from airlines_df, airports_df and routes_df into MongoDB collections.
airlines_collection$insert(airlines_df)
## 
Complete! Processed total of 93 rows.
## $nInserted
## [1] 93
## 
## $nMatched
## [1] 0
## 
## $nRemoved
## [1] 0
## 
## $nUpserted
## [1] 0
## 
## $writeErrors
## list()
airports_collection$insert(airports_df)
## 
Complete! Processed total of 153 rows.
## $nInserted
## [1] 153
## 
## $nMatched
## [1] 0
## 
## $nRemoved
## [1] 0
## 
## $nUpserted
## [1] 0
## 
## $writeErrors
## list()
routes_collection$insert(routes_df)
## 
Complete! Processed total of 563 rows.
## $nInserted
## [1] 563
## 
## $nMatched
## [1] 0
## 
## $nRemoved
## [1] 0
## 
## $nUpserted
## [1] 0
## 
## $writeErrors
## list()
(4.c) Find airlines that have code WN.
airlines_collection$count()
## [1] 93
airlines_collection$find('{"IATA_CODE":"WN"}')
## 
 Found 1 records...
 Imported 1 records. Simplifying into dataframe...
##   AIRLINE_ID               NAME IATA_CODE ICAO_CODE CALL_SIGN       COUNTRY ACTIVE
## 1       4547 Southwest Airlines        WN       SWA SOUTHWEST United States      Y
(4.d) Find all distinct airports and find the airport with the code LAX.
airports_collection$count()
## [1] 153
airports_collection$distinct("IATA_FAA")
##   [1] "OAR" "MHR" "BYS" "NXP" "SUU" "OAK" "NOW" "GEG" "SFO" "LAX" "NID" "INS" "NKX" "HHR" "NTD" "EDW" "NLC" "GRF" "VCV" "BFL" "NUW" "SEA" "LGB" "TNX" "NZY" "MWH" "SPB" "BAB" "RIU" "BUR" "SAC" "SKA" "TCM" "MER" "MCC" "FAT" "IPL" "CXL" "CIC" "SFF" "PDX" "BFI" "SAN" "ONT" "NFL" "SJC" "PMD" "NJK" "BLI" "NUQ" "RAL" "SCK" "RNO" "VBG" "LSV" "RIV" "MOD" "SMF" "PAE" "PSP" "SNA" "LAS" "MMV" "PUW" "LWS" "MRY" "SBA" "VNY" "CLD" "RDD" "EUG" "MFR" "RDM" "BLH" "TVL" "NZJ" "ACV" "ALW" "CEC" "CLM" "EKO" "LMT" "OTH" "SBP" "SMX" "YKM" "PSC" "LPS" "PWT" "LKE" "FUL" "1RL" "PDT" "EAT" "OXR" "STS" "AVX" "MHV" "TOA" "ELY" "MMH" "FRD" "ESD" "AST" "ONP" "IYK" "VIS" "MCE" "BXS" "LVK" "MPI" "TRM" "SMO" "UDD" "OLM" "DWA" "APC" "SDM" "SBD" "SQL" "SEE" "TKF" "RBK" "SLE" "MYV" "CCR" "KLS" "HSH" "TTD" "HIO" "SPZ" "WHP" "MAE" "REI" "RIR" "TIW" "RNM" "LPC" "RNT" "POC" "PAO" "HWD" "NGZ" "L52" "LHM" "COE" "W04" "55S" "S40" "CLS" "M94" "S30" "CVO"
airports_collection$find('{"IATA_FAA":"LAX"}')
## 
 Found 1 records...
 Imported 1 records. Simplifying into dataframe...
##   AIRPORT_ID             NAME        CITY       COUNTRY IATA_FAA ICAO LATITUDE LONGITUDE ALTITUDE TIMEZONE DST     TIMEZONE_DATABASE
## 1       3484 Los Angeles Intl Los Angeles United States      LAX KLAX 33.94254 -118.4081      126       -8   A America/Los_Angeles\r
(4.e) Find all distinct airline codes and all routes that have LAX as source and destination airport.
routes_collection$count()
## [1] 563
routes_collection$distinct("AIRLINE_CODE")
##  [1] "AA" "AS" "B6" "DL" "FL" "G4" "HA" "K5" "KL" "M5" "NK" "SQ" "UA" "US" "VX" "WN" "ZK"
routes_collection$find('{"$and": [{"SOURCE_AIRPORT_CODE": "LAX"}, {"DEST_AIRPORT_CODE": "SFO"}]}')
## 
 Found 7 records...
 Imported 7 records. Simplifying into dataframe...
##   AIRLINE_CODE AIRLINE_ID SOURCE_AIRPORT_CODE SOURCE_AIRPORT_ID DEST_AIRPORT_CODE DEST_AIRPORT_ID CODESHARE STOPS                 EQUIPMENT
## 1           AA         24                 LAX              3484               SFO            3469               0                     738\r
## 2           AS        439                 LAX              3484               SFO            3469         Y     0                     738\r
## 3           DL       2009                 LAX              3484               SFO            3469         Y     0                     E75\r
## 4           UA       5209                 LAX              3484               SFO            3469               0 739 752 753 319 320 738\r
## 5           US       5265                 LAX              3484               SFO            3469               0                     738\r
## 6           VX       5331                 LAX              3484               SFO            3469               0                 320 319\r
## 7           WN       4547                 LAX              3484               SFO            3469               0             73W 73C 733\r
(4.f) Display the contents of all collections.

Collections

(5) Conclusion:

(5.a) MongoDB contains collections instead of tables and documents instead of rows in the collections.