Assignment Description

For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.

For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.

For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.

Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.

Step 1 - Import the MYSQL flights database data into R.

Connect to the MYSQL Flights database.

# Open the database connection and list the database tables. 
connection <- dbConnect(RMySQL::MySQL(), user = 'root', password = '', dbname = 'flights', host = 'localhost')
tables <- dbListTables(connection)

tables
## [1] "airlines" "airports" "flights"  "planes"   "weather"

Import the MYSQL table data into R, and store it in dataframes.

For this assignment, I am only interested in the “airlines”, “airports”, and “flights” tables, so I will ignore the “planes” and “weather” tables.

airlines <- dbReadTable(connection, 'airlines')
airports <- dbReadTable(connection, 'airports')
flights <- dbReadTable(connection, 'flights')

# Close the database connection.
dbDisconnect(connection)
## [1] TRUE

Confirm that the database table data is now available in R.

head(airlines)
##   carrier                       name
## 1      9E        Endeavor Air Inc.\r
## 2      AA   American Airlines Inc.\r
## 3      AS     Alaska Airlines Inc.\r
## 4      B6          JetBlue Airways\r
## 5      DL     Delta Air Lines Inc.\r
## 6      EV ExpressJet Airlines Inc.\r

Remove the carriage returns (“”) from the “name” column of the airlines dataframe so that the data is cleaner.

airlines$name <- str_replace(airlines$name, '\\r', '')
colnames(airlines)
## [1] "carrier" "name"
head(airlines)
##   carrier                     name
## 1      9E        Endeavor Air Inc.
## 2      AA   American Airlines Inc.
## 3      AS     Alaska Airlines Inc.
## 4      B6          JetBlue Airways
## 5      DL     Delta Air Lines Inc.
## 6      EV ExpressJet Airlines Inc.
head(airports)
##   faa                           name      lat       lon  alt tz dst
## 1 04G              Lansdowne Airport 41.13047 -80.61958 1044 -5   A
## 2 06A  Moton Field Municipal Airport 32.46057 -85.68003  264 -5   A
## 3 06C            Schaumburg Regional 41.98934 -88.10124  801 -6   A
## 4 06N                Randall Airport 41.43191 -74.39156  523 -5   A
## 5 09J          Jekyll Island Airport 31.07447 -81.42778   11 -4   A
## 6 0A9 Elizabethton Municipal Airport 36.37122 -82.17342 1593 -4   A
head(flights)
##   year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013     1   1      517         2      830        11      UA  N14228
## 2 2013     1   1      533         4      850        20      UA  N24211
## 3 2013     1   1      542         2      923        33      AA  N619AA
## 4 2013     1   1      544        -1     1004       -18      B6  N804JB
## 5 2013     1   1      554        -6      812       -25      DL  N668DN
## 6 2013     1   1      554        -4      740        12      UA  N39463
##   flight origin dest air_time distance hour minute
## 1   1545    EWR  IAH      227     1400    5     17
## 2   1714    LGA  IAH      227     1416    5     33
## 3   1141    JFK  MIA      160     1089    5     42
## 4    725    JFK  BQN      183     1576    5     44
## 5    461    LGA  ATL      116      762    6     54
## 6   1696    EWR  ORD      150      719    6     54

Step 2 - Import the Data into MongoDB.

Prepare the flights data for insertion into the MongoDB database by merging the dataframes together.

# Merge the 3 dataframes (airlines, airports, and flights) together with the help of the dplyr package.
# We create a new object (flights_data) to contain the merged data. This object will then be imported into the MongoDB database.

# Add the flights dataframe to the flights_data object, and merge in the airlines dataframe.
flights_data <- flights %>%
  merge(airlines) %>%
  # To prevent conflicts with the "airports" table name column,
  # we rename the "airlines" name column to "airline".
  rename(airline = name) %>%
  rowwise() %>%
  ungroup()

# Now add the airports dataframe, and seperate the data into origin and destination airports.
flights_data <- airports %>%
  rename_(.dots = setNames(names(.), paste0('origin_', names(.)))) %>%
  rename(origin = origin_faa) %>%
  right_join(flights_data)

flights_data <- airports %>%
  rename_(.dots = setNames(names(.), paste0('dest_', names(.)))) %>%
  rename(dest = dest_faa) %>%
  right_join(flights_data)

Open a connection to the MongoDB database.

mongo_database <- mongo(collection = "flights_data", db = "flights_mongo")

Insert the flights data into the MongoDB database.

mongo_database$insert(flights_data)
## List of 5
##  $ nInserted  : num 336776
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Confirm that the data has been successfully imported into MongoDB by running a query for flights flying out of JFK.

jfk_flights <- mongo_database$find('{"origin": "JFK"}') %>%
  group_by(airline, origin_name, dest_name) %>%
  summarise(total_flights = n()) %>%
  arrange(airline)

kable(jfk_flights, "html", escape = F) %>%
  kable_styling("striped", full_width = T) %>%
  column_spec(1, bold = T)
airline origin_name dest_name total_flights
American Airlines Inc. John F Kennedy Intl Austin Bergstrom Intl 5475
American Airlines Inc. John F Kennedy Intl Chicago Ohare Intl 5475
American Airlines Inc. John F Kennedy Intl Dallas Fort Worth Intl 5505
American Airlines Inc. John F Kennedy Intl Eagle Co Rgnl 1545
American Airlines Inc. John F Kennedy Intl Fort Lauderdale Hollywood Intl 2730
American Airlines Inc. John F Kennedy Intl General Edward Lawrence Logan Intl 21825
American Airlines Inc. John F Kennedy Intl George Bush Intercontinental 4110
American Airlines Inc. John F Kennedy Intl Los Angeles Intl 48255
American Airlines Inc. John F Kennedy Intl Mc Carran Intl 9585
American Airlines Inc. John F Kennedy Intl Miami Intl 33315
American Airlines Inc. John F Kennedy Intl Orlando Intl 10950
American Airlines Inc. John F Kennedy Intl San Diego Intl 5475
American Airlines Inc. John F Kennedy Intl San Francisco Intl 21330
American Airlines Inc. John F Kennedy Intl Seattle Tacoma Intl 5475
American Airlines Inc. John F Kennedy Intl Tampa Intl 4665
American Airlines Inc. John F Kennedy Intl NA 21030
Delta Air Lines Inc. John F Kennedy Intl Austin Bergstrom Intl 5355
Delta Air Lines Inc. John F Kennedy Intl Denver Intl 5475
Delta Air Lines Inc. John F Kennedy Intl Detroit Metro Wayne Co 5805
Delta Air Lines Inc. John F Kennedy Intl Fort Lauderdale Hollywood Intl 16245
Delta Air Lines Inc. John F Kennedy Intl General Edward Lawrence Logan Intl 14580
Delta Air Lines Inc. John F Kennedy Intl Hartsfield Jackson Atlanta Intl 28110
Delta Air Lines Inc. John F Kennedy Intl Jackson Hole Airport 30
Delta Air Lines Inc. John F Kennedy Intl Lambert St Louis Intl 15
Delta Air Lines Inc. John F Kennedy Intl Los Angeles Intl 37515
Delta Air Lines Inc. John F Kennedy Intl Louis Armstrong New Orleans Intl 1815
Delta Air Lines Inc. John F Kennedy Intl Mc Carran Intl 25095
Delta Air Lines Inc. John F Kennedy Intl Miami Intl 16395
Delta Air Lines Inc. John F Kennedy Intl Minneapolis St Paul Intl 375
Delta Air Lines Inc. John F Kennedy Intl Nashville Intl 15
Delta Air Lines Inc. John F Kennedy Intl Orlando Intl 21450
Delta Air Lines Inc. John F Kennedy Intl Philadelphia Intl 30
Delta Air Lines Inc. John F Kennedy Intl Phoenix Sky Harbor Intl 7035
Delta Air Lines Inc. John F Kennedy Intl Pittsburgh Intl 690
Delta Air Lines Inc. John F Kennedy Intl Portland Intl 6870
Delta Air Lines Inc. John F Kennedy Intl Ronald Reagan Washington Natl 30
Delta Air Lines Inc. John F Kennedy Intl Salt Lake City Intl 26220
Delta Air Lines Inc. John F Kennedy Intl San Antonio Intl 4545
Delta Air Lines Inc. John F Kennedy Intl San Diego Intl 8625
Delta Air Lines Inc. John F Kennedy Intl San Francisco Intl 27870
Delta Air Lines Inc. John F Kennedy Intl Seattle Tacoma Intl 18195
Delta Air Lines Inc. John F Kennedy Intl Southwest Florida Intl 1455
Delta Air Lines Inc. John F Kennedy Intl Tampa Intl 10710
Delta Air Lines Inc. John F Kennedy Intl NA 19965
Endeavor Air Inc. John F Kennedy Intl Austin Bergstrom Intl 30
Endeavor Air Inc. John F Kennedy Intl Baltimore Washington Intl 12840
Endeavor Air Inc. John F Kennedy Intl Buffalo Niagara Intl 11685
Endeavor Air Inc. John F Kennedy Intl Charleston Afb Intl 5205
Endeavor Air Inc. John F Kennedy Intl Charlotte Douglas Intl 4185
Endeavor Air Inc. John F Kennedy Intl Chicago Ohare Intl 15840
Endeavor Air Inc. John F Kennedy Intl Cincinnati Northern Kentucky Intl 9480
Endeavor Air Inc. John F Kennedy Intl Cleveland Hopkins Intl 5130
Endeavor Air Inc. John F Kennedy Intl Dallas Fort Worth Intl 5475
Endeavor Air Inc. John F Kennedy Intl Detroit Metro Wayne Co 11685
Endeavor Air Inc. John F Kennedy Intl General Edward Lawrence Logan Intl 13710
Endeavor Air Inc. John F Kennedy Intl General Mitchell Intl 2745
Endeavor Air Inc. John F Kennedy Intl Greater Rochester Intl 4095
Endeavor Air Inc. John F Kennedy Intl Hartsfield Jackson Atlanta Intl 825
Endeavor Air Inc. John F Kennedy Intl Indianapolis Intl 4860
Endeavor Air Inc. John F Kennedy Intl Jacksonville Intl 4095
Endeavor Air Inc. John F Kennedy Intl Kansas City Intl 4140
Endeavor Air Inc. John F Kennedy Intl Louis Armstrong New Orleans Intl 6540
Endeavor Air Inc. John F Kennedy Intl Louisville International Airport 690
Endeavor Air Inc. John F Kennedy Intl Martha's Vineyard 1065
Endeavor Air Inc. John F Kennedy Intl Memphis Intl 15
Endeavor Air Inc. John F Kennedy Intl Minneapolis St Paul Intl 16050
Endeavor Air Inc. John F Kennedy Intl Nashville Intl 5460
Endeavor Air Inc. John F Kennedy Intl Norfolk Intl 5715
Endeavor Air Inc. John F Kennedy Intl Philadelphia Intl 14100
Endeavor Air Inc. John F Kennedy Intl Pittsburgh Intl 11220
Endeavor Air Inc. John F Kennedy Intl Port Columbus Intl 180
Endeavor Air Inc. John F Kennedy Intl Raleigh Durham Intl 12645
Endeavor Air Inc. John F Kennedy Intl Richmond Intl 3735
Endeavor Air Inc. John F Kennedy Intl Ronald Reagan Washington Natl 16110
Endeavor Air Inc. John F Kennedy Intl San Antonio Intl 795
Endeavor Air Inc. John F Kennedy Intl Syracuse Hancock Intl 675
Endeavor Air Inc. John F Kennedy Intl Tampa Intl 45
Endeavor Air Inc. John F Kennedy Intl Washington Dulles Intl 8700
Envoy Air John F Kennedy Intl Baltimore Washington Intl 5475
Envoy Air John F Kennedy Intl Cincinnati Northern Kentucky Intl 5475
Envoy Air John F Kennedy Intl Cleveland Hopkins Intl 5475
Envoy Air John F Kennedy Intl Indianapolis Intl 5475
Envoy Air John F Kennedy Intl Nashville Intl 5475
Envoy Air John F Kennedy Intl Norfolk Intl 5475
Envoy Air John F Kennedy Intl Pittsburgh Intl 5475
Envoy Air John F Kennedy Intl Port Columbus Intl 10950
Envoy Air John F Kennedy Intl Raleigh Durham Intl 21900
Envoy Air John F Kennedy Intl Ronald Reagan Washington Natl 32910
Envoy Air John F Kennedy Intl Tampa Intl 3810
ExpressJet Airlines Inc. John F Kennedy Intl Birmingham Intl 15
ExpressJet Airlines Inc. John F Kennedy Intl Hartsfield Jackson Atlanta Intl 15
ExpressJet Airlines Inc. John F Kennedy Intl Washington Dulles Intl 21090
Hawaiian Airlines Inc. John F Kennedy Intl Honolulu Intl 5130
JetBlue Airways John F Kennedy Intl Albuquerque International Sunport 3810
JetBlue Airways John F Kennedy Intl Austin Bergstrom Intl 11205
JetBlue Airways John F Kennedy Intl Bob Hope 5565
JetBlue Airways John F Kennedy Intl Buffalo Niagara Intl 42045
JetBlue Airways John F Kennedy Intl Burlington Intl 20460
JetBlue Airways John F Kennedy Intl Charleston Afb Intl 9195
JetBlue Airways John F Kennedy Intl Charlotte Douglas Intl 10935
JetBlue Airways John F Kennedy Intl Chicago Ohare Intl 13575
JetBlue Airways John F Kennedy Intl Denver Intl 5070
JetBlue Airways John F Kennedy Intl Fort Lauderdale Hollywood Intl 44835
JetBlue Airways John F Kennedy Intl General Edward Lawrence Logan Intl 38355
JetBlue Airways John F Kennedy Intl Greater Rochester Intl 21090
JetBlue Airways John F Kennedy Intl Jacksonville Intl 15390
JetBlue Airways John F Kennedy Intl Long Beach 10020
JetBlue Airways John F Kennedy Intl Los Angeles Intl 25320
JetBlue Airways John F Kennedy Intl Louis Armstrong New Orleans Intl 16140
JetBlue Airways John F Kennedy Intl Martha's Vineyard 2250
JetBlue Airways John F Kennedy Intl Mc Carran Intl 19650
JetBlue Airways John F Kennedy Intl Metropolitan Oakland Intl 4680
JetBlue Airways John F Kennedy Intl Nantucket Mem 3975
JetBlue Airways John F Kennedy Intl Norman Y Mineta San Jose Intl 4920
JetBlue Airways John F Kennedy Intl Orlando Intl 49560
JetBlue Airways John F Kennedy Intl Palm Beach Intl 26085
JetBlue Airways John F Kennedy Intl Phoenix Sky Harbor Intl 5475
JetBlue Airways John F Kennedy Intl Pittsburgh Intl 1350
JetBlue Airways John F Kennedy Intl Portland Intl 4875
JetBlue Airways John F Kennedy Intl Portland Intl Jetport 19560
JetBlue Airways John F Kennedy Intl Raleigh Durham Intl 11955
JetBlue Airways John F Kennedy Intl Sacramento Intl 4260
JetBlue Airways John F Kennedy Intl Salt Lake City Intl 5475
JetBlue Airways John F Kennedy Intl San Diego Intl 9945
JetBlue Airways John F Kennedy Intl San Francisco Intl 15525
JetBlue Airways John F Kennedy Intl Sarasota Bradenton Intl 7110
JetBlue Airways John F Kennedy Intl Seattle Tacoma Intl 7710
JetBlue Airways John F Kennedy Intl Southwest Florida Intl 18630
JetBlue Airways John F Kennedy Intl Syracuse Hancock Intl 18990
JetBlue Airways John F Kennedy Intl Tampa Intl 25575
JetBlue Airways John F Kennedy Intl Washington Dulles Intl 10125
JetBlue Airways John F Kennedy Intl William P Hobby 10710
JetBlue Airways John F Kennedy Intl NA 49740
United Air Lines Inc. John F Kennedy Intl Los Angeles Intl 30885
United Air Lines Inc. John F Kennedy Intl San Francisco Intl 37125
US Airways Inc. John F Kennedy Intl Charlotte Douglas Intl 27930
US Airways Inc. John F Kennedy Intl Philadelphia Intl 510
US Airways Inc. John F Kennedy Intl Phoenix Sky Harbor Intl 16485
Virgin America John F Kennedy Intl Los Angeles Intl 26955
Virgin America John F Kennedy Intl Mc Carran Intl 5475
Virgin America John F Kennedy Intl Norman Y Mineta San Jose Intl 15
Virgin America John F Kennedy Intl Palm Springs Intl 285
Virgin America John F Kennedy Intl San Francisco Intl 21210

Advantages and Disadvantages of storing the data in a relational database vs. a NoSQL database.

The main advantage of storing this data in a NoSQL database over a relational database is the increased speed in data retrieval. Data is generally stored within a NoSQL database in denormalized form. This has an advantage over relational databases as table joins do not need to be performed everytime data is retrieved, ultimately resulting in faster data retrieval. Additionally, if the flights database continues to grow in size, and if it is opened for public access, a NoSQL database is much more scalable than a relational database and can handle the increase in size and transactions.

However, if our flights database were to remain at it’s current size, and remained unexposed to the public, a relational database may be a better choice. This would be especially true if we needed to write a lot of database queries in order to generate complex reports that pulled data from several different data types.