For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
# Open the database connection and list the database tables.
connection <- dbConnect(RMySQL::MySQL(), user = 'root', password = '', dbname = 'flights', host = 'localhost')
tables <- dbListTables(connection)
tables
## [1] "airlines" "airports" "flights" "planes" "weather"
For this assignment, I am only interested in the “airlines”, “airports”, and “flights” tables, so I will ignore the “planes” and “weather” tables.
airlines <- dbReadTable(connection, 'airlines')
airports <- dbReadTable(connection, 'airports')
flights <- dbReadTable(connection, 'flights')
# Close the database connection.
dbDisconnect(connection)
## [1] TRUE
head(airlines)
## carrier name
## 1 9E Endeavor Air Inc.\r
## 2 AA American Airlines Inc.\r
## 3 AS Alaska Airlines Inc.\r
## 4 B6 JetBlue Airways\r
## 5 DL Delta Air Lines Inc.\r
## 6 EV ExpressJet Airlines Inc.\r
airlines$name <- str_replace(airlines$name, '\\r', '')
colnames(airlines)
## [1] "carrier" "name"
head(airlines)
## carrier name
## 1 9E Endeavor Air Inc.
## 2 AA American Airlines Inc.
## 3 AS Alaska Airlines Inc.
## 4 B6 JetBlue Airways
## 5 DL Delta Air Lines Inc.
## 6 EV ExpressJet Airlines Inc.
head(airports)
## faa name lat lon alt tz dst
## 1 04G Lansdowne Airport 41.13047 -80.61958 1044 -5 A
## 2 06A Moton Field Municipal Airport 32.46057 -85.68003 264 -5 A
## 3 06C Schaumburg Regional 41.98934 -88.10124 801 -6 A
## 4 06N Randall Airport 41.43191 -74.39156 523 -5 A
## 5 09J Jekyll Island Airport 31.07447 -81.42778 11 -4 A
## 6 0A9 Elizabethton Municipal Airport 36.37122 -82.17342 1593 -4 A
head(flights)
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum
## 1 2013 1 1 517 2 830 11 UA N14228
## 2 2013 1 1 533 4 850 20 UA N24211
## 3 2013 1 1 542 2 923 33 AA N619AA
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB
## 5 2013 1 1 554 -6 812 -25 DL N668DN
## 6 2013 1 1 554 -4 740 12 UA N39463
## flight origin dest air_time distance hour minute
## 1 1545 EWR IAH 227 1400 5 17
## 2 1714 LGA IAH 227 1416 5 33
## 3 1141 JFK MIA 160 1089 5 42
## 4 725 JFK BQN 183 1576 5 44
## 5 461 LGA ATL 116 762 6 54
## 6 1696 EWR ORD 150 719 6 54
# Merge the 3 dataframes (airlines, airports, and flights) together with the help of the dplyr package.
# We create a new object (flights_data) to contain the merged data. This object will then be imported into the MongoDB database.
# Add the flights dataframe to the flights_data object, and merge in the airlines dataframe.
flights_data <- flights %>%
merge(airlines) %>%
# To prevent conflicts with the "airports" table name column,
# we rename the "airlines" name column to "airline".
rename(airline = name) %>%
rowwise() %>%
ungroup()
# Now add the airports dataframe, and seperate the data into origin and destination airports.
flights_data <- airports %>%
rename_(.dots = setNames(names(.), paste0('origin_', names(.)))) %>%
rename(origin = origin_faa) %>%
right_join(flights_data)
flights_data <- airports %>%
rename_(.dots = setNames(names(.), paste0('dest_', names(.)))) %>%
rename(dest = dest_faa) %>%
right_join(flights_data)
mongo_database <- mongo(collection = "flights_data", db = "flights_mongo")
mongo_database$insert(flights_data)
## List of 5
## $ nInserted : num 336776
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
jfk_flights <- mongo_database$find('{"origin": "JFK"}') %>%
group_by(airline, origin_name, dest_name) %>%
summarise(total_flights = n()) %>%
arrange(airline)
kable(jfk_flights, "html", escape = F) %>%
kable_styling("striped", full_width = T) %>%
column_spec(1, bold = T)
| airline | origin_name | dest_name | total_flights |
|---|---|---|---|
| American Airlines Inc. | John F Kennedy Intl | Austin Bergstrom Intl | 5475 |
| American Airlines Inc. | John F Kennedy Intl | Chicago Ohare Intl | 5475 |
| American Airlines Inc. | John F Kennedy Intl | Dallas Fort Worth Intl | 5505 |
| American Airlines Inc. | John F Kennedy Intl | Eagle Co Rgnl | 1545 |
| American Airlines Inc. | John F Kennedy Intl | Fort Lauderdale Hollywood Intl | 2730 |
| American Airlines Inc. | John F Kennedy Intl | General Edward Lawrence Logan Intl | 21825 |
| American Airlines Inc. | John F Kennedy Intl | George Bush Intercontinental | 4110 |
| American Airlines Inc. | John F Kennedy Intl | Los Angeles Intl | 48255 |
| American Airlines Inc. | John F Kennedy Intl | Mc Carran Intl | 9585 |
| American Airlines Inc. | John F Kennedy Intl | Miami Intl | 33315 |
| American Airlines Inc. | John F Kennedy Intl | Orlando Intl | 10950 |
| American Airlines Inc. | John F Kennedy Intl | San Diego Intl | 5475 |
| American Airlines Inc. | John F Kennedy Intl | San Francisco Intl | 21330 |
| American Airlines Inc. | John F Kennedy Intl | Seattle Tacoma Intl | 5475 |
| American Airlines Inc. | John F Kennedy Intl | Tampa Intl | 4665 |
| American Airlines Inc. | John F Kennedy Intl | NA | 21030 |
| Delta Air Lines Inc. | John F Kennedy Intl | Austin Bergstrom Intl | 5355 |
| Delta Air Lines Inc. | John F Kennedy Intl | Denver Intl | 5475 |
| Delta Air Lines Inc. | John F Kennedy Intl | Detroit Metro Wayne Co | 5805 |
| Delta Air Lines Inc. | John F Kennedy Intl | Fort Lauderdale Hollywood Intl | 16245 |
| Delta Air Lines Inc. | John F Kennedy Intl | General Edward Lawrence Logan Intl | 14580 |
| Delta Air Lines Inc. | John F Kennedy Intl | Hartsfield Jackson Atlanta Intl | 28110 |
| Delta Air Lines Inc. | John F Kennedy Intl | Jackson Hole Airport | 30 |
| Delta Air Lines Inc. | John F Kennedy Intl | Lambert St Louis Intl | 15 |
| Delta Air Lines Inc. | John F Kennedy Intl | Los Angeles Intl | 37515 |
| Delta Air Lines Inc. | John F Kennedy Intl | Louis Armstrong New Orleans Intl | 1815 |
| Delta Air Lines Inc. | John F Kennedy Intl | Mc Carran Intl | 25095 |
| Delta Air Lines Inc. | John F Kennedy Intl | Miami Intl | 16395 |
| Delta Air Lines Inc. | John F Kennedy Intl | Minneapolis St Paul Intl | 375 |
| Delta Air Lines Inc. | John F Kennedy Intl | Nashville Intl | 15 |
| Delta Air Lines Inc. | John F Kennedy Intl | Orlando Intl | 21450 |
| Delta Air Lines Inc. | John F Kennedy Intl | Philadelphia Intl | 30 |
| Delta Air Lines Inc. | John F Kennedy Intl | Phoenix Sky Harbor Intl | 7035 |
| Delta Air Lines Inc. | John F Kennedy Intl | Pittsburgh Intl | 690 |
| Delta Air Lines Inc. | John F Kennedy Intl | Portland Intl | 6870 |
| Delta Air Lines Inc. | John F Kennedy Intl | Ronald Reagan Washington Natl | 30 |
| Delta Air Lines Inc. | John F Kennedy Intl | Salt Lake City Intl | 26220 |
| Delta Air Lines Inc. | John F Kennedy Intl | San Antonio Intl | 4545 |
| Delta Air Lines Inc. | John F Kennedy Intl | San Diego Intl | 8625 |
| Delta Air Lines Inc. | John F Kennedy Intl | San Francisco Intl | 27870 |
| Delta Air Lines Inc. | John F Kennedy Intl | Seattle Tacoma Intl | 18195 |
| Delta Air Lines Inc. | John F Kennedy Intl | Southwest Florida Intl | 1455 |
| Delta Air Lines Inc. | John F Kennedy Intl | Tampa Intl | 10710 |
| Delta Air Lines Inc. | John F Kennedy Intl | NA | 19965 |
| Endeavor Air Inc. | John F Kennedy Intl | Austin Bergstrom Intl | 30 |
| Endeavor Air Inc. | John F Kennedy Intl | Baltimore Washington Intl | 12840 |
| Endeavor Air Inc. | John F Kennedy Intl | Buffalo Niagara Intl | 11685 |
| Endeavor Air Inc. | John F Kennedy Intl | Charleston Afb Intl | 5205 |
| Endeavor Air Inc. | John F Kennedy Intl | Charlotte Douglas Intl | 4185 |
| Endeavor Air Inc. | John F Kennedy Intl | Chicago Ohare Intl | 15840 |
| Endeavor Air Inc. | John F Kennedy Intl | Cincinnati Northern Kentucky Intl | 9480 |
| Endeavor Air Inc. | John F Kennedy Intl | Cleveland Hopkins Intl | 5130 |
| Endeavor Air Inc. | John F Kennedy Intl | Dallas Fort Worth Intl | 5475 |
| Endeavor Air Inc. | John F Kennedy Intl | Detroit Metro Wayne Co | 11685 |
| Endeavor Air Inc. | John F Kennedy Intl | General Edward Lawrence Logan Intl | 13710 |
| Endeavor Air Inc. | John F Kennedy Intl | General Mitchell Intl | 2745 |
| Endeavor Air Inc. | John F Kennedy Intl | Greater Rochester Intl | 4095 |
| Endeavor Air Inc. | John F Kennedy Intl | Hartsfield Jackson Atlanta Intl | 825 |
| Endeavor Air Inc. | John F Kennedy Intl | Indianapolis Intl | 4860 |
| Endeavor Air Inc. | John F Kennedy Intl | Jacksonville Intl | 4095 |
| Endeavor Air Inc. | John F Kennedy Intl | Kansas City Intl | 4140 |
| Endeavor Air Inc. | John F Kennedy Intl | Louis Armstrong New Orleans Intl | 6540 |
| Endeavor Air Inc. | John F Kennedy Intl | Louisville International Airport | 690 |
| Endeavor Air Inc. | John F Kennedy Intl | Martha's Vineyard | 1065 |
| Endeavor Air Inc. | John F Kennedy Intl | Memphis Intl | 15 |
| Endeavor Air Inc. | John F Kennedy Intl | Minneapolis St Paul Intl | 16050 |
| Endeavor Air Inc. | John F Kennedy Intl | Nashville Intl | 5460 |
| Endeavor Air Inc. | John F Kennedy Intl | Norfolk Intl | 5715 |
| Endeavor Air Inc. | John F Kennedy Intl | Philadelphia Intl | 14100 |
| Endeavor Air Inc. | John F Kennedy Intl | Pittsburgh Intl | 11220 |
| Endeavor Air Inc. | John F Kennedy Intl | Port Columbus Intl | 180 |
| Endeavor Air Inc. | John F Kennedy Intl | Raleigh Durham Intl | 12645 |
| Endeavor Air Inc. | John F Kennedy Intl | Richmond Intl | 3735 |
| Endeavor Air Inc. | John F Kennedy Intl | Ronald Reagan Washington Natl | 16110 |
| Endeavor Air Inc. | John F Kennedy Intl | San Antonio Intl | 795 |
| Endeavor Air Inc. | John F Kennedy Intl | Syracuse Hancock Intl | 675 |
| Endeavor Air Inc. | John F Kennedy Intl | Tampa Intl | 45 |
| Endeavor Air Inc. | John F Kennedy Intl | Washington Dulles Intl | 8700 |
| Envoy Air | John F Kennedy Intl | Baltimore Washington Intl | 5475 |
| Envoy Air | John F Kennedy Intl | Cincinnati Northern Kentucky Intl | 5475 |
| Envoy Air | John F Kennedy Intl | Cleveland Hopkins Intl | 5475 |
| Envoy Air | John F Kennedy Intl | Indianapolis Intl | 5475 |
| Envoy Air | John F Kennedy Intl | Nashville Intl | 5475 |
| Envoy Air | John F Kennedy Intl | Norfolk Intl | 5475 |
| Envoy Air | John F Kennedy Intl | Pittsburgh Intl | 5475 |
| Envoy Air | John F Kennedy Intl | Port Columbus Intl | 10950 |
| Envoy Air | John F Kennedy Intl | Raleigh Durham Intl | 21900 |
| Envoy Air | John F Kennedy Intl | Ronald Reagan Washington Natl | 32910 |
| Envoy Air | John F Kennedy Intl | Tampa Intl | 3810 |
| ExpressJet Airlines Inc. | John F Kennedy Intl | Birmingham Intl | 15 |
| ExpressJet Airlines Inc. | John F Kennedy Intl | Hartsfield Jackson Atlanta Intl | 15 |
| ExpressJet Airlines Inc. | John F Kennedy Intl | Washington Dulles Intl | 21090 |
| Hawaiian Airlines Inc. | John F Kennedy Intl | Honolulu Intl | 5130 |
| JetBlue Airways | John F Kennedy Intl | Albuquerque International Sunport | 3810 |
| JetBlue Airways | John F Kennedy Intl | Austin Bergstrom Intl | 11205 |
| JetBlue Airways | John F Kennedy Intl | Bob Hope | 5565 |
| JetBlue Airways | John F Kennedy Intl | Buffalo Niagara Intl | 42045 |
| JetBlue Airways | John F Kennedy Intl | Burlington Intl | 20460 |
| JetBlue Airways | John F Kennedy Intl | Charleston Afb Intl | 9195 |
| JetBlue Airways | John F Kennedy Intl | Charlotte Douglas Intl | 10935 |
| JetBlue Airways | John F Kennedy Intl | Chicago Ohare Intl | 13575 |
| JetBlue Airways | John F Kennedy Intl | Denver Intl | 5070 |
| JetBlue Airways | John F Kennedy Intl | Fort Lauderdale Hollywood Intl | 44835 |
| JetBlue Airways | John F Kennedy Intl | General Edward Lawrence Logan Intl | 38355 |
| JetBlue Airways | John F Kennedy Intl | Greater Rochester Intl | 21090 |
| JetBlue Airways | John F Kennedy Intl | Jacksonville Intl | 15390 |
| JetBlue Airways | John F Kennedy Intl | Long Beach | 10020 |
| JetBlue Airways | John F Kennedy Intl | Los Angeles Intl | 25320 |
| JetBlue Airways | John F Kennedy Intl | Louis Armstrong New Orleans Intl | 16140 |
| JetBlue Airways | John F Kennedy Intl | Martha's Vineyard | 2250 |
| JetBlue Airways | John F Kennedy Intl | Mc Carran Intl | 19650 |
| JetBlue Airways | John F Kennedy Intl | Metropolitan Oakland Intl | 4680 |
| JetBlue Airways | John F Kennedy Intl | Nantucket Mem | 3975 |
| JetBlue Airways | John F Kennedy Intl | Norman Y Mineta San Jose Intl | 4920 |
| JetBlue Airways | John F Kennedy Intl | Orlando Intl | 49560 |
| JetBlue Airways | John F Kennedy Intl | Palm Beach Intl | 26085 |
| JetBlue Airways | John F Kennedy Intl | Phoenix Sky Harbor Intl | 5475 |
| JetBlue Airways | John F Kennedy Intl | Pittsburgh Intl | 1350 |
| JetBlue Airways | John F Kennedy Intl | Portland Intl | 4875 |
| JetBlue Airways | John F Kennedy Intl | Portland Intl Jetport | 19560 |
| JetBlue Airways | John F Kennedy Intl | Raleigh Durham Intl | 11955 |
| JetBlue Airways | John F Kennedy Intl | Sacramento Intl | 4260 |
| JetBlue Airways | John F Kennedy Intl | Salt Lake City Intl | 5475 |
| JetBlue Airways | John F Kennedy Intl | San Diego Intl | 9945 |
| JetBlue Airways | John F Kennedy Intl | San Francisco Intl | 15525 |
| JetBlue Airways | John F Kennedy Intl | Sarasota Bradenton Intl | 7110 |
| JetBlue Airways | John F Kennedy Intl | Seattle Tacoma Intl | 7710 |
| JetBlue Airways | John F Kennedy Intl | Southwest Florida Intl | 18630 |
| JetBlue Airways | John F Kennedy Intl | Syracuse Hancock Intl | 18990 |
| JetBlue Airways | John F Kennedy Intl | Tampa Intl | 25575 |
| JetBlue Airways | John F Kennedy Intl | Washington Dulles Intl | 10125 |
| JetBlue Airways | John F Kennedy Intl | William P Hobby | 10710 |
| JetBlue Airways | John F Kennedy Intl | NA | 49740 |
| United Air Lines Inc. | John F Kennedy Intl | Los Angeles Intl | 30885 |
| United Air Lines Inc. | John F Kennedy Intl | San Francisco Intl | 37125 |
| US Airways Inc. | John F Kennedy Intl | Charlotte Douglas Intl | 27930 |
| US Airways Inc. | John F Kennedy Intl | Philadelphia Intl | 510 |
| US Airways Inc. | John F Kennedy Intl | Phoenix Sky Harbor Intl | 16485 |
| Virgin America | John F Kennedy Intl | Los Angeles Intl | 26955 |
| Virgin America | John F Kennedy Intl | Mc Carran Intl | 5475 |
| Virgin America | John F Kennedy Intl | Norman Y Mineta San Jose Intl | 15 |
| Virgin America | John F Kennedy Intl | Palm Springs Intl | 285 |
| Virgin America | John F Kennedy Intl | San Francisco Intl | 21210 |
The main advantage of storing this data in a NoSQL database over a relational database is the increased speed in data retrieval. Data is generally stored within a NoSQL database in denormalized form. This has an advantage over relational databases as table joins do not need to be performed everytime data is retrieved, ultimately resulting in faster data retrieval. Additionally, if the flights database continues to grow in size, and if it is opened for public access, a NoSQL database is much more scalable than a relational database and can handle the increase in size and transactions.
However, if our flights database were to remain at it’s current size, and remained unexposed to the public, a relational database may be a better choice. This would be especially true if we needed to write a lot of database queries in order to generate complex reports that pulled data from several different data types.