For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
We begin by loading the flights database into R from MySQL by creating a connection and downloading the tables.
#knit with parameters to be prompted for a MySQL password
#load libraries
library(DBI)
library(RMySQL)
library(mongolite)
library(knitr)
#create a connection to the flights database in MySQL
flights_db = dbConnect(MySQL(), user='root', password= params$dbPassword, dbname='flights', host='localhost')
dbListTables(flights_db)
## [1] "airlines" "airports" "flights" "planes" "weather"
#load the flights tables
flights<-dbGetQuery(flights_db,"SELECT * FROM flights;")
airlines<-dbGetQuery(flights_db,"SELECT * FROM airlines;")
airports<-dbGetQuery(flights_db,"SELECT * FROM airports;")
planes<-dbGetQuery(flights_db,"SELECT * FROM planes;")
weather<-dbGetQuery(flights_db,"SELECT * FROM weather;")
#end connection to MySQL
dbDisconnect(flights_db)
## [1] TRUE
Next we load the tables into data frames in R.
#put flights tables into data frames
myflights <- data.frame(flights)
myairlines <- data.frame(airlines)
myairports <- data.frame(airports)
myplanes <- data.frame(planes)
myweather <- data.frame(weather)
#print the first few rows of the weather data frame
kable(head(myweather))
| origin | year | month | day | hour | temp | dewp | humid | wind_dir | wind_speed | wind_gust | precip | pressure | visib |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EWR | 2013 | 1 | 1 | 0 | 37.04 | 21.92 | 53.97 | 230 | 10.35702 | 11.91865 | 0 | 1013.9 | 10 |
| EWR | 2013 | 1 | 1 | 1 | 37.04 | 21.92 | 53.97 | 230 | 13.80936 | 15.89154 | 0 | 1013.0 | 10 |
| EWR | 2013 | 1 | 1 | 2 | 37.94 | 21.92 | 52.09 | 230 | 12.65858 | 14.56724 | 0 | 1012.6 | 10 |
| EWR | 2013 | 1 | 1 | 3 | 37.94 | 23.00 | 54.51 | 230 | 13.80936 | 15.89154 | 0 | 1012.7 | 10 |
| EWR | 2013 | 1 | 1 | 4 | 37.94 | 24.08 | 57.04 | 240 | 14.96014 | 17.21583 | 0 | 1012.8 | 10 |
| EWR | 2013 | 1 | 1 | 6 | 39.02 | 26.06 | 59.37 | 270 | 10.35702 | 11.91865 | 0 | 1012.0 | 10 |
Now we can load these data frames into a MongoDB database using the mongolite package in R.
#create a connection to MongoDB for the myflights data frame
mongodb <- mongo(collection = "myflights", db = "flights_db", url = "mongodb://localhost",
verbose = FALSE, options = ssl_options())
#insert the myflights data frame
mongodb$insert(myflights)
## List of 5
## $ nInserted : num 336776
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
#create a connection to MongoDB for the myairlines data frame
mongodb <- mongo(collection = "myairlines", db = "flights_db", url = "mongodb://localhost",
verbose = FALSE, options = ssl_options())
#insert the myairlines data frame
mongodb$insert(myairlines)
## List of 5
## $ nInserted : num 16
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
#create a connection to MongoDB for the myairports data frame
mongodb <- mongo(collection = "myairports", db = "flights_db", url = "mongodb://localhost",
verbose = FALSE, options = ssl_options())
#insert the myairports data frame
mongodb$insert(myairports)
## List of 5
## $ nInserted : num 1397
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
#create a connection to MongoDB for the myplanes data frame
mongodb <- mongo(collection = "myplanes", db = "flights_db", url = "mongodb://localhost",
verbose = FALSE, options = ssl_options())
#insert the myplanes data frame
mongodb$insert(myplanes)
## List of 5
## $ nInserted : num 3322
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
#create a connection to MongoDB for the myweather data frame
mongodb <- mongo(collection = "myweather", db = "flights_db", url = "mongodb://localhost",
verbose = FALSE, options = ssl_options())
#insert the myweather data frame
mongodb$insert(myweather)
## List of 5
## $ nInserted : num 8719
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
And finally we query the mongodb data base to show that the tables where loaded the MondoDB.
#query the flights mongodb to show that the flights table was inserted
mongoflights <- mongodb$find(limit = 5)
mongoflights
## origin year month day hour temp dewp humid wind_dir wind_speed
## 1 EWR 2013 1 1 0 37.04 21.92 53.97 230 10.35702
## 2 EWR 2013 1 1 1 37.04 21.92 53.97 230 13.80936
## 3 EWR 2013 1 1 2 37.94 21.92 52.09 230 12.65858
## 4 EWR 2013 1 1 3 37.94 23.00 54.51 230 13.80936
## 5 EWR 2013 1 1 4 37.94 24.08 57.04 240 14.96014
## wind_gust precip pressure visib
## 1 11.91865 0 1013.9 10
## 2 15.89154 0 1013.0 10
## 3 14.56724 0 1012.6 10
## 4 15.89154 0 1012.7 10
## 5 17.21583 0 1012.8 10
A relational database like MySQL uses tables which can be linked and queried using a structured query language. SQL databases are vertically scalable which allows for easliy increasing storage capacity. Some benefits of relational databases are maturity, compatibility, replicability, and cost effectivenes.
NoSQL databases are not table based, but rather store the data in column-oriented, document-oriented, graph-based or organized as a KeyValue store and therefore are more flexible in their format. The language used to query is a dynamic schema for unstructured data. A NoSQL database like MongoDB for example stores data in json form. NoSQL databases are horizontally scalable which allows for more traffic. Some benefits are scalability, manageablity, speed, and flexibility.