This is the hardest part of the whole project. Making sure that you get mongo installed and set up right is pretty straight forward, but must be done in the right order and tested. I used Homebrew
in Mac to install it via the command terminal. this can be done as follows:
`/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"`
You will need to update homebrew prior to installing MongoDB
`brew update`
`brew install mongodb`
You have options of where you choose to place this directory. I chose to install it to the default location data/db
so that I (and consequently R) would not have to search high and low to find it.
`mkdir -p /data/db` or change the location of the directory to your desired location
In order for you to interact with the MongoDB instance, it must be running. This takes two steps, first you have to run the MongoDB Daemon in a command terminal window type mongod
and then open a second command terminal window and type mongo
which will activate the database to receive and send data.
Remember when you are through using MongoDB to first exit the mongo instance using quit()
in the mongo
terminal, then close the daemon in the mongod
terminal by hitting control + c
at the same time, making sure you are in the terminal window at the time you do so.
Because I use SQLite to avoid annoying MY_SQL conflicts with my aging Mac system, you need to install RSQLite
to access the flights data in the database. To save this data into MongoDB, you will also need to install mongolite.
install.packages("RSQLite")
install.packages("mongolite")
Load both packages, and set the directory to point R toward the SQLite database.
require(RSQLite)
require(mongolite)
Make a connection to the database using DBI package’s dbConnect
making sure that you call the database by its correct name. I used “flights.sqlite” and checked the tables to be sure that they were all there.
setwd("/Volumes/Document_Drive/test")
db.flights = dbConnect(SQLite(), dbname="flights.sqlite")
dbListTables(db.flights)
[1] "airlines" "airports" "flights" "planes" "weather"
With all the right tables showing up, I made a query to join two tables, airlines and flights, to get the total number of delays for each airline and the total delay time. I set this query equal to the variable toMongo
so that I could pass this into the new NoSQL database.
I could just transfer SQL tables to MongoDB, but it defeats the purpose of a non-relational table, so I created what could be a nice bite of data for Mongo to chew on to show how to integrate R and MongoDB.
toMongo <- dbGetQuery(db.flights, "SELECT airlines.name as Airline, COUNT(*) as Delays, SUM(flights.arr_delay) as
Total_Delay FROM flights JOIN airlines ON flights.carrier=airlines.carrier WHERE flights.arr_delay>0 GROUP BY Airline ORDER BY Total_Delay DESC;")
dbDisconnect(db.flights)
Creating MongoDB database in R is very simple, you just expressly call the connection using the variable it was saved into and assign it to a new database using the method mongo
saving it to a new variable. Here the collection is toMongo
from the last chunk, and we are saving it in a MongoDB data base named, flights
.
Then you just insert this into your new database.
mongodata = mongo(collection = "test", db = "flights")
mongodata$insert(toMongo)
List of 5
$ nInserted : num 16
$ nMatched : num 0
$ nRemoved : num 0
$ nUpserted : num 0
$ writeErrors: list()
You can see the data in the new database by calling it up using $find
attribute of the connection.
mongodata$find()
Airline | Delays | Total_Delay |
---|---|---|
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
ExpressJet Airlines Inc. | 24484 | 1181808 |
JetBlue Airways | 23609 | 944574 |
United Air Lines Inc. | 22222 | 814458 |
Delta Air Lines Inc. | 16413 | 619485 |
Envoy Air | 11693 | 442604 |
American Airlines Inc. | 10706 | 409671 |
Endeavor Air Inc. | 6637 | 327023 |
Southwest Airlines Co. | 5304 | 216125 |
US Airways Inc. | 7349 | 213206 |
AirTran Airways Corporation | 1895 | 77874 |
Virgin America | 1746 | 76557 |
Frontier Airlines Inc. | 392 | 18651 |
Mesa Airlines Inc. | 258 | 13179 |
Alaska Airlines Inc. | 189 | 6495 |
Hawaiian Airlines Inc. | 97 | 3398 |
SkyWest Airlines Inc. | 10 | 606 |
This looks tabular because I passed it into a kable table, but within Mongo, it is saved in a binary structure called BSON, which allows you to search by any field name in a document and get a corresponding value much as you would with JSON. Here, `mongolite is doing this for us.
The advantages of using a relational store is that you can reduce redundancy in data by creating multiple tables to house attributes of tables which house more observations. An example would be a table of which had the categories of customers a store might have, with say 20 types, each of which might have 30 fields describing that type of customer. This could be called by a single customer-type key in the customer database which has 3-million records. This is far more efficient than storing all of the 30 fields with each customer to whom it applies, saving 30 million duplicate cells, and reducing query time, maintenance costs and physical computing power.
The disadvantages of Relational databases are their lack of ease in storing and accessing complex or non-tabular data efficiently. Searching such data is even less efficient. Also, at a certain point, distributing a relational system becomes cumbersome, requiring a lot of work to preserve the integrity of the individual tables as a database grows.
Advantages of using NoSQL is that you can store any shape, size or configuration of data within an individual record. This means JSON strings, the pages of “War and Peace”, the simplified binary description of RBG images in arrays, whole word documents, key-value pairs…almost anything which can be made into characters can be saved in NoSQL. SO it is flexible enough that you could keep every inch of your documents in one space.
NoSQL is also typically much more robust in a distributed environment playing nicely across nodes. They are highly scalable. And many types of computations and analysis can be done within a NoSQL that simply could not in a straight up SQL database.
Disadvantages of NoSQL, queries can be very complex, particularly for data that would be easily queryable in a traditional relation. They require more thought to administrate and plan for because the scope of the environment is not as clearly defined as with SQL. So you need to consider YOUR use case more clearly before you commit to much to a system.