In this assignment, I will take flights data from a csv file and migrate it to MongoDB database, and describe the advantages and disadvantages of storing the data in a relational database vs. NoSQL database.

library(RMongo)
## Loading required package: rJava
library(rjson)
# connect db
mongo <- mongoDbConnect('test', '127.0.0.1', 27017)

# read flights
flights <- read.csv("flights.csv", stringsAsFactors = FALSE)
L=lapply(split(flights,rownames(flights)),as.list)
names(L)=NULL

# remove all rows
# output <- dbRemoveQuery(mongo, 'flights', '{}')

# insert flights
for (i in 1:NROW(L)) {
dataJSON = toJSON(L[[i]])
output <- dbInsertDocument(mongo, "flights", dataJSON)
}

#output <- dbGetQuery(mongo, 'flights', '{"depart": "Seattle"}')
# view all records in table flights
output <- dbGetQuery(mongo, 'flights', '{}')
print(output)
##           arrive                     X_id flighttime        depart
## 1         Boston 5912649a08fd09ede72e2e3d        147       Atlanta
## 2        Atlanta 5912649a08fd09ede72e2e3e         99    Pittsburgh
## 3         Boston 5912649a08fd09ede72e2e3f        100    Pittsburgh
## 4        Detroit 5912649a08fd09ede72e2e40         60    Pittsburgh
## 5  San Francisco 5912649a08fd09ede72e2e41        399        Boston
## 6         Boston 5912649a08fd09ede72e2e42        319 San Francisco
## 7       Honolulu 5912649a08fd09ede72e2e43        362   Los Angeles
## 8    Los Angeles 5912649a08fd09ede72e2e44        336      Honolulu
## 9       New York 5912649a08fd09ede72e2e45         74        Boston
## 10        Boston 5912649a08fd09ede72e2e46         74      New York
## 11       Seattle 5912649a08fd09ede72e2e47        495      New York
## 12       Detroit 5912649a08fd09ede72e2e48        119       Atlanta
## 13      New York 5912649a08fd09ede72e2e49        495       Seattle
## 14      Honolulu 5912649a08fd09ede72e2e4a        355       Seattle
## 15       Seattle 5912649a08fd09ede72e2e4b        355      Honolulu
## 16        Boston 5912649a08fd09ede72e2e4c        313       Seattle
## 17       Seattle 5912649a08fd09ede72e2e4d        390        Boston
## 18    Pittsburgh 5912649a08fd09ede72e2e4e         99       Atlanta
## 19       Atlanta 5912649a08fd09ede72e2e4f        147        Boston
## 20       Detroit 5912649a08fd09ede72e2e50        108        Boston
## 21    Pittsburgh 5912649a08fd09ede72e2e51        100        Boston
## 22       Atlanta 5912649a08fd09ede72e2e52        119       Detroit
## 23        Boston 5912649a08fd09ede72e2e53        108       Detroit
## 24    Pittsburgh 5912649a08fd09ede72e2e54         60       Detroit
# disconnect mongoDB
dbDisconnect(mongo)

# compare 2 databases
flights <- flights[with(flights, order(arrive, depart, flighttime)), ]
output <- output[, -2]
output <- output[with(output, order(arrive, depart, flighttime)), ]
output <- output[, c(1, 3, 2)]
flights <- flights[, c(2, 1, 3)]
all.equal(flights, output,check.attributes = FALSE)
## [1] TRUE

Advantages of NoSQL vs. relational database:

1, flexibility, easy to manage than relational database;

2, mostly open source and low-cost;

3, scalability in supporting Mapreduce;

4, less complexity, no relational database set up; and

5, grow rapidly.

Disadvantages of NoSQL vs. relational database, such as MySQL:

1, Not as maturing as MySQL due to short history;

2, lack of reporting tools;

3, lack of standardization in querying language and compatibility; and

4, the community is not well defined.

Conclusion:

NoSQL data is the product of rapidly developed social media, big data, and web technologies, there is a need for more flexible structure and easier operation. It might not replace relational database, but grow as an important supplement to meet the demand in data storage.

References:

http://stackoverflow.com/questions/19564321/how-to-send-multiple-documents-using-rmongo

http://www.monitis.com/blog/cc-in-review-the-key-differences-between-sql-and-nosql-dbs/