The problem statement is as follows:
“For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing. For the relational database, you might use the flights database, the tb database, the”data skills" database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing. Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
You may work on a small team on this project. Due end of day next Sunday".
## Loading required package: DBI
We chose the movies database from an earlier assignment. Right below, we are connecting to the MySQL database.
Displaying all tables in the MySQL database.
## [1] "movies" "responders" "responses"
Joining two tables in MySQL database and getting the data in the R variable df2.
s <- paste0("SELECT m.movie_name, r.resp_name, mr.response, r.city, r.state, r.ethnicity
FROM movies m JOIN responses mr
ON (m.movie_Id = mr.movie_Id)
JOIN responders r
ON (mr.resp_Id = r.resp_Id)
ORDER BY mr.resp_Id
")
rs2 <- dbSendQuery(mydb, s)
df2 <- fetch(rs2, n=-1)
kable(head(df2))| movie_name | resp_name | response | city | state | ethnicity |
|---|---|---|---|---|---|
| Shakespeare in Love | John | 3 | New York | New York | White |
| Solaris | John | NA | New York | New York | White |
| Enter the Dragon | John | 5 | New York | New York | White |
| Star Wars | John | 3 | New York | New York | White |
| Rumble in the Bronx | John | NA | New York | New York | White |
| Autumn Sonata | John | 5 | New York | New York | White |
Connecting to Mongo, and creating a databse, called “movie_responses_db”, and a collection, called ‘movie_responses’, in it.
Exporting data from R variable into Mongo database.
## List of 5
## $ nInserted : num 30
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
Checking Mongo commands at my disposal.
## <Mongo collection> 'movie_responses'
## $aggregate(pipeline = "{}", options = "{\"allowDiskUse\":true}", handler = NULL, pagesize = 1000, iterate = FALSE)
## $count(query = "{}")
## $disconnect(gc = TRUE)
## $distinct(key, query = "{}")
## $drop()
## $export(con = stdout(), bson = FALSE, query = "{}", fields = "{}", sort = "{\"_id\":1}")
## $find(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0, handler = NULL, pagesize = 1000)
## $import(con, bson = FALSE)
## $index(add = NULL, remove = NULL)
## $info()
## $insert(data, pagesize = 1000, stop_on_error = TRUE, ...)
## $iterate(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0)
## $mapreduce(map, reduce, query = "{}", sort = "{}", limit = 0, out = NULL, scope = NULL)
## $remove(query, just_one = FALSE)
## $rename(name, db = NULL)
## $replace(query, update = "{}", upsert = FALSE)
## $run(command = "{\"ping\": 1}", simplify = TRUE)
## $update(query, update = "{\"$set\":{}}", filters = NULL, upsert = FALSE, multiple = FALSE)
Selecting from Mongo database.
| movie_name | resp_name | response | city | state | ethnicity |
|---|---|---|---|---|---|
| Shakespeare in Love | John | 3 | New York | New York | White |
| Solaris | John | NA | New York | New York | White |
| Enter the Dragon | John | 5 | New York | New York | White |
| Star Wars | John | 3 | New York | New York | White |
| Rumble in the Bronx | John | NA | New York | New York | White |
| Autumn Sonata | John | 5 | New York | New York | White |
Here’s a screen shot of what was actually inserted into Mongo database
Comparison between RDBMS and NoSQL database:
RDBMS is structured, and therefore roots out data, not conforming to the database structure, during import or Insert phase.
NoSQL (in our case, Mongo) doesn’t have any structure, so it doesn’t check whether any new field is in the database. This may have unintended consequences.RDBMS tables store data in the form of rows, separated in several columns.
NoSQL databases store data in Collections (equivalent of tables), in JSON data format, which is a Key:Value data store. A NoSQL row is called Document.RDBMS supports indices.
Mongo supports indices.RDBMS supports Primary key, which can be applied on any column that has unique data.
Mongo supports indices, but provides default key _idRDBMS supports Foreign Key contraints, for referential integrity.
Mongo does not support referential integrity.RDBMS provides Group By clause, for data aggregation.
Mongo provides Aggregate clause, for data aggregation.RDBMS support Joins, Triggers.
Mongo does not.RDBMS contains predefined schemas.
Mongo contains dynamic schemas.RDBMS is not suitable for hierarchical data, though in Oracle queries, it can be achieved by “Connect by prior”.
Mongo is suitable for hierarchical data.RDBMS emphasizes on ACID properties (Atomicity, Consistency, Isolation and Durability).
Mongo emphasizes on CAP theorem (Consistency, Availability, Partition tolerance)