For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
Libraries:
library(RMySQL)
library(dbConnect)
library(DBI)
library(mongolite)
mydb = dbConnect(MySQL(),
user='root', password='haley123', dbname='grads', host='localhost')
grads<- dbGetQuery(mydb, "select * from graduation")
head (grads)
## Demographic DBN School Name Cohort
## 1 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2003
## 2 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2004
## 3 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2005
## 4 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006
## 5 Total Cohort 01M292 HENRY STREET SCHOOL FOR INTERNATIONAL 2006 Aug
## 6 Total Cohort 01M448 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 2001
## Total Cohort Total Grads - n Total Grads - % of cohort Total Regents - n
## 1 5 s s s
## 2 55 37 67.3 17
## 3 64 43 67.2 27
## 4 78 43 55.1 36
## 5 78 44 56.4 37
## 6 64 46 71.900000000000006 32
## Total Regents - % of cohort Total Regents - % of grads
## 1 s s
## 2 30.9 45.9
## 3 42.2 62.8
## 4 46.2 83.7
## 5 47.4 84.1
## 6 50 69.599999999999994
## Advanced Regents - n Advanced Regents - % of cohort
## 1 s s
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 7 10.9
## Advanced Regents - % of grads Regents w/o Advanced - n
## 1 s s
## 2 0 17
## 3 0 27
## 4 0 36
## 5 0 37
## 6 15.2 25
## Regents w/o Advanced - % of cohort Regents w/o Advanced - % of grads
## 1 s s
## 2 30.9 45.9
## 3 42.2 62.8
## 4 46.2 83.7
## 5 47.4 84.1
## 6 39.1 54.3
## Local - n Local - % of cohort Local - % of grads Still Enrolled - n
## 1 s s s s
## 2 20 36.4 54.1 15
## 3 16 25 37.200000000000003 9
## 4 7 9 16.3 16
## 5 7 9 15.9 15
## 6 14 21.9 30.4 10
## Still Enrolled - % of cohort Dropped Out - n Dropped Out - % of cohort
## 1 s s s
## 2 27.3 3 5.5
## 3 14.1 9 14.1
## 4 20.5 11 14.1
## 5 19.2 11 14.1
## 6 15.6 6 9.4
grads2<- dbGetQuery(mydb, "select distinct `School Name`, `Total Grads - n`,`Dropped Out - n`, `Still Enrolled - n`
from graduation limit 10")
grads2
## School Name Total Grads - n Dropped Out - n
## 1 HENRY STREET SCHOOL FOR INTERNATIONAL s s
## 2 HENRY STREET SCHOOL FOR INTERNATIONAL 37 3
## 3 HENRY STREET SCHOOL FOR INTERNATIONAL 43 9
## 4 HENRY STREET SCHOOL FOR INTERNATIONAL 43 11
## 5 HENRY STREET SCHOOL FOR INTERNATIONAL 44 11
## 6 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 46 6
## 7 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 33 1
## 8 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 67 11
## 9 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 75 4
## 10 UNIVERSITY NEIGHBORHOOD HIGH SCHOOL 64 11
## Still Enrolled - n
## 1 s
## 2 15
## 3 9
## 4 16
## 5 15
## 6 10
## 7 16
## 8 9
## 9 33
## 10 41
mongodb <- mongo(collection = "test", db = "school")
mongodb$insert(grads)
## List of 5
## $ nInserted : num 25096
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mongodb2 <- mongo(collection = "test", db= "school2")
mongodb2$insert(grads2)
## List of 5
## $ nInserted : num 10
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
I noticed while using NoSql the data doesn’t need to be “Structured”- this means, every document in the collection can be different each other.
For example, If you’re using MySQL or PostgresSQL you would conform certain variables. Such as first name, last, email, phone number. You wouldn’t be able to add anything other information into the tables unless the table allows it.
While working in an unstructured database like MongoDB, the collection allows you to have more lead way. You can be able to have additional documents within the same collection.
A disadvantage for using NoSQL is that is still pretty new. Relational databases such as MySQL have been around longer, so NoSQL db can be stuck without inheriting needed support.