I am using my sql database created for project #3 and bulk inserting records to a new mongo db. I wanted to use neo4j for this, but I ran into problems with neo4j R package.
con = dbConnect(MySQL(), user='data607p3', password='data607p3', dbname='job_skills', host='localhost')
jobs_db <- dbReadTable(con, 'jobs')
skills_db<- dbReadTable(con, 'skills')
job_skills_db<- dbReadTable(con, 'job_skills')
# Join tables to get skills for jobs.
job_skills_db <- jobs_db %>%
inner_join(job_skills_db )%>%
inner_join(skills_db )
## Joining, by = "job_id"
## Joining, by = "skill_id"
nrow(job_skills_db)
## [1] 253746
data607 <- mongo("DATA607")
data607$insert(job_skills_db)
## List of 5
## $ nInserted : num 253746
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
# number of documents
data607$count()
## [1] 253746
# number of rows from my sql db
nrow(job_skills_db)
## [1] 253746
rskills <- data607$find('{"skill_name" : "r"}')
head(rskills)
## job_id job_name skill_id skill_name
## 1 9 scheduling coordinator 42 r
## 2 11 school bus driver 42 r
## 3 18 data scientist 42 r
## 4 34 configuration manager 42 r
## 5 65 clinical counselor 42 r
## 6 85 certified police officer 42 r
test <- data607$find(
query = '{ "job_name" : "data scientist" }',
fields = '{"job_name" : true, "skill_name" : true}',
limit = 50
)
Below are some of the pros and cons of sql and no sql databases.
Predefined schemas and all of your data must follow the same structure.
changing structure would be both difficult and disruptive.
vertically scalable.
Good for highly transactional systems.
Dynamic schema: flexibility to change your data schema without modifying any of your existing data.
Scalability: MongoDB is horizontally scalable.
Flexibility: You can add new columns or fields on MongoDB without affecting existing rows or application performance.
Each document can have its own unique structure.
No expensive joins