In this assignment, a migration from a RDBMS to a NoSQL system be performed.

RDMBS

Advantages
- Structured data
- Compability with may other tools

Disadvantages
- Cannot scale horizontally easily

NoSQL

Advantages
- Can easily scale to accomodate additional attributes (horizontal scale)

Disadvantages
- Not very useful for interconnected data (hard to do joins)
- Technology is still growing

library(mongolite)
library(dplyr)
library(RMySQL)

Setup Connections

Setup Mongo Atlas & MySQL Connections

mongoatlas <- mongo(collection = "dsskills", db="data607", url = mongoatlasURL)
                    
drv2 <- dbDriver("MySQL")
con2 <- dbConnect(drv2, username="dsc", password=mysqlpw, dbname ="dsskills", host="localhost")                    

Query MySQL

Query database created for Project 3 (Data Science Skills) and store in a dataframe

 datascienceskills <- dbGetQuery(con2, "SELECT JobBoard.Board_Name, Skill_Name.Word, Board_Summary.Occurences FROM Board_Summary
                       JOIN Skill_Name
                       ON Board_Summary.SkillID = Skill_Name.WordID
                       JOIN JobBoard
                       ON JobBoard.BoardID = Board_Summary.BoardID
                       ORDER BY Board_Summary.Occurences DESC;")
datascienceskills
##    Board_Name          Word Occurences
## 1      Indeed           sql        842
## 2      Indeed        python        841
## 3    LinkedIn        python        626
## 4      Indeed      modeling        613
## 5      Indeed      research        595
## 6      Indeed       tableau        567
## 7    LinkedIn    statistics        530
## 8    LinkedIn             r        514
## 9    LinkedIn      modeling        504
## 10     Indeed   programming        429
## 11     Indeed    statistics        420
## 12   LinkedIn           sql        398
## 13     Indeed   mathematics        394
## 14     Indeed             r        378
## 15   LinkedIn      research        370
## 16   LinkedIn communication        345
## 17     Indeed communication        327
## 18   LinkedIn   programming        273
## 19     Indeed visualization        260
## 20     Indeed       metrics        229
## 21   LinkedIn   mathematics        218
## 22   LinkedIn        hadoop        178
## 23   LinkedIn visualization        161
## 24     Indeed           etl        154
## 25   LinkedIn       tableau        143
## 26     Indeed        pandas        141
## 27     Indeed collaboration        131
## 28   LinkedIn           sas        120
## 29     Indeed           git        113
## 30   LinkedIn    tensorflow        109
## 31   LinkedIn           nlp         95
## 32     Indeed        hadoop         93
## 33     Indeed    tensorflow         87
## 34   LinkedIn       metrics         85
## 35   LinkedIn          hive         82
## 36   LinkedIn         azure         79
## 37     Indeed           nlp         76
## 38     Indeed         nosql         74
## 39     Indeed         numpy         71
## 40     Indeed           sas         65
## 41     Indeed         scipy         62
## 42   LinkedIn collaboration         59
## 43     Indeed    matplotlib         58
## 44   LinkedIn      qlikview         52
## 45     Indeed     designing         47
## 46     Indeed          hive         46
## 47     Indeed            bi         43
## 48   LinkedIn    matplotlib         38
## 49   LinkedIn         nosql         38
## 50     Indeed         azure         35
## 51     Indeed     mapreduce         26
## 52   LinkedIn            bi         24
## 53   LinkedIn        pandas         22
## 54   LinkedIn         numpy         21
## 55   LinkedIn         scipy         21
## 56   LinkedIn   probability         21
## 57     Indeed          perl         19
## 58   LinkedIn           git         16
## 59     Indeed    javascript         15
## 60   LinkedIn      spotfire         14
## 61     Indeed   probability         13
## 62     Indeed       mongodb         12
## 63     Indeed           api         12
## 64     Indeed     debugging          7
## 65     Indeed          ssis          5
## 66     Indeed          html          4
## 67     Indeed           css          3
## 68     Indeed      cloudera          3
## 69     Indeed      spotfire          3
## 70     Indeed      qlikview          3
## 71   LinkedIn     designing          3
## 72   LinkedIn           etl          2
## 73   LinkedIn     mapreduce          1
## 74     Indeed     judgement          1
## 75     Indeed            vb          1
## 76     Indeed          ssrs          1
## 77   LinkedIn    javascript          1
## 78   LinkedIn      cloudera          1

Insert to MongoDB

mongoatlas$insert(datascienceskills)
## List of 5
##  $ nInserted  : num 78
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Query MongoDB

mongoatlas$find("{}")
##    Board_Name          Word Occurences
## 1      Indeed           sql        842
## 2      Indeed        python        841
## 3    LinkedIn        python        626
## 4      Indeed      modeling        613
## 5      Indeed      research        595
## 6      Indeed       tableau        567
## 7    LinkedIn    statistics        530
## 8    LinkedIn             r        514
## 9    LinkedIn      modeling        504
## 10     Indeed   programming        429
## 11     Indeed    statistics        420
## 12   LinkedIn           sql        398
## 13     Indeed   mathematics        394
## 14     Indeed             r        378
## 15   LinkedIn      research        370
## 16   LinkedIn communication        345
## 17     Indeed communication        327
## 18   LinkedIn   programming        273
## 19     Indeed visualization        260
## 20     Indeed       metrics        229
## 21   LinkedIn   mathematics        218
## 22   LinkedIn        hadoop        178
## 23   LinkedIn visualization        161
## 24     Indeed           etl        154
## 25   LinkedIn       tableau        143
## 26     Indeed        pandas        141
## 27     Indeed collaboration        131
## 28   LinkedIn           sas        120
## 29     Indeed           git        113
## 30   LinkedIn    tensorflow        109
## 31   LinkedIn           nlp         95
## 32     Indeed        hadoop         93
## 33     Indeed    tensorflow         87
## 34   LinkedIn       metrics         85
## 35   LinkedIn          hive         82
## 36   LinkedIn         azure         79
## 37     Indeed           nlp         76
## 38     Indeed         nosql         74
## 39     Indeed         numpy         71
## 40     Indeed           sas         65
## 41     Indeed         scipy         62
## 42   LinkedIn collaboration         59
## 43     Indeed    matplotlib         58
## 44   LinkedIn      qlikview         52
## 45     Indeed     designing         47
## 46     Indeed          hive         46
## 47     Indeed            bi         43
## 48   LinkedIn    matplotlib         38
## 49   LinkedIn         nosql         38
## 50     Indeed         azure         35
## 51     Indeed     mapreduce         26
## 52   LinkedIn            bi         24
## 53   LinkedIn        pandas         22
## 54   LinkedIn         numpy         21
## 55   LinkedIn         scipy         21
## 56   LinkedIn   probability         21
## 57     Indeed          perl         19
## 58   LinkedIn           git         16
## 59     Indeed    javascript         15
## 60   LinkedIn      spotfire         14
## 61     Indeed   probability         13
## 62     Indeed       mongodb         12
## 63     Indeed           api         12
## 64     Indeed     debugging          7
## 65     Indeed          ssis          5
## 66     Indeed          html          4
## 67     Indeed           css          3
## 68     Indeed      cloudera          3
## 69     Indeed      spotfire          3
## 70     Indeed      qlikview          3
## 71   LinkedIn     designing          3
## 72   LinkedIn           etl          2
## 73   LinkedIn     mapreduce          1
## 74     Indeed     judgement          1
## 75     Indeed            vb          1
## 76     Indeed          ssrs          1
## 77   LinkedIn    javascript          1
## 78   LinkedIn      cloudera          1