In this assignment, a migration from a RDBMS to a NoSQL system be performed.
Advantages
- Structured data
- Compability with may other tools
Disadvantages
- Cannot scale horizontally easily
Advantages
- Can easily scale to accomodate additional attributes (horizontal scale)
Disadvantages
- Not very useful for interconnected data (hard to do joins)
- Technology is still growing
library(mongolite)
library(dplyr)
library(RMySQL)
Setup Mongo Atlas & MySQL Connections
mongoatlas <- mongo(collection = "dsskills", db="data607", url = mongoatlasURL)
drv2 <- dbDriver("MySQL")
con2 <- dbConnect(drv2, username="dsc", password=mysqlpw, dbname ="dsskills", host="localhost")
Query database created for Project 3 (Data Science Skills) and store in a dataframe
datascienceskills <- dbGetQuery(con2, "SELECT JobBoard.Board_Name, Skill_Name.Word, Board_Summary.Occurences FROM Board_Summary
JOIN Skill_Name
ON Board_Summary.SkillID = Skill_Name.WordID
JOIN JobBoard
ON JobBoard.BoardID = Board_Summary.BoardID
ORDER BY Board_Summary.Occurences DESC;")
datascienceskills
## Board_Name Word Occurences
## 1 Indeed sql 842
## 2 Indeed python 841
## 3 LinkedIn python 626
## 4 Indeed modeling 613
## 5 Indeed research 595
## 6 Indeed tableau 567
## 7 LinkedIn statistics 530
## 8 LinkedIn r 514
## 9 LinkedIn modeling 504
## 10 Indeed programming 429
## 11 Indeed statistics 420
## 12 LinkedIn sql 398
## 13 Indeed mathematics 394
## 14 Indeed r 378
## 15 LinkedIn research 370
## 16 LinkedIn communication 345
## 17 Indeed communication 327
## 18 LinkedIn programming 273
## 19 Indeed visualization 260
## 20 Indeed metrics 229
## 21 LinkedIn mathematics 218
## 22 LinkedIn hadoop 178
## 23 LinkedIn visualization 161
## 24 Indeed etl 154
## 25 LinkedIn tableau 143
## 26 Indeed pandas 141
## 27 Indeed collaboration 131
## 28 LinkedIn sas 120
## 29 Indeed git 113
## 30 LinkedIn tensorflow 109
## 31 LinkedIn nlp 95
## 32 Indeed hadoop 93
## 33 Indeed tensorflow 87
## 34 LinkedIn metrics 85
## 35 LinkedIn hive 82
## 36 LinkedIn azure 79
## 37 Indeed nlp 76
## 38 Indeed nosql 74
## 39 Indeed numpy 71
## 40 Indeed sas 65
## 41 Indeed scipy 62
## 42 LinkedIn collaboration 59
## 43 Indeed matplotlib 58
## 44 LinkedIn qlikview 52
## 45 Indeed designing 47
## 46 Indeed hive 46
## 47 Indeed bi 43
## 48 LinkedIn matplotlib 38
## 49 LinkedIn nosql 38
## 50 Indeed azure 35
## 51 Indeed mapreduce 26
## 52 LinkedIn bi 24
## 53 LinkedIn pandas 22
## 54 LinkedIn numpy 21
## 55 LinkedIn scipy 21
## 56 LinkedIn probability 21
## 57 Indeed perl 19
## 58 LinkedIn git 16
## 59 Indeed javascript 15
## 60 LinkedIn spotfire 14
## 61 Indeed probability 13
## 62 Indeed mongodb 12
## 63 Indeed api 12
## 64 Indeed debugging 7
## 65 Indeed ssis 5
## 66 Indeed html 4
## 67 Indeed css 3
## 68 Indeed cloudera 3
## 69 Indeed spotfire 3
## 70 Indeed qlikview 3
## 71 LinkedIn designing 3
## 72 LinkedIn etl 2
## 73 LinkedIn mapreduce 1
## 74 Indeed judgement 1
## 75 Indeed vb 1
## 76 Indeed ssrs 1
## 77 LinkedIn javascript 1
## 78 LinkedIn cloudera 1
mongoatlas$insert(datascienceskills)
## List of 5
## $ nInserted : num 78
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mongoatlas$find("{}")
## Board_Name Word Occurences
## 1 Indeed sql 842
## 2 Indeed python 841
## 3 LinkedIn python 626
## 4 Indeed modeling 613
## 5 Indeed research 595
## 6 Indeed tableau 567
## 7 LinkedIn statistics 530
## 8 LinkedIn r 514
## 9 LinkedIn modeling 504
## 10 Indeed programming 429
## 11 Indeed statistics 420
## 12 LinkedIn sql 398
## 13 Indeed mathematics 394
## 14 Indeed r 378
## 15 LinkedIn research 370
## 16 LinkedIn communication 345
## 17 Indeed communication 327
## 18 LinkedIn programming 273
## 19 Indeed visualization 260
## 20 Indeed metrics 229
## 21 LinkedIn mathematics 218
## 22 LinkedIn hadoop 178
## 23 LinkedIn visualization 161
## 24 Indeed etl 154
## 25 LinkedIn tableau 143
## 26 Indeed pandas 141
## 27 Indeed collaboration 131
## 28 LinkedIn sas 120
## 29 Indeed git 113
## 30 LinkedIn tensorflow 109
## 31 LinkedIn nlp 95
## 32 Indeed hadoop 93
## 33 Indeed tensorflow 87
## 34 LinkedIn metrics 85
## 35 LinkedIn hive 82
## 36 LinkedIn azure 79
## 37 Indeed nlp 76
## 38 Indeed nosql 74
## 39 Indeed numpy 71
## 40 Indeed sas 65
## 41 Indeed scipy 62
## 42 LinkedIn collaboration 59
## 43 Indeed matplotlib 58
## 44 LinkedIn qlikview 52
## 45 Indeed designing 47
## 46 Indeed hive 46
## 47 Indeed bi 43
## 48 LinkedIn matplotlib 38
## 49 LinkedIn nosql 38
## 50 Indeed azure 35
## 51 Indeed mapreduce 26
## 52 LinkedIn bi 24
## 53 LinkedIn pandas 22
## 54 LinkedIn numpy 21
## 55 LinkedIn scipy 21
## 56 LinkedIn probability 21
## 57 Indeed perl 19
## 58 LinkedIn git 16
## 59 Indeed javascript 15
## 60 LinkedIn spotfire 14
## 61 Indeed probability 13
## 62 Indeed mongodb 12
## 63 Indeed api 12
## 64 Indeed debugging 7
## 65 Indeed ssis 5
## 66 Indeed html 4
## 67 Indeed css 3
## 68 Indeed cloudera 3
## 69 Indeed spotfire 3
## 70 Indeed qlikview 3
## 71 LinkedIn designing 3
## 72 LinkedIn etl 2
## 73 LinkedIn mapreduce 1
## 74 Indeed judgement 1
## 75 Indeed vb 1
## 76 Indeed ssrs 1
## 77 LinkedIn javascript 1
## 78 LinkedIn cloudera 1