Reference: https://creativedata.atlassian.net/wiki/spaces/SAP/pages/53411843/R+-+Read+Write+files+from+MongoDB
Reference: https://stackoverflow.com/questions/12402184/rmongo-and-query-ing
Reference: https://cran.r-project.org/src/contrib/Archive/RMongo/
Reference: https://datascienceplus.com/using-mongodb-with-r/
Reference: https://github.com/tc/RMongo/blob/master/man/dbRemoveQuery-methods.Rd
Reference: https://www.bmc.com/blogs/sql-vs-nosql/
Reference: https://bitnine.net/blog-computing/sql-vs-nosql-comparative-advantages-and-disadvantages/
Reference: https://medium.com/@zhenwu93/relational-vs-non-relational-databases-8336870da8bc
library(DBI)
## Warning: package 'DBI' was built under R version 3.5.3
cn <- dbConnect(drv = RMySQL::MySQL(),
username = "admin",
password = "Data__607",
host = "database-1.cxdov2mcmzlo.us-east-2.rds.amazonaws.com",
port = 3306,
dbname = "data607project3")
data <- dbGetQuery(cn, "SELECT * FROM softwareskills")
data2C <- dbGetQuery(cn, "SELECT * FROM computingskills")
data3E<- dbGetQuery(cn, "SELECT * FROM educationlevels")
library(RMongo)
## Loading required package: rJava
## Warning: package 'rJava' was built under R version 3.5.3
##
## Attaching package: 'RMongo'
## The following objects are masked from 'package:DBI':
##
## dbDisconnect, dbGetQuery
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.5.3
library(stringr)
#Connection to mongodb
mongodb <- mongoDbConnect("skills", host = "localhost", port = 27017)
#Remove any existing collections
dbRemoveQuery(mongodb, 'softwareskills', '{ }')
## [1] "ok"
dbRemoveQuery(mongodb, 'computingskills', '{ }')
## [1] "ok"
dbRemoveQuery(mongodb, 'educationlevels', '{ }')
## [1] "ok"
#Function to insert data into mongodb
mongodb_insert <- function(connection, db, df)
{
#make json mongodb document
mongodata <- toJSON(df)
mongodata <- sub("[","",mongodata, fixed = TRUE)
mongodata <- sub("]","",mongodata, fixed = TRUE)
mongodata <- as.character(mongodata)
dbInsertDocument(connection, db, mongodata)
}
for(i in 1:nrow(data))
{
mongodb_insert(mongodb,"softwareskills",data[i,])
}
for(i in 1:nrow(data2C))
{
mongodb_insert(mongodb,"computingskills",data2C[i,])
}
for(i in 1:nrow(data3E))
{
mongodb_insert(mongodb,"educationlevels",data3E[i,])
}
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 3.5.3
#library(mongolite)
mongo <- mongoDbConnect("skills", "localhost", 27017)
kable(dbGetQuery(mongo, "softwareskills","{}")) #Get all the records
| Keyword | SimplyHired | Monster | X_id | id | Indeed | AngelList | |
|---|---|---|---|---|---|---|---|
| 10534 | Python | 5564 | 5353 | 5dbef0abe60dcb4d901833f9 | 1 | 7067 | 962 |
| 7359 | SQL | 4475 | 8970 | 5dbef0abe60dcb4d901833fa | 2 | 5504 | 559 |
| 6454 | R | 3709 | 3421 | 5dbef0abe60dcb4d901833fb | 3 | 4617 | 332 |
| 4458 | Java | 2549 | 4359 | 5dbef0abe60dcb4d901833fc | 4 | 3366 | 527 |
| 4388 | Hadoop | 2240 | 2858 | 5dbef0abe60dcb4d901833fd | 5 | 2982 | 194 |
| 3686 | Spark | 2262 | 2453 | 5dbef0abe60dcb4d901833fe | 6 | 2978 | 259 |
| 2727 | Excel | 1862 | 1674 | 5dbef0abe60dcb4d901833ff | 7 | 2287 | 627 |
| 2636 | Tableau | 1732 | 1826 | 5dbef0abe60dcb4d90183400 | 8 | 2183 | 76 |
| 2621 | AWS | 1793 | 1682 | 5dbef0abe60dcb4d90183401 | 9 | 2346 | 571 |
| 2589 | SAS | 1417 | 1153 | 5dbef0abe60dcb4d90183402 | 10 | 1744 | 37 |
| 1980 | Scala | 1175 | 1206 | 5dbef0abe60dcb4d90183403 | 11 | 1506 | 182 |
| 1977 | C++ | 1160 | 1094 | 5dbef0abe60dcb4d90183404 | 12 | 1567 | 459 |
| 1859 | Hive | 1161 | 40 | 5dbef0abe60dcb4d90183405 | 13 | 1534 | 91 |
| 1564 | Javascript | 992 | 1436 | 5dbef0abe60dcb4d90183406 | 14 | 1192 | 635 |
| 1417 | NoSQL | 942 | 1562 | 5dbef0abe60dcb4d90183407 | 15 | 1103 | 221 |
| 1332 | Azure | 800 | 752 | 5dbef0abe60dcb4d90183408 | 16 | 1205 | 90 |
| 1130 | TensorFlow | 686 | 643 | 5dbef0abe60dcb4d90183409 | 17 | 926 | 149 |
| 0 | C | 0 | 0 | 5dbef0abe60dcb4d9018340a | 18 | 0 | 0 |
| 1061 | PowerPoint | 712 | 706 | 5dbef0abe60dcb4d9018340b | 19 | 832 | 16 |
| 986 | Matlab | 691 | 625 | 5dbef0abe60dcb4d9018340c | 20 | 828 | 68 |
| 951 | Docker | 603 | 583 | 5dbef0abe60dcb4d9018340d | 21 | 792 | 228 |
| 931 | Git | 669 | 814 | 5dbef0abe60dcb4d9018340e | 22 | 813 | 232 |
| 839 | C# | 497 | 610 | 5dbef0abe60dcb4d9018340f | 23 | 645 | 459 |
| 780 | MySQL | 524 | 501 | 5dbef0abe60dcb4d90183410 | 24 | 659 | 226 |
| 736 | Ruby | 411 | 302 | 5dbef0abe60dcb4d90183411 | 25 | 535 | 262 |
| 711 | Microsoft Office | 427 | 661 | 5dbef0abe60dcb4d90183412 | 26 | 550 | 37 |
| 660 | SPSS | 428 | 416 | 5dbef0abe60dcb4d90183413 | 27 | 470 | 18 |
| 653 | MongoDB | 423 | 401 | 5dbef0abe60dcb4d90183414 | 28 | 584 | 159 |
| 639 | Pig | 422 | 477 | 5dbef0abe60dcb4d90183415 | 29 | 508 | 17 |
| 630 | Pandas | 450 | 333 | 5dbef0abe60dcb4d90183416 | 30 | 525 | 86 |
| 629 | Hbase | 419 | 495 | 5dbef0abe60dcb4d90183417 | 31 | 500 | 33 |
| 610 | Cassandra | 408 | 461 | 5dbef0abe60dcb4d90183418 | 32 | 490 | 90 |
| 585 | Numpy | 365 | 311 | 5dbef0abe60dcb4d90183419 | 33 | 433 | 88 |
| 521 | Perl | 332 | 386 | 5dbef0abe60dcb4d9018341a | 34 | 473 | 86 |
| 494 | Power BI | 347 | 275 | 5dbef0abe60dcb4d9018341b | 35 | 400 | 16 |
| 476 | Node | 317 | 95 | 5dbef0abe60dcb4d9018341c | 36 | 360 | 110 |
| 438 | PostgreSQL | 324 | 470 | 5dbef0abe60dcb4d9018341d | 37 | 376 | 194 |
| 436 | D3 | 141 | 173 | 5dbef0abe60dcb4d9018341e | 38 | 176 | 69 |
| 403 | Keras | 245 | 211 | 5dbef0abe60dcb4d9018341f | 39 | 302 | 50 |
| 382 | PHP | 212 | 133 | 5dbef0abe60dcb4d90183420 | 40 | 320 | 113 |
| 274 | Redis | 158 | 131 | 5dbef0abe60dcb4d90183421 | 41 | 194 | 136 |
| 270 | Alteryx | 170 | 201 | 5dbef0abe60dcb4d90183422 | 42 | 262 | 0 |
| 270 | Jupyter | 173 | 139 | 5dbef0abe60dcb4d90183423 | 43 | 198 | 22 |
| 256 | Stata | 155 | 124 | 5dbef0abe60dcb4d90183424 | 44 | 172 | 14 |
| 243 | Caffe | 149 | 155 | 5dbef0abe60dcb4d90183425 | 45 | 185 | 35 |
| 241 | PyTorch | 165 | 161 | 5dbef0abe60dcb4d90183426 | 46 | 178 | 32 |
library(kableExtra)
#library(mongolite)
mongo <- mongoDbConnect("skills", "localhost", 27017)
kable(dbGetQuery(mongo, "computingskills","{}")) #Get all the records
| Keyword | SimplyHired | Monster | X_id | id | Indeed | |
|---|---|---|---|---|---|---|
| 5701 | machine learning | 2561 | 2340 | 5dbef0abe60dcb4d90183427 | 1 | 3439 |
| 5168 | analysis | 2668 | 3306 | 5dbef0abe60dcb4d90183428 | 2 | 3500 |
| 4893 | statistics | 2308 | 2399 | 5dbef0abe60dcb4d90183429 | 3 | 2992 |
| 4517 | computer science | 2093 | 1900 | 5dbef0abe60dcb4d9018342a | 4 | 2739 |
| 3404 | communication | 1791 | 2053 | 5dbef0abe60dcb4d9018342b | 5 | 2344 |
| 2605 | mathematics | 1497 | 1815 | 5dbef0abe60dcb4d9018342c | 6 | 1961 |
| 1879 | visualization | 1153 | 1207 | 5dbef0abe60dcb4d9018342d | 7 | 1413 |
| 0 | AI composite | 0 | 0 | 5dbef0abe60dcb4d9018342e | 8 | 0 |
| 1310 | deep learning | 675 | 606 | 5dbef0abe60dcb4d9018342f | 9 | 979 |
| 0 | NLP composite | 0 | 0 | 5dbef0abe60dcb4d90183430 | 10 | 0 |
| 732 | software development | 481 | 784 | 5dbef0abe60dcb4d90183431 | 11 | 627 |
| 671 | neural networks | 421 | 305 | 5dbef0abe60dcb4d90183432 | 12 | 485 |
| 514 | data engineering | 0 | 200 | 5dbef0abe60dcb4d90183433 | 13 | 0 |
| 476 | project management | 330 | 348 | 5dbef0abe60dcb4d90183434 | 14 | 397 |
| 413 | software engineering | 250 | 512 | 5dbef0abe60dcb4d90183435 | 15 | 295 |
library(kableExtra)
#library(mongolite)
mongo <- mongoDbConnect("skills", "localhost", 27017)
kable(dbGetQuery(mongo, "educationlevels","{}")) #Get all the records
| Keyword | SimplyHired | Monster | X_id | id | Indeed | AngelList | |
|---|---|---|---|---|---|---|---|
| 2917 | MS | 2532 | 1985 | 5dbef0abe60dcb4d90183436 | 1 | 1821 | 288 |
| 3242 | PhD | 2221 | 1629 | 5dbef0abe60dcb4d90183437 | 2 | 1468 | 230 |
| 1568 | masters | 2704 | 2033 | 5dbef0abe60dcb4d90183438 | 3 | 887 | 165 |
| 677 | bachelors | 3326 | 2631 | 5dbef0abe60dcb4d90183439 | 4 | 578 | 97 |
| 1186 | MBA | 788 | 634 | 5dbef0abe60dcb4d9018343a | 5 | 675 | 63 |
| 31 | bootcamp | 129 | 74 | 5dbef0abe60dcb4d9018343b | 6 | 14 | 18 |
| 74 | Kaggle | 49 | 44 | 5dbef0abe60dcb4d9018343c | 7 | 12086 | 5 |
What is SQL?
SQL (Structured Query Language) is a programming language that is used to manage data in relational databases. Relational databases use relations (typically called tables) to store data and then match that data by using common characteristics within the dataset. Some common relational database management systems that use SQL are Oracle, Sybase, Microsoft SQL Server, Access, and Ingres. Cobb’s breakthrough paper describes a database where objects could be constructed and queried using something he called SQL, structured query language. He used SQL to create both data (in objects called tables) and the schema for that data, which describes fields in columns. A single record in a SQL database is called a row.
What is NoSQL?
A NoSQL database, on the other hand, is self-describing, so does not require a schema. Nor does it enforce relations between tables in all cases. All its documents are JSON documents, which are complete entities that one can readily read and understand. NoSQL refers to high-performance, non-relational databases that utilize a wide variety of data models. These databases are highly recognized for their ease-of-use, scalable performance, strong resilience, and wide availability. NoSQL database examples include MongoDB, MarkLogic, Couchbase, CloudDB, and Amazon’s Dynamo DB.
Major Differences
There are many differences between SQL and NoSQL, all of which are important to understand when making a decision about what might be the best data management system for your organization. These include differences in:
Language
Scalability
Community
Structure
The Language
One of the major differences between SQL relational and NoSQL non-relational databases is the language. As mentioned, SQL databases use Structured Query Language for defining and manipulating data. This allows SQL to be extremely versatile and widely-used - however, it also makes it more restrictive. SQL requires that you use predefined schemas to determine the structure of your data before you even begin to work with it. Your data must also follow the same structure as well, which can entail significant upfront preparation along with careful execution.
A NoSQL database features a dynamic schema for unstructured data and the data can be stored in many different ways, whether it be graph-based, document-oriented, column-oriented, or organized as a KeyValue store. This extreme flexibility allows you to create documents without first having to carefully plan and define their structure, add fields as you go, and vary the syntax from database to database. It also allows you to give each document its own unique structure, providing you with more freedom overall.
The Scalability
Another big difference between SQL and NoSQL is their scalability. In most SQL databases, they are vertically scalable, which means that you can increase the load on a single server by increasing components like RAM, SSD, or CPU. In contrast, NoSQL databases are horizontally scalable, which means that they can handle increased traffic simply by adding more servers to the database. NoSQL databases have the ability to become larger and much more powerful, making them the preferred choice for large or constantly evolving data sets.
The Community
Due to SQL’s maturity, it has a much stronger and more developed community compared to NoSQL. There are thousands of chats and forums available where experts can share knowledge and discuss SQL best practices, continuously enhancing skills. Although NoSQL is growing rapidly, its community is not as well defined as SQL due to the fact that it is still relatively new.
The Structure
Finally, a last thing to consider when debating SQL versus NoSQL is their structures. SQL databases are table-based which makes them a better option for applications that require multi-row transactions. Samples of these may be accounting systems or even legacy systems that were originally built for a relational structure. NoSQL databases can be key-value pairs, wide-column stores, graph databases, or document-based.