A SQL database “movie_reviews” has been created that contains ratings given by a set of viewers to 6 recent movies. This database is loaded and read into a data frame. The related file “reviews.sql” contains SQL commands that were issued to create the database. Configuration: MySQL version 5.7, Windows 10
library(RMySQL)
library(datasets)
# Open a MySql database containing movie ratings from movie viewers
dbcon = dbConnect(MySQL(), user = 'root', password = 'pwd001', dbname='movie_reviews')
# Check the database has table(s).
dbListTables(dbcon)
## [1] "responses"
dbListFields(dbcon, 'responses')
## [1] "reviewer" "movie" "rating"
# Read all rows in the responses table into data frame. Print data frame.
result = dbSendQuery(dbcon, "select * from responses")
mr.df = fetch(result, n=30)
print(mr.df)
## reviewer movie rating
## 1 Mark LaLaLand 1
## 2 Mark Moana 2
## 3 Mark Zootopia 4
## 4 Mark Arrival 4
## 5 Mark FindingDory 5
## 6 Mark SuicideSquad 3
## 7 Jane LaLaLand 5
## 8 Jane Moana 3
## 9 Jane Zootopia 3
## 10 Jane Arrival 4
## 11 Jane FindingDory 2
## 12 Jane SuicideSquad 2
## 13 Nate LaLaLand 4
## 14 Nate Moana 2
## 15 Nate Zootopia 2
## 16 Nate Arrival 4
## 17 Nate FindingDory 5
## 18 Nate SuicideSquad 5
## 19 Rob LaLaLand 3
## 20 Rob Moana 3
## 21 Rob Zootopia 4
## 22 Rob Arrival 1
## 23 Rob FindingDory 1
## 24 Rob SuicideSquad 2
## 25 Lisa LaLaLand 1
## 26 Lisa Moana 5
## 27 Lisa Zootopia 5
## 28 Lisa Arrival 3
## 29 Lisa FindingDory 3
## 30 Lisa SuicideSquad 4
library(mongolite)
# Create a MongoDB database and collection
mr <- mongo(collection = "movies", db = "reviews")
if (mr$count() > 0)
mr$drop()
data(mr.df)
mr$insert(mr.df)
## List of 5
## $ nInserted : num 30
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
mr$count()
## [1] 30
# Read back from the MongoDB database.
mout <- mr$find()
print(mout)
## reviewer movie rating
## 1 Mark LaLaLand 1
## 2 Mark Moana 2
## 3 Mark Zootopia 4
## 4 Mark Arrival 4
## 5 Mark FindingDory 5
## 6 Mark SuicideSquad 3
## 7 Jane LaLaLand 5
## 8 Jane Moana 3
## 9 Jane Zootopia 3
## 10 Jane Arrival 4
## 11 Jane FindingDory 2
## 12 Jane SuicideSquad 2
## 13 Nate LaLaLand 4
## 14 Nate Moana 2
## 15 Nate Zootopia 2
## 16 Nate Arrival 4
## 17 Nate FindingDory 5
## 18 Nate SuicideSquad 5
## 19 Rob LaLaLand 3
## 20 Rob Moana 3
## 21 Rob Zootopia 4
## 22 Rob Arrival 1
## 23 Rob FindingDory 1
## 24 Rob SuicideSquad 2
## 25 Lisa LaLaLand 1
## 26 Lisa Moana 5
## 27 Lisa Zootopia 5
## 28 Lisa Arrival 3
## 29 Lisa FindingDory 3
## 30 Lisa SuicideSquad 4
Relational databases such as MySQL allow more advanced querying and indexing. They store data in rows where each row contains values of all variables (columns) corresponding to that row. They allow multiple secondary indexes to be built and used with predictable performance. NoSQL databases are limited in the number of secondary indexes that can be supported.
On the other hand, with massive datasets that are common for web-scale data, a MySQL database suffers from limited scalability: it is usually not designed to scale to multiple nodes, thus limiting the amount of data that it can store under a single repository. In this regard the NoSQL databases are clearly superior: they are designed from the ground up to scale to hundreds, often thousands of nodes. For this reason, they are the only choice in data centers where data from millions of users must be stored and accessed efficiently. A relational database, for all its sophisticated features, simply cannot scale to those levels of size and speed.
Today, an IT administrator must learn to deploy both types of databases as required by the size, growth rates and usage of their data.