MongoDB-Trial

Silverio Vasquez

November 12, 2017

From MySQL to MongoDB

SQL Script to create movie database

Create movie database locally using this SQL file

Then extract movies database from MySQL into a dataframe and print out top and bottom to confirm data came in OK.

# Connect to MySQL database
con <- dbConnect(MySQL(),user='CUNY_SPS',password='',
                 host='localhost',dbname='tb')

# Download all data in database
rs<- dbSendQuery(con, "select * from tb_hw ORDER BY rating DESC")

# Store data in dataframe
movies <- fetch(rs) 
head(movies)

##        movie year   genre mins rating initials
## 1     Avatar 2009 Fantasy  302      5      sjv
## 2 Braveheart 1995   Drama  318      5      sjv
## 3       Heat 1995  Action  310      5      sjv
## 4     Avatar 2009 Fantasy  302      5       jv
## 5       Okja 2017   Drama  120      5       jv
## 6     Avatar 2009 Fantasy  302      5       cl

tail(movies)

##         movie year  genre mins rating initials
## 25 Terminator 1984 Sci-fi  107      2       jv
## 26       Okja 2017  Drama  120      2       cl
## 27 Braveheart 1995  Drama  318      2       np
## 28      Rocky 1976  Drama  120      1       jv
## 29       Heat 1995 Action  310      1       jv
## 30      Rocky 1976  Drama  120      1       np

Import movies data frame to MongoDB

Inserting a dataframe from R into a MongoDB collection is supereasy thanks to the mongolite package

my_db <-  mongo(collection='movies', db="ratings")
my_db$insert(movies)

## List of 5
##  $ nInserted  : num 30
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Confirm data imported correctly:

my_db$count() == dim(movies)[1]

## [1] TRUE

Check first record in MongoDB:

my_db$iterate()$one()

## $movie
## [1] "Avatar"
## 
## $year
## [1] 2009
## 
## $genre
## [1] "Fantasy"
## 
## $mins
## [1] 302
## 
## $rating
## [1] 5
## 
## $initials
## [1] "sjv"

Export data from MongoDB to a data frame and confirm data frame matches the one uploaded:

# return data from MongoDB into a tmp data frame
tmp <- my_db$find()
# test if two data frames are identical
identical(movies,tmp)

## [1] TRUE

Query MongoDB

Return all movies rated >3 from any user:

top_5 <- my_db$distinct("movie", query = '{"rating":{"$gt":3}}')
print(top_5)

## [1] "Avatar"     "Braveheart" "Heat"       "Okja"       "Terminator"
## [6] "Rocky"

Return all movies rated by user “sjv”:

sjv <- my_db$find(query = '{"initials":"sjv"}')
sjv

##        movie year   genre mins rating initials
## 1     Avatar 2009 Fantasy  302      5      sjv
## 2 Braveheart 1995   Drama  318      5      sjv
## 3       Heat 1995  Action  310      5      sjv
## 4 Terminator 1984  Sci-fi  107      4      sjv
## 5      Rocky 1976   Drama  120      4      sjv
## 6       Okja 2017   Drama  120      3      sjv

Return all movies created after year 2000:

post2000 <- my_db$distinct("movie",query = '{"year":{"$gt":2000}}')
print(post2000)

## [1] "Avatar" "Okja"

Aggregate data and plot

Return average rating per movie and plot results:

avg_rating <- my_db$aggregate('[{"$group":{"_id":"$movie","average":{"$avg":"$rating"}}}]') 
names(avg_rating) <- c("movie","average_rating") 
ggplot(aes(movie,average_rating),data=avg_rating) + 
        geom_col(fill="sky blue") + ggtitle("Average Rating per Movie") + theme_classic()

Pros & Cons

The pros of NOSQL are that developers don’t have to worry about schemas, which can be time consuming to create if data connections are complex. NOSQL tends to be faster to retrieve data as well.

Some cons of NOSQL is the greater skill set needed to create/query NOSQL databases. Additionally, sometimes businesses want to keep/store the relationship between one item and another. For example, if I want to know how employees report to Bob, this could arguably be faster to query in a relational database.