library(RMySQL)

## Loading required package: DBI

#I thought tb was the flights dataset, which explains dbname

mydb = dbConnect(MySQL(), user = 'root', password = 'password1234', dbname = 'flights', host = 'localhost')
dbListFields(mydb, 'tb')

## [1] "country" "year"    "sex"     "child"   "adult"   "elderly"

Run a query using the dbSendQuery command to extract the database from MySQL

rs <- dbSendQuery(mydb, "SELECT * FROM tb")
mySQL_data <- fetch(rs,n =-1)

Save the retrieved data as a csv file

write.csv(mySQL_data, "MySQLquery_db.csv",row.names = FALSE)
dbDisconnect(mydb)

## Warning: Closing open result sets

## [1] TRUE

If you’re skipping the SQL approach, here’s the direct link to the csv in the github repository https://raw.githubusercontent.com/hvasquez81/DATA607-Week-12-Assignment/master/MySQLquery_db.csv

The next thing to do here is open Neo4j desktop and create a new project

Step 1

You need to create a new local graph

Step 2

Set graph name and password

Step 3

Start the new graph

Step 4

Open new graph in Neo4j browser

Step 5

Run the Following line of code

Step 6

Now you’ve converted a RDBMS to a NoSQL db

Step 7

NoSQL vs. RDBMS (MySQL)

In this situation I feel that a NoSQL database isn’t as useful as it normally is. I believe the data is rather simple and not as diverse as a database from say a store like Amazon. The connections are rather short, and therefore network analysis on a graph database wouldn’t show much. In this case, a RDBMS would be easier to work with if we were to look for aggregates such as the sum of child, adult, elderely, sex, between different years or in different countries. The queries in MySQL would be much easier to work with and much more effective in finding out what the data shows us as opposed to what Neo4j could show us.

However, say we were to include another data set to tb, something that included more information as to what’s going on with these people and where exactly they’re migrating to and from, I would then assume that Neo4j would be much more efficient in seeing what exactly is going on. Something interesting that could derive from this is seeing migration and immigration patterns between countries and different age groups. But since our data is limited with only 6 variables, and no variables relating one country to another, I would suggest keeping the dataset in MySQL to perform any analysis.

Alternative Solution MongoDB

We have the mySQL database stored as a dataframe in R. We need to go ahead and connect to the MongoDB

Install the necessary packages

library(mongolite)

c = mongo(collection = "mySQL_data", db ="MySQLData")
c$insert(mySQL_data)

## List of 5
##  $ nInserted  : num 3800
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

The 3800 rows have been added from our MySQL database tb!

head(c$aggregate())

##                        _id     country year    sex child adult elderly
## 1 5bf203c196db8a2c2b5096d2 Afghanistan 1995 female    NA    NA      NA
## 2 5bf203c196db8a2c2b5096d3 Afghanistan 1995   male    NA    NA      NA
## 3 5bf203c196db8a2c2b5096d4 Afghanistan 1996 female    NA    NA      NA
## 4 5bf203c196db8a2c2b5096d5 Afghanistan 1996   male    NA    NA      NA
## 5 5bf203c196db8a2c2b5096d6 Afghanistan 1997 female     5    96       1
## 6 5bf203c196db8a2c2b5096d7 Afghanistan 1997   male     0    26       0

Here is the head of our database in MongoDB. Note that the _id is added everytime you insert data into Mongodb so each id is unique. Lets take a look inside the terminal

MongoDB

DATA607 Week 12 Assignment

NoSQL vs. RDBMS (MySQL)

Alternative Solution MongoDB