Grando Week 12 Assignment

options(width = 100)
#This is a standard setup I include so that my working directory is set correctly whether I work on one of my windows or linux machines.
if (Sys.info()["sysname"]=="Windows"){
  setwd("~/Masters/DATA607/Week12/Assignment")
} else {
  setwd("~/Documents/Masters/DATA607/Week12/Assignment")
}

Load the tb data

csv_file <- read.csv(file = "https://raw.githubusercontent.com/john-grando/Masters/master/DATA607/Week12/Assignment/tb.csv?token=AXxIejuTdx-_SiiG9LH5jwl9HvCeb1paks5aEk1LwA%3D%3D",
                     header = FALSE,
                     col.names = c("Country", "Year", "Gender","Val1", "Val2", "Val3"))

Create mongo db, collection, and insert tb data

library(mongolite)
my_collection <- mongo(collection = "tb_collection", db = "diseases")
my_collection$insert(csv_file)
## List of 5
##  $ nInserted  : num 3800
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Check data structure

my_collection$iterate()$one()
## $Country
## [1] "Afghanistan"
## 
## $Year
## [1] 1995
## 
## $Gender
## [1] "female"
## 
## $Val1
## [1] -1
## 
## $Val2
## [1] -1
## 
## $Val3
## [1] -1

Why NoSQL?

Well, there are plenty of pro vs. con webpages stating different things about why NoSQL is more useful in some situations but I thought I would take the one reason that stuck with me the most and show an example. SQL databases are built upon structured data tables, and it is very difficult to change those structures on a consistent basis. Therefore, if your data structure is constantly changing then you can be in some trouble trying to map it to a SQL database. However, NoSQL (Mongo in this case) stores information as documents and does not mind if there is additional, or less data. For example, let’s just add a new attribute to an observation.

new_entry <- data.frame(Country=c("TestCountry"), Year=c(9999), Gender=("Test Gender"), Val1=999, Val2=999, Val3=999, Val4=999)
my_collection$insert(new_entry)
## List of 5
##  $ nInserted  : num 1
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

So, mongo said it was inserted but did it work?

my_collection$find(query = '{"Country":"TestCountry"}')
##       Country Year      Gender Val1 Val2 Val3 Val4
## 1 TestCountry 9999 Test Gender  999  999  999  999

Yes it did, but wait, what did it do to our existing dataset?

my_collection$find(query = '{"Country":"Congo"}')
##    Country Year Gender Val1 Val2 Val3
## 1    Congo 1995 female   17  915   11
## 2    Congo 1995   male   16 1012   15
## 3    Congo 1996 female   -1   -1   -1
## 4    Congo 1996   male   -1   -1   -1
## 5    Congo 1997 female   -1   -1   -1
## 6    Congo 1997   male   -1   -1   -1
## 7    Congo 1998 female   -1   -1   -1
## 8    Congo 1998   male   -1   -1   -1
## 9    Congo 1999 female   25  895   22
## 10   Congo 1999   male   17 1046   27
## 11   Congo 2000 female   -1   -1   -1
## 12   Congo 2000   male   -1   -1   -1
## 13   Congo 2001 female   53 1899  107
## 14   Congo 2001   male   31 2009   65
## 15   Congo 2002 female   -1   -1   -1
## 16   Congo 2002   male   -1   -1   -1
## 17   Congo 2003 female   -1   -1   -1
## 18   Congo 2003   male   -1   -1   -1
## 19   Congo 2004 female   38 1727   44
## 20   Congo 2004   male    9 2269   32
## 21   Congo 2005 female   -1   -1   -1
## 22   Congo 2005   male   -1   -1   -1
## 23   Congo 2006 female   44 1348   54
## 24   Congo 2006   male   32 1662   51
## 25   Congo 2007 female   45 1577   60
## 26   Congo 2007   male   28 1779   63
## 27   Congo 2008 female   56 1429   51
## 28   Congo 2008   male   31 1755   49
## 29   Congo 2009 female   65 1414   47
## 30   Congo 2009   male   50 1801   56
## 31   Congo 2010 female   49 1437   56
## 32   Congo 2010   male   41 1811   55
## 33   Congo 2011 female   72 1491   97
## 34   Congo 2011   male   58 1922   76
## 35   Congo 2012 female   63 1508  108
## 36   Congo 2012   male   46 2187   72
## 37   Congo 2013 female   46 1467   71
## 38   Congo 2013   male   35 2224   60

Nothing! That’s right! If we have new data with more information to insert, we can just do it and not worry about the existing data. However, note that this can cause a lot of workarounds and/or queries to make sure that we pull all the data we want. Additionally, there are many articles that can easily be googled which cite disadvantages of NoSQL, which are mainly security, consistency, and standardization.

Here is a query showing the Congo and our test country in one result. Note, a column is made for the new variable as long as one of the queried documents has a value (it is NA for all other documents)

my_collection$find(query = '{"$or" : [{"Country":"TestCountry"}, {"Country":"Congo"}]}')
##        Country Year      Gender Val1 Val2 Val3 Val4
## 1        Congo 1995      female   17  915   11   NA
## 2        Congo 1995        male   16 1012   15   NA
## 3        Congo 1996      female   -1   -1   -1   NA
## 4        Congo 1996        male   -1   -1   -1   NA
## 5        Congo 1997      female   -1   -1   -1   NA
## 6        Congo 1997        male   -1   -1   -1   NA
## 7        Congo 1998      female   -1   -1   -1   NA
## 8        Congo 1998        male   -1   -1   -1   NA
## 9        Congo 1999      female   25  895   22   NA
## 10       Congo 1999        male   17 1046   27   NA
## 11       Congo 2000      female   -1   -1   -1   NA
## 12       Congo 2000        male   -1   -1   -1   NA
## 13       Congo 2001      female   53 1899  107   NA
## 14       Congo 2001        male   31 2009   65   NA
## 15       Congo 2002      female   -1   -1   -1   NA
## 16       Congo 2002        male   -1   -1   -1   NA
## 17       Congo 2003      female   -1   -1   -1   NA
## 18       Congo 2003        male   -1   -1   -1   NA
## 19       Congo 2004      female   38 1727   44   NA
## 20       Congo 2004        male    9 2269   32   NA
## 21       Congo 2005      female   -1   -1   -1   NA
## 22       Congo 2005        male   -1   -1   -1   NA
## 23       Congo 2006      female   44 1348   54   NA
## 24       Congo 2006        male   32 1662   51   NA
## 25       Congo 2007      female   45 1577   60   NA
## 26       Congo 2007        male   28 1779   63   NA
## 27       Congo 2008      female   56 1429   51   NA
## 28       Congo 2008        male   31 1755   49   NA
## 29       Congo 2009      female   65 1414   47   NA
## 30       Congo 2009        male   50 1801   56   NA
## 31       Congo 2010      female   49 1437   56   NA
## 32       Congo 2010        male   41 1811   55   NA
## 33       Congo 2011      female   72 1491   97   NA
## 34       Congo 2011        male   58 1922   76   NA
## 35       Congo 2012      female   63 1508  108   NA
## 36       Congo 2012        male   46 2187   72   NA
## 37       Congo 2013      female   46 1467   71   NA
## 38       Congo 2013        male   35 2224   60   NA
## 39 TestCountry 9999 Test Gender  999  999  999  999
my_collection$drop()