1 Intro

MongoDB is a NoSQL database program using JSON type of documents with schemas. It’s open source cross-platform database. MongoDB is the representative NoSQL database engine. To me, I’ve started to learn Python for some reasons. One of them, for me, is that I want to insert webCrawling datasets including text data, image url, and so on. to NoSQL database.

2 MongDB Installation

I highly recommend users to use mongoDB atlas. Since Cloud rapidly dominates over IT industry, it’s better to practice a good cloud product like mongodb atlas cloud. It’s free to use it connecting to major cloud service agencies, AWS, GCP, Azure. The details are followed link: https://www.mongodb.com/cloud/atlas

2.1 Note - Network Access.

Following instructions, it’s not much difficult to build mongoDB cluster. But, when setting Network Access up, it’s a bit confused where to click. For educational purpose please click ALLOW ACCESS FROM ANYWHERE for your sake

By clicking, users are freely access to mongoDB in cloud.

3 R Connecting to MongoDB Cluster

Many ways to connect in different languages. However, unfortunately, this platform does not provide any information connecting to R. In R, it’s important find cluster url with username and password replaced with SCRAM credentials. For R users, please click the followed link: https://docs.mongodb.com/manual/reference/connection-string/

In general, users are able to get an URI string when clicking to connect button. Please see image below.

Now, it’s time to code in R.

3.1 R Package Installation

In R, type below code and install mongolite If you want to look at the package more, then please read PDF file: https://cran.r-project.org/web/packages/mongolite/mongolite.pdf

## intall packages & load them
# if (! ("mongolite" %in% rownames(installed.packages()))) { install.packages("mongolite") }
library(mongolite)

3.2 Connecting to MongoDB

Via mongo(), users are able to reach mongoDB.

url_path = 'mongodb+srv://<username here>:<password here>!@<cluster url>/admin'

#make connection object that specifies new database and collection (dataset)
mongo <- mongo(collection = "listingsAndReviews", db = "sample_airbnb", 
              url = url_path, 
              verbose = TRUE)
  • Sample url_path is provided. So, users must fine your own url_path.

  • Next code is for connecting to database called sample_airbnb, which alreay uploaded freely as following instructions.

mongo_db <- mongo(collection = "listingsAndReviews", # Data Table
               db = "sample_airbnb", # DataBase
               url = url_path, 
               verbose = TRUE)

print(mongo_db)
## <Mongo collection> 'listingsAndReviews' 
##  $aggregate(pipeline = "{}", options = "{\"allowDiskUse\":true}", handler = NULL, pagesize = 1000, iterate = FALSE) 
##  $count(query = "{}") 
##  $disconnect(gc = TRUE) 
##  $distinct(key, query = "{}") 
##  $drop() 
##  $export(con = stdout(), bson = FALSE, query = "{}", fields = "{}", sort = "{\"_id\":1}") 
##  $find(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0, handler = NULL, pagesize = 1000) 
##  $import(con, bson = FALSE) 
##  $index(add = NULL, remove = NULL) 
##  $info() 
##  $insert(data, pagesize = 1000, stop_on_error = TRUE, ...) 
##  $iterate(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0) 
##  $mapreduce(map, reduce, query = "{}", sort = "{}", limit = 0, out = NULL, scope = NULL) 
##  $remove(query, just_one = FALSE) 
##  $rename(name, db = NULL) 
##  $replace(query, update = "{}", upsert = FALSE) 
##  $run(command = "{\"ping\": 1}", simplify = TRUE) 
##  $update(query, update = "{\"$set\":{}}", filters = NULL, upsert = FALSE, multiple = FALSE)
  • If users connect to NoSQL via mongo(), then you may see details as followed.

3.3 Data Import

Users are able to import data stored in our cluster.

data <- mongo_db$find(query = '{}')
## 
 Found 1000 records...
 Found 2000 records...
 Found 3000 records...
 Found 4000 records...
 Found 5000 records...
 Found 5555 records...
 Imported 5555 records. Simplifying into dataframe...
dplyr::glimpse(data)
## Observations: 5,555
## Variables: 41
## $ listing_url           <chr> "https://www.airbnb.com/rooms/10006546", "…
## $ name                  <chr> "Ribeira Charming Duplex", "Horto flat wit…
## $ summary               <chr> "Fantastic duplex apartment with three bed…
## $ space                 <chr> "Privileged views of the Douro River and R…
## $ description           <chr> "Fantastic duplex apartment with three bed…
## $ neighborhood_overview <chr> "In the neighborhood of the river, you can…
## $ notes                 <chr> "Lose yourself in the narrow streets and s…
## $ transit               <chr> "Transport: • Metro station and S. Bento r…
## $ access                <chr> "We are always available to help guests. T…
## $ interaction           <chr> "Cot - 10 € / night Dog - € 7,5 / night", …
## $ house_rules           <chr> "Make the house your home...", "I just hop…
## $ property_type         <chr> "House", "Apartment", "Condominium", "Apar…
## $ room_type             <chr> "Entire home/apt", "Entire home/apt", "Ent…
## $ bed_type              <chr> "Real Bed", "Real Bed", "Real Bed", "Real …
## $ minimum_nights        <chr> "2", "2", "3", "14", "1", "12", "3", "2", …
## $ maximum_nights        <chr> "30", "1125", "365", "1125", "1125", "360"…
## $ cancellation_policy   <chr> "moderate", "flexible", "strict_14_with_gr…
## $ last_scraped          <dttm> 2019-02-16 14:00:00, 2019-02-11 14:00:00,…
## $ calendar_last_scraped <dttm> 2019-02-16 14:00:00, 2019-02-11 14:00:00,…
## $ first_review          <dttm> 2016-01-03 14:00:00, NA, 2013-05-24 13:00…
## $ last_review           <dttm> 2019-01-20 14:00:00, NA, 2019-02-07 14:00…
## $ accommodates          <int> 8, 4, 2, 1, 2, 2, 4, 6, 8, 4, 4, 2, 3, 6, …
## $ bedrooms              <int> 3, 1, 1, 1, 1, 1, 1, 2, 1, 1, 0, 0, 1, 3, …
## $ beds                  <int> 5, 2, 1, 1, 1, 1, 3, 6, 8, 2, 2, 1, 2, 3, …
## $ number_of_reviews     <int> 51, 0, 96, 1, 0, 70, 70, 1, 1, 0, 5, 0, 3,…
## $ bathrooms             <dbl> 1.0, 1.0, 1.0, 1.5, 2.0, 1.0, 2.0, 1.0, 4.…
## $ amenities             <list> [<"TV", "Cable TV", "Wifi", "Kitchen", "P…
## $ price                 <dbl> 80, 317, 115, 40, 701, 135, 119, 527, 250,…
## $ security_deposit      <dbl> 200, NA, NA, NA, 1000, 0, 600, NA, 0, NA, …
## $ cleaning_fee          <dbl> 35, 187, 100, NA, 250, 135, 150, 211, 0, N…
## $ extra_people          <dbl> 15, 0, 0, 0, 0, 0, 40, 211, 40, 31, 0, 12,…
## $ guests_included       <dbl> 6, 1, 1, 1, 1, 1, 3, 1, 4, 1, 1, 1, 1, 1, …
## $ images                <df[,4]> <data.frame[25 x 4]>
## $ host                  <df[,16]> <data.frame[25 x 16]>
## $ address               <df[,7]> <data.frame[25 x 7]>
## $ availability          <df[,4]> <data.frame[25 x 4]>
## $ review_scores         <df[,7]> <data.frame[25 x 7]>
## $ reviews               <list> [<data.frame[51 x 6]>, <data.frame[0 x 0]…
## $ weekly_price          <dbl> NA, 1492, 650, NA, NA, NA, NA, NA, NA, NA,…
## $ monthly_price         <dbl> NA, 4849, 2150, NA, NA, NA, NA, NA, NA, NA…
## $ reviews_per_month     <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

The code is very simple but powerful to use. As you can see, the number of records in both in Cloud and in R are same.

3.4 Creation Database & Collection, insertion

If users want to create database and collection, it’s easy to create them with mongo() & collection=, db=. The sample code is below.

my_collection = mongo(collection = "name_of_collection", db = "name_of_database") # create connection, database and collection
my_collection$insert(name_of_collection)

Following the code, let’s create insert iris data to mongoDB. Here, delete the current connection is important. rm(mongo_db) in this case.

rm(mongo_db) # disconnection
data("iris")
iris_collection <- mongo(collection = "iris", # Creating collection
               db = "sample_dataset_R", # Creating DataBase
               url = url_path, 
               verbose = TRUE)

# insert code
iris_collection$insert(iris)
## 
Complete! Processed total of 150 rows.
## List of 5
##  $ nInserted  : num 150
##  $ nMatched   : num 0
##  $ nRemoved   : num 0
##  $ nUpserted  : num 0
##  $ writeErrors: list()

Now, users go to mongodb cloud and please check database & collection. It’s very successfully inserted to mongoDB cloud.

After inserting new data to mongoDB, then it’s okay to disconnect iris_collection.

rm(iris_collection)


End_of_Document