MongoDB is a NoSQL database program using JSON type of documents with schemas. It’s open source cross-platform database. MongoDB is the representative NoSQL database engine. To me, I’ve started to learn Python for some reasons. One of them, for me, is that I want to insert webCrawling datasets including text data, image url, and so on. to NoSQL database.
I highly recommend users to use mongoDB atlas. Since Cloud rapidly dominates over IT industry, it’s better to practice a good cloud product like mongodb atlas cloud. It’s free to use it connecting to major cloud service agencies, AWS, GCP, Azure. The details are followed link: https://www.mongodb.com/cloud/atlas
Following instructions, it’s not much difficult to build mongoDB cluster. But, when setting Network Access up, it’s a bit confused where to click. For educational purpose please click ALLOW ACCESS FROM ANYWHERE for your sake
By clicking, users are freely access to mongoDB in cloud.
Many ways to connect in different languages. However, unfortunately, this platform does not provide any information connecting to R. In R, it’s important find cluster url with username and password replaced with SCRAM credentials. For R users, please click the followed link: https://docs.mongodb.com/manual/reference/connection-string/
In general, users are able to get an URI string when clicking to connect button. Please see image below.
Now, it’s time to code in R.
In R, type below code and install mongolite If you want to look at the package more, then please read PDF file: https://cran.r-project.org/web/packages/mongolite/mongolite.pdf
## intall packages & load them
# if (! ("mongolite" %in% rownames(installed.packages()))) { install.packages("mongolite") }
library(mongolite)
Via mongo(), users are able to reach mongoDB.
url_path = 'mongodb+srv://<username here>:<password here>!@<cluster url>/admin'
#make connection object that specifies new database and collection (dataset)
mongo <- mongo(collection = "listingsAndReviews", db = "sample_airbnb",
url = url_path,
verbose = TRUE)
Sample url_path is provided. So, users must fine your own url_path.
Next code is for connecting to database called sample_airbnb, which alreay uploaded freely as following instructions.
mongo_db <- mongo(collection = "listingsAndReviews", # Data Table
db = "sample_airbnb", # DataBase
url = url_path,
verbose = TRUE)
print(mongo_db)
## <Mongo collection> 'listingsAndReviews'
## $aggregate(pipeline = "{}", options = "{\"allowDiskUse\":true}", handler = NULL, pagesize = 1000, iterate = FALSE)
## $count(query = "{}")
## $disconnect(gc = TRUE)
## $distinct(key, query = "{}")
## $drop()
## $export(con = stdout(), bson = FALSE, query = "{}", fields = "{}", sort = "{\"_id\":1}")
## $find(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0, handler = NULL, pagesize = 1000)
## $import(con, bson = FALSE)
## $index(add = NULL, remove = NULL)
## $info()
## $insert(data, pagesize = 1000, stop_on_error = TRUE, ...)
## $iterate(query = "{}", fields = "{\"_id\":0}", sort = "{}", skip = 0, limit = 0)
## $mapreduce(map, reduce, query = "{}", sort = "{}", limit = 0, out = NULL, scope = NULL)
## $remove(query, just_one = FALSE)
## $rename(name, db = NULL)
## $replace(query, update = "{}", upsert = FALSE)
## $run(command = "{\"ping\": 1}", simplify = TRUE)
## $update(query, update = "{\"$set\":{}}", filters = NULL, upsert = FALSE, multiple = FALSE)
mongo(), then you may see details as followed.Users are able to import data stored in our cluster.
data <- mongo_db$find(query = '{}')
##
Found 1000 records...
Found 2000 records...
Found 3000 records...
Found 4000 records...
Found 5000 records...
Found 5555 records...
Imported 5555 records. Simplifying into dataframe...
dplyr::glimpse(data)
## Observations: 5,555
## Variables: 41
## $ listing_url <chr> "https://www.airbnb.com/rooms/10006546", "…
## $ name <chr> "Ribeira Charming Duplex", "Horto flat wit…
## $ summary <chr> "Fantastic duplex apartment with three bed…
## $ space <chr> "Privileged views of the Douro River and R…
## $ description <chr> "Fantastic duplex apartment with three bed…
## $ neighborhood_overview <chr> "In the neighborhood of the river, you can…
## $ notes <chr> "Lose yourself in the narrow streets and s…
## $ transit <chr> "Transport: • Metro station and S. Bento r…
## $ access <chr> "We are always available to help guests. T…
## $ interaction <chr> "Cot - 10 € / night Dog - € 7,5 / night", …
## $ house_rules <chr> "Make the house your home...", "I just hop…
## $ property_type <chr> "House", "Apartment", "Condominium", "Apar…
## $ room_type <chr> "Entire home/apt", "Entire home/apt", "Ent…
## $ bed_type <chr> "Real Bed", "Real Bed", "Real Bed", "Real …
## $ minimum_nights <chr> "2", "2", "3", "14", "1", "12", "3", "2", …
## $ maximum_nights <chr> "30", "1125", "365", "1125", "1125", "360"…
## $ cancellation_policy <chr> "moderate", "flexible", "strict_14_with_gr…
## $ last_scraped <dttm> 2019-02-16 14:00:00, 2019-02-11 14:00:00,…
## $ calendar_last_scraped <dttm> 2019-02-16 14:00:00, 2019-02-11 14:00:00,…
## $ first_review <dttm> 2016-01-03 14:00:00, NA, 2013-05-24 13:00…
## $ last_review <dttm> 2019-01-20 14:00:00, NA, 2019-02-07 14:00…
## $ accommodates <int> 8, 4, 2, 1, 2, 2, 4, 6, 8, 4, 4, 2, 3, 6, …
## $ bedrooms <int> 3, 1, 1, 1, 1, 1, 1, 2, 1, 1, 0, 0, 1, 3, …
## $ beds <int> 5, 2, 1, 1, 1, 1, 3, 6, 8, 2, 2, 1, 2, 3, …
## $ number_of_reviews <int> 51, 0, 96, 1, 0, 70, 70, 1, 1, 0, 5, 0, 3,…
## $ bathrooms <dbl> 1.0, 1.0, 1.0, 1.5, 2.0, 1.0, 2.0, 1.0, 4.…
## $ amenities <list> [<"TV", "Cable TV", "Wifi", "Kitchen", "P…
## $ price <dbl> 80, 317, 115, 40, 701, 135, 119, 527, 250,…
## $ security_deposit <dbl> 200, NA, NA, NA, 1000, 0, 600, NA, 0, NA, …
## $ cleaning_fee <dbl> 35, 187, 100, NA, 250, 135, 150, 211, 0, N…
## $ extra_people <dbl> 15, 0, 0, 0, 0, 0, 40, 211, 40, 31, 0, 12,…
## $ guests_included <dbl> 6, 1, 1, 1, 1, 1, 3, 1, 4, 1, 1, 1, 1, 1, …
## $ images <df[,4]> <data.frame[25 x 4]>
## $ host <df[,16]> <data.frame[25 x 16]>
## $ address <df[,7]> <data.frame[25 x 7]>
## $ availability <df[,4]> <data.frame[25 x 4]>
## $ review_scores <df[,7]> <data.frame[25 x 7]>
## $ reviews <list> [<data.frame[51 x 6]>, <data.frame[0 x 0]…
## $ weekly_price <dbl> NA, 1492, 650, NA, NA, NA, NA, NA, NA, NA,…
## $ monthly_price <dbl> NA, 4849, 2150, NA, NA, NA, NA, NA, NA, NA…
## $ reviews_per_month <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
The code is very simple but powerful to use. As you can see, the number of records in both in Cloud and in R are same.
If users want to create database and collection, it’s easy to create them with mongo() & collection=, db=. The sample code is below.
my_collection = mongo(collection = "name_of_collection", db = "name_of_database") # create connection, database and collection
my_collection$insert(name_of_collection)
Following the code, let’s create insert iris data to mongoDB. Here, delete the current connection is important. rm(mongo_db) in this case.
rm(mongo_db) # disconnection
data("iris")
iris_collection <- mongo(collection = "iris", # Creating collection
db = "sample_dataset_R", # Creating DataBase
url = url_path,
verbose = TRUE)
# insert code
iris_collection$insert(iris)
##
Complete! Processed total of 150 rows.
## List of 5
## $ nInserted : num 150
## $ nMatched : num 0
## $ nRemoved : num 0
## $ nUpserted : num 0
## $ writeErrors: list()
Now, users go to mongodb cloud and please check database & collection. It’s very successfully inserted to mongoDB cloud.
After inserting new data to mongoDB, then it’s okay to disconnect iris_collection.
rm(iris_collection)
End_of_Document