Assignment Instructions
For this assignment, you should take information from a relational database and migrate it to a NoSQL database of your own choosing.
For the relational database, you might use the flights database, the tb database, the “data skills” database your team created for Project 3, or another database of your own choosing or creation.
For the NoSQL database, you may use MongoDB (which we introduced in week 7), Neo4j, or another NoSQL database of your choosing.
Your migration process needs to be reproducible. R code is encouraged, but not required. You should also briefly describe the advantages and disadvantages of storing the data in a relational database vs. your NoSQL database.
Pre-Requistes : Available Libraries
- knitr
- dplyr
- tidyr
- RMySQL
- mongolite
- RNeo4j
- neo4r
- kableExtra
- DT
- data.table
- ggplot2
Show Source Raw Data in Data Frame as Table
Source data of “airlines”
Kable
kable(head(airlines,10)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
|
carrier
|
name
|
|
9E
|
Endeavor Air Inc.
|
|
AA
|
American Airlines Inc.
|
|
AS
|
Alaska Airlines Inc.
|
|
B6
|
JetBlue Airways
|
|
DL
|
Delta Air Lines Inc.
|
|
EV
|
ExpressJet Airlines Inc.
|
|
F9
|
Frontier Airlines Inc.
|
|
FL
|
AirTran Airways Corporation
|
|
HA
|
Hawaiian Airlines Inc.
|
|
MQ
|
Envoy Air
|
Data Table
DT::datatable(head(airlines,10), options = list(pagelength=5))
Select
DT::datatable(select(head(airlines,10), carrier:name), options = list(pagelength=5))
Knitr
knitr::kable(head(airlines,10), format = "html")
|
carrier
|
name
|
|
9E
|
Endeavor Air Inc.
|
|
AA
|
American Airlines Inc.
|
|
AS
|
Alaska Airlines Inc.
|
|
B6
|
JetBlue Airways
|
|
DL
|
Delta Air Lines Inc.
|
|
EV
|
ExpressJet Airlines Inc.
|
|
F9
|
Frontier Airlines Inc.
|
|
FL
|
AirTran Airways Corporation
|
|
HA
|
Hawaiian Airlines Inc.
|
|
MQ
|
Envoy Air
|
SQL
sqldf("select * from airlines")
Source data of “airports”
Kable
kable(head(airports,10)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
|
faa
|
name
|
lat
|
lon
|
alt
|
tz
|
dst
|
tzone
|
|
04G
|
Lansdowne Airport
|
41.13047
|
-80.61958
|
1044
|
-5
|
A
|
America/New_York
|
|
06A
|
Moton Field Municipal Airport
|
32.46057
|
-85.68003
|
264
|
-6
|
A
|
America/Chicago
|
|
06C
|
Schaumburg Regional
|
41.98934
|
-88.10124
|
801
|
-6
|
A
|
America/Chicago
|
|
06N
|
Randall Airport
|
41.43191
|
-74.39156
|
523
|
-5
|
A
|
America/New_York
|
|
09J
|
Jekyll Island Airport
|
31.07447
|
-81.42778
|
11
|
-5
|
A
|
America/New_York
|
|
0A9
|
Elizabethton Municipal Airport
|
36.37122
|
-82.17342
|
1593
|
-5
|
A
|
America/New_York
|
|
0G6
|
Williams County Airport
|
41.46731
|
-84.50678
|
730
|
-5
|
A
|
America/New_York
|
|
0G7
|
Finger Lakes Regional Airport
|
42.88356
|
-76.78123
|
492
|
-5
|
A
|
America/New_York
|
|
0P2
|
Shoestring Aviation Airfield
|
39.79482
|
-76.64719
|
1000
|
-5
|
U
|
America/New_York
|
|
0S9
|
Jefferson County Intl
|
48.05381
|
-122.81064
|
108
|
-8
|
A
|
America/Los_Angeles
|
Data Table
DT::datatable(head(airports,10), options = list(pagelength=5))
Select
DT::datatable(select(head(airports,10), faa:tzone), options = list(pagelength=5))
Knitr
knitr::kable(head(airports,10), format = "html")
|
faa
|
name
|
lat
|
lon
|
alt
|
tz
|
dst
|
tzone
|
|
04G
|
Lansdowne Airport
|
41.13047
|
-80.61958
|
1044
|
-5
|
A
|
America/New_York
|
|
06A
|
Moton Field Municipal Airport
|
32.46057
|
-85.68003
|
264
|
-6
|
A
|
America/Chicago
|
|
06C
|
Schaumburg Regional
|
41.98934
|
-88.10124
|
801
|
-6
|
A
|
America/Chicago
|
|
06N
|
Randall Airport
|
41.43191
|
-74.39156
|
523
|
-5
|
A
|
America/New_York
|
|
09J
|
Jekyll Island Airport
|
31.07447
|
-81.42778
|
11
|
-5
|
A
|
America/New_York
|
|
0A9
|
Elizabethton Municipal Airport
|
36.37122
|
-82.17342
|
1593
|
-5
|
A
|
America/New_York
|
|
0G6
|
Williams County Airport
|
41.46731
|
-84.50678
|
730
|
-5
|
A
|
America/New_York
|
|
0G7
|
Finger Lakes Regional Airport
|
42.88356
|
-76.78123
|
492
|
-5
|
A
|
America/New_York
|
|
0P2
|
Shoestring Aviation Airfield
|
39.79482
|
-76.64719
|
1000
|
-5
|
U
|
America/New_York
|
|
0S9
|
Jefferson County Intl
|
48.05381
|
-122.81064
|
108
|
-8
|
A
|
America/Los_Angeles
|
SQL
sqldf("select * from airports")
Source data of “flights”
Kable
kable(head(flights,10)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
|
year
|
month
|
day
|
dep_time
|
sched_dep_time
|
dep_delay
|
arr_time
|
sched_arr_time
|
arr_delay
|
carrier
|
flight
|
tailnum
|
origin
|
dest
|
air_time
|
distance
|
hour
|
minute
|
time_hour
|
|
2013
|
1
|
1
|
517
|
515
|
2
|
830
|
819
|
11
|
UA
|
1545
|
N14228
|
EWR
|
IAH
|
227
|
1400
|
5
|
15
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
533
|
529
|
4
|
850
|
830
|
20
|
UA
|
1714
|
N24211
|
LGA
|
IAH
|
227
|
1416
|
5
|
29
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
542
|
540
|
2
|
923
|
850
|
33
|
AA
|
1141
|
N619AA
|
JFK
|
MIA
|
160
|
1089
|
5
|
40
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
544
|
545
|
-1
|
1004
|
1022
|
-18
|
B6
|
725
|
N804JB
|
JFK
|
BQN
|
183
|
1576
|
5
|
45
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
554
|
600
|
-6
|
812
|
837
|
-25
|
DL
|
461
|
N668DN
|
LGA
|
ATL
|
116
|
762
|
6
|
0
|
2013-01-01 06:00:00
|
|
2013
|
1
|
1
|
554
|
558
|
-4
|
740
|
728
|
12
|
UA
|
1696
|
N39463
|
EWR
|
ORD
|
150
|
719
|
5
|
58
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
555
|
600
|
-5
|
913
|
854
|
19
|
B6
|
507
|
N516JB
|
EWR
|
FLL
|
158
|
1065
|
6
|
0
|
2013-01-01 06:00:00
|
|
2013
|
1
|
1
|
557
|
600
|
-3
|
709
|
723
|
-14
|
EV
|
5708
|
N829AS
|
LGA
|
IAD
|
53
|
229
|
6
|
0
|
2013-01-01 06:00:00
|
|
2013
|
1
|
1
|
557
|
600
|
-3
|
838
|
846
|
-8
|
B6
|
79
|
N593JB
|
JFK
|
MCO
|
140
|
944
|
6
|
0
|
2013-01-01 06:00:00
|
|
2013
|
1
|
1
|
558
|
600
|
-2
|
753
|
745
|
8
|
AA
|
301
|
N3ALAA
|
LGA
|
ORD
|
138
|
733
|
6
|
0
|
2013-01-01 06:00:00
|
Data Table
DT::datatable(head(flights,10), options = list(pagelength=5))
Select
DT::datatable(select(head(flights,10), year:time_hour), options = list(pagelength=5))
Knitr
knitr::kable(head(flights,10), format = "html")
|
year
|
month
|
day
|
dep_time
|
sched_dep_time
|
dep_delay
|
arr_time
|
sched_arr_time
|
arr_delay
|
carrier
|
flight
|
tailnum
|
origin
|
dest
|
air_time
|
distance
|
hour
|
minute
|
time_hour
|
|
2013
|
1
|
1
|
517
|
515
|
2
|
830
|
819
|
11
|
UA
|
1545
|
N14228
|
EWR
|
IAH
|
227
|
1400
|
5
|
15
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
533
|
529
|
4
|
850
|
830
|
20
|
UA
|
1714
|
N24211
|
LGA
|
IAH
|
227
|
1416
|
5
|
29
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
542
|
540
|
2
|
923
|
850
|
33
|
AA
|
1141
|
N619AA
|
JFK
|
MIA
|
160
|
1089
|
5
|
40
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
544
|
545
|
-1
|
1004
|
1022
|
-18
|
B6
|
725
|
N804JB
|
JFK
|
BQN
|
183
|
1576
|
5
|
45
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
554
|
600
|
-6
|
812
|
837
|
-25
|
DL
|
461
|
N668DN
|
LGA
|
ATL
|
116
|
762
|
6
|
0
|
2013-01-01 06:00:00
|
|
2013
|
1
|
1
|
554
|
558
|
-4
|
740
|
728
|
12
|
UA
|
1696
|
N39463
|
EWR
|
ORD
|
150
|
719
|
5
|
58
|
2013-01-01 05:00:00
|
|
2013
|
1
|
1
|
555
|
600
|
-5
|
913
|
854
|
19
|
B6
|
507
|
N516JB
|
EWR
|
FLL
|
158
|
1065
|
6
|
0
|
2013-01-01 06:00:00
|
|
2013
|
1
|
1
|
557
|
600
|
-3
|
709
|
723
|
-14
|
EV
|
5708
|
N829AS
|
LGA
|
IAD
|
53
|
229
|
6
|
0
|
2013-01-01 06:00:00
|
|
2013
|
1
|
1
|
557
|
600
|
-3
|
838
|
846
|
-8
|
B6
|
79
|
N593JB
|
JFK
|
MCO
|
140
|
944
|
6
|
0
|
2013-01-01 06:00:00
|
|
2013
|
1
|
1
|
558
|
600
|
-2
|
753
|
745
|
8
|
AA
|
301
|
N3ALAA
|
LGA
|
ORD
|
138
|
733
|
6
|
0
|
2013-01-01 06:00:00
|
SQL
sqldf("select * from flights")
Source data of “planes”
Kable
kable(head(planes,10)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
|
tailnum
|
year
|
type
|
manufacturer
|
model
|
engines
|
seats
|
speed
|
engine
|
|
N10156
|
2004
|
Fixed wing multi engine
|
EMBRAER
|
EMB-145XR
|
2
|
55
|
NA
|
Turbo-fan
|
|
N102UW
|
1998
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N103US
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N104UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N10575
|
2002
|
Fixed wing multi engine
|
EMBRAER
|
EMB-145LR
|
2
|
55
|
NA
|
Turbo-fan
|
|
N105UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N107US
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N108UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N109UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N110UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
Data Table
DT::datatable(head(planes,10), options = list(pagelength=5))
Select
DT::datatable(select(head(planes,10), tailnum:engine), options = list(pagelength=5))
Knitr
knitr::kable(head(planes,10), format = "html")
|
tailnum
|
year
|
type
|
manufacturer
|
model
|
engines
|
seats
|
speed
|
engine
|
|
N10156
|
2004
|
Fixed wing multi engine
|
EMBRAER
|
EMB-145XR
|
2
|
55
|
NA
|
Turbo-fan
|
|
N102UW
|
1998
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N103US
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N104UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N10575
|
2002
|
Fixed wing multi engine
|
EMBRAER
|
EMB-145LR
|
2
|
55
|
NA
|
Turbo-fan
|
|
N105UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N107US
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N108UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N109UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
|
N110UW
|
1999
|
Fixed wing multi engine
|
AIRBUS INDUSTRIE
|
A320-214
|
2
|
182
|
NA
|
Turbo-fan
|
SQL
sqldf("select * from planes")
Source data of “weather”
Kable
kable(head(weather,10)) %>%
kable_styling(bootstrap_options = c("striped","hover","condensed","responsive"),full_width = F,position = "left",font_size = 12) %>%
row_spec(0, background ="gray")
|
origin
|
year
|
month
|
day
|
hour
|
temp
|
dewp
|
humid
|
wind_dir
|
wind_speed
|
wind_gust
|
precip
|
pressure
|
visib
|
time_hour
|
|
EWR
|
2013
|
1
|
1
|
1
|
39.02
|
26.06
|
59.37
|
270
|
10.35702
|
NA
|
0
|
1012.0
|
10
|
2013-01-01 01:00:00
|
|
EWR
|
2013
|
1
|
1
|
2
|
39.02
|
26.96
|
61.63
|
250
|
8.05546
|
NA
|
0
|
1012.3
|
10
|
2013-01-01 02:00:00
|
|
EWR
|
2013
|
1
|
1
|
3
|
39.02
|
28.04
|
64.43
|
240
|
11.50780
|
NA
|
0
|
1012.5
|
10
|
2013-01-01 03:00:00
|
|
EWR
|
2013
|
1
|
1
|
4
|
39.92
|
28.04
|
62.21
|
250
|
12.65858
|
NA
|
0
|
1012.2
|
10
|
2013-01-01 04:00:00
|
|
EWR
|
2013
|
1
|
1
|
5
|
39.02
|
28.04
|
64.43
|
260
|
12.65858
|
NA
|
0
|
1011.9
|
10
|
2013-01-01 05:00:00
|
|
EWR
|
2013
|
1
|
1
|
6
|
37.94
|
28.04
|
67.21
|
240
|
11.50780
|
NA
|
0
|
1012.4
|
10
|
2013-01-01 06:00:00
|
|
EWR
|
2013
|
1
|
1
|
7
|
39.02
|
28.04
|
64.43
|
240
|
14.96014
|
NA
|
0
|
1012.2
|
10
|
2013-01-01 07:00:00
|
|
EWR
|
2013
|
1
|
1
|
8
|
39.92
|
28.04
|
62.21
|
250
|
10.35702
|
NA
|
0
|
1012.2
|
10
|
2013-01-01 08:00:00
|
|
EWR
|
2013
|
1
|
1
|
9
|
39.92
|
28.04
|
62.21
|
260
|
14.96014
|
NA
|
0
|
1012.7
|
10
|
2013-01-01 09:00:00
|
|
EWR
|
2013
|
1
|
1
|
10
|
41.00
|
28.04
|
59.65
|
260
|
13.80936
|
NA
|
0
|
1012.4
|
10
|
2013-01-01 10:00:00
|
Data Table
DT::datatable(head(weather,10), options = list(pagelength=5))
Select
DT::datatable(select(head(weather,10), origin:time_hour), options = list(pagelength=5))
Knitr
knitr::kable(head(weather,10), format = "html")
|
origin
|
year
|
month
|
day
|
hour
|
temp
|
dewp
|
humid
|
wind_dir
|
wind_speed
|
wind_gust
|
precip
|
pressure
|
visib
|
time_hour
|
|
EWR
|
2013
|
1
|
1
|
1
|
39.02
|
26.06
|
59.37
|
270
|
10.35702
|
NA
|
0
|
1012.0
|
10
|
2013-01-01 01:00:00
|
|
EWR
|
2013
|
1
|
1
|
2
|
39.02
|
26.96
|
61.63
|
250
|
8.05546
|
NA
|
0
|
1012.3
|
10
|
2013-01-01 02:00:00
|
|
EWR
|
2013
|
1
|
1
|
3
|
39.02
|
28.04
|
64.43
|
240
|
11.50780
|
NA
|
0
|
1012.5
|
10
|
2013-01-01 03:00:00
|
|
EWR
|
2013
|
1
|
1
|
4
|
39.92
|
28.04
|
62.21
|
250
|
12.65858
|
NA
|
0
|
1012.2
|
10
|
2013-01-01 04:00:00
|
|
EWR
|
2013
|
1
|
1
|
5
|
39.02
|
28.04
|
64.43
|
260
|
12.65858
|
NA
|
0
|
1011.9
|
10
|
2013-01-01 05:00:00
|
|
EWR
|
2013
|
1
|
1
|
6
|
37.94
|
28.04
|
67.21
|
240
|
11.50780
|
NA
|
0
|
1012.4
|
10
|
2013-01-01 06:00:00
|
|
EWR
|
2013
|
1
|
1
|
7
|
39.02
|
28.04
|
64.43
|
240
|
14.96014
|
NA
|
0
|
1012.2
|
10
|
2013-01-01 07:00:00
|
|
EWR
|
2013
|
1
|
1
|
8
|
39.92
|
28.04
|
62.21
|
250
|
10.35702
|
NA
|
0
|
1012.2
|
10
|
2013-01-01 08:00:00
|
|
EWR
|
2013
|
1
|
1
|
9
|
39.92
|
28.04
|
62.21
|
260
|
14.96014
|
NA
|
0
|
1012.7
|
10
|
2013-01-01 09:00:00
|
|
EWR
|
2013
|
1
|
1
|
10
|
41.00
|
28.04
|
59.65
|
260
|
13.80936
|
NA
|
0
|
1012.4
|
10
|
2013-01-01 10:00:00
|
SQL
sqldf("select * from weather")
Load data into RDBMS - MySql
Load the tidy data from R data frame to mySQL Google Cloud relational database for data analysis and visual representation
Connect to mySQL Google Cloud
Create and Load Tables in mySQL Google Cloud
- Load into airlines table directly from the data frame via overwrite
# Write the data frame to the database
dbWriteTable(mySqlConn, name = "airlines", value = airlines, row.names = FALSE, overwrite=T)
- Load into airports table via drop if exists (dbSendQuery) and create table (dbSendQuery) followed by load (dbWriteTable)
# Drop table if it already exists
dbSendQuery(mySqlConn, "DROP TABLE IF EXISTS airports;")
# Create table
dbSendQuery(mySqlConn, "CREATE TABLE airports
(faa VARCHAR(100) NOT NULL,
name VARCHAR(100) NOT NULL,
lat NUMERIC(65,5),
lon NUMERIC(65,5),
alt INTEGER,
tz INTEGER,
dst VARCHAR(100),
tzone VARCHAR(100),
CONSTRAINT pk_airports PRIMARY KEY (faa)
);")
# Load table
dbWriteTable(mySqlConn, "airports", airports, overwrite=T)
- Load into flights table via drop (dbRemoveTable) if exists (dbExistsTable) and load (dbWriteTable)
# Drop table if it already exists
if (dbExistsTable(mySqlConn, "flights"))
dbRemoveTable(mySqlConn, "flights")
# Write the data frame to the database
dbWriteTable(mySqlConn, name = "flights", value = flights, row.names = FALSE)
- Load into planes table via load (dbWriteTable) if not exists (dbExistsTable)
if(!dbExistsTable(mySqlConn,"planes")) {
dbWriteTable(mySqlConn, value = planes, name = "planes", row.names=F, append = TRUE )
}
Load data into NoSql - MongoDB
Mongodb is one of the Nosql database. It is very famous for its fluid data structure.
- All the formats are loaded in JSON format.
- Databases hold collections of documents.
- Collections are analogous to tables of an RDBMS that store documents those who are not same in structure. This is possible because MongoDB is a Schema-free database.
Load the tidy data from R data frame to mongoDB NoSQL database for data analysis and visual representation
The mongolite package is more efficient than the RMongo package (unavailable from CRAN) which requires conversion to JSON, string manipulation, and several lines of additional code using the below functions:
- mongoDbConnect(“Flights”, host = “localhost”, port = 27017)
- dbInsertDocument(mongodb, ‘Table’, “jsonData”)
- dbShowCollections(mongodb)
- dbGetQuery(mongodb, “Table”, “{}”)
- dbRemoveQuery(mongodb, ‘Table’, convert(Table))
- dbDisconnect(mongodb)
Connect to NoSQL MongoDB Atlas
Using mongolite
#find cluster url with username and password replaced with SCRAM credentials
url_path = 'mongodb+srv://root:data607@cluster0-blj5o.mongodb.net/test'
#make connection object that specifies new database and collection (dataset)
mongoNYCFlightsConn <- mongo (collection = "nycflights13",
db = "CUNY_DATA607",
url = url_path,
#url = "mongodb://usr:pass@cluster0-shard-00-00-h8acf.mongodb.net:27017,cluster0-shard-00-01-12ucd.mongodb.net:27017,cluster0-shard-00-02-haucd.mongodb.net:27017/dbname?ssl=true&replicaSet=Cluster0-shard-0&authSource=admin",
verbose = TRUE)
#show commands for connection
mongoNYCFlightsConn
Using RMongo
The RMongo package (unavailable from CRAN) requires conversion to JSON, string manipulation, and several lines of additional code using the below functions: - mongoDbConnect(“Flights”, host = “localhost”, port = 27017) - dbInsertDocument(mongodb, ‘Table’, “jsonData”) - dbShowCollections(mongodb) - dbGetQuery(mongodb, “Table”, “{}”) - dbRemoveQuery(mongodb, ‘Table’, convert(Table))
- dbDisconnect(mongodb)
Show data
Airlines
knitr::include_graphics('CUNY_DATA607.airlines.jpg')

#
#plot(load.image("CUNY_DATA607.airlines.jpg"))
knitr::include_graphics('CUNY_DATA607.airlines-indexes.jpg')

#
#plot(load.image("CUNY_DATA607.airlines-indexes.jpg"))
Flights
knitr::include_graphics('CUNY_DATA607.flights.jpg')

#plot(load.image("CUNY_DATA607.flights.jpg"))
knitr::include_graphics('CUNY_DATA607.flights-indexes.jpg')

#plot(load.image("CUNY_DATA607.flights-indexes.jpg"))
NYCFlights13
knitr::include_graphics('CUNY_DATA607.nycflights13.jpg')

#plot(load.image("CUNY_DATA607.nycflights13.jpg"))
knitr::include_graphics('CUNY_DATA607.nycflights13-indexes.jpg')

#plot(load.image("CUNY_DATA607.nycflights13-indexes.jpg"))
NYCFlightsAirlines
knitr::include_graphics('CUNY_DATA607.nycflightsairlines.jpg')

#plot(load.image("CUNY_DATA607.nycflightsairlines.jpg"))
knitr::include_graphics('CUNY_DATA607.nycflightsairlines-indexes.jpg')

#plot(load.image("CUNY_DATA607.nycflightsairlines-indexes.jpg"))
Query data
First records in the collection
mongoNYCFlightsConn$iterate()$one() # first 1 record in the collection
## $carrier
## [1] "9E"
##
## $name
## [1] "Endeavor Air Inc."
kable(mongoFlightAirlinesConn$iterate()$one()) # first 1 records in the collection
Find distinct airlines from collection
mongoFlightAirlinesConn$distinct("carrier")
## [1] "UA" "AA" "B6" "DL" "EV" "MQ" "US" "WN" "VX" "FL" "AS"
mongoFlightAirlinesConn$distinct("carrier", "{\"origin\" : \"EWR\", \"dest\": \"IAH\" }")
## [1] "UA"
Find records in the collection
kable(mongoFlightAirlinesConn$find('{"origin":"EWR"}'))
##
Found 31 records...
Imported 31 records. Simplifying into dataframe...
|
carrier
|
name
|
year
|
month
|
day
|
dep_time
|
sched_dep_time
|
dep_delay
|
arr_time
|
sched_arr_time
|
arr_delay
|
flight
|
tailnum
|
origin
|
dest
|
air_time
|
distance
|
hour
|
minute
|
time_hour
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
517
|
515
|
2
|
830
|
819
|
11
|
1545
|
N14228
|
EWR
|
IAH
|
227
|
1400
|
5
|
15
|
2013-01-01 05:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
554
|
558
|
-4
|
740
|
728
|
12
|
1696
|
N39463
|
EWR
|
ORD
|
150
|
719
|
5
|
58
|
2013-01-01 05:00:00
|
|
B6
|
JetBlue Airways
|
2013
|
1
|
1
|
555
|
600
|
-5
|
913
|
854
|
19
|
507
|
N516JB
|
EWR
|
FLL
|
158
|
1065
|
6
|
0
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
558
|
600
|
-2
|
923
|
937
|
-14
|
1124
|
N53441
|
EWR
|
SFO
|
361
|
2565
|
6
|
0
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
559
|
600
|
-1
|
854
|
902
|
-8
|
1187
|
N76515
|
EWR
|
LAS
|
337
|
2227
|
6
|
0
|
2013-01-01 06:00:00
|
|
B6
|
JetBlue Airways
|
2013
|
1
|
1
|
601
|
600
|
1
|
844
|
850
|
-6
|
343
|
N644JB
|
EWR
|
PBI
|
147
|
1023
|
6
|
0
|
2013-01-01 06:00:00
|
|
AA
|
American Airlines Inc.
|
2013
|
1
|
1
|
606
|
610
|
-4
|
858
|
910
|
-12
|
1895
|
N633AA
|
EWR
|
MIA
|
152
|
1085
|
6
|
10
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
607
|
607
|
0
|
858
|
915
|
-17
|
1077
|
N53442
|
EWR
|
MIA
|
157
|
1085
|
6
|
7
|
2013-01-01 06:00:00
|
|
MQ
|
Envoy Air
|
2013
|
1
|
1
|
608
|
600
|
8
|
807
|
735
|
32
|
3768
|
N9EAMQ
|
EWR
|
ORD
|
139
|
719
|
6
|
0
|
2013-01-01 06:00:00
|
|
DL
|
Delta Air Lines Inc.
|
2013
|
1
|
1
|
615
|
615
|
0
|
833
|
842
|
-9
|
575
|
N326NB
|
EWR
|
ATL
|
120
|
746
|
6
|
15
|
2013-01-01 06:00:00
|
|
US
|
US Airways Inc.
|
2013
|
1
|
1
|
622
|
630
|
-8
|
1017
|
1014
|
3
|
245
|
N807AW
|
EWR
|
PHX
|
342
|
2133
|
6
|
30
|
2013-01-01 06:00:00
|
|
EV
|
ExpressJet Airlines Inc.
|
2013
|
1
|
1
|
624
|
630
|
-6
|
909
|
840
|
29
|
4626
|
N11107
|
EWR
|
MSP
|
190
|
1008
|
6
|
30
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
628
|
630
|
-2
|
1016
|
947
|
29
|
1665
|
N33289
|
EWR
|
LAX
|
366
|
2454
|
6
|
30
|
2013-01-01 06:00:00
|
|
US
|
US Airways Inc.
|
2013
|
1
|
1
|
629
|
630
|
-1
|
824
|
833
|
-9
|
1019
|
N426US
|
EWR
|
CLT
|
91
|
529
|
6
|
30
|
2013-01-01 06:00:00
|
|
EV
|
ExpressJet Airlines Inc.
|
2013
|
1
|
1
|
632
|
608
|
24
|
740
|
728
|
12
|
4144
|
N13553
|
EWR
|
IAD
|
52
|
212
|
6
|
8
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
643
|
646
|
-3
|
922
|
940
|
-18
|
556
|
N497UA
|
EWR
|
PBI
|
146
|
1023
|
6
|
46
|
2013-01-01 06:00:00
|
|
US
|
US Airways Inc.
|
2013
|
1
|
1
|
643
|
645
|
-2
|
837
|
848
|
-11
|
926
|
N178US
|
EWR
|
CLT
|
91
|
529
|
6
|
45
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
644
|
636
|
8
|
931
|
940
|
-9
|
1701
|
N75435
|
EWR
|
FLL
|
151
|
1065
|
6
|
36
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
646
|
645
|
1
|
1023
|
1030
|
-7
|
1496
|
N38727
|
EWR
|
SNA
|
380
|
2434
|
6
|
45
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
656
|
700
|
-4
|
948
|
1011
|
-23
|
1115
|
N24212
|
EWR
|
TPA
|
156
|
997
|
7
|
0
|
2013-01-01 07:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
659
|
700
|
-1
|
959
|
1008
|
-9
|
960
|
N838UA
|
EWR
|
RSW
|
164
|
1068
|
7
|
0
|
2013-01-01 07:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
701
|
700
|
1
|
1123
|
1154
|
-31
|
1203
|
N77296
|
EWR
|
SJU
|
188
|
1608
|
7
|
0
|
2013-01-01 07:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
715
|
713
|
2
|
911
|
850
|
21
|
544
|
N841UA
|
EWR
|
ORD
|
156
|
719
|
7
|
13
|
2013-01-01 07:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
723
|
725
|
-2
|
1013
|
1017
|
-4
|
962
|
N514UA
|
EWR
|
PBI
|
153
|
1023
|
7
|
25
|
2013-01-01 07:00:00
|
|
AS
|
Alaska Airlines Inc.
|
2013
|
1
|
1
|
724
|
725
|
-1
|
1020
|
1030
|
-10
|
11
|
N594AS
|
EWR
|
SEA
|
338
|
2402
|
7
|
25
|
2013-01-01 07:00:00
|
|
AA
|
American Airlines Inc.
|
2013
|
1
|
1
|
725
|
730
|
-5
|
1052
|
1040
|
12
|
2083
|
N4WRAA
|
EWR
|
DFW
|
238
|
1372
|
7
|
30
|
2013-01-01 07:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
727
|
730
|
-3
|
959
|
952
|
7
|
1162
|
N37462
|
EWR
|
DEN
|
254
|
1605
|
7
|
30
|
2013-01-01 07:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
732
|
645
|
47
|
1011
|
941
|
30
|
1111
|
N37456
|
EWR
|
MCO
|
145
|
937
|
6
|
45
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
739
|
739
|
0
|
1104
|
1038
|
26
|
1479
|
N37408
|
EWR
|
IAH
|
249
|
1400
|
7
|
39
|
2013-01-01 07:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
746
|
746
|
0
|
1119
|
1129
|
-10
|
1668
|
N24224
|
EWR
|
SFO
|
373
|
2565
|
7
|
46
|
2013-01-01 07:00:00
|
|
MQ
|
Envoy Air
|
2013
|
1
|
1
|
749
|
710
|
39
|
939
|
850
|
49
|
3737
|
N508MQ
|
EWR
|
ORD
|
148
|
719
|
7
|
10
|
2013-01-01 07:00:00
|
kable(mongoFlightAirlinesConn$find('{"name":"Lansdowne Airport"}'))
##
Imported 0 records. Simplifying into dataframe...
mongoFlightAirlinesConn$find("{\"origin\" : \"EWR\"}", handler = NULL)
##
Found 31 records...
Imported 31 records. Simplifying into dataframe...
## carrier name year month day dep_time sched_dep_time
## 1 UA United Air Lines Inc. 2013 1 1 517 515
## 2 UA United Air Lines Inc. 2013 1 1 554 558
## 3 B6 JetBlue Airways 2013 1 1 555 600
## 4 UA United Air Lines Inc. 2013 1 1 558 600
## 5 UA United Air Lines Inc. 2013 1 1 559 600
## 6 B6 JetBlue Airways 2013 1 1 601 600
## 7 AA American Airlines Inc. 2013 1 1 606 610
## 8 UA United Air Lines Inc. 2013 1 1 607 607
## 9 MQ Envoy Air 2013 1 1 608 600
## 10 DL Delta Air Lines Inc. 2013 1 1 615 615
## 11 US US Airways Inc. 2013 1 1 622 630
## 12 EV ExpressJet Airlines Inc. 2013 1 1 624 630
## 13 UA United Air Lines Inc. 2013 1 1 628 630
## 14 US US Airways Inc. 2013 1 1 629 630
## 15 EV ExpressJet Airlines Inc. 2013 1 1 632 608
## 16 UA United Air Lines Inc. 2013 1 1 643 646
## 17 US US Airways Inc. 2013 1 1 643 645
## 18 UA United Air Lines Inc. 2013 1 1 644 636
## 19 UA United Air Lines Inc. 2013 1 1 646 645
## 20 UA United Air Lines Inc. 2013 1 1 656 700
## 21 UA United Air Lines Inc. 2013 1 1 659 700
## 22 UA United Air Lines Inc. 2013 1 1 701 700
## 23 UA United Air Lines Inc. 2013 1 1 715 713
## 24 UA United Air Lines Inc. 2013 1 1 723 725
## 25 AS Alaska Airlines Inc. 2013 1 1 724 725
## 26 AA American Airlines Inc. 2013 1 1 725 730
## 27 UA United Air Lines Inc. 2013 1 1 727 730
## 28 UA United Air Lines Inc. 2013 1 1 732 645
## 29 UA United Air Lines Inc. 2013 1 1 739 739
## 30 UA United Air Lines Inc. 2013 1 1 746 746
## 31 MQ Envoy Air 2013 1 1 749 710
## dep_delay arr_time sched_arr_time arr_delay flight tailnum origin dest
## 1 2 830 819 11 1545 N14228 EWR IAH
## 2 -4 740 728 12 1696 N39463 EWR ORD
## 3 -5 913 854 19 507 N516JB EWR FLL
## 4 -2 923 937 -14 1124 N53441 EWR SFO
## 5 -1 854 902 -8 1187 N76515 EWR LAS
## 6 1 844 850 -6 343 N644JB EWR PBI
## 7 -4 858 910 -12 1895 N633AA EWR MIA
## 8 0 858 915 -17 1077 N53442 EWR MIA
## 9 8 807 735 32 3768 N9EAMQ EWR ORD
## 10 0 833 842 -9 575 N326NB EWR ATL
## 11 -8 1017 1014 3 245 N807AW EWR PHX
## 12 -6 909 840 29 4626 N11107 EWR MSP
## 13 -2 1016 947 29 1665 N33289 EWR LAX
## 14 -1 824 833 -9 1019 N426US EWR CLT
## 15 24 740 728 12 4144 N13553 EWR IAD
## 16 -3 922 940 -18 556 N497UA EWR PBI
## 17 -2 837 848 -11 926 N178US EWR CLT
## 18 8 931 940 -9 1701 N75435 EWR FLL
## 19 1 1023 1030 -7 1496 N38727 EWR SNA
## 20 -4 948 1011 -23 1115 N24212 EWR TPA
## 21 -1 959 1008 -9 960 N838UA EWR RSW
## 22 1 1123 1154 -31 1203 N77296 EWR SJU
## 23 2 911 850 21 544 N841UA EWR ORD
## 24 -2 1013 1017 -4 962 N514UA EWR PBI
## 25 -1 1020 1030 -10 11 N594AS EWR SEA
## 26 -5 1052 1040 12 2083 N4WRAA EWR DFW
## 27 -3 959 952 7 1162 N37462 EWR DEN
## 28 47 1011 941 30 1111 N37456 EWR MCO
## 29 0 1104 1038 26 1479 N37408 EWR IAH
## 30 0 1119 1129 -10 1668 N24224 EWR SFO
## 31 39 939 850 49 3737 N508MQ EWR ORD
## air_time distance hour minute time_hour
## 1 227 1400 5 15 2013-01-01 05:00:00
## 2 150 719 5 58 2013-01-01 05:00:00
## 3 158 1065 6 0 2013-01-01 06:00:00
## 4 361 2565 6 0 2013-01-01 06:00:00
## 5 337 2227 6 0 2013-01-01 06:00:00
## 6 147 1023 6 0 2013-01-01 06:00:00
## 7 152 1085 6 10 2013-01-01 06:00:00
## 8 157 1085 6 7 2013-01-01 06:00:00
## 9 139 719 6 0 2013-01-01 06:00:00
## 10 120 746 6 15 2013-01-01 06:00:00
## 11 342 2133 6 30 2013-01-01 06:00:00
## 12 190 1008 6 30 2013-01-01 06:00:00
## 13 366 2454 6 30 2013-01-01 06:00:00
## 14 91 529 6 30 2013-01-01 06:00:00
## 15 52 212 6 8 2013-01-01 06:00:00
## 16 146 1023 6 46 2013-01-01 06:00:00
## 17 91 529 6 45 2013-01-01 06:00:00
## 18 151 1065 6 36 2013-01-01 06:00:00
## 19 380 2434 6 45 2013-01-01 06:00:00
## 20 156 997 7 0 2013-01-01 07:00:00
## 21 164 1068 7 0 2013-01-01 07:00:00
## 22 188 1608 7 0 2013-01-01 07:00:00
## 23 156 719 7 13 2013-01-01 07:00:00
## 24 153 1023 7 25 2013-01-01 07:00:00
## 25 338 2402 7 25 2013-01-01 07:00:00
## 26 238 1372 7 30 2013-01-01 07:00:00
## 27 254 1605 7 30 2013-01-01 07:00:00
## 28 145 937 6 45 2013-01-01 06:00:00
## 29 249 1400 7 39 2013-01-01 07:00:00
## 30 373 2565 7 46 2013-01-01 07:00:00
## 31 148 719 7 10 2013-01-01 07:00:00
mongoFlightAirlinesConn$find(paste('{"arr_delay" :{"$gt" : 80}}')) %>% kable() # flights with an arrival delay > 80mins
##
Imported 0 records. Simplifying into dataframe...
mongoFlightAirlines <- mongoFlightAirlinesConn$find()
##
Found 100 records...
Imported 100 records. Simplifying into dataframe...
kable(head(mongoFlightAirlines))
|
carrier
|
name
|
year
|
month
|
day
|
dep_time
|
sched_dep_time
|
dep_delay
|
arr_time
|
sched_arr_time
|
arr_delay
|
flight
|
tailnum
|
origin
|
dest
|
air_time
|
distance
|
hour
|
minute
|
time_hour
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
517
|
515
|
2
|
830
|
819
|
11
|
1545
|
N14228
|
EWR
|
IAH
|
227
|
1400
|
5
|
15
|
2013-01-01 05:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
533
|
529
|
4
|
850
|
830
|
20
|
1714
|
N24211
|
LGA
|
IAH
|
227
|
1416
|
5
|
29
|
2013-01-01 05:00:00
|
|
AA
|
American Airlines Inc.
|
2013
|
1
|
1
|
542
|
540
|
2
|
923
|
850
|
33
|
1141
|
N619AA
|
JFK
|
MIA
|
160
|
1089
|
5
|
40
|
2013-01-01 05:00:00
|
|
B6
|
JetBlue Airways
|
2013
|
1
|
1
|
544
|
545
|
-1
|
1004
|
1022
|
-18
|
725
|
N804JB
|
JFK
|
BQN
|
183
|
1576
|
5
|
45
|
2013-01-01 05:00:00
|
|
DL
|
Delta Air Lines Inc.
|
2013
|
1
|
1
|
554
|
600
|
-6
|
812
|
837
|
-25
|
461
|
N668DN
|
LGA
|
ATL
|
116
|
762
|
6
|
0
|
2013-01-01 06:00:00
|
|
UA
|
United Air Lines Inc.
|
2013
|
1
|
1
|
554
|
558
|
-4
|
740
|
728
|
12
|
1696
|
N39463
|
EWR
|
ORD
|
150
|
719
|
5
|
58
|
2013-01-01 05:00:00
|
Join flights and airlines collections
mongoFlightsConn$aggregate('[
{"$match" : {"arr_delay" :{"$gt" : 120}}},
{
"$project":
{
"_id" :0
}
},
{
"$lookup":
{
"from": "airlines",
"localField": "carrier",
"foreignField": "carrier",
"as": "airline_name"
}
},
{"$unwind" : "$airline_name"},
{
"$project":
{
"airline_name._id" :0, "airline_name.carrier":0
}
}
]') %>% kable()
##
Found 25 records...
Imported 25 records. Simplifying into dataframe...
|
year
|
month
|
day
|
dep_time
|
sched_dep_time
|
dep_delay
|
arr_time
|
sched_arr_time
|
arr_delay
|
carrier
|
flight
|
tailnum
|
origin
|
dest
|
air_time
|
distance
|
hour
|
minute
|
time_hour
|
airline_name
|
|
2013
|
1
|
1
|
811
|
630
|
101
|
1047
|
830
|
137
|
MQ
|
4576
|
N531MQ
|
LGA
|
CLT
|
118
|
544
|
6
|
30
|
2013-01-01 06:00:00
|
Envoy Air
|
|
2013
|
1
|
1
|
848
|
1835
|
853
|
1001
|
1950
|
851
|
MQ
|
3944
|
N942MQ
|
JFK
|
BWI
|
41
|
184
|
18
|
35
|
2013-01-01 18:00:00
|
Envoy Air
|
|
2013
|
1
|
1
|
957
|
733
|
144
|
1056
|
853
|
123
|
UA
|
856
|
N534UA
|
EWR
|
BOS
|
37
|
200
|
7
|
33
|
2013-01-01 07:00:00
|
United Air Lines Inc.
|
|
2013
|
1
|
1
|
1114
|
900
|
134
|
1447
|
1222
|
145
|
UA
|
1086
|
N76502
|
LGA
|
IAH
|
248
|
1416
|
9
|
0
|
2013-01-01 09:00:00
|
United Air Lines Inc.
|
|
2013
|
1
|
1
|
1505
|
1310
|
115
|
1638
|
1431
|
127
|
EV
|
4497
|
N17984
|
EWR
|
RIC
|
63
|
277
|
13
|
10
|
2013-01-01 13:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
1525
|
1340
|
105
|
1831
|
1626
|
125
|
B6
|
525
|
N231JB
|
EWR
|
MCO
|
152
|
937
|
13
|
40
|
2013-01-01 13:00:00
|
JetBlue Airways
|
|
2013
|
1
|
1
|
1549
|
1445
|
64
|
1912
|
1656
|
136
|
EV
|
4181
|
N21197
|
EWR
|
MCI
|
234
|
1092
|
14
|
45
|
2013-01-01 14:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
1558
|
1359
|
119
|
1718
|
1515
|
123
|
EV
|
5712
|
N826AS
|
JFK
|
IAD
|
53
|
228
|
13
|
59
|
2013-01-01 13:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
1732
|
1630
|
62
|
2028
|
1825
|
123
|
EV
|
4092
|
N16911
|
EWR
|
DAY
|
119
|
533
|
16
|
30
|
2013-01-01 16:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
1803
|
1620
|
103
|
2008
|
1750
|
138
|
MQ
|
4622
|
N504MQ
|
LGA
|
BNA
|
154
|
764
|
16
|
20
|
2013-01-01 16:00:00
|
Envoy Air
|
|
2013
|
1
|
1
|
1815
|
1325
|
290
|
2120
|
1542
|
338
|
EV
|
4417
|
N17185
|
EWR
|
OMA
|
213
|
1134
|
13
|
25
|
2013-01-01 13:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
1842
|
1422
|
260
|
1958
|
1535
|
263
|
EV
|
4633
|
N18120
|
EWR
|
BTV
|
46
|
266
|
14
|
22
|
2013-01-01 14:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
1856
|
1645
|
131
|
2212
|
2005
|
127
|
AA
|
181
|
N323AA
|
JFK
|
LAX
|
336
|
2475
|
16
|
45
|
2013-01-01 16:00:00
|
American Airlines Inc.
|
|
2013
|
1
|
1
|
1934
|
1725
|
129
|
2126
|
1855
|
151
|
MQ
|
4255
|
N909MQ
|
JFK
|
BNA
|
154
|
765
|
17
|
25
|
2013-01-01 17:00:00
|
Envoy Air
|
|
2013
|
1
|
1
|
1938
|
1703
|
155
|
2109
|
1823
|
166
|
EV
|
4300
|
N18557
|
EWR
|
RIC
|
68
|
277
|
17
|
3
|
2013-01-01 17:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
1942
|
1705
|
157
|
2124
|
1830
|
174
|
MQ
|
4410
|
N835MQ
|
JFK
|
DCA
|
60
|
213
|
17
|
5
|
2013-01-01 17:00:00
|
Envoy Air
|
|
2013
|
1
|
1
|
2006
|
1630
|
216
|
2230
|
1848
|
222
|
EV
|
4644
|
N14972
|
EWR
|
SAV
|
121
|
708
|
16
|
30
|
2013-01-01 16:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
2009
|
1808
|
121
|
2145
|
1942
|
123
|
EV
|
4440
|
N14143
|
EWR
|
PIT
|
65
|
319
|
18
|
8
|
2013-01-01 18:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
2115
|
1700
|
255
|
2330
|
1920
|
250
|
9E
|
3347
|
N924XJ
|
JFK
|
CVG
|
115
|
589
|
17
|
0
|
2013-01-01 17:00:00
|
Endeavor Air Inc.
|
|
2013
|
1
|
1
|
2119
|
1930
|
109
|
2358
|
2136
|
142
|
EV
|
4543
|
N13123
|
EWR
|
DSM
|
200
|
1017
|
19
|
30
|
2013-01-01 19:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
2205
|
1720
|
285
|
46
|
2040
|
246
|
AA
|
1999
|
N5DNAA
|
EWR
|
MIA
|
146
|
1085
|
17
|
20
|
2013-01-01 17:00:00
|
American Airlines Inc.
|
|
2013
|
1
|
1
|
2221
|
2000
|
141
|
2331
|
2124
|
127
|
EV
|
4462
|
N13566
|
EWR
|
BUF
|
56
|
282
|
20
|
0
|
2013-01-01 20:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
2312
|
2000
|
192
|
21
|
2110
|
191
|
EV
|
4312
|
N13958
|
EWR
|
DCA
|
44
|
199
|
20
|
0
|
2013-01-01 20:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
1
|
2343
|
1724
|
379
|
314
|
1938
|
456
|
EV
|
4321
|
N21197
|
EWR
|
MCI
|
222
|
1092
|
17
|
24
|
2013-01-01 17:00:00
|
ExpressJet Airlines Inc.
|
|
2013
|
1
|
2
|
126
|
2250
|
156
|
233
|
2359
|
154
|
B6
|
22
|
N636JB
|
JFK
|
SYR
|
49
|
209
|
22
|
50
|
2013-01-02 22:00:00
|
JetBlue Airways
|
Visualize trips per carrier
mongoFlightsConn$aggregate('[{"$group":{"_id":"$carrier", "Count": {"$sum":1}}}]') %>%
arrange(desc(Count)) %>%
ggplot(aes(x=reorder(`_id`,Count),y=Count))+
geom_bar(stat="identity",color='skyblue')+
coord_flip()+
xlab("Carriers")+
theme_bw()
##
Found 14 records...
Imported 14 records. Simplifying into dataframe...

Observations of MongoDB
- Setting the database needs little bit of knowledge and need to know how to start the server.
- There is no GUI for mongodb. Everything is performed in command prompt. Although there are some from 3rd party, it is not combined with database.
- When R interacts with mongodb, there is a lag between the operations.
- Writing to database takes time. It is not quick as mysql.
- Fetching the results are quick.
- There are no joins. So to perform joins, we need to do lot of operations.
Load data into GraphDB - Neo4J
Neo4j is an graphical database. - We can link the data of different tables. - All the data points are called as nodes. - The nodes can have label. - These labels will have properties of that specific node. - The relationship also been tagged as labels. It also can have properties. - All the interactions are performed via CYPER queries.
There was a working RNeo4j package so that R and and Neo4j could connect seemlessly but it appears this package has been taken down from CRAN
Connect to GraphDB Cloud
# load data into Neo4J
write.csv(airlines, file = "airlines.csv", row.names=FALSE, na="")
write.csv(airports, file = "airports.csv", row.names=FALSE, na="")
write.csv(planes, file = "planes.csv", row.names=FALSE, na="")
#load_csv(on_load = "", neoConn, url="airlines.csv", header = TRUE, periodic_commit = 1000, as = "csv", type = c("row", "graph"), output = c("r", "json"), include_stats = TRUE, include_meta = FALSE)
#LOAD CSV WITH HEADERS FROM "airlines.csv" AS population
#CREATE (a:Population {country: population.country, year: toInt(population.year), population: toInt(population.population)})
#RETURN a
#LOAD CSV FROM "airports.csv" AS line
#CREATE (b:Cases { country: line[0], year: toInt(line[1]), gender: line[2], cases: toInt(line[3])})
#RETURN b
Using RNeo4j
#setting up the database instance
#neo4j = startGraph("http://localhost:7474/db/data/","neo4j","admin")
neo4j = startGraph("bolt://hobby-nckncgjedgejgbkedmkbnccl.dbs.graphenedb.com:24787", #"https://hobby-nckncgjedgejgbkedmkbnccl.dbs.graphenedb.com:24780/db/data/",
username = "root",
password = "b.LVqKHBbkgC9I.NjBLWxHmLfTT0cji")
#Delete the previous relationships and nodes
clear(neo4j, input = FALSE)
cypher(neo4j,"MATCH ()-[r:Departs]-() delete r")
cypher(neo4j,"MATCH ()-[r:Arrival]-() delete r")
cypher(neo4j,"match(n) delete n")
#cypher(neo4j,"match(n:airline) delete n")
# For airlines
for(i in 1:nrow(airlines)) {
cypher(neo4j,paste0("create(carrier:airline{code:'",airlines[i,1],"',name:'",airlines[i,2],"'})",collapse = ""))
}
# For origin
#cypher(neo4j,"match(n:origin) delete n")
origin_codes <- unique(flightfact_mysql$origin)
for(i in 1:length(origin_codes)) {
cypher(neo4j,paste0("create(location:origin{origin_code:'",origin_codes[i],"'})",collapse = ""))
}
# For dest
#cypher(neo4j,"match(n:dest) delete n")
dest_codes <- unique(flightfact_mysql$dest)
for(i in 1:length(dest_codes)) {
cypher(neo4j,paste0("create(location:dest{dest_code:'",dest_codes[i],"'})",collapse = ""))
}
# For depart
#cypher(neo4j,"MATCH ()-[r:Departs]-() delete r")
depart <- unique(flightfact_mysql[,c("origin","carrier")])
for(i in 1:nrow(depart)) {
cypher(neo4j,paste0("match (air:airline{code:'",depart[i,2],"'}),(org:origin{origin_code:'",depart[i,1],"'}) create (air)-[r:Departs]->(org)"))
}
# For arrival
#cypher(neo4j,"MATCH ()-[r:Arrival]-() delete r")
arrival <- unique(flightfact_mysql[,c("dest","carrier")])
for(i in 1:nrow(arrival)) {
cypher(neo4j,paste0("match (air:airline{code:'",arrival[i,2],"'}),(dest:dest{dest_code:'",arrival[i,1],"'}) create (air)-[r:Arrival]->(dest)"))
}
Using neo4r
#neo4jConn <- neo4j_api$new(url = "bolt://fuchsia-lakin-mountain-hermann.graphstory.cloud:7687",
# user = "fuchsia_lakin_mountain_hermann", password = "vi4YzVJAt5MbSDPCp7vriTnv")
neoConn <- neo4j_api$new(url = "https://hobby-nckncgjedgejgbkedmkbnccl.dbs.graphenedb.com:24780/db/data/",
user = "root", password = "b.LVqKHBbkgC9I.NjBLWxHmLfTT0cji")
#neoConn$get_version()
#call_neo4j("MATCH (n) DETACH DELETE n", neoConn)
cypher_query <- " "
#Now load the Airlines data dataframe into Neo4J graph Cypher query
for(i in 1:nrow(airlines)) {
cypher_query <- paste(cypher_query, "CREATE", vec_to_cypher(airlines[i, ], "Airlines"), " ")
}
cypher_query
call_neo4j(paste(cypher_query,";"), neoConn)
cypher_query <- " "
#Now load the Airports data dataframe into Neo4J graph Cypher query
for(i in 1:nrow(airports)) {
cypher_query <- paste(cypher_query, "CREATE", vec_to_cypher(airports[i, ], "Airports"), " ")
}
cypher_query
call_neo4j(paste(cypher_query,";"), neoConn)
cypher_query <- " "
#Now load the Airports data dataframe into Neo4J graph Cypher query
for(i in 1:nrow(planes)) {
cypher_query <- paste(cypher_query, "CREATE", vec_to_cypher(planes[i, ], "Planes"), " ")
}
cypher_query
call_neo4j(paste(cypher_query,";"), neoConn)
cypher_query <- " "
#Now load the Airports data dataframe into Neo4J graph Cypher query
for(i in 1:nrow(flights)) {
cypher_query <- paste(cypher_query, "CREATE", vec_to_cypher(flights[i, ], "Flights"), " ")
}
cypher_query
call_neo4j(paste(cypher_query,";"), neoConn)
call_neo4j("MATCH (a:Flights), (b:Airlines) WHERE (a.carrier) = (b.carrier) CREATE (a) -[:Carrier]-> (b);", neo4jConn)
Observations of Neo4J
- Using cloud installation is easy with GrapheneDB and creating a database was simple.
- The data is seen in visual format. So it is easy to think more about data.
- It is done in step by step. So we need to understand the data more.
- Relationships play a major role. It gives meaning to the nodes.
- The grphical interface is very nice. But it becomes messy if you have more data and nodes.
- Fetching the required data is very quick and fast.
- Need to learn new query language called CYPER. Although it is easy for smaller operations.
Difference between the databases
| RDBMS |
Document-oriented |
| Relational database schema is strictly well defined. |
NoSQL database doesn’t have a fixed schema. |
| Data are stored in tables as columns and rows. |
Data are stored in collections with fields as key - value pairs under document. |
| Mature: Established database - backed with a huge community - somehow stable. |
Scalable: Horizontally scalable reducing the workload and scaling business with ease. |
| Row size cannot exceed 64KB, which is shared among all columns. |
Max document size is 16MB. |
| Joins are used to combine data from two or more tables |
NoSQL databases do not use joins. Collections join is implemented at application level. |
| Object names are not case-insensitive. |
Object names are case-sensitive, even on case-insensitive file systems. |
| Row nesting is not supported. |
Document nesting is allowed, maximum 100 documents. |
| Relational databases are ACID compliant and is implemented using transactions. |
NoSQL databases are not ACID compliant, transaction safety is implemented using CAP theorem. |
| Tables can have upto 1000 columns and fit into 64KB. |
No limit on key-value pairs, as long as it fits into 16MB |
| A maximum of 16 columns is permitted for compound index. |
Maximum 31 fields in a compound index. |
| Slow performance for high volume |
High-performance for high volume queries |
| Description |
Most widely used database |
Popular nosql database |
Getting popular |
| Type |
RDBMS |
Document Store |
Graph |
| Transaction |
ACID |
No ACID |
ACID |
| In-memory |
Yes |
Yes |
No |
| Application scenarios |
Traditional structure |
API’s and IOT |
Real-time recommendations |
| Scalability |
Easy |
Easy |
Medium |
| Main character |
Zylo data |
JSON format |
Relationship |
CleanUp
dbDisconnect(mySqlConn)
paste("Disconnected from MySQL on", date())
mongoNYCFlightsConn$drop()
mongoAirlinesConn$drop()
mongoFlightsConn$drop()
mongoFlightAirlinesConn$drop()