Model Data Relasional adalah suatu model basis data yang terdiri dari beberapa tabel berdimensi dua yg disebut (relasi atau tabel) , dengan masing-masing relation(relasi), yang tersusun atas tuple (baris) dan atribut (kolom) untuk menggambarkan sebuah basis data. Pada model relasional, tidak ada data yang kembar. Karena ada kunci primer (Primary Key). Kunci primer adalah satu item yang dipilih dalam suatu kolom yang unik dan tidak sama yang berfungsi untuk membedakan antara satu kolom dengan kolom lainnya.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v stringr 1.4.0
## v tidyr   1.1.2     v forcats 0.5.1
## v readr   1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(nycflights13)
## Warning: package 'nycflights13' was built under R version 4.0.4
library(ggplot2)
library(maps)
## Warning: package 'maps' was built under R version 4.0.4
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map

Maskapai penerbangan di ambil dari data set airlines, kita memungkinkan untuk mencari nama lengkap maskapai dan fungsi str() kita bisa mengetahui struktur dari data set airlines.

str(airlines)
## tibble [16 x 2] (S3: tbl_df/tbl/data.frame)
##  $ carrier: chr [1:16] "9E" "AA" "AS" "B6" ...
##  $ name   : chr [1:16] "Endeavor Air Inc." "American Airlines Inc." "Alaska Airlines Inc." "JetBlue Airways" ...
airlines
## # A tibble: 16 x 2
##    carrier name                       
##    <chr>   <chr>                      
##  1 9E      Endeavor Air Inc.          
##  2 AA      American Airlines Inc.     
##  3 AS      Alaska Airlines Inc.       
##  4 B6      JetBlue Airways            
##  5 DL      Delta Air Lines Inc.       
##  6 EV      ExpressJet Airlines Inc.   
##  7 F9      Frontier Airlines Inc.     
##  8 FL      AirTran Airways Corporation
##  9 HA      Hawaiian Airlines Inc.     
## 10 MQ      Envoy Air                  
## 11 OO      SkyWest Airlines Inc.      
## 12 UA      United Air Lines Inc.      
## 13 US      US Airways Inc.            
## 14 VX      Virgin America             
## 15 WN      Southwest Airlines Co.     
## 16 YV      Mesa Airlines Inc.

Bandara di ambil dari data set airports, kita memungkinkan untuk mencari nama lengkap bandara dan fungsi str() kita bisa mengetahui struktur dari data set bandara.

str(airports)
## tibble [1,458 x 8] (S3: tbl_df/tbl/data.frame)
##  $ faa  : chr [1:1458] "04G" "06A" "06C" "06N" ...
##  $ name : chr [1:1458] "Lansdowne Airport" "Moton Field Municipal Airport" "Schaumburg Regional" "Randall Airport" ...
##  $ lat  : num [1:1458] 41.1 32.5 42 41.4 31.1 ...
##  $ lon  : num [1:1458] -80.6 -85.7 -88.1 -74.4 -81.4 ...
##  $ alt  : num [1:1458] 1044 264 801 523 11 ...
##  $ tz   : num [1:1458] -5 -6 -6 -5 -5 -5 -5 -5 -5 -8 ...
##  $ dst  : chr [1:1458] "A" "A" "A" "A" ...
##  $ tzone: chr [1:1458] "America/New_York" "America/Chicago" "America/Chicago" "America/New_York" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   id = col_double(),
##   ..   name = col_character(),
##   ..   city = col_character(),
##   ..   country = col_character(),
##   ..   faa = col_character(),
##   ..   icao = col_character(),
##   ..   lat = col_double(),
##   ..   lon = col_double(),
##   ..   alt = col_double(),
##   ..   tz = col_double(),
##   ..   dst = col_character(),
##   ..   tzone = col_character()
##   .. )
airports
## # A tibble: 1,458 x 8
##    faa   name                       lat    lon   alt    tz dst   tzone          
##    <chr> <chr>                    <dbl>  <dbl> <dbl> <dbl> <chr> <chr>          
##  1 04G   Lansdowne Airport         41.1  -80.6  1044    -5 A     America/New_Yo~
##  2 06A   Moton Field Municipal A~  32.5  -85.7   264    -6 A     America/Chicago
##  3 06C   Schaumburg Regional       42.0  -88.1   801    -6 A     America/Chicago
##  4 06N   Randall Airport           41.4  -74.4   523    -5 A     America/New_Yo~
##  5 09J   Jekyll Island Airport     31.1  -81.4    11    -5 A     America/New_Yo~
##  6 0A9   Elizabethton Municipal ~  36.4  -82.2  1593    -5 A     America/New_Yo~
##  7 0G6   Williams County Airport   41.5  -84.5   730    -5 A     America/New_Yo~
##  8 0G7   Finger Lakes Regional A~  42.9  -76.8   492    -5 A     America/New_Yo~
##  9 0P2   Shoestring Aviation Air~  39.8  -76.6  1000    -5 U     America/New_Yo~
## 10 0S9   Jefferson County Intl     48.1 -123.    108    -8 A     America/Los_An~
## # ... with 1,448 more rows

Pesawat terbang di ambil dari data set planes, kita memungkinkan untuk mencari nama lengkap planes dan fungsi str() kita bisa mengetahui struktur dari data set planes.

str(planes)
## tibble [3,322 x 9] (S3: tbl_df/tbl/data.frame)
##  $ tailnum     : chr [1:3322] "N10156" "N102UW" "N103US" "N104UW" ...
##  $ year        : int [1:3322] 2004 1998 1999 1999 2002 1999 1999 1999 1999 1999 ...
##  $ type        : chr [1:3322] "Fixed wing multi engine" "Fixed wing multi engine" "Fixed wing multi engine" "Fixed wing multi engine" ...
##  $ manufacturer: chr [1:3322] "EMBRAER" "AIRBUS INDUSTRIE" "AIRBUS INDUSTRIE" "AIRBUS INDUSTRIE" ...
##  $ model       : chr [1:3322] "EMB-145XR" "A320-214" "A320-214" "A320-214" ...
##  $ engines     : int [1:3322] 2 2 2 2 2 2 2 2 2 2 ...
##  $ seats       : int [1:3322] 55 182 182 182 55 182 182 182 182 182 ...
##  $ speed       : int [1:3322] NA NA NA NA NA NA NA NA NA NA ...
##  $ engine      : chr [1:3322] "Turbo-fan" "Turbo-fan" "Turbo-fan" "Turbo-fan" ...
planes
## # A tibble: 3,322 x 9
##    tailnum  year type          manufacturer   model  engines seats speed engine 
##    <chr>   <int> <chr>         <chr>          <chr>    <int> <int> <int> <chr>  
##  1 N10156   2004 Fixed wing m~ EMBRAER        EMB-1~       2    55    NA Turbo-~
##  2 N102UW   1998 Fixed wing m~ AIRBUS INDUST~ A320-~       2   182    NA Turbo-~
##  3 N103US   1999 Fixed wing m~ AIRBUS INDUST~ A320-~       2   182    NA Turbo-~
##  4 N104UW   1999 Fixed wing m~ AIRBUS INDUST~ A320-~       2   182    NA Turbo-~
##  5 N10575   2002 Fixed wing m~ EMBRAER        EMB-1~       2    55    NA Turbo-~
##  6 N105UW   1999 Fixed wing m~ AIRBUS INDUST~ A320-~       2   182    NA Turbo-~
##  7 N107US   1999 Fixed wing m~ AIRBUS INDUST~ A320-~       2   182    NA Turbo-~
##  8 N108UW   1999 Fixed wing m~ AIRBUS INDUST~ A320-~       2   182    NA Turbo-~
##  9 N109UW   1999 Fixed wing m~ AIRBUS INDUST~ A320-~       2   182    NA Turbo-~
## 10 N110UW   1999 Fixed wing m~ AIRBUS INDUST~ A320-~       2   182    NA Turbo-~
## # ... with 3,312 more rows
planes.se <- dplyr::select(planes, tailnum:manufacturer)
head(planes.se,n=5)
## # A tibble: 5 x 4
##   tailnum  year type                    manufacturer    
##   <chr>   <int> <chr>                   <chr>           
## 1 N10156   2004 Fixed wing multi engine EMBRAER         
## 2 N102UW   1998 Fixed wing multi engine AIRBUS INDUSTRIE
## 3 N103US   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 4 N104UW   1999 Fixed wing multi engine AIRBUS INDUSTRIE
## 5 N10575   2002 Fixed wing multi engine EMBRAER
str(weather)
## tibble [26,115 x 15] (S3: tbl_df/tbl/data.frame)
##  $ origin    : chr [1:26115] "EWR" "EWR" "EWR" "EWR" ...
##  $ year      : int [1:26115] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
##  $ month     : int [1:26115] 1 1 1 1 1 1 1 1 1 1 ...
##  $ day       : int [1:26115] 1 1 1 1 1 1 1 1 1 1 ...
##  $ hour      : int [1:26115] 1 2 3 4 5 6 7 8 9 10 ...
##  $ temp      : num [1:26115] 39 39 39 39.9 39 ...
##  $ dewp      : num [1:26115] 26.1 27 28 28 28 ...
##  $ humid     : num [1:26115] 59.4 61.6 64.4 62.2 64.4 ...
##  $ wind_dir  : num [1:26115] 270 250 240 250 260 240 240 250 260 260 ...
##  $ wind_speed: num [1:26115] 10.36 8.06 11.51 12.66 12.66 ...
##  $ wind_gust : num [1:26115] NA NA NA NA NA NA NA NA NA NA ...
##  $ precip    : num [1:26115] 0 0 0 0 0 0 0 0 0 0 ...
##  $ pressure  : num [1:26115] 1012 1012 1012 1012 1012 ...
##  $ visib     : num [1:26115] 10 10 10 10 10 10 10 10 10 10 ...
##  $ time_hour : POSIXct[1:26115], format: "2013-01-01 01:00:00" "2013-01-01 02:00:00" ...
weather
## # A tibble: 26,115 x 15
##    origin  year month   day  hour  temp  dewp humid wind_dir wind_speed
##    <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>
##  1 EWR     2013     1     1     1  39.0  26.1  59.4      270      10.4 
##  2 EWR     2013     1     1     2  39.0  27.0  61.6      250       8.06
##  3 EWR     2013     1     1     3  39.0  28.0  64.4      240      11.5 
##  4 EWR     2013     1     1     4  39.9  28.0  62.2      250      12.7 
##  5 EWR     2013     1     1     5  39.0  28.0  64.4      260      12.7 
##  6 EWR     2013     1     1     6  37.9  28.0  67.2      240      11.5 
##  7 EWR     2013     1     1     7  39.0  28.0  64.4      240      15.0 
##  8 EWR     2013     1     1     8  39.9  28.0  62.2      250      10.4 
##  9 EWR     2013     1     1     9  39.9  28.0  62.2      260      15.0 
## 10 EWR     2013     1     1    10  41    28.0  59.6      260      13.8 
## # ... with 26,105 more rows, and 5 more variables: wind_gust <dbl>,
## #   precip <dbl>, pressure <dbl>, visib <dbl>, time_hour <dttm>
weather.se <- dplyr::filter(weather, temp >= 40)
print(weather.se)
## # A tibble: 19,401 x 15
##    origin  year month   day  hour  temp  dewp humid wind_dir wind_speed
##    <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>
##  1 EWR     2013     1     1    10  41    28.0  59.6      260      13.8 
##  2 EWR     2013     1     1    11  41    27.0  57.1      260      15.0 
##  3 EWR     2013     1     5    10  41    21.0  44.5      310      10.4 
##  4 EWR     2013     1     5    11  42.1  19.9  40.8      320      15.0 
##  5 EWR     2013     1     5    12  43.0  19.9  39.4      310      13.8 
##  6 EWR     2013     1     5    13  44.1  19.9  37.8      290      11.5 
##  7 EWR     2013     1     5    14  44.1  19.9  37.8      310       9.21
##  8 EWR     2013     1     5    15  43.0  19.0  37.9      310       8.06
##  9 EWR     2013     1     5    16  41    19.9  42.5      310       4.60
## 10 EWR     2013     1     6    11  43.0  30.0  59.9      240      10.4 
## # ... with 19,391 more rows, and 5 more variables: wind_gust <dbl>,
## #   precip <dbl>, pressure <dbl>, visib <dbl>, time_hour <dttm>

Operator pipe: %>%

Pipe operator: %>% dalam dplyr diimport dari package magrittr. Pipe digunakan untuk menyalurkan output dari suatu fungsi ke input fungsi lain. Alih-alih menggunakan fungsi tersarang, digunakanlah pipe yang dibaca dari kiri ke kanan.

head(select(weather, origin, year, temp), n = 5)
## # A tibble: 5 x 3
##   origin  year  temp
##   <chr>  <int> <dbl>
## 1 EWR     2013  39.0
## 2 EWR     2013  39.0
## 3 EWR     2013  39.0
## 4 EWR     2013  39.9
## 5 EWR     2013  39.0

dengan menggunakan pipe, dapat dituliskan sebagai berikut :

weather %>%
  select(origin, year, temp) %>%
  head(n = 5)
## # A tibble: 5 x 3
##   origin  year  temp
##   <chr>  <int> <dbl>
## 1 EWR     2013  39.0
## 2 EWR     2013  39.0
## 3 EWR     2013  39.0
## 4 EWR     2013  39.9
## 5 EWR     2013  39.0
weather.ar <- arrange(weather, temp)
head(weather.ar , n = 10)
## # A tibble: 10 x 15
##    origin  year month   day  hour  temp  dewp humid wind_dir wind_speed
##    <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>
##  1 EWR     2013     1    23     5  10.9 -4     50.2      270      10.4 
##  2 EWR     2013     1    23     6  10.9 -4     50.2      270      11.5 
##  3 EWR     2013     1    23     2  12.0 -7.06  41.3      270      10.4 
##  4 EWR     2013     1    23     3  12.0 -5.98  43.5      270       8.06
##  5 EWR     2013     1    23     4  12.0 -5.08  45.4      270       9.21
##  6 EWR     2013     1    23     7  12.0 -2.92  50.4      270      11.5 
##  7 EWR     2013     1    24     3  12.0  1.94  63.3       30       4.60
##  8 JFK     2013     1    23     4  12.0 -7.06  41.3      280      12.7 
##  9 JFK     2013     1    23     5  12.0 -5.08  45.4      280      15.0 
## 10 JFK     2013     1    23     6  12.0 -5.08  45.4      290      13.8 
## # ... with 5 more variables: wind_gust <dbl>, precip <dbl>, pressure <dbl>,
## #   visib <dbl>, time_hour <dttm>
weather.ab <- arrange(weather, desc(temp))
head(weather.ab , n = 10)
## # A tibble: 10 x 15
##    origin  year month   day  hour  temp  dewp humid wind_dir wind_speed
##    <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>
##  1 EWR     2013     7    18    15 100.   66.0  33.2      300       9.21
##  2 EWR     2013     7    19    16 100.   71.1  39.5      230      20.7 
##  3 EWR     2013     7    19    13  99.0  72.0  42.1      260      18.4 
##  4 EWR     2013     7    19    14  99.0  71.1  40.8      250      20.7 
##  5 EWR     2013     7    19    15  99.0  72.0  42.1      240      16.1 
##  6 LGA     2013     7    18    15  99.0  64.0  32.0      250      12.7 
##  7 LGA     2013     7    19    15  99.0  69.1  38.2      240      16.1 
##  8 LGA     2013     7    19    16  99.0  69.1  38.2      230      15.0 
##  9 EWR     2013     7    18    12  98.1  69.1  39.2      300       9.21
## 10 EWR     2013     7    18    13  98.1  66.9  36.4      300      11.5 
## # ... with 5 more variables: wind_gust <dbl>, precip <dbl>, pressure <dbl>,
## #   visib <dbl>, time_hour <dttm>
weather %>%
  mutate(weather_proportion = temp/dewp) %>%
  head(5)
## # A tibble: 5 x 16
##   origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
##   <chr>  <int> <int> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
## 1 EWR     2013     1     1     1  39.0  26.1  59.4      270      10.4         NA
## 2 EWR     2013     1     1     2  39.0  27.0  61.6      250       8.06        NA
## 3 EWR     2013     1     1     3  39.0  28.0  64.4      240      11.5         NA
## 4 EWR     2013     1     1     4  39.9  28.0  62.2      250      12.7         NA
## 5 EWR     2013     1     1     5  39.0  28.0  64.4      260      12.7         NA
## # ... with 5 more variables: precip <dbl>, pressure <dbl>, visib <dbl>,
## #   time_hour <dttm>, weather_proportion <dbl>
weather %>%
  summarise(rata2 = mean(dewp), median = median(dewp), 
            minimum = min(dewp), maximum = max(dewp))
## # A tibble: 1 x 4
##   rata2 median minimum maximum
##   <dbl>  <dbl>   <dbl>   <dbl>
## 1    NA     NA      NA      NA