For your assignment you may be using different dataset than what is included here.
Always read carefully the instructions on Sakai.
Tasks/questions to be completed/answered are highlighted in larger bolded fonts and numbered according to their section.
We are going to use tidyverse a collection of R packages designed for data science.
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.4
## v tidyr 0.7.2 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.2.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Name your dataset ‘mydata’ so it easy to work with.
Commands: read_csv() head() mean() sub() as.numeric()
mydata <- read.csv("Taxi_Trips_Sample.csv")
mydata = na.omit(mydata)
head(mydata)
## Trip.ID
## 1 3e7d6d8ccf1425ae1dcd584f5c3ca303cf6362ed
## 2 3e7d6e5c4e87f01a475c8200b33777e85497da89
## 4 3e7d6efe43222b0ebc698583916674c648dd4520
## 5 3e7d6f001e9bcda8478a489cb53293d26328ac85
## 8 3e7d6f35332ed1069a218b63cef47c3be42a26fb
## 14 3e7d706fe60689dc646772c09e128e19ab022472
## Taxi.ID
## 1 b47c583b142d75b42882975eaab19c6cb98d82686016576cce6e305b1b99eb16aacfb9a21ff61c84873a6c3dde282756c162c538c8b69554fd8f811f3a8f60a2
## 2 bc1c0381e3bca623e6c04f3410f7b67201a9fc85c6b66d0f420a88099d38448f9b9874e246da49cf2ef32ea3d027eec9c5b484fe77dbfc033c389b5576ac66bd
## 4 0f831bff43d83f396f2e4950126c6137dcdb60fb4c8580ffe860203747a83a789b22f2f9e4fbdd0dd8ed8c310366d8935228ddbcadf708fb9691ca5dd1b6c802
## 5 e5274d6c103515af3ce705182d0bbbea7ca077a6f23b1736254f2de8ba3e1687dd77f5fb541b7f00b1ebc24cfde54caf5a9562f046a0559acbfe1e7159e17c1a
## 8 5cb1d0f30a10dc0c80525ef9cbf1fd1a0f2c57e7ae2a5efca20c69229e52ada226b39071f69d9e8bef81722dc96234dca57b26694b6ec4773eb79b1beeac5c83
## 14 631de3a4ad35b7d7d75949b33ad59e09820b51b3f58225df9f3f1c9aa65554e97ba9d6f868c94b5c2f170cf086a0ca7089dc2f6dab2c9096283f482659071447
## Trip.Start.Timestamp Trip.End.Timestamp Trip.Seconds Trip.Miles
## 1 08/19/2014 02:45:00 PM 08/19/2014 02:45:00 PM 480 0.15
## 2 09/23/2013 05:15:00 PM 09/23/2013 05:15:00 PM 420 0.00
## 4 05/10/2013 08:00:00 PM 05/10/2013 08:45:00 PM 2340 13.80
## 5 02/21/2016 07:15:00 PM 02/21/2016 07:15:00 PM 300 0.70
## 8 03/08/2014 03:45:00 PM 03/08/2014 04:30:00 PM 2220 13.30
## 14 05/05/2014 09:30:00 AM 05/05/2014 09:45:00 AM 900 0.10
## Pickup.Census.Tract Dropoff.Census.Tract Pickup.Community.Area
## 1 17031280100 17031839100 28
## 2 17031081800 17031281900 8
## 4 17031980000 17031060400 76
## 5 17031081500 17031081500 8
## 8 17031980100 17031242300 56
## 14 17031832600 17031081401 7
## Dropoff.Community.Area Fare Tips Tolls Extras Trip.Total
## 1 32 $7.05 $0.00 $0.00 $1.50 $8.55
## 2 28 $6.05 $0.00 $0.00 $0.00 $6.05
## 4 6 $31.25 $0.00 $0.00 $3.00 $34.25
## 5 8 $5.50 $0.00 $0.00 $0.00 $5.50
## 8 24 $30.45 $6.49 $0.00 $2.00 $38.94
## 14 8 $10.45 $0.00 $0.00 $0.00 $10.45
## Payment.Type Company Pickup.Centroid.Latitude
## 1 Cash 41.88530
## 2 Cash Taxi Affiliation Services 41.89322
## 4 Cash 41.97907
## 5 Cash 41.89251
## 8 Credit Card 41.78600
## 14 Cash Taxi Affiliation Services 41.91475
## Pickup.Centroid.Longitude Pickup.Centroid.Location
## 1 -87.64281 POINT (-87.642808 41.8853)
## 2 -87.63784 POINT (-87.637844 41.893216)
## 4 -87.90304 POINT (-87.90304 41.979071)
## 5 -87.62621 POINT (-87.626215 41.892508)
## 8 -87.75093 POINT (-87.750934 41.785999)
## 14 -87.65401 POINT (-87.654007 41.914747)
## Dropoff.Centroid.Latitude Dropoff.Centroid.Longitude
## 1 41.88099 -87.63275
## 2 41.87926 -87.64265
## 4 41.95067 -87.66654
## 5 41.89251 -87.62621
## 8 41.89951 -87.67960
## 14 41.89503 -87.61971
## Dropoff.Centroid..Location Community.Areas
## 1 POINT (-87.632746 41.880994) 29
## 2 POINT (-87.642649 41.879255) 37
## 4 POINT (-87.666536 41.950673) 75
## 5 POINT (-87.626215 41.892508) 37
## 8 POINT (-87.6796 41.899507) 53
## 14 POINT (-87.619711 41.895033) 68
fare = mydata$Fare
tips = mydata$Tips
tolls = mydata$Tolls
extras = mydata$Extras
head(fare)
## [1] $7.05 $6.05 $31.25 $5.50 $30.45 $10.45
## 892 Levels: $0.00 $0.01 $0.03 $0.05 $0.10 $0.11 $0.28 $0.30 ... $99.99
head(tips)
## [1] $0.00 $0.00 $0.00 $0.00 $6.49 $0.00
## 1081 Levels: $0.00 $0.01 $0.02 $0.03 $0.04 $0.05 $0.06 $0.08 ... $9.98
head(tolls)
## [1] $0.00 $0.00 $0.00 $0.00 $0.00 $0.00
## 25 Levels: $0.00 $0.60 $0.95 $1.00 $1.11 $1.20 $1.50 $1.90 ... $82.50
head(extras)
## [1] $1.50 $0.00 $3.00 $0.00 $2.00 $0.00
## 183 Levels: $0.00 $0.01 $0.03 $0.08 $0.10 $0.11 $0.20 $0.34 ... $9.50
** What happens when we try to use the function? **
mean(fare)
## Warning in mean.default(fare): argument is not numeric or logical:
## returning NA
## [1] NA
mean(tips)
## Warning in mean.default(tips): argument is not numeric or logical:
## returning NA
## [1] NA
mean(tolls)
## Warning in mean.default(tolls): argument is not numeric or logical:
## returning NA
## [1] NA
mean(extras)
## Warning in mean.default(extras): argument is not numeric or logical:
## returning NA
## [1] NA
To resolve the error, check if the feature data type is correct
Notice that comma ‘1,234’ in some values
Also the data type and dollar sign ‘$’ symbol
To remove the any character in this case the comma from “1,234”. We must substitute it with just an empty space.
** substitute comma with “” **
fare_clean <- sub(pattern = "," , replacement = "", x = fare)
head(fare_clean)
## [1] "$7.05" "$6.05" "$31.25" "$5.50" "$30.45" "$10.45"
tips_clean <- sub(pattern = "," , replacement = "", x = tips)
head(tips_clean)
## [1] "$0.00" "$0.00" "$0.00" "$0.00" "$6.49" "$0.00"
tolls_clean <- sub(pattern = "," , replacement = "", x = tolls)
head(tolls_clean)
## [1] "$0.00" "$0.00" "$0.00" "$0.00" "$0.00" "$0.00"
extras_clean <- sub(pattern = "," , replacement = "", x = extras)
head(extras_clean)
## [1] "$1.50" "$0.00" "$3.00" "$0.00" "$2.00" "$0.00"
** substitute dollar sign with “” **
fare_clean <- sub(pattern = "\\$" , replacement = "", x = fare_clean)
head(fare_clean)
## [1] "7.05" "6.05" "31.25" "5.50" "30.45" "10.45"
tips_clean <- sub(pattern = "\\$" , replacement = "", x = tips_clean)
head(tips_clean)
## [1] "0.00" "0.00" "0.00" "0.00" "6.49" "0.00"
tolls_clean <- sub(pattern = "\\$" , replacement = "", x = tolls_clean)
head(tolls_clean)
## [1] "0.00" "0.00" "0.00" "0.00" "0.00" "0.00"
extras_clean <- sub(pattern = "\\$" , replacement = "", x = extras_clean)
head(extras_clean)
## [1] "1.50" "0.00" "3.00" "0.00" "2.00" "0.00"
** character to numeric - as.numeric()**
fare_clean <- as.numeric(fare_clean)
tips_clean <- as.numeric(tips_clean)
tolls_clean <- as.numeric(tolls_clean)
extras_clean <- as.numeric(extras_clean)
** mean with NA removed **
fare_clean <- mean(fare_clean , na.rm = TRUE)
fare_clean
## [1] 10.95067
tips_clean <- mean(tips_clean , na.rm = TRUE)
tips_clean
## [1] 1.096804
tolls_clean <- mean(tolls_clean , na.rm = TRUE)
tolls_clean
## [1] 0.005630209
extras_clean <- mean(extras_clean , na.rm = TRUE)
extras_clean
## [1] 0.7317954
** Clean variables to dataset **
mydata$Fare <- fare
mydata$Tips <- tips
mydata$Tolls <- tolls
mydata$Extras <- extras
head(mydata)
## Trip.ID
## 1 3e7d6d8ccf1425ae1dcd584f5c3ca303cf6362ed
## 2 3e7d6e5c4e87f01a475c8200b33777e85497da89
## 4 3e7d6efe43222b0ebc698583916674c648dd4520
## 5 3e7d6f001e9bcda8478a489cb53293d26328ac85
## 8 3e7d6f35332ed1069a218b63cef47c3be42a26fb
## 14 3e7d706fe60689dc646772c09e128e19ab022472
## Taxi.ID
## 1 b47c583b142d75b42882975eaab19c6cb98d82686016576cce6e305b1b99eb16aacfb9a21ff61c84873a6c3dde282756c162c538c8b69554fd8f811f3a8f60a2
## 2 bc1c0381e3bca623e6c04f3410f7b67201a9fc85c6b66d0f420a88099d38448f9b9874e246da49cf2ef32ea3d027eec9c5b484fe77dbfc033c389b5576ac66bd
## 4 0f831bff43d83f396f2e4950126c6137dcdb60fb4c8580ffe860203747a83a789b22f2f9e4fbdd0dd8ed8c310366d8935228ddbcadf708fb9691ca5dd1b6c802
## 5 e5274d6c103515af3ce705182d0bbbea7ca077a6f23b1736254f2de8ba3e1687dd77f5fb541b7f00b1ebc24cfde54caf5a9562f046a0559acbfe1e7159e17c1a
## 8 5cb1d0f30a10dc0c80525ef9cbf1fd1a0f2c57e7ae2a5efca20c69229e52ada226b39071f69d9e8bef81722dc96234dca57b26694b6ec4773eb79b1beeac5c83
## 14 631de3a4ad35b7d7d75949b33ad59e09820b51b3f58225df9f3f1c9aa65554e97ba9d6f868c94b5c2f170cf086a0ca7089dc2f6dab2c9096283f482659071447
## Trip.Start.Timestamp Trip.End.Timestamp Trip.Seconds Trip.Miles
## 1 08/19/2014 02:45:00 PM 08/19/2014 02:45:00 PM 480 0.15
## 2 09/23/2013 05:15:00 PM 09/23/2013 05:15:00 PM 420 0.00
## 4 05/10/2013 08:00:00 PM 05/10/2013 08:45:00 PM 2340 13.80
## 5 02/21/2016 07:15:00 PM 02/21/2016 07:15:00 PM 300 0.70
## 8 03/08/2014 03:45:00 PM 03/08/2014 04:30:00 PM 2220 13.30
## 14 05/05/2014 09:30:00 AM 05/05/2014 09:45:00 AM 900 0.10
## Pickup.Census.Tract Dropoff.Census.Tract Pickup.Community.Area
## 1 17031280100 17031839100 28
## 2 17031081800 17031281900 8
## 4 17031980000 17031060400 76
## 5 17031081500 17031081500 8
## 8 17031980100 17031242300 56
## 14 17031832600 17031081401 7
## Dropoff.Community.Area Fare Tips Tolls Extras Trip.Total
## 1 32 $7.05 $0.00 $0.00 $1.50 $8.55
## 2 28 $6.05 $0.00 $0.00 $0.00 $6.05
## 4 6 $31.25 $0.00 $0.00 $3.00 $34.25
## 5 8 $5.50 $0.00 $0.00 $0.00 $5.50
## 8 24 $30.45 $6.49 $0.00 $2.00 $38.94
## 14 8 $10.45 $0.00 $0.00 $0.00 $10.45
## Payment.Type Company Pickup.Centroid.Latitude
## 1 Cash 41.88530
## 2 Cash Taxi Affiliation Services 41.89322
## 4 Cash 41.97907
## 5 Cash 41.89251
## 8 Credit Card 41.78600
## 14 Cash Taxi Affiliation Services 41.91475
## Pickup.Centroid.Longitude Pickup.Centroid.Location
## 1 -87.64281 POINT (-87.642808 41.8853)
## 2 -87.63784 POINT (-87.637844 41.893216)
## 4 -87.90304 POINT (-87.90304 41.979071)
## 5 -87.62621 POINT (-87.626215 41.892508)
## 8 -87.75093 POINT (-87.750934 41.785999)
## 14 -87.65401 POINT (-87.654007 41.914747)
## Dropoff.Centroid.Latitude Dropoff.Centroid.Longitude
## 1 41.88099 -87.63275
## 2 41.87926 -87.64265
## 4 41.95067 -87.66654
## 5 41.89251 -87.62621
## 8 41.89951 -87.67960
## 14 41.89503 -87.61971
## Dropoff.Centroid..Location Community.Areas
## 1 POINT (-87.632746 41.880994) 29
## 2 POINT (-87.642649 41.879255) 37
## 4 POINT (-87.666536 41.950673) 75
## 5 POINT (-87.626215 41.892508) 37
## 8 POINT (-87.6796 41.899507) 53
## 14 POINT (-87.619711 41.895033) 68
** Save clean data **
write_csv(mydata, path = "data/mydata_clean.csv")
In this task we must calculate the mean, standard deviation, maximum, and minimum for the given feature.
trip_seconds = mydata$Trip.Seconds
head(trip_seconds)
## [1] 480 420 2340 300 2220 900
** calculate the average **
mean(trip_seconds)
## [1] 720.8911
** calculate the standard deviation **
sd(trip_seconds)
## [1] 1005.201
** calculate the min **
min(trip_seconds)
## [1] 0
** calculate the max **
max(trip_seconds)
## [1] 74340
To find the outliers we are going to look at the upper and lower limits
An outlier is value that “lies outside” most of the other values in a set of data.
A method to find upper and lower thresholds involves finding the interquartile range.
** quantile calculation for the give feature**
quantile(trip_seconds)
## 0% 25% 50% 75% 100%
## 0 300 540 840 74340
** Lower and upper quantile calculation **
lowerq = quantile(trip_seconds)[2]
lowerq
## 25%
## 300
upperq = quantile(trip_seconds)[4]
upperq
## 75%
## 840
Interquantile calculation
iqr = upperq - lowerq
iqr
## 75%
## 540
The threshold is the boundaries that determine if a value is an outlier.
If the value falls above the upper threshold or below the lower threshold, it is an outlier.
** Calculation the upper threshold **
upper_threshold = (iqr * 1.5) + upperq
upper_threshold
## 75%
## 1650
** Calculation the lower threshold **
lower_threshold = lowerq - (iqr * 1.5)
lower_threshold
## 25%
## -510
** Identify outliers **
trip_seconds[ trip_seconds > upper_threshold][1:10]
## [1] 2340 2220 3300 1680 1860 1980 3360 4500 2760 1860
trip_seconds[ trip_seconds < lower_threshold][1:10]
## [1] NA NA NA NA NA NA NA NA NA NA
** Finding outliers records **
count(mydata[ trip_seconds > upper_threshold, ][1:10])
## # A tibble: 1 x 1
## n
## <int>
## 1 4837
count(mydata[ trip_seconds < lower_threshold, ][1:10])
## # A tibble: 1 x 1
## n
## <int>
## 1 0
There are 4837 outliers, all above the upper threshold. Outliers are irregular data points that skew from the regular pattern of the rest of the data.
It can also be useful to visualize the data using a box and whisker plot.
The boxplot supports the IQR also shows the upper and lower thresholds
p <- ggplot(data = mydata, aes(x = "", y = trip_seconds)) + geom_boxplot() + coord_flip()
p
Chicago Taxi Dashboard: https://data.cityofchicago.org/Transportation/Taxi-Trips-Dashboard/spcw-brbq
Chicago Taxi Data Description: http://digital.cityofchicago.org/index.php/chicago-taxi-data-released
knitr::include_graphics("img01.png")
This graph confirms that the expectation that most taxi trips are under 10 miles. Traveling 100 miles plus would be a bad economic choice and would be more time consuming than taking a plane; that is why there is a huge drop in between 10 and 100 miles.
summary(mydata)
## Trip.ID
## 3e7d6d8ccf1425ae1dcd584f5c3ca303cf6362ed: 1
## 3e7d6e5c4e87f01a475c8200b33777e85497da89: 1
## 3e7d6efe43222b0ebc698583916674c648dd4520: 1
## 3e7d6f001e9bcda8478a489cb53293d26328ac85: 1
## 3e7d6f35332ed1069a218b63cef47c3be42a26fb: 1
## 3e7d706fe60689dc646772c09e128e19ab022472: 1
## (Other) :60705
## Taxi.ID
## 4f189764b8d9b6f71f7936ab414cac07634be0a00790ca179f9460521b7c9c3e5e102f5ba4e1c9cd18cdd9856dbf4f66ae8f13d8c82f8d2d4872f74b96938a24: 45
## 1b2865284986db761b656ee64ad35fa46df8f7da125dde86ddb79d798917d2a8ac1bc1e7865a340f797d7cc7ba275d9397e4ce03c88da96b088fab3088c07dc8: 38
## ad33dffdd6cd00795ea1a00a6a6db1a38482075d532b55e712741e9b4a2541375fcf642d01d35b51646d2a07b49376f167b5ddd1f7b0a5354afe07f514108365: 37
## 0861cb74337c620cb9ec639af7dc3aa99173b768caf750a2fd1ff17a8d9db86cad36772c7ff6ddaf2fda48de41bc82981145fe46693ed147d86ae194ee15c703: 36
## 21e6058fef096e0d45f7b9d47433974c9c4e1820c3e5bee61670db5e58eb32e58432f1b6cf09adf73fe8a36a25cfe880f2e797158b2c890ec52603c3635d759f: 36
## 2a29bcc02d98eb9e748a8427ffd773934047f77e20f70325b01f515ba28cb6e379194d8d65b7783c6791f390ea0d9833daf89a537be9be1db5b81e1db23a07c5: 36
## (Other) :60483
## Trip.Start.Timestamp Trip.End.Timestamp
## 02/05/2015 07:15:00 PM: 7 07/25/2014 06:45:00 PM: 8
## 02/27/2015 08:45:00 AM: 7 02/28/2014 08:45:00 PM: 7
## 04/25/2014 06:45:00 PM: 7 03/11/2016 08:15:00 PM: 7
## 05/17/2014 11:30:00 PM: 7 02/05/2015 07:45:00 PM: 6
## 07/25/2014 06:45:00 PM: 7 02/07/2014 09:00:00 PM: 6
## 02/07/2014 08:45:00 PM: 6 04/10/2015 09:45:00 AM: 6
## (Other) :60670 (Other) :60671
## Trip.Seconds Trip.Miles Pickup.Census.Tract
## Min. : 0.0 Min. : 0.000 Min. :1.703e+10
## 1st Qu.: 300.0 1st Qu.: 0.000 1st Qu.:1.703e+10
## Median : 540.0 Median : 0.900 Median :1.703e+10
## Mean : 720.9 Mean : 2.478 Mean :1.703e+10
## 3rd Qu.: 840.0 3rd Qu.: 1.900 3rd Qu.:1.703e+10
## Max. :74340.0 Max. :1770.000 Max. :1.703e+10
##
## Dropoff.Census.Tract Pickup.Community.Area Dropoff.Community.Area
## Min. :1.703e+10 Min. : 1.00 Min. : 1.00
## 1st Qu.:1.703e+10 1st Qu.: 8.00 1st Qu.: 8.00
## Median :1.703e+10 Median : 8.00 Median : 8.00
## Mean :1.703e+10 Mean :21.58 Mean :21.19
## 3rd Qu.:1.703e+10 3rd Qu.:32.00 3rd Qu.:32.00
## Max. :1.703e+10 Max. :77.00 Max. :77.00
##
## Fare Tips Tolls Extras
## $6.25 : 2162 $0.00 :40362 $0.00 :60694 $0.00 :34234
## $5.25 : 2041 $2.00 : 7355 $50.00 : 5 $1.00 :13299
## $5.85 : 1835 $3.00 : 2472 $1.50 : 3 $2.00 : 6253
## $5.65 : 1829 $1.00 : 2234 $2.00 : 2 $1.50 : 3483
## $5.45 : 1817 $4.00 : 595 $3.00 : 2 $3.00 : 1376
## $6.05 : 1804 $5.00 : 520 : 1 $4.00 : 657
## (Other):49223 (Other): 7173 (Other): 4 (Other): 1409
## Trip.Total Payment.Type
## $7.25 : 1540 Cash :39135
## $6.25 : 1455 Credit Card:21147
## $6.65 : 1361 Dispute : 37
## $7.05 : 1253 No Charge : 305
## $6.45 : 1251 Pcard : 9
## $7.45 : 1228 Prcard : 3
## (Other):52623 Unknown : 75
## Company Pickup.Centroid.Latitude
## :20771 Min. :41.74
## Taxi Affiliation Services :20403 1st Qu.:41.88
## Dispatch Taxi Affiliation : 6685 Median :41.89
## Blue Ribbon Taxi Association Inc.: 4589 Mean :41.90
## Choice Taxi Association : 3784 3rd Qu.:41.90
## Northwest Management LLC : 2236 Max. :42.02
## (Other) : 2243
## Pickup.Centroid.Longitude Pickup.Centroid.Location
## Min. :-87.90 POINT (-87.632746 41.880994): 8461
## 1st Qu.:-87.64 POINT (-87.620993 41.884987): 4963
## Median :-87.63 POINT (-87.626215 41.892508): 3787
## Mean :-87.65 POINT (-87.631864 41.892042): 3647
## 3rd Qu.:-87.62 POINT (-87.90304 41.979071) : 3016
## Max. :-87.58 POINT (-87.642649 41.879255): 2780
## (Other) :34057
## Dropoff.Centroid.Latitude Dropoff.Centroid.Longitude
## Min. :41.74 Min. :-87.90
## 1st Qu.:41.88 1st Qu.:-87.64
## Median :41.89 Median :-87.63
## Mean :41.90 Mean :-87.65
## 3rd Qu.:41.90 3rd Qu.:-87.62
## Max. :42.02 Max. :-87.58
##
## Dropoff.Centroid..Location Community.Areas
## POINT (-87.632746 41.880994): 7602 Min. : 1.00
## POINT (-87.620993 41.884987): 4394 1st Qu.:37.00
## POINT (-87.631864 41.892042): 3062 Median :37.00
## POINT (-87.626215 41.892508): 3057 Mean :40.71
## POINT (-87.642649 41.879255): 2696 3rd Qu.:38.00
## POINT (-87.90304 41.979071) : 2435 Max. :76.00
## (Other) :37465
The data is about all the variables that go into taxi rides and the data is both quantitative and qualitative. There are 24 columns and 10,000 rows including the column titles. The data provides valuable data that could be used to look for patterns and correlations amongst the variables included. Some of the data includes miles driven, tips, tolls, extras, latitude and longitude, and much more.
There are many relationships that can be drawn from the data, but the most noteworthy would be the relationship between costs. The total cost relies on the fare, tips, tolls, and extras. The value of these variables cumulatively make up the total cost. Another important relationship would be the fare and both the trip miles and the trip seconds. Taxis’ payment method is based on the total distance and the total time in the taxi.
knitr::include_graphics("erdplus-diagram.png")