We present an analysis of taxi rider tipping behavior for New York City Yellow Cab trips during the month of June, 2015. The data were obtained from the NYC Taxi and Limousine Commission and can be found here.
This dataset consists of observations of more than 12.3 million rides and the 19 variables recorded include fare, distance and tip values, as well as latitude and longitude of pickup and dropoff locations.
The data were cleaned to eliminate nonsensical values for distance and fare amounts.
Here we plot a histogram of trip distances and calculate the mean and median distance in miles. Note the presence of local maxima in the regions where the trip distance is within a range approximating the driving distance from Manhattan to two local airports, JFK International and LaGuardia.
mean(taxi.with.loc$trip_distance)
## [1] 3.016046
median(taxi.with.loc$trip_distance)
## [1] 1.75
Here we calculate the percent tip as \[tip/total * 100\%\] and make plots from 100,000 observations sampled from the tidied data.
The histogram of tip percentages shows remarkably that the number of riders who tip in the customary range of 15%-20% is just about equal to the number who do not add any tip at all.
When we plot percent tip versus total amount, more characteristics emerge. First, we see there seem to be two different styles of tipping. One is to tip a constant percentage regardless of the fare, represented by the horizontal lines of observations. The other style is to reduce the percent tip in proportion to the total amount paid for the ride1, shown by the observations that lie along the family of concave curves.
There could be several reasons for this characteristic. When the fare is low, such as $7, a rider may be more inclined to say ‘keep the change.’ In this case, if the rider gave the driver a ten dollar bill, the resulting tip would be 43%. Furthermore, As the fare amount increases, the absolute dollar amount of the tip will seem to meet or exceed the fair value for a gratuity.
Second, the data confirm the prevalence of the customary tipping percentages. We see many observations at 20% and another group at around 16.5%. A third, more generous group, falls on a line near 23%.
Finally, we see once again a prominent collection of observations at 0%.
Here the log-log plot of tip percent vs total amount shows more clearly that tipping strategies fall into 2 groups: tipping at a constant rate and tipping in inverse proportion to the total amount. In the current plot, this is visualized by two families of lines, one with zero slope that corresponds to constant-rate tips and another line with negative slope, corresponding to inverse-proportion tips. The negative slope gives the exponent of the relationship between percent tip and total amount, which is very nearly -1.
Because withholding a tip is ordinarily a signal for bad service, we hypothesize that among the 0 percent tippers there may be a preponderance of people who are unfamiliar with the local tipping customs, such as foreign tourists.
Visitors (international and domestic) to New York City in 20142
| Domestic | Foreign | Total |
|---|---|---|
| 44.5 million | 12.0 million | 56.5 million |
The total number of taxi rides in June 2015 ending with a 0% tip was 4,879,656. Using statistics from 2014 and assuming equal monthly numbers of foreign visitors, attributing all these trips to foreign tourists would imply an average of 4.88 trips per tourist, or between 2 and 3 round trips per person. This is not entirely unreasonable, although since tourists often travel in groups and the NYC subway serves as an attraction in itself, it seems neither sufficient nor logical to uphold this hypothesis. Surely some domestic tourists and native New Yorkers are among the non-tipping riders.
Below we plot the rates of zero-percent tip rides for selected areas that should be biased toward heavy tourist populations. We use the latitude and longitude of the pickup and dropoff locations to determine if the trips originate or end in Times Square, JFK International Airport and LaGuardia, primarily a domestic airport with shuttle service accommodating business commuters.
We see that the rates for JFK and Times Square exceed the rate for All Trips, but that LaGuardia is lower. This lends support to the proposition that tourists are over-represented in the non-tipping population.
We have seen evidence that tipping styles fall into two classes, tipping at a constant rate and tipping in inverse proportion to the total amount paid. We can’t conclude that these represent two distinct groups of people, although this is a possibility. It could be that riders choose between the two strategies on a ride-by-ride basis. However the fact that the trend is seen even at low fare amounts does suggest that there may be separate groups represented by these styles.
Our examination also illuminated the high proportion of zero-percent tippers. We showed some evidence supporting the conjecture that tourists comprise many of these rides.
The total amount includes fare, taxes, fees and tolls.↩