EXPLORING UBER TRIPS

How far does Uber take you?

Neeraj Sehrawat, StudentId- s3711712
Ria Talwar, StudentId- s3729618
Radhika Santosh Zawar, StudentId- s3734939

Last updated: 28 October, 2018

Introduction

Problem Statement

Data

Descriptive Statistics and Visualisation

summary(uber$MILES., na.rm = TRUE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.50    2.90    6.00   10.57   10.40  310.30
sd(uber$MILES.)
## [1] 21.57911

Descriptive Statistics and Visualisation Contd:

boxplot(uber$MILES.)

Descriptive Statistics and Visualisation Contd:

zscores_miles <- uber$MILES. %>%  scores(type = "z")
zscores_miles %>% summary()
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.466509 -0.355290 -0.211632  0.000000 -0.007732 13.889971

The total number of outliers as per Z-score is pretty small, 22 out of 1155 observation which is just 1.9%. The outliers are a pattern in the data and also due to the uncertainty of the reasons for having them, they are not removed.

length (which( abs(zscores_miles) >3 ))
## [1] 22

Descriptive Statistics and Visualisation Contd: (preprocessing of data)

hist(uber$MILES., col = "yellow")

Descriptive Statistics and Visualisation Contd: (transformation of data)

uber$MILES_transformed <- log(uber$MILES.)


par(mfrow=c(2,2)) 
hist(uber$MILES., col = "yellow", main = "Histogram for Uber Miles", xlab = "Uber Miles Before")
hist(uber$MILES_transformed, col = "yellow", main = "Histogram for Transformed Uber Miles", xlab = "Uber Miles After Transformation")

Hypothesis Testing

\(H_0: \mu = 1.791759\) (log 6)

\(H_A: \mu \ne \ 1.791759\)(log 6)

log(6)
## [1] 1.791759

One sample t-test assumes that:

  1. Population standard deviation is unknown which is true in this case.

  2. The data is normally distributed.

Hypothesis Testing Contd:

\(\alpha = 0.05\)

We define “unusual” as there being a less than 5% chance for a result to occur, or a result even more extreme, assuming is true. We will call this 5% the significance (\(\alpha\) ) level of the test.

Reject \(H_0\) :

if t- critical value is beyond -1.96 to +1.96

if p-value < 0.05(\(\alpha\) significance level)

if CI of the mean of miles does not capture \(H_0: \mu = 6\)

otherwise, fail to reject \(H_0\)

Hypthesis Testing Cont.

qt(p = 0.05/2, df = 1155-1, lower.tail = TRUE)
## [1] -1.962022

The One sample t-test is very simple to perform in R, it gives us the result for all the three methods (namely - t-critical value, p-value method, and confidence interval method) which lets us decide whether the we Reject or Fail to Reject the Null on the basis of the decision rules mentioned before.

t.test(uber$MILES_transformed, mu = log(6), alternative="two.sided")
## 
##  One Sample t-test
## 
## data:  uber$MILES_transformed
## t = -1.374, df = 1154, p-value = 0.1697
## alternative hypothesis: true mean is not equal to 1.791759
## 95 percent confidence interval:
##  1.69446 1.80891
## sample estimates:
## mean of x 
##  1.751685

Hypthesis Testing Cont.

t-statistic: -1.374

P-value: 0.1697(almost 0.17)

Confidence interval: lower bound: 1.69; upper bound: 1.81

Discussion

Discussion (contd):

Strengths:

  1. Collected from three geographically and demographically different countries namely USA, Pakistan and Srilanka.

  2. Taken over a period of one year from January to December 2016 that takes in to account any temporary fluctuation due to external factors like weather, vacations, festivals etc.

  3. Random people were picked up from random locations,taking uber for various purposes.

Limitations:

Discussion (contd):

Direction for future investigations:

Through the hypothesis testing on the sample, the sample mean of length of the trip, 10.57(miles) (transformed mean : 1.751685) is not found to be unusual and comes from the population whose length of trips(in miles) are on average(\(\mu = 6\)).

References