In this blog, I am going to talk about usage of paired sample t-test. It is a very useful tool especially in the situations where you have to calculate any improvements letโs say if marketing promotion caused in increase of sales where you check the average sales before and after marketing promotion. In this blog, I am going to show whether reducing in speed limit in NYC caused lowering number of accidents or not.
library(tidyverse)
library(reshape2)
before_speedlimit <- crash_joined %>%
filter(`CRASH DATE` >= "2012-11-07" & `CRASH DATE` <= "2014-11-06") %>%
select(`ON STREET NAME`, `CRASH DATE`) %>%
group_by(`ON STREET NAME`) %>% tally() %>%
arrange(`ON STREET NAME`) %>%
rename(`Frequency_before` = n)
after_speedlimit <- crash_joined %>%
filter(`CRASH DATE` >= "2014-11-07" & `CRASH DATE` <= "2016-11-06") %>%
select(`ON STREET NAME`, `CRASH DATE`) %>%
group_by(`ON STREET NAME`) %>% tally() %>%
arrange(`ON STREET NAME`) %>%
rename(`Frequency_after` = n)
speedlimit_ttest <- merge(before_speedlimit, after_speedlimit, by="ON STREET NAME") %>%
filter(!is.na(`ON STREET NAME`)) %>%
melt(., id.vars=c("ON STREET NAME"), variable.name = "Before and After")
ggplot(speedlimit_ttest, aes(x=`Before and After`, y=log10(value)))+geom_boxplot(fill='steelblue')+
labs(title="Mean of Before and After Reducing Speed limit to 25MPH", x="Before and After",y="Value - Log Transformed")+
theme_classic()
t.test(speedlimit_ttest$value ~ speedlimit_ttest$`Before and After`, paired= TRUE, conf.level=0.95)
##
## Paired t-test
##
## data: speedlimit_ttest$value by speedlimit_ttest$`Before and After`
## t = 0.23936, df = 5238, p-value = 0.8108
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.235193 1.578770
## sample estimates:
## mean of the differences
## 0.1717885
Boxplot shows that there is no difference in reducing the number of accidents before and after reducing the speed limit to 25 MPH in NYC. Paired sample t-test also confirms that. The mean difference between them is 0.17 which is very minor and hypothesis is rejected in favor of stating that there is no significant difference in the average of these two groups.