[Dataset] (https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/boot/amis.csv)

Description

In a study into the effect that warning signs have on speeding patterns, Cambridgeshire County Council considered 14 pairs of locations. The locations were paired to account for factors such as traffic volume and type of road. One site in each pair had a sign erected warning of the dangers of speeding and asking drivers to slow down. No action was taken at the second site. Three sets of measurements were taken at each site. Each set of measurements was nominally of the speeds of 100 cars but not all sites have exactly 100 measurements. These speed measurements were taken before the erection of the sign, shortly after the erection of the sign, and again after the sign had been in place for some time.

Usage

amis

Format

This data frame contains the following columns:

speed
Speeds of cars (in miles per hour).

period
A numeric column indicating the time that the reading was taken. A value of 1 indicates a reading taken before the sign was erected, a 2 indicates a reading taken shortly after erection of the sign and a 3 indicates a reading taken after the sign had been in place for some time.

warning
A numeric column indicating whether the location of the reading was chosen to have a warning sign erected. A value of 1 indicates presence of a sign and a value of 2 indicates that no sign was erected.

pair
A numeric column giving the pair number at which the reading was taken. Pairs were numbered from 1 to 14.

#install.packages("RCurl")
library(RCurl)
gitHubUrl <- getURL("https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/boot/amis.csv")
speedingPattern <- read.csv(text = gitHubUrl)
head(speedingPattern)
##   X speed period warning pair
## 1 1    26      1       1    1
## 2 2    26      1       1    1
## 3 3    26      1       1    1
## 4 4    26      1       1    1
## 5 5    27      1       1    1
## 6 6    28      1       1    1
str(speedingPattern)
## 'data.frame':    8437 obs. of  5 variables:
##  $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ speed  : int  26 26 26 26 27 28 28 28 28 29 ...
##  $ period : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ warning: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pair   : int  1 1 1 1 1 1 1 1 1 1 ...
summary(speedingPattern)
##        X            speed           period         warning     
##  Min.   :   1   Min.   :19.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2110   1st Qu.:33.00   1st Qu.:1.000   1st Qu.:1.000  
##  Median :4219   Median :37.00   Median :2.000   Median :2.000  
##  Mean   :4219   Mean   :37.82   Mean   :2.004   Mean   :1.507  
##  3rd Qu.:6328   3rd Qu.:42.00   3rd Qu.:3.000   3rd Qu.:2.000  
##  Max.   :8437   Max.   :67.00   Max.   :3.000   Max.   :2.000  
##       pair       
##  Min.   : 1.000  
##  1st Qu.: 4.000  
##  Median : 8.000  
##  Mean   : 7.559  
##  3rd Qu.:11.000  
##  Max.   :14.000

The dataset contains speeding measurements taken at different locations (pairs) which are numbered from 1 to 14. For each pair some measurements don’t have any warning sign and some have warning sign. Now let’s analyze the speed for various situations for location 1.

Dataset for pair 1 with no warning sign

no_warning_pair_1 <- speedingPattern[which(speedingPattern$warning == 2 & speedingPattern$pair == 1), ]
summary(no_warning_pair_1)
##        X              speed           period     warning       pair  
##  Min.   : 201.0   Min.   :21.00   Min.   :1   Min.   :2   Min.   :1  
##  1st Qu.: 275.8   1st Qu.:32.00   1st Qu.:1   1st Qu.:2   1st Qu.:1  
##  Median : 350.5   Median :36.00   Median :2   Median :2   Median :1  
##  Mean   :2117.2   Mean   :36.36   Mean   :2   Mean   :2   Mean   :1  
##  3rd Qu.:5725.2   3rd Qu.:40.00   3rd Qu.:3   3rd Qu.:2   3rd Qu.:1  
##  Max.   :5800.0   Max.   :56.00   Max.   :3   Max.   :2   Max.   :1

Dataset for pair 1 before warning sign

before_warning_pair_1 <- speedingPattern[which(speedingPattern$period == 1 & speedingPattern$warning == 1 & speedingPattern$pair == 1), ]
summary(before_warning_pair_1)
##        X              speed           period     warning       pair  
##  Min.   :  1.00   Min.   :26.00   Min.   :1   Min.   :1   Min.   :1  
##  1st Qu.: 25.75   1st Qu.:31.00   1st Qu.:1   1st Qu.:1   1st Qu.:1  
##  Median : 50.50   Median :34.00   Median :1   Median :1   Median :1  
##  Mean   : 50.50   Mean   :34.53   Mean   :1   Mean   :1   Mean   :1  
##  3rd Qu.: 75.25   3rd Qu.:37.25   3rd Qu.:1   3rd Qu.:1   3rd Qu.:1  
##  Max.   :100.00   Max.   :49.00   Max.   :1   Max.   :1   Max.   :1

Dataset for pair 1 after warning sign. Here the data set contains speed measurements right after the warning sign is erected and reading taken after the sign had been in place for some time

after_warning_pair_1 <- speedingPattern[which((speedingPattern$period == 2 | speedingPattern$period == 3) & speedingPattern$warning == 1 & speedingPattern$pair == 1), ]
summary(after_warning_pair_1)
##        X              speed           period       warning       pair  
##  Min.   : 101.0   Min.   :21.00   Min.   :2.0   Min.   :1   Min.   :1  
##  1st Qu.: 150.8   1st Qu.:32.00   1st Qu.:2.0   1st Qu.:1   1st Qu.:1  
##  Median :2900.5   Median :35.00   Median :2.5   Median :1   Median :1  
##  Mean   :2900.5   Mean   :35.37   Mean   :2.5   Mean   :1   Mean   :1  
##  3rd Qu.:5650.2   3rd Qu.:38.00   3rd Qu.:3.0   3rd Qu.:1   3rd Qu.:1  
##  Max.   :5700.0   Max.   :51.00   Max.   :3.0   Max.   :1   Max.   :1

Let’s plot the speed for various situations for pair 1

Box plot for data set with no warning

require(ggplot2)
ggplot(no_warning_pair_1,aes(y=speed, x=1)) + geom_boxplot()

Box plot for data set with before warning

ggplot(before_warning_pair_1,aes(y=speed, x=1)) + geom_boxplot()

Box plot for data set with after warning

ggplot(after_warning_pair_1,aes(y=speed, x=1)) + geom_boxplot()

Histogram for speed of all pairs without warning sign

speedingPatternNoWarning <- speedingPattern[which(speedingPattern$warning == 2), ]
ggplot(data=speedingPatternNoWarning) + geom_histogram(aes(x=speed))

Histogram for speed of all pairs before warning sign

speedingPatternBeforeWarning <- speedingPattern[which(speedingPattern$warning == 1 & speedingPattern$period == 1), ]
ggplot(data=speedingPatternBeforeWarning) + geom_histogram(aes(x=speed))

Histogram for speed of all pairs after warning sign

speedingPatternAfterWarning <- speedingPattern[which(speedingPattern$warning == 1 & (speedingPattern$period == 2 | speedingPattern$period == 3)), ]
ggplot(data=speedingPatternAfterWarning) + geom_histogram(aes(x=speed))

Scatterplot for pair vs speed for different periods

ggplot(speedingPattern, aes(x=pair, y=speed)) + geom_point(aes(color=period))

Barplot of speed vs count

ggplot(speedingPattern, aes(x=factor(speed))) + geom_bar(stat="count", width=0.7, fill="steelblue") + theme_minimal()

Analysis

We need to analyze if erecting warning signs has any effect on the speed of the vehicles. To do that we need to create a new data frame for mean speeds.

pairs <- unique(speedingPattern$pair)
no_warning <- rep(0, length(pairs))
before_warning <- rep(0, length(pairs))
after_warning <- rep(0, length(pairs))
for (i in pairs) {
        no_warning[i] <- mean(speedingPattern[which(speedingPattern$warning == 2 & speedingPattern$pair == i), "speed"])  
        before_warning[i] <- mean(speedingPattern[which(speedingPattern$period == 1 & speedingPattern$warning == 1 & speedingPattern$pair == i), "speed"])
        after_warning[i] <- mean(speedingPattern[which((speedingPattern$period == 2 | speedingPattern$period == 3) & speedingPattern$warning == 1 & speedingPattern$pair == i), "speed"])
}
mean_speeds <- data.frame(pair = pairs, no_warning = no_warning, before_warning = before_warning, after_warning = after_warning)
print(mean_speeds)
##    pair no_warning before_warning after_warning
## 1     1   36.36000          34.53      35.37000
## 2     2   33.69000          38.95      38.93000
## 3     3   37.66667          32.22      32.69543
## 4     4   36.85614          36.38      35.73077
## 5     5   39.35000          36.61      36.61749
## 6     6   39.63333          40.13      37.31500
## 7     7   41.92667          29.86      28.78000
## 8     8   38.35667          38.35      40.15500
## 9     9   38.81000          38.05      36.46500
## 10   10   35.83103          33.95      35.47000
## 11   11   34.37750          34.62      34.87500
## 12   12   42.48667          38.57      37.97500
## 13   14   46.36000          39.62      41.84500
## 14   13   45.32667          39.30      41.46000

We add a new column to the data frame for percentage change in speed before and after signs are erected.

mean_speeds$perc_change <- ((mean_speeds$after_warning - mean_speeds$before_warning) / mean_speeds$before_warning) * 100
print(mean_speeds)
##    pair no_warning before_warning after_warning perc_change
## 1     1   36.36000          34.53      35.37000  2.43266725
## 2     2   33.69000          38.95      38.93000 -0.05134788
## 3     3   37.66667          32.22      32.69543  1.47557875
## 4     4   36.85614          36.38      35.73077 -1.78458155
## 5     5   39.35000          36.61      36.61749  0.02044889
## 6     6   39.63333          40.13      37.31500 -7.01470222
## 7     7   41.92667          29.86      28.78000 -3.61687877
## 8     8   38.35667          38.35      40.15500  4.70664928
## 9     9   38.81000          38.05      36.46500 -4.16557162
## 10   10   35.83103          33.95      35.47000  4.47717231
## 11   11   34.37750          34.62      34.87500  0.73656846
## 12   12   42.48667          38.57      37.97500 -1.54264973
## 13   14   46.36000          39.62      41.84500  5.61585058
## 14   13   45.32667          39.30      41.46000  5.49618321

Let’s see how many locations have reduced speed after warning sign erection

pairs_with_reduced_speed <- nrow(mean_speeds[which(mean_speeds$perc_change < 0), ])
print(pairs_with_reduced_speed)
## [1] 6

What percentage of all locations have an effect on speed (ie reduced speed) with the experiment?

print(nrow(mean_speeds[which(mean_speeds$perc_change < 0), ]) / length(pairs))
## [1] 0.4285714

Which location has maximum reduction on speed?

print(mean_speeds[which.min(mean_speeds$perc_change), ])
##   pair no_warning before_warning after_warning perc_change
## 6    6   39.63333          40.13        37.315   -7.014702

Conclusion

The experiment shows less than half of the locations have shown effect on speed with warning sign. Location 6 has the maximum reduction (7.015 %) in speed after warning sign is erected.