# Load Libriaries
library(tidyverse)
library(knitr)
# Read data
super <- read.csv("2013Coupes.csv")
dat <- filter(super,Price < 100000)
# Find Mean, Median and Standard Deviation of data w/o supercar and assign it to a variable name
reg <- dat %>%
summarise(Mean = mean(Price), Median = median(Price), Standard_Dev = sd(Price))
# Find Mean, Median and Standard Deviation of data w/ supercar and assign it to a variable name
wporche <- super %>%
summarise(Mean = mean(Price), Median = median(Price), Standard_Dev = sd(Price))
# In the darkness bind them
w2 <- rbind(reg,wporche)
#Display Summary Statistics
kable(w2)
| Mean | Median | Standard_Dev |
|---|---|---|
| 21145.8 | 20992 | 6193.661 |
| 155587.1 | 21807 | 445930.032 |
Row 1 is the summary statistics for the data set without the supercar. Row 2 is the summary statistics for the data set with the supercar. Three tests can provide information about the mean regarding this data. These three tests are the skew test, the outlier test, and the deviation test.
Data sets are typically skewed left if mean < median < mode. Data sets are typically skewed right if mode < median < mean. The data set without the supercar evaluates mean > median, suggesting the data is skewed right. This holds true for the supercar data set, but here the mean exceeds the median by more than 7x, indicating it is skewed far to the right.
Potential outliers can be identified in a data set by calculating IQR, multiplying this value by 1.5, and then adding or subracting that value from Q3 and Q1 respectively. Any results returned by the test are potential outliers.
q3outlier_value <- quantile(dat$Price,.75) + IQR(dat$Price) * 1.5
q1outlier_value <- quantile(dat$Price,.25) - IQR(dat$Price) * 1.5
outlier_test <- filter(dat,Price > q3outlier_value | Price < q1outlier_value)
kable(outlier_test)
| Vehicle.Type | Year | Make | Model | Price | MPG..city. | MPG..highway. | Horsepower | Cylinders |
|---|
The test returned no values, indicating that potential outliers were not identified using this methodology.
sq3outlier_value <- quantile(super$Price,.75) + IQR(super$Price) * 1.5
sq1outlier_value <- quantile(super$Price,.25) - IQR(super$Price) * 1.5
s_outlier_test <- filter(super,Price > sq3outlier_value | Price < sq1outlier_value)
kable(s_outlier_test)
| Vehicle.Type | Year | Make | Model | Price | MPG..city. | MPG..highway. | Horsepower | Cylinders |
|---|---|---|---|---|---|---|---|---|
| Coupe | 2013 | Porsche | 918 Spider | 1500000 | 22 | 22 | 875 | 8 |
The test returned the supercar, indicating it as a potential outlier.
Values in a dataset that are 2 or more standard deviations from the mean are considered “far” from the mean. Any results returned from the test will be considered far from the mean using this methodology.
sdtest <- filter(dat,Price < 21145.8-6193.661*2 | Price > 21145.8+6193.661*2)
kable(sdtest)
| Vehicle.Type | Year | Make | Model | Price | MPG..city. | MPG..highway. | Horsepower | Cylinders |
|---|
None of the data points are considered far from the mean.
sdtest2 <- filter(super,Price < 155587.1-445930.032*2 | Price > 155587.1+445930.032*2)
kable(sdtest2)
| Vehicle.Type | Year | Make | Model | Price | MPG..city. | MPG..highway. | Horsepower | Cylinders |
|---|---|---|---|---|---|---|---|---|
| Coupe | 2013 | Porsche | 918 Spider | 1500000 | 22 | 22 | 875 | 8 |
The supercar is far from the mean.
The data set without the supercar is skewed a little to the right, potential outliers were not identified, and all values are within two standard deviations of the mean. The data set with the supercar is skewed far to the right, and the supercar returned as both a potential outlier and far from the mean. I was surprised the change in standard deviation was higher than the change in mean, and found this to be a valuable exercise.
kable(super)
| Vehicle.Type | Year | Make | Model | Price | MPG..city. | MPG..highway. | Horsepower | Cylinders |
|---|---|---|---|---|---|---|---|---|
| Coupe | 2013 | Jaguar | XK | 21807 | 16 | 24 | 385 | 8 |
| Coupe | 2013 | Chevrolet | Camero | 27795 | 15 | 24 | 426 | 8 |
| Coupe | 2013 | Ford | Mustang | 29145 | 15 | 26 | 420 | 8 |
| Coupe | 2013 | Mercedes | E550 | 14403 | 17 | 27 | 402 | 8 |
| Coupe | 2013 | Audi | S5 | 17209 | 18 | 28 | 333 | 6 |
| Coupe | 2013 | BMW | M3 | 25732 | 14 | 20 | 414 | 8 |
| Coupe | 2013 | Mini | Coupe 2D | 13674 | 26 | 35 | 208 | 4 |
| Coupe | 2013 | Dodge | Challenger | 13774 | 16 | 25 | 375 | 8 |
| Coupe | 2013 | Cadillac | CTS-V | 27742 | 12 | 18 | 556 | 8 |
| Coupe | 2013 | Nissan | 370Z | 20177 | 19 | 26 | 332 | 6 |
| Coupe | 2013 | Porsche | 918 Spider | 1500000 | 22 | 22 | 875 | 8 |