library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
df <- read.csv('Auto Sales data.csv')
df |>
head(20)
## ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER SALES ORDERDATE
## 1 10107 30 95.70 2 2871.00 24/02/2018
## 2 10121 34 81.35 5 2765.90 07/05/2018
## 3 10134 41 94.74 2 3884.34 01/07/2018
## 4 10145 45 83.26 6 3746.70 25/08/2018
## 5 10168 36 96.66 1 3479.76 28/10/2018
## 6 10180 29 86.13 9 2497.77 11/11/2018
## 7 10188 48 114.84 1 5512.32 18/11/2018
## 8 10211 41 114.84 14 4708.44 15/01/2019
## 9 10223 37 107.18 1 3965.66 20/02/2019
## 10 10237 23 101.44 7 2333.12 05/04/2019
## 11 10251 28 113.88 2 3188.64 18/05/2019
## 12 10263 34 108.14 2 3676.76 28/06/2019
## 13 10275 45 92.83 1 4177.35 23/07/2019
## 14 10285 36 113.88 6 4099.68 27/08/2019
## 15 10299 23 112.93 9 2597.39 30/09/2019
## 16 10309 41 107.18 5 4394.38 15/10/2019
## 17 10318 46 94.74 1 4358.04 02/11/2019
## 18 10329 42 104.67 1 4396.14 15/11/2019
## 19 10341 41 188.73 9 7737.93 24/11/2019
## 20 10361 20 72.55 13 1451.00 17/12/2019
## DAYS_SINCE_LASTORDER STATUS PRODUCTLINE MSRP PRODUCTCODE
## 1 828 Shipped Motorcycles 95 S10_1678
## 2 757 Shipped Motorcycles 95 S10_1678
## 3 703 Shipped Motorcycles 95 S10_1678
## 4 649 Shipped Motorcycles 95 S10_1678
## 5 586 Shipped Motorcycles 95 S10_1678
## 6 573 Shipped Motorcycles 95 S10_1678
## 7 567 Shipped Motorcycles 95 S10_1678
## 8 510 Shipped Motorcycles 95 S10_1678
## 9 475 Shipped Motorcycles 95 S10_1678
## 10 432 Shipped Motorcycles 95 S10_1678
## 11 390 Shipped Motorcycles 95 S10_1678
## 12 350 Shipped Motorcycles 95 S10_1678
## 13 326 Shipped Motorcycles 95 S10_1678
## 14 292 Shipped Motorcycles 95 S10_1678
## 15 259 Shipped Motorcycles 95 S10_1678
## 16 245 Shipped Motorcycles 95 S10_1678
## 17 228 Shipped Motorcycles 95 S10_1678
## 18 216 Shipped Motorcycles 95 S10_1678
## 19 208 Shipped Motorcycles 95 S10_1678
## 20 186 Shipped Motorcycles 95 S10_1678
## CUSTOMERNAME PHONE
## 1 Land of Toys Inc. 2125557818
## 2 Reims Collectables 26.47.1555
## 3 Lyon Souveniers +33 1 46 62 7555
## 4 Toys4GrownUps.com 6265557265
## 5 Technics Stores Inc. 6505556809
## 6 Daedalus Designs Imports 20.16.1555
## 7 Herkku Gifts +47 2267 3215
## 8 Auto Canal Petit (1) 47.55.6555
## 9 Australian Collectors, Co. 03 9520 4555
## 10 Vitachrome Inc. 2125551500
## 11 Tekni Collectables Inc. 2015559350
## 12 Gift Depot Inc. 2035552570
## 13 La Rochelle Gifts 40.67.8555
## 14 Marta's Replicas Co. 6175558555
## 15 Toys of Finland, Co. 90-224 8555
## 16 Baane Mini Imports 07-98 9555
## 17 Diecast Classics Inc. 2155551555
## 18 Land of Toys Inc. 2125557818
## 19 Salzburg Collectables 6562-9555
## 20 Souveniers And Things Co. +61 2 9495 8555
## ADDRESSLINE1 CITY POSTALCODE COUNTRY
## 1 897 Long Airport Avenue NYC 10022 USA
## 2 59 rue de l'Abbaye Reims 51100 France
## 3 27 rue du Colonel Pierre Avia Paris 75508 France
## 4 78934 Hillside Dr. Pasadena 90003 USA
## 5 9408 Furth Circle Burlingame 94217 USA
## 6 184, chausse de Tournai Lille 59000 France
## 7 Drammen 121, PR 744 Sentrum Bergen N 5804 Norway
## 8 25, rue Lauriston Paris 75016 France
## 9 636 St Kilda Road Melbourne 3004 Australia
## 10 2678 Kingston Rd. NYC 10022 USA
## 11 7476 Moss Rd. Newark 94019 USA
## 12 25593 South Bay Ln. Bridgewater 97562 USA
## 13 67, rue des Cinquante Otages Nantes 44000 France
## 14 39323 Spinnaker Dr. Cambridge 51247 USA
## 15 Keskuskatu 45 Helsinki 21240 Finland
## 16 Erling Skakkes gate 78 Stavern 4110 Norway
## 17 7586 Pompton St. Allentown 70267 USA
## 18 897 Long Airport Avenue NYC 10022 USA
## 19 Geislweg 14 Salzburg 5020 Austria
## 20 Monitor Money Building, 815 Pacific Hwy Chatswood 2067 Australia
## CONTACTLASTNAME CONTACTFIRSTNAME DEALSIZE
## 1 Yu Kwai Small
## 2 Henriot Paul Small
## 3 Da Cunha Daniel Medium
## 4 Young Julie Medium
## 5 Hirano Juri Medium
## 6 Rance Martine Small
## 7 Oeztan Veysel Medium
## 8 Perrier Dominique Medium
## 9 Ferguson Peter Medium
## 10 Frick Michael Small
## 11 Brown William Medium
## 12 King Julie Medium
## 13 Labrune Janine Medium
## 14 Hernandez Marta Medium
## 15 Karttunen Matti Small
## 16 Bergulfsen Jonas Medium
## 17 Yu Kyung Medium
## 18 Yu Kwai Medium
## 19 Pipps Georg Large
## 20 Huxley Adrian Small
This summary looks at the PRICEEACH column of the data. Below, I’ve found the minimum, maximum and mean values.
min(df$PRICEEACH)
## [1] 26.88
mean(df$PRICEEACH)
## [1] 101.099
max(df$PRICEEACH)
## [1] 252.87
The unit’s probably not dollars, either… most cars that I’m used to seeing are around the $30k mark. What metric could it be using?
This summary looks at the STATUS column of the data - and all of its unique values.
unique(df$STATUS)
## [1] "Shipped" "Disputed" "In Process" "Cancelled" "On Hold"
## [6] "Resolved"
This is a recent record, so hopefully most of them have been shipped.
Which column would have the strongest relationship with the current shipment status? The price of a single unit of automobile? Or would using the total price of the shipment be better… or maybe just the date that it was ordered?
What sort of relationships can I draw between the colums?
I wonder if the maximum or minimum price per unit is an outlier…
The Interquartile Range Rule suggests a distance from the mean where a point becomes an outlier. I think it was 150% of the difference between the 3rd and 1st quartiles.
midSpread <- 1.5 * (quantile(df$PRICEEACH, .75) - quantile(df$PRICEEACH, .25))
midSpread
## 75%
## 87.5325
Now that I’ve found the IQR, I added and subtracted it from the mean.
mean(df$PRICEEACH) + midSpread
## 75%
## 188.6315
mean(df$PRICEEACH) - midSpread
## 75%
## 13.56645
The lower value’s less than the miniumum, yet the higher’s value’s also less than the maximum - so the max would be an outlier from the IQR rule.
I guess that makes sense - just what luxury cars will cost you!
I chose to create a histogram for the PRICEEACH column, as I had already found the variables for a boxplot.
ggplot(
data = df,
mapping = aes(x = PRICEEACH)
) +
geom_histogram(
binwidth = 10
)
I expected outliers in the maximum, but I didn’t expect the graph to look that right-skewed. Maybe most automobile shipments follow this trend…?
I created a histogram for the STATUS column, too.
ggplot(
data = df,
mapping = aes(x = STATUS)
) +
geom_bar()
Looks like they get their job done most of the time with relatively few disputes. They must be a trustworthy shipping company, even if this isn’t a yearly record.