Week 1 Data Dive

Importing libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)

Importing data

df <- read.csv('Auto Sales data.csv')
df |>
  head(20)
##    ORDERNUMBER QUANTITYORDERED PRICEEACH ORDERLINENUMBER   SALES  ORDERDATE
## 1        10107              30     95.70               2 2871.00 24/02/2018
## 2        10121              34     81.35               5 2765.90 07/05/2018
## 3        10134              41     94.74               2 3884.34 01/07/2018
## 4        10145              45     83.26               6 3746.70 25/08/2018
## 5        10168              36     96.66               1 3479.76 28/10/2018
## 6        10180              29     86.13               9 2497.77 11/11/2018
## 7        10188              48    114.84               1 5512.32 18/11/2018
## 8        10211              41    114.84              14 4708.44 15/01/2019
## 9        10223              37    107.18               1 3965.66 20/02/2019
## 10       10237              23    101.44               7 2333.12 05/04/2019
## 11       10251              28    113.88               2 3188.64 18/05/2019
## 12       10263              34    108.14               2 3676.76 28/06/2019
## 13       10275              45     92.83               1 4177.35 23/07/2019
## 14       10285              36    113.88               6 4099.68 27/08/2019
## 15       10299              23    112.93               9 2597.39 30/09/2019
## 16       10309              41    107.18               5 4394.38 15/10/2019
## 17       10318              46     94.74               1 4358.04 02/11/2019
## 18       10329              42    104.67               1 4396.14 15/11/2019
## 19       10341              41    188.73               9 7737.93 24/11/2019
## 20       10361              20     72.55              13 1451.00 17/12/2019
##    DAYS_SINCE_LASTORDER  STATUS PRODUCTLINE MSRP PRODUCTCODE
## 1                   828 Shipped Motorcycles   95    S10_1678
## 2                   757 Shipped Motorcycles   95    S10_1678
## 3                   703 Shipped Motorcycles   95    S10_1678
## 4                   649 Shipped Motorcycles   95    S10_1678
## 5                   586 Shipped Motorcycles   95    S10_1678
## 6                   573 Shipped Motorcycles   95    S10_1678
## 7                   567 Shipped Motorcycles   95    S10_1678
## 8                   510 Shipped Motorcycles   95    S10_1678
## 9                   475 Shipped Motorcycles   95    S10_1678
## 10                  432 Shipped Motorcycles   95    S10_1678
## 11                  390 Shipped Motorcycles   95    S10_1678
## 12                  350 Shipped Motorcycles   95    S10_1678
## 13                  326 Shipped Motorcycles   95    S10_1678
## 14                  292 Shipped Motorcycles   95    S10_1678
## 15                  259 Shipped Motorcycles   95    S10_1678
## 16                  245 Shipped Motorcycles   95    S10_1678
## 17                  228 Shipped Motorcycles   95    S10_1678
## 18                  216 Shipped Motorcycles   95    S10_1678
## 19                  208 Shipped Motorcycles   95    S10_1678
## 20                  186 Shipped Motorcycles   95    S10_1678
##                  CUSTOMERNAME            PHONE
## 1           Land of Toys Inc.       2125557818
## 2          Reims Collectables       26.47.1555
## 3             Lyon Souveniers +33 1 46 62 7555
## 4           Toys4GrownUps.com       6265557265
## 5        Technics Stores Inc.       6505556809
## 6    Daedalus Designs Imports       20.16.1555
## 7                Herkku Gifts    +47 2267 3215
## 8            Auto Canal Petit   (1) 47.55.6555
## 9  Australian Collectors, Co.     03 9520 4555
## 10            Vitachrome Inc.       2125551500
## 11    Tekni Collectables Inc.       2015559350
## 12            Gift Depot Inc.       2035552570
## 13          La Rochelle Gifts       40.67.8555
## 14       Marta's Replicas Co.       6175558555
## 15       Toys of Finland, Co.      90-224 8555
## 16         Baane Mini Imports       07-98 9555
## 17      Diecast Classics Inc.       2155551555
## 18          Land of Toys Inc.       2125557818
## 19      Salzburg Collectables        6562-9555
## 20  Souveniers And Things Co.  +61 2 9495 8555
##                               ADDRESSLINE1        CITY POSTALCODE   COUNTRY
## 1                  897 Long Airport Avenue         NYC      10022       USA
## 2                       59 rue de l'Abbaye       Reims      51100    France
## 3            27 rue du Colonel Pierre Avia       Paris      75508    France
## 4                       78934 Hillside Dr.    Pasadena      90003       USA
## 5                        9408 Furth Circle  Burlingame      94217       USA
## 6                  184, chausse de Tournai       Lille      59000    France
## 7              Drammen 121, PR 744 Sentrum      Bergen     N 5804    Norway
## 8                        25, rue Lauriston       Paris      75016    France
## 9                        636 St Kilda Road   Melbourne       3004 Australia
## 10                       2678 Kingston Rd.         NYC      10022       USA
## 11                           7476 Moss Rd.      Newark      94019       USA
## 12                     25593 South Bay Ln. Bridgewater      97562       USA
## 13            67, rue des Cinquante Otages      Nantes      44000    France
## 14                     39323 Spinnaker Dr.   Cambridge      51247       USA
## 15                           Keskuskatu 45    Helsinki      21240   Finland
## 16                  Erling Skakkes gate 78     Stavern       4110    Norway
## 17                        7586 Pompton St.   Allentown      70267       USA
## 18                 897 Long Airport Avenue         NYC      10022       USA
## 19                             Geislweg 14    Salzburg       5020   Austria
## 20 Monitor Money Building, 815 Pacific Hwy   Chatswood       2067 Australia
##    CONTACTLASTNAME CONTACTFIRSTNAME DEALSIZE
## 1               Yu             Kwai    Small
## 2          Henriot             Paul    Small
## 3         Da Cunha           Daniel   Medium
## 4            Young            Julie   Medium
## 5           Hirano             Juri   Medium
## 6            Rance          Martine    Small
## 7           Oeztan           Veysel   Medium
## 8          Perrier        Dominique   Medium
## 9         Ferguson            Peter   Medium
## 10           Frick          Michael    Small
## 11           Brown          William   Medium
## 12            King            Julie   Medium
## 13         Labrune           Janine   Medium
## 14       Hernandez            Marta   Medium
## 15       Karttunen            Matti    Small
## 16      Bergulfsen            Jonas   Medium
## 17              Yu            Kyung   Medium
## 18              Yu             Kwai   Medium
## 19           Pipps            Georg    Large
## 20          Huxley           Adrian    Small

Numeric Summary

This summary looks at the PRICEEACH column of the data. Below, I’ve found the minimum, maximum and mean values.

min(df$PRICEEACH)
## [1] 26.88
mean(df$PRICEEACH)
## [1] 101.099
max(df$PRICEEACH)
## [1] 252.87

The unit’s probably not dollars, either… most cars that I’m used to seeing are around the $30k mark. What metric could it be using?

Categorical Summary

This summary looks at the STATUS column of the data - and all of its unique values.

unique(df$STATUS)
## [1] "Shipped"    "Disputed"   "In Process" "Cancelled"  "On Hold"   
## [6] "Resolved"

This is a recent record, so hopefully most of them have been shipped.

Novel Investigations

  • Which column would have the strongest relationship with the current shipment status? The price of a single unit of automobile? Or would using the total price of the shipment be better… or maybe just the date that it was ordered?

  • What sort of relationships can I draw between the colums?

  • I wonder if the maximum or minimum price per unit is an outlier…

    The Interquartile Range Rule suggests a distance from the mean where a point becomes an outlier. I think it was 150% of the difference between the 3rd and 1st quartiles.

    midSpread <- 1.5 * (quantile(df$PRICEEACH, .75) - quantile(df$PRICEEACH, .25))
    midSpread
    ##     75% 
    ## 87.5325

    Now that I’ve found the IQR, I added and subtracted it from the mean.

    mean(df$PRICEEACH) + midSpread
    ##      75% 
    ## 188.6315
    mean(df$PRICEEACH) - midSpread
    ##      75% 
    ## 13.56645

    The lower value’s less than the miniumum, yet the higher’s value’s also less than the maximum - so the max would be an outlier from the IQR rule.

    I guess that makes sense - just what luxury cars will cost you!

Visual Representations

I chose to create a histogram for the PRICEEACH column, as I had already found the variables for a boxplot.

ggplot(
  data = df,
  mapping = aes(x = PRICEEACH)
) +
  geom_histogram(
    binwidth = 10
  )

I expected outliers in the maximum, but I didn’t expect the graph to look that right-skewed. Maybe most automobile shipments follow this trend…?

I created a histogram for the STATUS column, too.

ggplot(
  data = df,
  mapping = aes(x = STATUS)
) +
  geom_bar()

Looks like they get their job done most of the time with relatively few disputes. They must be a trustworthy shipping company, even if this isn’t a yearly record.