DATA 3010 Final

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Importing Libraries and Data

Data-set Description

[1] "This is how many different industries are in this dataset:"
[1] 7
[1] "This is how many product categories in the dataset:"
[1] 7
[1] "This is how many different countries suppiers are from:"
[1] 8
[1] "These are the different supplier certifactions:"
[1] "None"      "ISO 9001"  "ISO 14001" "ISO 45001"
[1] "This is the number of shipping modes for sales:"
[1] 4
[1] "This is the number of carriers for sales:"
[1] 5

Variable Description Table

          variable_name         dataset
1 On_Time_Delivery_Rate supplier_master
2   Certification_Level supplier_master
3              Category  product_master
4             Unit_Cost  product_master
5        Order_Quantity    sales_orders
6            Unit_Price    sales_orders
7              Discount    sales_orders
8           Order_Total    sales_orders
9      Profit_Per_Order    sales_orders
                                        description             general_type
1            Historical on-time delivery percentage quantitative - continous
2 Supplier certification level (ISO standards etc.)    categorical - nominal
3                                  Product Category    categorical - nominal
4            Cost per unit (raw manufacturing cost) quantitative - continous
5                         Amount of Product ordered  quantitative - discrete
6          Standard price per unit at time of order quantitative - continous
7           Discount applied as a fraction of price quantitative - continous
8       Total revenue after discount, excluding VAT quantitative - continous
9       Profit for the order after subtracting COGS quantitative - continous
       units
1 Percentage
2        N/A
3        N/A
4        USD
5     Pieces
6        USD
7 Proportion
8        USD
9        USD
# A tibble: 1 × 4
  variable        mean median    sd
  <chr>          <dbl>  <dbl> <dbl>
1 Delivery Rate:  80.1     81  11.8
# A tibble: 1 × 4
  variable    mean median    sd
  <chr>      <dbl>  <dbl> <dbl>
1 Unit Cost:  100.   102.  59.5
# A tibble: 1 × 4
  variable     mean median    sd
  <chr>       <dbl>  <dbl> <dbl>
1 Unit Price:  177.   176.  70.7
# A tibble: 1 × 4
  variable   mean median     sd
  <chr>     <dbl>  <dbl>  <dbl>
1 Discount: 0.125   0.13 0.0722
# A tibble: 1 × 4
  variable      mean median    sd
  <chr>        <dbl>  <dbl> <dbl>
1 Order Total:  776.   703.  459.
# A tibble: 1 × 4
  variable           mean median    sd
  <chr>             <dbl>  <dbl> <dbl>
1 Profit Per Order:  280.   240.  243.

Interpretation:

The average on-time delivery rate was 80.07, the median was 81 and the standard deviation was 11.77. This indicates that on average the suppliers had on time deliveries around 80% of the time. The average Unit Cost is 100 dollars with a standard deviation of 59.45. This shows that there is a large variation in Unit Cost. The average Unit Price is 177.0 dollars with a standard deviation of 70.75. This

cat_summary <- function(df, var) {
  freq_tbl <- table(df[[var]])
  
  list(
    frequency_table = as.data.frame(freq_tbl),
    mode = names(freq_tbl)[which.max(freq_tbl)]
  )
}
cat_summary(supplier_master, "Certification_Level")
$frequency_table
       Var1 Freq
1 ISO 14001   25
2 ISO 45001   20
3  ISO 9001   34
4      None   21

$mode
[1] "ISO 9001"
cat_summary(product_master, "Category")
$frequency_table
         Var1 Freq
1     Apparel   27
2      Beauty   34
3 Electronics   24
4        Food   37
5        Home   26
6  Industrial   23
7       Sport   29

$mode
[1] "Food"
supplier_master$otd_category <- cut(
  supplier_master$On_Time_Delivery_Rate,
  breaks = c(0, 60, 70, 80, 90, 100),
  include.lowest = TRUE,
  right = FALSE,
  labels = c("Very Poor", "Poor", "Medium", "Good", "Great")
)
table(supplier_master$Certification_Level, supplier_master$otd_category)
           
            Very Poor Poor Medium Good Great
  ISO 14001         0    9      4    9     3
  ISO 45001         0    5      2    4     9
  ISO 9001          0    6      8   11     9
  None              0    3     10    3     5
round(prop.table(table(supplier_master$Certification_Level, supplier_master$otd_category), margin = 1) * 100, 2)
           
            Very Poor  Poor Medium  Good Great
  ISO 14001      0.00 36.00  16.00 36.00 12.00
  ISO 45001      0.00 25.00  10.00 20.00 45.00
  ISO 9001       0.00 17.65  23.53 32.35 26.47
  None           0.00 14.29  47.62 14.29 23.81
round(prop.table(table(supplier_master$Certification_Level, supplier_master$otd_category), margin = 2) * 100, 2)
           
            Very Poor  Poor Medium  Good Great
  ISO 14001           39.13  16.67 33.33 11.54
  ISO 45001           21.74   8.33 14.81 34.62
  ISO 9001            26.09  33.33 40.74 34.62
  None                13.04  41.67 11.11 19.23

# A tibble: 7 × 8
  Category     mean meadian    sd   min    q1    q3   max
  <chr>       <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Apparel      91.1    94.4  66.4  8.55  24.4  147.  199.
2 Beauty      110.    128.   60.6  7.21  63.6  156.  198.
3 Electronics 110.    123.   57.0 12.9   75.1  154.  190.
4 Food        116.    129.   62.1  7.81  67.2  170.  197.
5 Home         93.5    89.2  56.1  7.11  62.5  131.  194.
6 Industrial   81.3    75.2  57.7  9.54  31.8  128.  195.
7 Sport        88.9    80.3  50.6 11.0   48.6  119.  190.
# A tibble: 7 × 8
  Category        n  mean    sd    se t_star lower upper
  <chr>       <int> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1 Apparel        27  91.1  66.4 12.8    2.06  64.8  117.
2 Beauty         30 108.   63.3 11.6    2.05  84.2  131.
3 Electronics    24 110.   57.0 11.6    2.07  86.2  134.
4 Food           30 123.   58.9 10.8    2.05 101.   145.
5 Home           26  93.5  56.1 11.0    2.06  70.9  116.
6 Industrial     23  81.3  57.7 12.0    2.07  56.3  106.
7 Sport          29  88.9  50.6  9.40   2.05  69.7  108.

Call:
lm(formula = Discount ~ Order_Quantity, data = sales_orders)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.12787 -0.06448  0.00251  0.06411  0.12627 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.1233518  0.0013774  89.554   <2e-16 ***
Order_Quantity 0.0003762  0.0002553   1.473    0.141    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.07216 on 19998 degrees of freedom
Multiple R-squared:  0.0001086, Adjusted R-squared:  5.855e-05 
F-statistic: 2.171 on 1 and 19998 DF,  p-value: 0.1406
`geom_smooth()` using formula = 'y ~ x'