Exploratory analysis and essay Pre-work Visit the following website and explore the range of sizes of this data set (from 100 to 5 million records): https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/ or (new) https://www.kaggle.com/datasets
1). Select 2 files to download 2). Based on your computer’s capabilities (memory, CPU), select 2 files you can handle (recommended one small, one large) 3). Download the files 4). Review the structure and content of the tables, and think about the data sets (structure, size, dependencies, labels, etc) 5). Consider the similarities and differences in the two data sets you have downloaded 6). Think about how to analyze and predict an outcome based on the data sets available Based on the data you have, think which two machine learning algorithms presented so far could be used to analyze the data.
#library(knitr)
#install.packages("tinytex")
#tinytex::install_tinytex()
I selected the sales data set and used two sizes small (100 sales records) and large (50,000 sales records). Please advise how to load large files to Github which appears to have a 25Mb size limit.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df_100_small <- read.csv("https://raw.githubusercontent.com/tponnada/hello-world/master/100%20Sales%20Records.csv")
df_50000_large <- read.csv("https://raw.githubusercontent.com/tponnada/hello-world/master/50000%20Sales%20Records.csv")
Once files are loaded, we proceed with reviewing the structure and content of the tables and consider the similarities/differences between the two data sets to understand which two machine learning algorithms can be applied in this context.
A visual examination tells us both the small and the large data set have the same variables, the same number of columns with the same exact data types (similarities) while the differences really come down to the number of rows in each file. Of the 14 variables in both data sets, the variable types are accurate for most variables with the exception of date which needed conversion from character to date in both data sets. This is accomplished in the data clean-up section below.
head(df_100_small)
## Region Country Item.Type
## 1 Australia and Oceania Tuvalu Baby Food
## 2 Central America and the Caribbean Grenada Cereal
## 3 Europe Russia Office Supplies
## 4 Sub-Saharan Africa Sao Tome and Principe Fruits
## 5 Sub-Saharan Africa Rwanda Office Supplies
## 6 Australia and Oceania Solomon Islands Baby Food
## Sales.Channel Order.Priority Order.Date Order.ID Ship.Date Units.Sold
## 1 Offline H 5/28/2010 669165933 6/27/2010 9925
## 2 Online C 8/22/2012 963881480 9/15/2012 2804
## 3 Offline L 5/2/2014 341417157 5/8/2014 1779
## 4 Online C 6/20/2014 514321792 7/5/2014 8102
## 5 Offline L 2/1/2013 115456712 2/6/2013 5062
## 6 Online C 2/4/2015 547995746 2/21/2015 2974
## Unit.Price Unit.Cost Total.Revenue Total.Cost Total.Profit
## 1 255.28 159.42 2533654.00 1582243.50 951410.50
## 2 205.70 117.11 576782.80 328376.44 248406.36
## 3 651.21 524.96 1158502.59 933903.84 224598.75
## 4 9.33 6.92 75591.66 56065.84 19525.82
## 5 651.21 524.96 3296425.02 2657347.52 639077.50
## 6 255.28 159.42 759202.72 474115.08 285087.64
head(df_50000_large)
## Region Country Item.Type Sales.Channel Order.Priority
## 1 Sub-Saharan Africa Namibia Household Offline M
## 2 Europe Iceland Baby Food Online H
## 3 Europe Russia Meat Online L
## 4 Europe Moldova Meat Online L
## 5 Europe Malta Cereal Online M
## 6 Asia Indonesia Meat Online H
## Order.Date Order.ID Ship.Date Units.Sold Unit.Price Unit.Cost Total.Revenue
## 1 8/31/2015 897751939 10/12/2015 3604 668.27 502.54 2408445.1
## 2 11/20/2010 599480426 1/9/2011 8435 255.28 159.42 2153286.8
## 3 6/22/2017 538911855 6/25/2017 4848 421.89 364.69 2045322.7
## 4 2/28/2012 459845054 3/20/2012 7225 421.89 364.69 3048155.2
## 5 8/12/2010 626391351 9/13/2010 1975 205.70 117.11 406257.5
## 6 8/20/2010 472974574 8/27/2010 2542 421.89 364.69 1072444.4
## Total.Cost Total.Profit
## 1 1811154.2 597290.9
## 2 1344707.7 808579.1
## 3 1768017.1 277305.6
## 4 2634885.2 413270.0
## 5 231292.2 174965.2
## 6 927042.0 145402.4
colnames(df_100_small)
## [1] "Region" "Country" "Item.Type" "Sales.Channel"
## [5] "Order.Priority" "Order.Date" "Order.ID" "Ship.Date"
## [9] "Units.Sold" "Unit.Price" "Unit.Cost" "Total.Revenue"
## [13] "Total.Cost" "Total.Profit"
colnames(df_50000_large)
## [1] "Region" "Country" "Item.Type" "Sales.Channel"
## [5] "Order.Priority" "Order.Date" "Order.ID" "Ship.Date"
## [9] "Units.Sold" "Unit.Price" "Unit.Cost" "Total.Revenue"
## [13] "Total.Cost" "Total.Profit"
df_100_small[['Order Date']] <- as.Date(df_100_small[['Order.Date']], "%m/%d/%Y")
df_100_small[['Ship Date']] <- as.Date(df_100_small[['Ship.Date']], "%m/%d/%Y")
df_50000_large[['Order Date']] <- as.Date(df_50000_large[['Order.Date']], "%m/%d/%Y")
df_50000_large[['Ship Date']] <- as.Date(df_50000_large[['Ship.Date']], "%m/%d/%Y")
Both the small and the large dataset encompass 7 years of history of sales data by country/region and item type/sales channel within those countries.
summary(df_100_small)
## Region Country Item.Type Sales.Channel
## Length:100 Length:100 Length:100 Length:100
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Order.Priority Order.Date Order.ID Ship.Date
## Length:100 Length:100 Min. :114606559 Length:100
## Class :character Class :character 1st Qu.:338922488 Class :character
## Mode :character Mode :character Median :557708561 Mode :character
## Mean :555020412
## 3rd Qu.:790755081
## Max. :994022214
## Units.Sold Unit.Price Unit.Cost Total.Revenue
## Min. : 124 Min. : 9.33 Min. : 6.92 Min. : 4870
## 1st Qu.:2836 1st Qu.: 81.73 1st Qu.: 35.84 1st Qu.: 268721
## Median :5382 Median :179.88 Median :107.28 Median : 752314
## Mean :5129 Mean :276.76 Mean :191.05 Mean :1373488
## 3rd Qu.:7369 3rd Qu.:437.20 3rd Qu.:263.33 3rd Qu.:2212045
## Max. :9925 Max. :668.27 Max. :524.96 Max. :5997055
## Total.Cost Total.Profit Order Date Ship Date
## Min. : 3612 Min. : 1258 Min. :2010-02-02 Min. :2010-02-25
## 1st Qu.: 168868 1st Qu.: 121444 1st Qu.:2012-02-14 1st Qu.:2012-02-24
## Median : 363566 Median : 290768 Median :2013-07-12 Median :2013-08-11
## Mean : 931806 Mean : 441682 Mean :2013-09-16 Mean :2013-10-09
## 3rd Qu.:1613870 3rd Qu.: 635829 3rd Qu.:2015-04-07 3rd Qu.:2015-04-28
## Max. :4509794 Max. :1719922 Max. :2017-05-22 Max. :2017-06-17
summary(df_50000_large)
## Region Country Item.Type Sales.Channel
## Length:50000 Length:50000 Length:50000 Length:50000
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Order.Priority Order.Date Order.ID Ship.Date
## Length:50000 Length:50000 Min. :100013196 Length:50000
## Class :character Class :character 1st Qu.:324007046 Class :character
## Mode :character Mode :character Median :550422394 Mode :character
## Mean :549733027
## 3rd Qu.:776782381
## Max. :999999463
## Units.Sold Unit.Price Unit.Cost Total.Revenue
## Min. : 1 Min. : 9.33 Min. : 6.92 Min. : 28
## 1st Qu.: 2498 1st Qu.: 81.73 1st Qu.: 35.84 1st Qu.: 276487
## Median : 5018 Median :154.06 Median : 97.44 Median : 781325
## Mean : 5000 Mean :265.65 Mean :187.32 Mean :1323716
## 3rd Qu.: 7493 3rd Qu.:421.89 3rd Qu.:263.33 3rd Qu.:1808642
## Max. :10000 Max. :668.27 Max. :524.96 Max. :6682032
## Total.Cost Total.Profit Order Date
## Min. : 21 Min. : 7.2 Min. :2010-01-01
## 1st Qu.: 160637 1st Qu.: 94150.9 1st Qu.:2011-11-15
## Median : 467104 Median : 279536.4 Median :2013-10-09
## Mean : 933157 Mean : 390558.7 Mean :2013-10-11
## 3rd Qu.:1190390 3rd Qu.: 564286.7 3rd Qu.:2015-09-04
## Max. :5249075 Max. :1738178.4 Max. :2017-07-28
## Ship Date
## Min. :2010-01-02
## 1st Qu.:2011-12-11
## Median :2013-11-02
## Mean :2013-11-05
## 3rd Qu.:2015-09-30
## Max. :2017-09-16
hist(df_100_small$`Total.Profit`)
hist(df_50000_large$`Total.Profit`)
### Machine-learning algorithms (Algorithm 1: Multiple Linear
regression)
Since the sales data set is numeric in nature, I decided to use the supervised machine learning approach of linear regression which is used to generate a numeric prediction. Here, for example the question we pose could be what is the profitability based on units sold, total cost, region and item type. To do this, we use a multiple regression model with units sold, total cost, region and item type as the independent variables and total profit as the dependent variable.
The model we obtain for the smaller dataset of 100 records is
Total Profit = 4.588e+04 + (4.290e+01 * Units.Sold) + (2.163e-01 * Total.Cost) + (4.477e+04 * RegionAustralia and Oceania) + (-4.772e+04 * RegionCentral America and the Caribbean) + (1.701e+04 * RegionEurope) + (1.190e+05 * RegionMiddle East and North Africa) + (-3.704e+04 * RegionNorth America) + (3.986e+03 * RegionSub-Saharan Africa) + (-3.080e+05 * Item.TypeBeverages) + (-7.325e+03 * Item.TypeCereal) + (4.769e+04 * Item.TypeClothes) + (3.265e+05 * Item.TypeCosmetics) + (-3.035e+05 * Item.TypeFruits) + (-7.427e+04 * Item.TypeHousehold) + (-4.952e+05 * Item.TypeMeat) + (-2.683e+05 * Item.TypeOffice Supplies) + (-1.967e+05 * Item.TypePersonal Care) + (-9.106e+04 * Item.TypeSnacks) + (-6.136e+04 * Item.TypeVegetables)
For a fictional units sold of 200 for Clothes with a per unit cost of 100 in the Australian and Oceania region, the profit using the model formula comes out to $146,946 for model 1.
The model we obtain for the larger dataset of 50,000 records is
Total Profit = 1.161e+05 + (3.764e+01 * Units.Sold) + (2.163e-01 * Total.Cost) + (4.219e+03 * RegionAustralia and Oceania) + (1.636e+03 * RegionCentral America and the Caribbean) + (1.178e+03 * RegionEurope) + (3.432e+03 * RegionMiddle East and North Africa) + (-1.296e+03 * RegionNorth America) + (2.888e+03 * RegionSub-Saharan Africa) + (-2.628e+05 * Item.TypeBeverages) + (9.335e+03 * Item.TypeCereal) + (2.177e+04 * Item.TypeClothes) + (2.779e+05 * Item.TypeCosmetics) + (-3.025e+05 * Item.TypeFruits) + (-2.196e+04 * Item.TypeHousehold) + (-4.123e+05 * Item.TypeMeat) + (-2.425e+05 * Item.TypeOffice Supplies) + (-2.449e+05 * Item.TypePersonal Care) + (-1.360e+05 * Item.TypeSnacks) + (-8.900e+04 * Item.TypeVegetables)
For a fictional units sold of 200 for Clothes with a per unit cost of 100 in the Australian and Oceania region, the profit using the model formula comes out to $149,639 for model 2 (close to model 1 value but not the same).Both models fit reasonably well as seen by the R^2 and adjusted R^2.
sales_mod1 <- lm(data = df_100_small, Total.Profit ~ Units.Sold + Total.Cost + Region + Item.Type)
summary(sales_mod1)
##
## Call:
## lm(formula = Total.Profit ~ Units.Sold + Total.Cost + Region +
## Item.Type, data = df_100_small)
##
## Residuals:
## Min 1Q Median 3Q Max
## -274707 -48314 -1709 59132 192626
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.588e+04 5.681e+04 0.808 0.42175
## Units.Sold 4.290e+01 5.468e+00 7.846 1.62e-11
## Total.Cost 2.575e-01 2.162e-02 11.910 < 2e-16
## RegionAustralia and Oceania 4.477e+04 4.457e+04 1.004 0.31818
## RegionCentral America and the Caribbean -4.772e+04 4.882e+04 -0.978 0.33124
## RegionEurope 1.701e+04 3.802e+04 0.447 0.65582
## RegionMiddle East and North Africa 1.190e+05 4.464e+04 2.666 0.00928
## RegionNorth America -3.704e+04 6.653e+04 -0.557 0.57925
## RegionSub-Saharan Africa 3.986e+03 3.478e+04 0.115 0.90904
## Item.TypeBeverages -3.080e+05 5.418e+04 -5.684 2.06e-07
## Item.TypeCereal -7.325e+03 5.495e+04 -0.133 0.89429
## Item.TypeClothes 4.769e+04 4.912e+04 0.971 0.33460
## Item.TypeCosmetics 3.265e+05 4.818e+04 6.776 1.90e-09
## Item.TypeFruits -3.035e+05 5.291e+04 -5.736 1.66e-07
## Item.TypeHousehold -7.427e+04 6.288e+04 -1.181 0.24104
## Item.TypeMeat -4.952e+05 8.266e+04 -5.991 5.68e-08
## Item.TypeOffice Supplies -2.683e+05 5.751e+04 -4.665 1.22e-05
## Item.TypePersonal Care -1.967e+05 5.241e+04 -3.753 0.00033
## Item.TypeSnacks -9.106e+04 7.023e+04 -1.297 0.19848
## Item.TypeVegetables -6.136e+04 5.752e+04 -1.067 0.28931
##
## (Intercept)
## Units.Sold ***
## Total.Cost ***
## RegionAustralia and Oceania
## RegionCentral America and the Caribbean
## RegionEurope
## RegionMiddle East and North Africa **
## RegionNorth America
## RegionSub-Saharan Africa
## Item.TypeBeverages ***
## Item.TypeCereal
## Item.TypeClothes
## Item.TypeCosmetics ***
## Item.TypeFruits ***
## Item.TypeHousehold
## Item.TypeMeat ***
## Item.TypeOffice Supplies ***
## Item.TypePersonal Care ***
## Item.TypeSnacks
## Item.TypeVegetables
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 95730 on 80 degrees of freedom
## Multiple R-squared: 0.9615, Adjusted R-squared: 0.9523
## F-statistic: 105.1 on 19 and 80 DF, p-value: < 2.2e-16
sales_mod2 <- lm(data = df_50000_large, Total.Profit ~ Units.Sold + Total.Cost + Region + Item.Type)
summary(sales_mod2)
##
## Call:
## lm(formula = Total.Profit ~ Units.Sold + Total.Cost + Region +
## Item.Type, data = df_50000_large)
##
## Residuals:
## Min 1Q Median 3Q Max
## -397586 -59147 -83 58771 397660
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.161e+05 2.152e+03 53.962 < 2e-16
## Units.Sold 3.764e+01 2.379e-01 158.240 < 2e-16
## Total.Cost 2.163e-01 9.240e-04 234.117 < 2e-16
## RegionAustralia and Oceania 4.219e+03 2.054e+03 2.054 0.0400
## RegionCentral America and the Caribbean 1.636e+03 1.871e+03 0.874 0.3819
## RegionEurope 1.178e+03 1.531e+03 0.770 0.4415
## RegionMiddle East and North Africa 3.432e+03 1.811e+03 1.895 0.0581
## RegionNorth America -1.296e+03 3.385e+03 -0.383 0.7020
## RegionSub-Saharan Africa 2.888e+03 1.525e+03 1.894 0.0583
## Item.TypeBeverages -2.628e+05 2.380e+03 -110.453 < 2e-16
## Item.TypeCereal 9.335e+03 2.317e+03 4.029 5.61e-05
## Item.TypeClothes 2.177e+04 2.376e+03 9.162 < 2e-16
## Item.TypeCosmetics 2.779e+05 2.351e+03 118.177 < 2e-16
## Item.TypeFruits -3.025e+05 2.405e+03 -125.774 < 2e-16
## Item.TypeHousehold -2.196e+04 2.794e+03 -7.859 3.95e-15
## Item.TypeMeat -4.123e+05 2.483e+03 -166.027 < 2e-16
## Item.TypeOffice Supplies -2.425e+05 2.858e+03 -84.853 < 2e-16
## Item.TypePersonal Care -2.449e+05 2.354e+03 -104.026 < 2e-16
## Item.TypeSnacks -1.360e+05 2.324e+03 -58.523 < 2e-16
## Item.TypeVegetables -8.900e+04 2.324e+03 -38.299 < 2e-16
##
## (Intercept) ***
## Units.Sold ***
## Total.Cost ***
## RegionAustralia and Oceania *
## RegionCentral America and the Caribbean
## RegionEurope
## RegionMiddle East and North Africa .
## RegionNorth America
## RegionSub-Saharan Africa .
## Item.TypeBeverages ***
## Item.TypeCereal ***
## Item.TypeClothes ***
## Item.TypeCosmetics ***
## Item.TypeFruits ***
## Item.TypeHousehold ***
## Item.TypeMeat ***
## Item.TypeOffice Supplies ***
## Item.TypePersonal Care ***
## Item.TypeSnacks ***
## Item.TypeVegetables ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 104700 on 49980 degrees of freedom
## Multiple R-squared: 0.9233, Adjusted R-squared: 0.9232
## F-statistic: 3.165e+04 on 19 and 49980 DF, p-value: < 2.2e-16
The second type of machine learning algorithm I decided to use was logistic regression. Instead of modeling our response variable directly, as in linear regression, logistic regression models the probability of a particular response value. As determined earlier, there are 14 variables, 6 categorical variables (Region, Country, Item Type, Sales Channel, Order Priority), 2 dates (Order Date, Ship Date) and 6 numeric variables (Units Sold, Unit Price, Unit Cost, Total Revenue, Total Cost and Total Profit). The dataset is relatively clean with no missing numeric values. Here, I use the dependent variable as Sales.Channel
Using the sample() base R function that we introduced in Chapter 3, we partition our data into training and test datasets using a 75 percent to 25 percent split. We call the new datasets df_100_small_train and df_100_small_test and df_50000_large_train and df_50000_large_test, respectively. The results show that we do have similar class distributions across all three sets and we do not have a class imbalance problem, especially in the large dataset.
To train a binomial logistic regression model using the glm() function, we pass three main arguments to it. The first argument (data) is the training data (donors _ train). The second argument (family) is the type of regression model we intend to build. We set it to binomial. This tells the glm() function that we intend to build a binomial logistic regression model using the logit link function. Instead of setting family = binomial, we could also write family = binomial(link = “logit”). The last argument we pass to the function is the formula for the prediction problem. This is where we specify which features (predictors) to use to predict the class (response). For our model, we specify that the function should use all the features in our training set (.) to build a model that predicts Sales.Channel.
In linear regression, we interpreted the model coefficients as the average change in the value of the response as a result of a unit change in a particular predictor. However, in logistic regression, we interpret the model coefficients as the change in the log-odds of the response as a result of a unit change in the predictor variable (see Equation 5.8). For example, a value of -4.623e-06 for the coefficient of RegionAustralia and Oceania means that, for every unit increase in the value of RegionAustralia and Oceania, the log-odds of Sales.Channel being TRUE (Online mode) changes by -4.623e-06.
df_100_small <- df_100_small %>%
mutate(newcode = case_when(
(Sales.Channel == 'Offline') ~ '0',
(Sales.Channel == 'Online') ~ '1'))
df_100_small$newcode <- as.numeric(df_100_small$newcode)
df_100_small %>%
keep(is.numeric) %>%
summary()
## Order.ID Units.Sold Unit.Price Unit.Cost
## Min. :114606559 Min. : 124 Min. : 9.33 Min. : 6.92
## 1st Qu.:338922488 1st Qu.:2836 1st Qu.: 81.73 1st Qu.: 35.84
## Median :557708561 Median :5382 Median :179.88 Median :107.28
## Mean :555020412 Mean :5129 Mean :276.76 Mean :191.05
## 3rd Qu.:790755081 3rd Qu.:7369 3rd Qu.:437.20 3rd Qu.:263.33
## Max. :994022214 Max. :9925 Max. :668.27 Max. :524.96
## Total.Revenue Total.Cost Total.Profit newcode
## Min. : 4870 Min. : 3612 Min. : 1258 Min. :0.0
## 1st Qu.: 268721 1st Qu.: 168868 1st Qu.: 121444 1st Qu.:0.0
## Median : 752314 Median : 363566 Median : 290768 Median :0.5
## Mean :1373488 Mean : 931806 Mean : 441682 Mean :0.5
## 3rd Qu.:2212045 3rd Qu.:1613870 3rd Qu.: 635829 3rd Qu.:1.0
## Max. :5997055 Max. :4509794 Max. :1719922 Max. :1.0
df_50000_large <- df_50000_large %>%
mutate(newcode = case_when(
(Sales.Channel == 'Offline') ~ '0',
(Sales.Channel == 'Online') ~ '1'))
df_50000_large$newcode <- as.numeric(df_50000_large$newcode)
df_50000_large %>%
keep(is.numeric) %>%
summary()
## Order.ID Units.Sold Unit.Price Unit.Cost
## Min. :100013196 Min. : 1 Min. : 9.33 Min. : 6.92
## 1st Qu.:324007046 1st Qu.: 2498 1st Qu.: 81.73 1st Qu.: 35.84
## Median :550422394 Median : 5018 Median :154.06 Median : 97.44
## Mean :549733027 Mean : 5000 Mean :265.65 Mean :187.32
## 3rd Qu.:776782381 3rd Qu.: 7493 3rd Qu.:421.89 3rd Qu.:263.33
## Max. :999999463 Max. :10000 Max. :668.27 Max. :524.96
## Total.Revenue Total.Cost Total.Profit newcode
## Min. : 28 Min. : 21 Min. : 7.2 Min. :0.0000
## 1st Qu.: 276487 1st Qu.: 160637 1st Qu.: 94150.9 1st Qu.:0.0000
## Median : 781325 Median : 467104 Median : 279536.4 Median :1.0000
## Mean :1323716 Mean : 933157 Mean : 390558.7 Mean :0.5007
## 3rd Qu.:1808642 3rd Qu.:1190390 3rd Qu.: 564286.7 3rd Qu.:1.0000
## Max. :6682032 Max. :5249075 Max. :1738178.4 Max. :1.0000
set.seed(1234)
sample_set_small <- sample(nrow(df_100_small), round(nrow(df_100_small)*.75), replace = FALSE)
df_100_small_train <- df_100_small[sample_set_small, ]
df_100_small_test <- df_100_small[-sample_set_small, ]
round(prop.table(table(select(df_100_small, Sales.Channel), exclude = NULL)), 4) * 100
## Sales.Channel
## Offline Online
## 50 50
round(prop.table(table(select(df_100_small_train, Sales.Channel), exclude = NULL)), 4) * 100
## Sales.Channel
## Offline Online
## 48 52
round(prop.table(table(select(df_100_small_test, Sales.Channel), exclude = NULL)), 4) * 100
## Sales.Channel
## Offline Online
## 56 44
set.seed(1234)
sample_set_large <- sample(nrow(df_50000_large), round(nrow(df_50000_large)*.75), replace = FALSE)
df_50000_large_train <- df_50000_large[sample_set_large, ]
df_50000_large_test <- df_50000_large[-sample_set_large, ]
round(prop.table(table(select(df_50000_large, Sales.Channel), exclude = NULL)), 4) * 100
## Sales.Channel
## Offline Online
## 49.93 50.07
round(prop.table(table(select(df_50000_large_train, Sales.Channel), exclude = NULL)), 4) * 100
## Sales.Channel
## Offline Online
## 49.89 50.11
round(prop.table(table(select(df_50000_large_test, Sales.Channel), exclude = NULL)), 4) * 100
## Sales.Channel
## Offline Online
## 50.07 49.93
df_100_small_mod <- glm(data = df_100_small_train, family = binomial, formula = newcode ~ .)
summary(df_100_small_mod)
##
## Call:
## glm(formula = newcode ~ ., family = binomial, data = df_100_small_train)
##
## Coefficients: (162 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.657e+01 1.468e+06 0 1
## RegionAustralia and Oceania -4.623e-06 1.424e+06 0 1
## RegionCentral America and the Caribbean -3.941e-13 8.723e+05 0 1
## RegionEurope -1.963e-07 2.195e+06 0 1
## RegionMiddle East and North Africa -3.856e-07 8.723e+05 0 1
## RegionNorth America 4.618e-06 2.518e+06 0 1
## RegionSub-Saharan Africa -4.410e-06 1.007e+06 0 1
## CountryAngola 4.410e-06 8.723e+05 0 1
## CountryAustralia -4.619e-06 3.901e+06 0 1
## CountryAustria 2.014e-07 1.424e+06 0 1
## CountryAzerbaijan 3.903e-07 8.723e+05 0 1
## CountryBulgaria -2.765e-09 2.195e+06 0 1
## CountryBurkina Faso -4.823e-06 2.937e+06 0 1
## CountryCameroon 4.208e-06 1.126e+06 0 1
## CountryCape Verde 4.412e-06 1.126e+06 0 1
## CountryComoros -2.124e-07 1.816e+06 0 1
## CountryCote d'Ivoire 9.030e-06 1.816e+06 0 1
## CountryDemocratic Republic of the Congo -4.116e-07 4.643e+06 0 1
## CountryDjibouti 4.415e-06 7.122e+05 0 1
## CountryEast Timor 7.809e-09 1.007e+06 0 1
## CountryFederated States of Micronesia -2.050e-07 3.341e+06 0 1
## CountryFiji 9.243e-06 4.419e+06 0 1
## CountryFrance -8.001e-10 2.467e+06 0 1
## CountryGabon 9.029e-06 1.884e+06 0 1
## CountryGrenada -2.003e-07 1.332e+06 0 1
## CountryHaiti 5.101e-09 8.723e+05 0 1
## CountryHonduras NA NA NA NA
## CountryIceland 2.067e-07 8.723e+05 0 1
## CountryKenya -4.075e-07 3.264e+06 0 1
## CountryKiribati 4.224e-06 1.745e+06 0 1
## CountryKuwait -1.201e-08 7.122e+05 0 1
## CountryLaos -4.432e-09 1.511e+06 0 1
## CountryLesotho 4.227e-06 1.332e+06 0 1
## CountryLibya -4.417e-06 5.036e+05 0 1
## CountryMacedonia 4.816e-06 1.511e+06 0 1
## CountryMadagascar 4.412e-06 1.126e+06 0 1
## CountryMali 4.223e-06 1.234e+06 0 1
## CountryMauritania 9.024e-06 2.937e+06 0 1
## CountryMexico NA NA NA NA
## CountryMoldova 4.610e-06 1.424e+06 0 1
## CountryMonaco -4.426e-06 3.225e+06 0 1
## CountryMyanmar -1.995e-07 8.723e+05 0 1
## CountryNiger 8.834e-06 1.007e+06 0 1
## CountryNorway -4.427e-06 4.671e+06 0 1
## CountryPortugal -9.045e-06 4.419e+06 0 1
## CountryRepublic of the Congo 9.029e-06 2.195e+06 0 1
## CountryRomania -4.418e-06 2.467e+06 0 1
## CountryRussia 1.919e-07 1.424e+06 0 1
## CountryRwanda 4.406e-06 1.007e+06 0 1
## CountrySamoa 4.423e-06 1.332e+06 0 1
## CountrySan Marino -4.424e-06 4.671e+06 0 1
## CountrySao Tome and Principe 8.635e-06 2.137e+06 0 1
## CountrySenegal -4.185e-07 2.980e+06 0 1
## CountrySierra Leone 4.406e-06 1.007e+06 0 1
## CountrySlovenia -4.427e-06 4.698e+06 0 1
## CountrySolomon Islands 4.617e-06 1.234e+06 0 1
## CountrySouth Sudan 1.365e-05 4.214e+06 0 1
## CountrySpain 1.963e-07 2.195e+06 0 1
## CountrySri Lanka 5.101e-09 1.332e+06 0 1
## CountrySwitzerland 2.014e-07 1.234e+06 0 1
## CountrySyria NA NA NA NA
## CountryThe Gambia -2.124e-07 1.745e+06 0 1
## CountryTurkmenistan NA NA NA NA
## CountryTuvalu NA NA NA NA
## CountryZambia NA NA NA NA
## Item.TypeBeverages 4.619e-06 3.868e+06 0 1
## Item.TypeCereal -8.763e-14 7.122e+05 0 1
## Item.TypeClothes -4.624e-06 2.252e+06 0 1
## Item.TypeCosmetics -4.628e-06 2.137e+06 0 1
## Item.TypeFruits -4.434e-06 2.617e+06 0 1
## Item.TypeHousehold -4.623e-06 1.424e+06 0 1
## Item.TypeMeat -4.623e-06 1.511e+06 0 1
## Item.TypeOffice Supplies -4.618e-06 2.195e+06 0 1
## Item.TypePersonal Care -9.241e-06 3.525e+06 0 1
## Item.TypeSnacks -4.824e-06 3.225e+06 0 1
## Item.TypeVegetables NA NA NA NA
## Sales.ChannelOnline 5.313e+01 1.424e+06 0 1
## Order.PriorityH 4.618e-06 2.195e+06 0 1
## Order.PriorityL 4.618e-06 2.467e+06 0 1
## Order.PriorityM 4.618e-06 2.137e+06 0 1
## Order.Date1/14/2017 NA NA NA NA
## Order.Date1/4/2011 NA NA NA NA
## Order.Date10/11/2013 9.532e-09 1.007e+06 0 1
## Order.Date10/13/2013 NA NA NA NA
## Order.Date10/13/2014 NA NA NA NA
## Order.Date10/14/2014 NA NA NA NA
## Order.Date10/21/2012 NA NA NA NA
## Order.Date10/23/2016 NA NA NA NA
## Order.Date10/28/2014 NA NA NA NA
## Order.Date10/30/2010 4.804e-06 1.670e+06 0 1
## Order.Date11/14/2015 NA NA NA NA
## Order.Date11/19/2016 NA NA NA NA
## Order.Date11/22/2011 NA NA NA NA
## Order.Date11/26/2010 NA NA NA NA
## Order.Date11/26/2011 NA NA NA NA
## Order.Date11/6/2014 NA NA NA NA
## Order.Date11/7/2011 NA NA NA NA
## Order.Date12/23/2010 NA NA NA NA
## Order.Date12/29/2013 NA NA NA NA
## Order.Date12/30/2010 NA NA NA NA
## Order.Date12/31/2016 NA NA NA NA
## Order.Date12/6/2016 NA NA NA NA
## Order.Date2/1/2013 NA NA NA NA
## Order.Date2/16/2012 NA NA NA NA
## Order.Date2/17/2012 NA NA NA NA
## Order.Date2/2/2010 NA NA NA NA
## Order.Date2/23/2015 NA NA NA NA
## Order.Date2/25/2017 NA NA NA NA
## Order.Date2/3/2014 NA NA NA NA
## Order.Date2/4/2015 NA NA NA NA
## Order.Date2/6/2010 NA NA NA NA
## Order.Date2/8/2017 NA NA NA NA
## Order.Date3/11/2017 NA NA NA NA
## Order.Date3/18/2012 NA NA NA NA
## Order.Date3/29/2016 NA NA NA NA
## Order.Date4/18/2014 NA NA NA NA
## Order.Date4/23/2011 NA NA NA NA
## Order.Date4/23/2012 NA NA NA NA
## Order.Date4/23/2013 NA NA NA NA
## Order.Date4/25/2015 NA NA NA NA
## Order.Date4/30/2012 NA NA NA NA
## Order.Date4/7/2014 NA NA NA NA
## Order.Date5/14/2014 NA NA NA NA
## Order.Date5/2/2014 NA NA NA NA
## Order.Date5/22/2017 NA NA NA NA
## Order.Date5/26/2011 NA NA NA NA
## Order.Date5/28/2010 NA NA NA NA
## Order.Date5/29/2012 NA NA NA NA
## Order.Date5/7/2010 NA NA NA NA
## Order.Date5/7/2016 NA NA NA NA
## Order.Date6/1/2016 NA NA NA NA
## Order.Date6/13/2012 NA NA NA NA
## Order.Date6/20/2014 NA NA NA NA
## Order.Date6/26/2013 NA NA NA NA
## Order.Date6/30/2010 NA NA NA NA
## Order.Date6/30/2016 NA NA NA NA
## Order.Date6/7/2012 NA NA NA NA
## Order.Date6/8/2012 NA NA NA NA
## Order.Date7/14/2015 NA NA NA NA
## Order.Date7/17/2012 NA NA NA NA
## Order.Date7/18/2014 NA NA NA NA
## Order.Date7/20/2013 NA NA NA NA
## Order.Date7/26/2011 NA NA NA NA
## Order.Date7/30/2015 NA NA NA NA
## Order.Date7/31/2012 NA NA NA NA
## Order.Date7/31/2015 NA NA NA NA
## Order.Date7/7/2014 NA NA NA NA
## Order.Date7/8/2012 NA NA NA NA
## Order.Date8/14/2015 NA NA NA NA
## Order.Date8/18/2013 NA NA NA NA
## Order.Date8/2/2014 NA NA NA NA
## Order.Date8/22/2012 NA NA NA NA
## Order.Date9/15/2011 NA NA NA NA
## Order.Date9/17/2012 NA NA NA NA
## Order.ID NA NA NA NA
## Ship.Date1/20/2011 NA NA NA NA
## Ship.Date1/23/2017 NA NA NA NA
## Ship.Date1/28/2014 NA NA NA NA
## Ship.Date1/31/2011 NA NA NA NA
## Ship.Date1/5/2011 NA NA NA NA
## Ship.Date1/7/2012 NA NA NA NA
## Ship.Date10/20/2012 NA NA NA NA
## Ship.Date10/23/2011 NA NA NA NA
## Ship.Date11/10/2014 NA NA NA NA
## Ship.Date11/14/2014 NA NA NA NA
## Ship.Date11/15/2011 NA NA NA NA
## Ship.Date11/15/2014 NA NA NA NA
## Ship.Date11/16/2013 NA NA NA NA
## Ship.Date11/17/2010 NA NA NA NA
## Ship.Date11/18/2015 NA NA NA NA
## Ship.Date11/25/2013 NA NA NA NA
## Ship.Date11/25/2016 NA NA NA NA
## Ship.Date11/30/2012 NA NA NA NA
## Ship.Date12/12/2014 NA NA NA NA
## Ship.Date12/14/2016 NA NA NA NA
## Ship.Date12/18/2016 NA NA NA NA
## Ship.Date12/25/2010 NA NA NA NA
## Ship.Date12/3/2011 NA NA NA NA
## Ship.Date12/31/2016 NA NA NA NA
## Ship.Date2/13/2017 NA NA NA NA
## Ship.Date2/21/2015 NA NA NA NA
## Ship.Date2/25/2010 NA NA NA NA
## Ship.Date2/25/2017 NA NA NA NA
## Ship.Date2/28/2012 NA NA NA NA
## Ship.Date2/6/2013 NA NA NA NA
## Ship.Date3/18/2010 NA NA NA NA
## Ship.Date3/2/2015 NA NA NA NA
## Ship.Date3/20/2012 NA NA NA NA
## Ship.Date3/20/2014 NA NA NA NA
## Ship.Date3/28/2017 NA NA NA NA
## Ship.Date4/19/2014 NA NA NA NA
## Ship.Date4/27/2011 NA NA NA NA
## Ship.Date4/29/2016 NA NA NA NA
## Ship.Date4/7/2012 NA NA NA NA
## Ship.Date5/10/2010 NA NA NA NA
## Ship.Date5/10/2016 NA NA NA NA
## Ship.Date5/18/2012 NA NA NA NA
## Ship.Date5/20/2013 NA NA NA NA
## Ship.Date5/28/2015 NA NA NA NA
## Ship.Date5/30/2014 NA NA NA NA
## Ship.Date5/8/2014 NA NA NA NA
## Ship.Date6/2/2012 NA NA NA NA
## Ship.Date6/27/2010 NA NA NA NA
## Ship.Date6/27/2012 NA NA NA NA
## Ship.Date6/28/2014 NA NA NA NA
## Ship.Date6/29/2016 NA NA NA NA
## Ship.Date6/3/2012 NA NA NA NA
## Ship.Date6/5/2017 NA NA NA NA
## Ship.Date6/8/2012 NA NA NA NA
## Ship.Date7/1/2013 NA NA NA NA
## Ship.Date7/11/2014 NA NA NA NA
## Ship.Date7/15/2011 NA NA NA NA
## Ship.Date7/24/2012 NA NA NA NA
## Ship.Date7/26/2016 NA NA NA NA
## Ship.Date7/27/2012 NA NA NA NA
## Ship.Date7/30/2014 NA NA NA NA
## Ship.Date7/5/2014 NA NA NA NA
## Ship.Date7/9/2012 NA NA NA NA
## Ship.Date8/1/2010 NA NA NA NA
## Ship.Date8/19/2014 NA NA NA NA
## Ship.Date8/25/2015 NA NA NA NA
## Ship.Date8/7/2013 NA NA NA NA
## Ship.Date8/8/2015 NA NA NA NA
## Ship.Date9/11/2012 NA NA NA NA
## Ship.Date9/15/2012 NA NA NA NA
## Ship.Date9/18/2013 NA NA NA NA
## Ship.Date9/3/2011 NA NA NA NA
## Ship.Date9/3/2015 NA NA NA NA
## Ship.Date9/30/2015 NA NA NA NA
## Units.Sold NA NA NA NA
## Unit.Price NA NA NA NA
## Unit.Cost NA NA NA NA
## Total.Revenue NA NA NA NA
## Total.Cost NA NA NA NA
## Total.Profit NA NA NA NA
## `Order Date` NA NA NA NA
## `Ship Date` NA NA NA NA
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.0385e+02 on 74 degrees of freedom
## Residual deviance: 4.3512e-10 on 0 degrees of freedom
## AIC: 150
##
## Number of Fisher Scoring iterations: 25
df_50000_large_mod <- glm(newcode ~ Region + Country + Item.Type + Order.Priority + Units.Sold + Unit.Price + Unit.Cost + Total.Cost + Total.Profit + Total.Revenue, data = df_50000_large_train, family=binomial)
summary(df_50000_large_mod)
##
## Call:
## glm(formula = newcode ~ Region + Country + Item.Type + Order.Priority +
## Units.Sold + Unit.Price + Unit.Cost + Total.Cost + Total.Profit +
## Total.Revenue, family = binomial, data = df_50000_large_train)
##
## Coefficients: (9 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.374e-02 1.464e-01 0.162 0.87118
## RegionAustralia and Oceania 1.535e-02 2.013e-01 0.076 0.93920
## RegionCentral America and the Caribbean -7.638e-02 1.892e-01 -0.404 0.68647
## RegionEurope 1.417e-01 1.984e-01 0.715 0.47487
## RegionMiddle East and North Africa 7.617e-02 2.010e-01 0.379 0.70478
## RegionNorth America -9.866e-02 1.968e-01 -0.501 0.61613
## RegionSub-Saharan Africa -1.229e-01 2.028e-01 -0.606 0.54449
## CountryAlbania -2.136e-01 1.982e-01 -1.078 0.28122
## CountryAlgeria -2.131e-01 2.018e-01 -1.056 0.29088
## CountryAndorra 1.789e-02 1.996e-01 0.090 0.92858
## CountryAngola 1.572e-01 2.011e-01 0.782 0.43421
## CountryAntigua and Barbuda -3.808e-02 1.859e-01 -0.205 0.83766
## CountryArmenia 1.403e-02 2.063e-01 0.068 0.94578
## CountryAustralia 8.874e-03 1.997e-01 0.044 0.96455
## CountryAustria -5.898e-02 1.968e-01 -0.300 0.76443
## CountryAzerbaijan 2.470e-01 2.021e-01 1.222 0.22155
## CountryBahrain -1.895e-03 1.988e-01 -0.010 0.99240
## CountryBangladesh 8.350e-02 1.943e-01 0.430 0.66732
## CountryBarbados 1.415e-02 1.918e-01 0.074 0.94119
## CountryBelarus -3.705e-01 2.040e-01 -1.817 0.06929
## CountryBelgium -1.249e-01 1.970e-01 -0.634 0.52600
## CountryBelize 4.445e-02 1.927e-01 0.231 0.81755
## CountryBenin 2.852e-01 2.061e-01 1.384 0.16646
## CountryBhutan 3.837e-02 2.024e-01 0.190 0.84962
## CountryBosnia and Herzegovina -4.190e-01 2.048e-01 -2.045 0.04081
## CountryBotswana 8.647e-03 2.030e-01 0.043 0.96603
## CountryBrunei 2.173e-01 2.018e-01 1.077 0.28156
## CountryBulgaria -2.880e-01 1.998e-01 -1.441 0.14949
## CountryBurkina Faso 1.669e-01 2.041e-01 0.818 0.41329
## CountryBurundi 5.060e-02 2.046e-01 0.247 0.80466
## CountryCambodia -2.868e-02 1.979e-01 -0.145 0.88478
## CountryCameroon -2.309e-02 2.055e-01 -0.112 0.91055
## CountryCanada 1.184e-01 1.998e-01 0.592 0.55358
## CountryCape Verde -9.923e-02 1.989e-01 -0.499 0.61788
## CountryCentral African Republic 5.488e-01 2.023e-01 2.713 0.00667
## CountryChad 1.067e-01 2.033e-01 0.525 0.59971
## CountryChina -6.176e-02 1.922e-01 -0.321 0.74793
## CountryComoros 2.778e-02 2.089e-01 0.133 0.89423
## CountryCosta Rica 6.503e-02 1.968e-01 0.330 0.74108
## CountryCote d'Ivoire 3.310e-01 2.108e-01 1.570 0.11645
## CountryCroatia -1.324e-01 2.006e-01 -0.660 0.50912
## CountryCuba 2.854e-01 1.929e-01 1.479 0.13901
## CountryCyprus -5.920e-02 1.968e-01 -0.301 0.76360
## CountryCzech Republic -1.109e-01 2.057e-01 -0.539 0.58960
## CountryDemocratic Republic of the Congo 2.654e-01 1.980e-01 1.340 0.18012
## CountryDenmark 7.771e-02 1.978e-01 0.393 0.69439
## CountryDjibouti -6.464e-02 2.015e-01 -0.321 0.74833
## CountryDominica 1.515e-01 1.944e-01 0.779 0.43578
## CountryDominican Republic 3.748e-02 1.868e-01 0.201 0.84098
## CountryEast Timor 3.410e-02 2.015e-01 0.169 0.86564
## CountryEgypt 6.554e-02 2.001e-01 0.327 0.74330
## CountryEl Salvador 9.484e-02 1.958e-01 0.484 0.62818
## CountryEquatorial Guinea -6.020e-02 2.047e-01 -0.294 0.76869
## CountryEritrea 2.777e-01 2.064e-01 1.346 0.17833
## CountryEstonia -1.138e-01 1.996e-01 -0.570 0.56853
## CountryEthiopia 2.783e-01 2.058e-01 1.352 0.17633
## CountryFederated States of Micronesia 6.232e-02 2.062e-01 0.302 0.76246
## CountryFiji 1.460e-01 2.046e-01 0.714 0.47539
## CountryFinland -1.780e-01 1.924e-01 -0.925 0.35488
## CountryFrance 2.870e-02 1.927e-01 0.149 0.88161
## CountryGabon 2.643e-01 1.992e-01 1.327 0.18455
## CountryGeorgia -1.176e-01 2.024e-01 -0.581 0.56135
## CountryGermany -1.508e-01 2.001e-01 -0.754 0.45112
## CountryGhana -5.567e-02 2.059e-01 -0.270 0.78689
## CountryGreece -7.896e-02 2.025e-01 -0.390 0.69661
## CountryGreenland -6.347e-03 1.957e-01 -0.032 0.97413
## CountryGrenada 2.254e-01 1.851e-01 1.218 0.22325
## CountryGuatemala -7.898e-02 1.905e-01 -0.415 0.67849
## CountryGuinea 4.396e-01 1.979e-01 2.221 0.02635
## CountryGuinea-Bissau 9.889e-02 2.069e-01 0.478 0.63258
## CountryHaiti 1.128e-01 1.981e-01 0.570 0.56899
## CountryHonduras -5.709e-03 1.876e-01 -0.030 0.97572
## CountryHungary -6.010e-02 2.020e-01 -0.298 0.76605
## CountryIceland -3.084e-01 2.023e-01 -1.525 0.12733
## CountryIndia 1.160e-01 2.062e-01 0.563 0.57358
## CountryIndonesia 5.622e-02 1.977e-01 0.284 0.77610
## CountryIran 3.026e-02 2.075e-01 0.146 0.88403
## CountryIraq -2.236e-01 1.978e-01 -1.131 0.25822
## CountryIreland -6.117e-02 1.964e-01 -0.312 0.75542
## CountryIsrael -2.567e-01 2.063e-01 -1.244 0.21354
## CountryItaly 3.288e-02 1.962e-01 0.168 0.86694
## CountryJamaica 1.057e-01 1.915e-01 0.552 0.58110
## CountryJapan -1.562e-01 2.051e-01 -0.762 0.44634
## CountryJordan 1.635e-01 2.075e-01 0.788 0.43059
## CountryKazakhstan -3.814e-02 2.036e-01 -0.187 0.85140
## CountryKenya 7.189e-02 2.036e-01 0.353 0.72401
## CountryKiribati -1.466e-01 2.020e-01 -0.726 0.46793
## CountryKosovo -2.232e-01 2.020e-01 -1.105 0.26919
## CountryKuwait 2.410e-02 1.996e-01 0.121 0.90388
## CountryKyrgyzstan 3.586e-01 2.010e-01 1.784 0.07446
## CountryLaos -1.179e-01 2.017e-01 -0.585 0.55869
## CountryLatvia -1.653e-01 1.970e-01 -0.839 0.40122
## CountryLebanon -9.530e-02 2.025e-01 -0.471 0.63795
## CountryLesotho 3.547e-01 2.025e-01 1.751 0.07992
## CountryLiberia 1.925e-01 2.032e-01 0.948 0.34330
## CountryLibya -5.811e-02 2.006e-01 -0.290 0.77201
## CountryLiechtenstein -3.346e-01 2.029e-01 -1.649 0.09919
## CountryLithuania -2.707e-01 1.963e-01 -1.379 0.16790
## CountryLuxembourg -2.015e-01 1.989e-01 -1.013 0.31093
## CountryMacedonia -3.071e-01 1.960e-01 -1.567 0.11708
## CountryMadagascar 2.436e-01 2.028e-01 1.201 0.22973
## CountryMalawi 1.040e-01 2.097e-01 0.496 0.62000
## CountryMalaysia 8.798e-03 1.960e-01 0.045 0.96419
## CountryMaldives -2.596e-02 1.925e-01 -0.135 0.89272
## CountryMali 1.755e-02 2.037e-01 0.086 0.93133
## CountryMalta -9.153e-02 1.936e-01 -0.473 0.63638
## CountryMarshall Islands 2.568e-01 2.057e-01 1.249 0.21177
## CountryMauritania -1.121e-01 2.010e-01 -0.558 0.57693
## CountryMauritius 8.079e-02 2.024e-01 0.399 0.68979
## CountryMexico -3.733e-02 1.936e-01 -0.193 0.84712
## CountryMoldova 3.828e-02 2.002e-01 0.191 0.84838
## CountryMonaco 3.742e-02 2.086e-01 0.179 0.85762
## CountryMongolia -1.339e-01 1.991e-01 -0.673 0.50120
## CountryMontenegro -1.794e-02 2.000e-01 -0.090 0.92855
## CountryMorocco 1.341e-01 2.051e-01 0.654 0.51316
## CountryMozambique -2.576e-02 2.018e-01 -0.128 0.89841
## CountryMyanmar -1.139e-01 1.990e-01 -0.572 0.56700
## CountryNamibia 4.303e-01 1.991e-01 2.161 0.03067
## CountryNauru 1.113e-02 2.025e-01 0.055 0.95618
## CountryNepal 2.864e-01 2.055e-01 1.393 0.16354
## CountryNetherlands -2.393e-01 1.943e-01 -1.232 0.21803
## CountryNew Zealand -1.524e-01 2.053e-01 -0.742 0.45789
## CountryNicaragua -7.772e-02 1.895e-01 -0.410 0.68171
## CountryNiger 7.651e-02 2.069e-01 0.370 0.71148
## CountryNigeria 1.623e-01 2.024e-01 0.802 0.42262
## CountryNorth Korea -5.370e-02 1.989e-01 -0.270 0.78716
## CountryNorway -1.441e-01 1.965e-01 -0.734 0.46322
## CountryOman -3.809e-02 2.040e-01 -0.187 0.85191
## CountryPakistan -7.631e-02 2.079e-01 -0.367 0.71360
## CountryPalau -2.007e-02 2.050e-01 -0.098 0.92204
## CountryPanama -2.233e-01 1.884e-01 -1.185 0.23583
## CountryPapua New Guinea 8.896e-02 2.060e-01 0.432 0.66582
## CountryPhilippines 1.891e-01 2.006e-01 0.942 0.34597
## CountryPoland 1.445e-01 2.038e-01 0.709 0.47829
## CountryPortugal -2.381e-01 2.034e-01 -1.171 0.24171
## CountryQatar -1.886e-02 2.001e-01 -0.094 0.92494
## CountryRepublic of the Congo 1.046e-01 2.074e-01 0.505 0.61388
## CountryRomania -2.981e-01 2.002e-01 -1.489 0.13648
## CountryRussia -7.115e-02 1.986e-01 -0.358 0.72020
## CountryRwanda 3.024e-01 2.037e-01 1.484 0.13775
## CountrySaint Kitts and Nevis 1.877e-01 1.861e-01 1.009 0.31314
## CountrySaint Lucia 1.496e-02 1.857e-01 0.081 0.93581
## CountrySaint Vincent and the Grenadines -8.930e-02 1.922e-01 -0.465 0.64211
## CountrySamoa -1.442e-02 1.934e-01 -0.075 0.94058
## CountrySan Marino -2.398e-01 1.975e-01 -1.214 0.22484
## CountrySao Tome and Principe 1.649e-02 1.987e-01 0.083 0.93386
## CountrySaudi Arabia 7.664e-02 1.958e-01 0.391 0.69552
## CountrySenegal 1.306e-01 2.026e-01 0.644 0.51934
## CountrySerbia -5.147e-01 2.013e-01 -2.557 0.01056
## CountrySeychelles 3.801e-02 2.009e-01 0.189 0.84996
## CountrySierra Leone 1.626e-01 2.024e-01 0.803 0.42175
## CountrySingapore 5.478e-02 1.924e-01 0.285 0.77580
## CountrySlovakia -9.291e-02 1.991e-01 -0.467 0.64078
## CountrySlovenia -2.897e-01 2.027e-01 -1.429 0.15302
## CountrySolomon Islands 7.083e-02 2.025e-01 0.350 0.72655
## CountrySomalia 3.343e-02 1.998e-01 0.167 0.86713
## CountrySouth Africa -9.526e-02 2.012e-01 -0.473 0.63592
## CountrySouth Korea 1.429e-01 1.967e-01 0.726 0.46761
## CountrySouth Sudan -3.754e-02 2.011e-01 -0.187 0.85195
## CountrySpain -3.600e-01 2.028e-01 -1.776 0.07581
## CountrySri Lanka 4.060e-02 1.962e-01 0.207 0.83609
## CountrySudan 9.619e-02 2.000e-01 0.481 0.63054
## CountrySwaziland 5.879e-02 2.086e-01 0.282 0.77804
## CountrySweden -9.805e-02 2.041e-01 -0.480 0.63100
## CountrySwitzerland -2.209e-01 2.034e-01 -1.086 0.27735
## CountrySyria -1.902e-01 2.075e-01 -0.917 0.35921
## CountryTaiwan -1.159e-01 1.959e-01 -0.592 0.55391
## CountryTajikistan 8.166e-02 1.977e-01 0.413 0.67957
## CountryTanzania -2.997e-01 2.100e-01 -1.427 0.15351
## CountryThailand 7.130e-02 1.932e-01 0.369 0.71210
## CountryThe Bahamas 3.842e-01 1.899e-01 2.023 0.04308
## CountryThe Gambia -8.433e-02 2.015e-01 -0.418 0.67560
## CountryTogo 1.355e-01 2.082e-01 0.651 0.51524
## CountryTonga -1.714e-01 2.043e-01 -0.839 0.40142
## CountryTrinidad and Tobago NA NA NA NA
## CountryTunisia -1.947e-01 2.012e-01 -0.968 0.33321
## CountryTurkey -2.739e-01 2.100e-01 -1.305 0.19197
## CountryTurkmenistan -9.191e-02 1.980e-01 -0.464 0.64247
## CountryTuvalu 4.688e-02 2.023e-01 0.232 0.81672
## CountryUganda 2.067e-01 2.018e-01 1.024 0.30568
## CountryUkraine -3.154e-01 1.958e-01 -1.611 0.10722
## CountryUnited Arab Emirates -7.991e-02 1.984e-01 -0.403 0.68706
## CountryUnited Kingdom -1.020e-01 2.009e-01 -0.508 0.61142
## CountryUnited States of America NA NA NA NA
## CountryUzbekistan 5.642e-02 1.953e-01 0.289 0.77271
## CountryVanuatu NA NA NA NA
## CountryVatican City NA NA NA NA
## CountryVietnam NA NA NA NA
## CountryYemen 2.538e-01 2.126e-01 1.194 0.23257
## CountryZambia 9.347e-02 2.051e-01 0.456 0.64853
## CountryZimbabwe NA NA NA NA
## Item.TypeBeverages -3.091e-02 5.893e-02 -0.524 0.60000
## Item.TypeCereal -5.367e-02 5.149e-02 -1.042 0.29726
## Item.TypeClothes -8.740e-02 5.289e-02 -1.652 0.09848
## Item.TypeCosmetics -9.111e-02 5.917e-02 -1.540 0.12364
## Item.TypeFruits -5.489e-02 6.107e-02 -0.899 0.36882
## Item.TypeHousehold -4.888e-02 6.193e-02 -0.789 0.42990
## Item.TypeMeat -8.891e-02 6.876e-02 -1.293 0.19595
## Item.TypeOffice Supplies -2.834e-02 6.752e-02 -0.420 0.67468
## Item.TypePersonal Care -4.829e-02 5.755e-02 -0.839 0.40144
## Item.TypeSnacks -1.933e-02 5.331e-02 -0.363 0.71691
## Item.TypeVegetables -7.770e-03 5.247e-02 -0.148 0.88227
## Order.PriorityH 1.668e-02 2.938e-02 0.568 0.57031
## Order.PriorityL 4.043e-05 2.933e-02 0.001 0.99890
## Order.PriorityM -3.654e-02 2.934e-02 -1.245 0.21303
## Units.Sold 4.020e-06 6.451e-06 0.623 0.53312
## Unit.Price NA NA NA NA
## Unit.Cost NA NA NA NA
## Total.Cost -1.228e-08 2.969e-08 -0.414 0.67917
## Total.Profit 5.080e-08 9.919e-08 0.512 0.60854
## Total.Revenue NA NA NA NA
##
## (Intercept)
## RegionAustralia and Oceania
## RegionCentral America and the Caribbean
## RegionEurope
## RegionMiddle East and North Africa
## RegionNorth America
## RegionSub-Saharan Africa
## CountryAlbania
## CountryAlgeria
## CountryAndorra
## CountryAngola
## CountryAntigua and Barbuda
## CountryArmenia
## CountryAustralia
## CountryAustria
## CountryAzerbaijan
## CountryBahrain
## CountryBangladesh
## CountryBarbados
## CountryBelarus .
## CountryBelgium
## CountryBelize
## CountryBenin
## CountryBhutan
## CountryBosnia and Herzegovina *
## CountryBotswana
## CountryBrunei
## CountryBulgaria
## CountryBurkina Faso
## CountryBurundi
## CountryCambodia
## CountryCameroon
## CountryCanada
## CountryCape Verde
## CountryCentral African Republic **
## CountryChad
## CountryChina
## CountryComoros
## CountryCosta Rica
## CountryCote d'Ivoire
## CountryCroatia
## CountryCuba
## CountryCyprus
## CountryCzech Republic
## CountryDemocratic Republic of the Congo
## CountryDenmark
## CountryDjibouti
## CountryDominica
## CountryDominican Republic
## CountryEast Timor
## CountryEgypt
## CountryEl Salvador
## CountryEquatorial Guinea
## CountryEritrea
## CountryEstonia
## CountryEthiopia
## CountryFederated States of Micronesia
## CountryFiji
## CountryFinland
## CountryFrance
## CountryGabon
## CountryGeorgia
## CountryGermany
## CountryGhana
## CountryGreece
## CountryGreenland
## CountryGrenada
## CountryGuatemala
## CountryGuinea *
## CountryGuinea-Bissau
## CountryHaiti
## CountryHonduras
## CountryHungary
## CountryIceland
## CountryIndia
## CountryIndonesia
## CountryIran
## CountryIraq
## CountryIreland
## CountryIsrael
## CountryItaly
## CountryJamaica
## CountryJapan
## CountryJordan
## CountryKazakhstan
## CountryKenya
## CountryKiribati
## CountryKosovo
## CountryKuwait
## CountryKyrgyzstan .
## CountryLaos
## CountryLatvia
## CountryLebanon
## CountryLesotho .
## CountryLiberia
## CountryLibya
## CountryLiechtenstein .
## CountryLithuania
## CountryLuxembourg
## CountryMacedonia
## CountryMadagascar
## CountryMalawi
## CountryMalaysia
## CountryMaldives
## CountryMali
## CountryMalta
## CountryMarshall Islands
## CountryMauritania
## CountryMauritius
## CountryMexico
## CountryMoldova
## CountryMonaco
## CountryMongolia
## CountryMontenegro
## CountryMorocco
## CountryMozambique
## CountryMyanmar
## CountryNamibia *
## CountryNauru
## CountryNepal
## CountryNetherlands
## CountryNew Zealand
## CountryNicaragua
## CountryNiger
## CountryNigeria
## CountryNorth Korea
## CountryNorway
## CountryOman
## CountryPakistan
## CountryPalau
## CountryPanama
## CountryPapua New Guinea
## CountryPhilippines
## CountryPoland
## CountryPortugal
## CountryQatar
## CountryRepublic of the Congo
## CountryRomania
## CountryRussia
## CountryRwanda
## CountrySaint Kitts and Nevis
## CountrySaint Lucia
## CountrySaint Vincent and the Grenadines
## CountrySamoa
## CountrySan Marino
## CountrySao Tome and Principe
## CountrySaudi Arabia
## CountrySenegal
## CountrySerbia *
## CountrySeychelles
## CountrySierra Leone
## CountrySingapore
## CountrySlovakia
## CountrySlovenia
## CountrySolomon Islands
## CountrySomalia
## CountrySouth Africa
## CountrySouth Korea
## CountrySouth Sudan
## CountrySpain .
## CountrySri Lanka
## CountrySudan
## CountrySwaziland
## CountrySweden
## CountrySwitzerland
## CountrySyria
## CountryTaiwan
## CountryTajikistan
## CountryTanzania
## CountryThailand
## CountryThe Bahamas *
## CountryThe Gambia
## CountryTogo
## CountryTonga
## CountryTrinidad and Tobago
## CountryTunisia
## CountryTurkey
## CountryTurkmenistan
## CountryTuvalu
## CountryUganda
## CountryUkraine
## CountryUnited Arab Emirates
## CountryUnited Kingdom
## CountryUnited States of America
## CountryUzbekistan
## CountryVanuatu
## CountryVatican City
## CountryVietnam
## CountryYemen
## CountryZambia
## CountryZimbabwe
## Item.TypeBeverages
## Item.TypeCereal
## Item.TypeClothes .
## Item.TypeCosmetics
## Item.TypeFruits
## Item.TypeHousehold
## Item.TypeMeat
## Item.TypeOffice Supplies
## Item.TypePersonal Care
## Item.TypeSnacks
## Item.TypeVegetables
## Order.PriorityH
## Order.PriorityL
## Order.PriorityM
## Units.Sold
## Unit.Price
## Unit.Cost
## Total.Cost
## Total.Profit
## Total.Revenue
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 51986 on 37499 degrees of freedom
## Residual deviance: 51778 on 37298 degrees of freedom
## AIC: 52182
##
## Number of Fisher Scoring iterations: 3
exp(coef(df_100_small_mod)["RegionAustralia and Oceania"])
## RegionAustralia and Oceania
## 0.9999954
exp(coef(df_100_small_mod)["RegionCentral America and the Caribbean"])
## RegionCentral America and the Caribbean
## 1
exp(coef(df_100_small_mod)["RegionEurope"])
## RegionEurope
## 0.9999998
We see that the test dataset for the model with 100 records is very small but are able to predict using the second training dataset of 12,500 records. The results show the probability that Sales.Channel = “Online” for each of the observations. For example, record 3 “Russia in Europe” has a 50% probability of using the online sales channel while record 21 “Antigua & Barbuda” has a 47% online sales channel sales probability.
Let’s compare our model’s predicted values for Sales.Channel with the actual values in the test data set. To do this, we create a confusion matrix, which shows the interaction between the predicted and actual values. Using the base R table() function, we can create a simple confusion matrix. This tells us that our model has a prediction accuracy of 49.0 percent.
filter(df_100_small_test, Country =="Bangladesh" | Country =="Belize" | Country =="Brunei" | Country =="Costa Rica" | Country =="Iran" | Country =="Kyrgyzstan" | Country =="Lebanon" | Country =="Lithuania" | Country =="Malaysia" | Country =="Mongolia" | Country =="Mozambique" | Country =="New Zealand" | Country =="Nicaragua" | Country =="Pakistan" | Country =="Saudi Arabia" | Country =="Slovakia" | Country =="United Kingdom")
## Region Country Item.Type
## 1 Asia Kyrgyzstan Vegetables
## 2 Asia Bangladesh Clothes
## 3 Asia Mongolia Personal Care
## 4 Australia and Oceania New Zealand Fruits
## 5 Central America and the Caribbean Costa Rica Personal Care
## 6 Asia Brunei Office Supplies
## 7 Europe Slovakia Vegetables
## 8 Middle East and North Africa Saudi Arabia Cereal
## 9 Europe United Kingdom Household
## 10 Central America and the Caribbean Belize Clothes
## 11 Europe Lithuania Office Supplies
## 12 Middle East and North Africa Pakistan Cosmetics
## 13 Middle East and North Africa Lebanon Clothes
## 14 Middle East and North Africa Iran Cosmetics
## 15 Central America and the Caribbean Nicaragua Beverages
## 16 Asia Malaysia Fruits
## 17 Sub-Saharan Africa Mozambique Household
## Sales.Channel Order.Priority Order.Date Order.ID Ship.Date Units.Sold
## 1 Online H 6/24/2011 814711606 7/12/2011 124
## 2 Online L 1/13/2017 187310731 3/1/2017 8263
## 3 Offline C 2/19/2014 832401311 2/23/2014 4901
## 4 Online H 9/8/2014 142278373 10/4/2014 2187
## 5 Offline L 5/8/2017 456767165 5/21/2017 6409
## 6 Online L 4/1/2012 320009267 5/8/2012 6708
## 7 Online H 10/6/2012 759224212 11/10/2012 171
## 8 Online M 3/25/2013 844530045 3/28/2013 4063
## 9 Online L 1/5/2012 955357205 2/14/2012 282
## 10 Offline M 7/25/2016 807025039 9/7/2016 5498
## 11 Offline H 10/24/2010 166460740 11/17/2010 8287
## 12 Offline L 7/5/2013 231145322 8/16/2013 9892
## 13 Online L 9/18/2012 663110148 10/8/2012 7884
## 14 Online H 11/15/2016 286959302 12/8/2016 6489
## 15 Offline C 2/8/2011 963392674 3/21/2011 8156
## 16 Offline L 11/11/2011 810711038 12/28/2011 6267
## 17 Offline L 2/10/2012 665095412 2/15/2012 5367
## Unit.Price Unit.Cost Total.Revenue Total.Cost Total.Profit Order Date
## 1 154.06 90.93 19103.44 11275.32 7828.12 2011-06-24
## 2 109.28 35.84 902980.64 296145.92 606834.72 2017-01-13
## 3 81.73 56.67 400558.73 277739.67 122819.06 2014-02-19
## 4 9.33 6.92 20404.71 15134.04 5270.67 2014-09-08
## 5 81.73 56.67 523807.57 363198.03 160609.54 2017-05-08
## 6 651.21 524.96 4368316.68 3521431.68 846885.00 2012-04-01
## 7 154.06 90.93 26344.26 15549.03 10795.23 2012-10-06
## 8 205.70 117.11 835759.10 475817.93 359941.17 2013-03-25
## 9 668.27 502.54 188452.14 141716.28 46735.86 2012-01-05
## 10 109.28 35.84 600821.44 197048.32 403773.12 2016-07-25
## 11 651.21 524.96 5396577.27 4350343.52 1046233.75 2010-10-24
## 12 437.20 263.33 4324782.40 2604860.36 1719922.04 2013-07-05
## 13 109.28 35.84 861563.52 282562.56 579000.96 2012-09-18
## 14 437.20 263.33 2836990.80 1708748.37 1128242.43 2016-11-15
## 15 47.45 31.79 387002.20 259279.24 127722.96 2011-02-08
## 16 9.33 6.92 58471.11 43367.64 15103.47 2011-11-11
## 17 668.27 502.54 3586605.09 2697132.18 889472.91 2012-02-10
## Ship Date newcode
## 1 2011-07-12 1
## 2 2017-03-01 1
## 3 2014-02-23 0
## 4 2014-10-04 1
## 5 2017-05-21 0
## 6 2012-05-08 1
## 7 2012-11-10 1
## 8 2013-03-28 1
## 9 2012-02-14 1
## 10 2016-09-07 0
## 11 2010-11-17 0
## 12 2013-08-16 0
## 13 2012-10-08 1
## 14 2016-12-08 1
## 15 2011-03-21 0
## 16 2011-12-28 0
## 17 2012-02-15 0
df_100_small_test1 <- df_100_small_test %>%
filter(!Country %in% c('Bangladesh', 'Belize', 'Brunei', 'Costa Rica', 'Iran', 'Kyrgyzstan', 'Lebanon', 'Lithuania', 'Malaysia', 'Mongolia', 'Mozambique', 'New Zealand', 'Nicaragua', 'Pakistan', 'Saudi Arabia', 'Slovakia', 'United Kingdom'))
#df_100_small_mod_pred1 <- predict(df_100_small_mod, df_100_small_test1, type = 'response')
#head(df_100_small_mod_pred1)
df_50000_large_mod_pred1 <- predict(df_50000_large_mod, df_50000_large_test, type = 'response')
head(df_50000_large_mod_pred1)
## 3 4 15 19 21 28
## 0.5043303 0.5330945 0.4523842 0.4962811 0.4600560 0.4679573
df_50000_large_mod_pred2 <- ifelse(df_50000_large_mod_pred1 >= 0.5, 1, 0)
head(df_50000_large_mod_pred2)
## 3 4 15 19 21 28
## 1 1 0 0 0 0
df_50000_large_mod_pred1_table <- table(df_50000_large_test$Sales.Channel, df_50000_large_mod_pred2)
df_50000_large_mod_pred1_table
## df_50000_large_mod_pred2
## 0 1
## Offline 3050 3209
## Online 3158 3083
sum(diag(df_50000_large_mod_pred1_table)) / nrow(df_50000_large_test)
## [1] 0.49064
I tried improving the model further but ran into multicollinearity issues that couldn’t be identified. Unit price and unit cost as well as total revenue and total cost were correlated but removing one of these pairs of variables didn’t resolve the issue.
library(stats)
library(corrplot)
## corrplot 0.92 loaded
df_50000_large_train %>%
keep(is.numeric) %>%
cor() %>%
corrplot()
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
df_50000_large_mod1 <- glm(newcode ~ Region + Country + Item.Type + Order.Priority + Units.Sold + Total.Cost + Total.Profit, data = df_50000_large_train, family=binomial)
summary(df_50000_large_mod1)
##
## Call:
## glm(formula = newcode ~ Region + Country + Item.Type + Order.Priority +
## Units.Sold + Total.Cost + Total.Profit, family = binomial,
## data = df_50000_large_train)
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.374e-02 1.464e-01 0.162 0.87118
## RegionAustralia and Oceania 1.535e-02 2.013e-01 0.076 0.93920
## RegionCentral America and the Caribbean -7.638e-02 1.892e-01 -0.404 0.68647
## RegionEurope 1.417e-01 1.984e-01 0.715 0.47487
## RegionMiddle East and North Africa 7.617e-02 2.010e-01 0.379 0.70478
## RegionNorth America -9.866e-02 1.968e-01 -0.501 0.61613
## RegionSub-Saharan Africa -1.229e-01 2.028e-01 -0.606 0.54449
## CountryAlbania -2.136e-01 1.982e-01 -1.078 0.28122
## CountryAlgeria -2.131e-01 2.018e-01 -1.056 0.29088
## CountryAndorra 1.789e-02 1.996e-01 0.090 0.92858
## CountryAngola 1.572e-01 2.011e-01 0.782 0.43421
## CountryAntigua and Barbuda -3.808e-02 1.859e-01 -0.205 0.83766
## CountryArmenia 1.403e-02 2.063e-01 0.068 0.94578
## CountryAustralia 8.874e-03 1.997e-01 0.044 0.96455
## CountryAustria -5.898e-02 1.968e-01 -0.300 0.76443
## CountryAzerbaijan 2.470e-01 2.021e-01 1.222 0.22155
## CountryBahrain -1.895e-03 1.988e-01 -0.010 0.99240
## CountryBangladesh 8.350e-02 1.943e-01 0.430 0.66732
## CountryBarbados 1.415e-02 1.918e-01 0.074 0.94119
## CountryBelarus -3.705e-01 2.040e-01 -1.817 0.06929
## CountryBelgium -1.249e-01 1.970e-01 -0.634 0.52600
## CountryBelize 4.445e-02 1.927e-01 0.231 0.81755
## CountryBenin 2.852e-01 2.061e-01 1.384 0.16646
## CountryBhutan 3.837e-02 2.024e-01 0.190 0.84962
## CountryBosnia and Herzegovina -4.190e-01 2.048e-01 -2.045 0.04081
## CountryBotswana 8.647e-03 2.030e-01 0.043 0.96603
## CountryBrunei 2.173e-01 2.018e-01 1.077 0.28156
## CountryBulgaria -2.880e-01 1.998e-01 -1.441 0.14949
## CountryBurkina Faso 1.669e-01 2.041e-01 0.818 0.41329
## CountryBurundi 5.060e-02 2.046e-01 0.247 0.80466
## CountryCambodia -2.868e-02 1.979e-01 -0.145 0.88478
## CountryCameroon -2.309e-02 2.055e-01 -0.112 0.91055
## CountryCanada 1.184e-01 1.998e-01 0.592 0.55358
## CountryCape Verde -9.923e-02 1.989e-01 -0.499 0.61788
## CountryCentral African Republic 5.488e-01 2.023e-01 2.713 0.00667
## CountryChad 1.067e-01 2.033e-01 0.525 0.59971
## CountryChina -6.176e-02 1.922e-01 -0.321 0.74793
## CountryComoros 2.778e-02 2.089e-01 0.133 0.89423
## CountryCosta Rica 6.503e-02 1.968e-01 0.330 0.74108
## CountryCote d'Ivoire 3.310e-01 2.108e-01 1.570 0.11645
## CountryCroatia -1.324e-01 2.006e-01 -0.660 0.50912
## CountryCuba 2.854e-01 1.929e-01 1.479 0.13901
## CountryCyprus -5.920e-02 1.968e-01 -0.301 0.76360
## CountryCzech Republic -1.109e-01 2.057e-01 -0.539 0.58960
## CountryDemocratic Republic of the Congo 2.654e-01 1.980e-01 1.340 0.18012
## CountryDenmark 7.771e-02 1.978e-01 0.393 0.69439
## CountryDjibouti -6.464e-02 2.015e-01 -0.321 0.74833
## CountryDominica 1.515e-01 1.944e-01 0.779 0.43578
## CountryDominican Republic 3.748e-02 1.868e-01 0.201 0.84098
## CountryEast Timor 3.410e-02 2.015e-01 0.169 0.86564
## CountryEgypt 6.554e-02 2.001e-01 0.327 0.74330
## CountryEl Salvador 9.484e-02 1.958e-01 0.484 0.62818
## CountryEquatorial Guinea -6.020e-02 2.047e-01 -0.294 0.76869
## CountryEritrea 2.777e-01 2.064e-01 1.346 0.17833
## CountryEstonia -1.138e-01 1.996e-01 -0.570 0.56853
## CountryEthiopia 2.783e-01 2.058e-01 1.352 0.17633
## CountryFederated States of Micronesia 6.232e-02 2.062e-01 0.302 0.76246
## CountryFiji 1.460e-01 2.046e-01 0.714 0.47539
## CountryFinland -1.780e-01 1.924e-01 -0.925 0.35488
## CountryFrance 2.870e-02 1.927e-01 0.149 0.88161
## CountryGabon 2.643e-01 1.992e-01 1.327 0.18455
## CountryGeorgia -1.176e-01 2.024e-01 -0.581 0.56135
## CountryGermany -1.508e-01 2.001e-01 -0.754 0.45112
## CountryGhana -5.567e-02 2.059e-01 -0.270 0.78689
## CountryGreece -7.896e-02 2.025e-01 -0.390 0.69661
## CountryGreenland -6.347e-03 1.957e-01 -0.032 0.97413
## CountryGrenada 2.254e-01 1.851e-01 1.218 0.22325
## CountryGuatemala -7.898e-02 1.905e-01 -0.415 0.67849
## CountryGuinea 4.396e-01 1.979e-01 2.221 0.02635
## CountryGuinea-Bissau 9.889e-02 2.069e-01 0.478 0.63258
## CountryHaiti 1.128e-01 1.981e-01 0.570 0.56899
## CountryHonduras -5.709e-03 1.876e-01 -0.030 0.97572
## CountryHungary -6.010e-02 2.020e-01 -0.298 0.76605
## CountryIceland -3.084e-01 2.023e-01 -1.525 0.12733
## CountryIndia 1.160e-01 2.062e-01 0.563 0.57358
## CountryIndonesia 5.622e-02 1.977e-01 0.284 0.77610
## CountryIran 3.026e-02 2.075e-01 0.146 0.88403
## CountryIraq -2.236e-01 1.978e-01 -1.131 0.25822
## CountryIreland -6.117e-02 1.964e-01 -0.312 0.75542
## CountryIsrael -2.567e-01 2.063e-01 -1.244 0.21354
## CountryItaly 3.288e-02 1.962e-01 0.168 0.86694
## CountryJamaica 1.057e-01 1.915e-01 0.552 0.58110
## CountryJapan -1.562e-01 2.051e-01 -0.762 0.44634
## CountryJordan 1.635e-01 2.075e-01 0.788 0.43059
## CountryKazakhstan -3.814e-02 2.036e-01 -0.187 0.85140
## CountryKenya 7.189e-02 2.036e-01 0.353 0.72401
## CountryKiribati -1.466e-01 2.020e-01 -0.726 0.46793
## CountryKosovo -2.232e-01 2.020e-01 -1.105 0.26919
## CountryKuwait 2.410e-02 1.996e-01 0.121 0.90388
## CountryKyrgyzstan 3.586e-01 2.010e-01 1.784 0.07446
## CountryLaos -1.179e-01 2.017e-01 -0.585 0.55869
## CountryLatvia -1.653e-01 1.970e-01 -0.839 0.40122
## CountryLebanon -9.530e-02 2.025e-01 -0.471 0.63795
## CountryLesotho 3.547e-01 2.025e-01 1.751 0.07992
## CountryLiberia 1.925e-01 2.032e-01 0.948 0.34330
## CountryLibya -5.811e-02 2.006e-01 -0.290 0.77201
## CountryLiechtenstein -3.346e-01 2.029e-01 -1.649 0.09919
## CountryLithuania -2.707e-01 1.963e-01 -1.379 0.16790
## CountryLuxembourg -2.015e-01 1.989e-01 -1.013 0.31093
## CountryMacedonia -3.071e-01 1.960e-01 -1.567 0.11708
## CountryMadagascar 2.436e-01 2.028e-01 1.201 0.22973
## CountryMalawi 1.040e-01 2.097e-01 0.496 0.62000
## CountryMalaysia 8.798e-03 1.960e-01 0.045 0.96419
## CountryMaldives -2.596e-02 1.925e-01 -0.135 0.89272
## CountryMali 1.755e-02 2.037e-01 0.086 0.93133
## CountryMalta -9.153e-02 1.936e-01 -0.473 0.63638
## CountryMarshall Islands 2.568e-01 2.057e-01 1.249 0.21177
## CountryMauritania -1.121e-01 2.010e-01 -0.558 0.57693
## CountryMauritius 8.079e-02 2.024e-01 0.399 0.68979
## CountryMexico -3.733e-02 1.936e-01 -0.193 0.84712
## CountryMoldova 3.828e-02 2.002e-01 0.191 0.84838
## CountryMonaco 3.742e-02 2.086e-01 0.179 0.85762
## CountryMongolia -1.339e-01 1.991e-01 -0.673 0.50120
## CountryMontenegro -1.794e-02 2.000e-01 -0.090 0.92855
## CountryMorocco 1.341e-01 2.051e-01 0.654 0.51316
## CountryMozambique -2.576e-02 2.018e-01 -0.128 0.89841
## CountryMyanmar -1.139e-01 1.990e-01 -0.572 0.56700
## CountryNamibia 4.303e-01 1.991e-01 2.161 0.03067
## CountryNauru 1.113e-02 2.025e-01 0.055 0.95618
## CountryNepal 2.864e-01 2.055e-01 1.393 0.16354
## CountryNetherlands -2.393e-01 1.943e-01 -1.232 0.21803
## CountryNew Zealand -1.524e-01 2.053e-01 -0.742 0.45789
## CountryNicaragua -7.772e-02 1.895e-01 -0.410 0.68171
## CountryNiger 7.651e-02 2.069e-01 0.370 0.71148
## CountryNigeria 1.623e-01 2.024e-01 0.802 0.42262
## CountryNorth Korea -5.370e-02 1.989e-01 -0.270 0.78716
## CountryNorway -1.441e-01 1.965e-01 -0.734 0.46322
## CountryOman -3.809e-02 2.040e-01 -0.187 0.85191
## CountryPakistan -7.631e-02 2.079e-01 -0.367 0.71360
## CountryPalau -2.007e-02 2.050e-01 -0.098 0.92204
## CountryPanama -2.233e-01 1.884e-01 -1.185 0.23583
## CountryPapua New Guinea 8.896e-02 2.060e-01 0.432 0.66582
## CountryPhilippines 1.891e-01 2.006e-01 0.942 0.34597
## CountryPoland 1.445e-01 2.038e-01 0.709 0.47829
## CountryPortugal -2.381e-01 2.034e-01 -1.171 0.24171
## CountryQatar -1.886e-02 2.001e-01 -0.094 0.92494
## CountryRepublic of the Congo 1.046e-01 2.074e-01 0.505 0.61388
## CountryRomania -2.981e-01 2.002e-01 -1.489 0.13648
## CountryRussia -7.115e-02 1.986e-01 -0.358 0.72020
## CountryRwanda 3.024e-01 2.037e-01 1.484 0.13775
## CountrySaint Kitts and Nevis 1.877e-01 1.861e-01 1.009 0.31314
## CountrySaint Lucia 1.496e-02 1.857e-01 0.081 0.93581
## CountrySaint Vincent and the Grenadines -8.930e-02 1.922e-01 -0.465 0.64211
## CountrySamoa -1.442e-02 1.934e-01 -0.075 0.94058
## CountrySan Marino -2.398e-01 1.975e-01 -1.214 0.22484
## CountrySao Tome and Principe 1.649e-02 1.987e-01 0.083 0.93386
## CountrySaudi Arabia 7.664e-02 1.958e-01 0.391 0.69552
## CountrySenegal 1.306e-01 2.026e-01 0.644 0.51934
## CountrySerbia -5.147e-01 2.013e-01 -2.557 0.01056
## CountrySeychelles 3.801e-02 2.009e-01 0.189 0.84996
## CountrySierra Leone 1.626e-01 2.024e-01 0.803 0.42175
## CountrySingapore 5.478e-02 1.924e-01 0.285 0.77580
## CountrySlovakia -9.291e-02 1.991e-01 -0.467 0.64078
## CountrySlovenia -2.897e-01 2.027e-01 -1.429 0.15302
## CountrySolomon Islands 7.083e-02 2.025e-01 0.350 0.72655
## CountrySomalia 3.343e-02 1.998e-01 0.167 0.86713
## CountrySouth Africa -9.526e-02 2.012e-01 -0.473 0.63592
## CountrySouth Korea 1.429e-01 1.967e-01 0.726 0.46761
## CountrySouth Sudan -3.754e-02 2.011e-01 -0.187 0.85195
## CountrySpain -3.600e-01 2.028e-01 -1.776 0.07581
## CountrySri Lanka 4.060e-02 1.962e-01 0.207 0.83609
## CountrySudan 9.619e-02 2.000e-01 0.481 0.63054
## CountrySwaziland 5.879e-02 2.086e-01 0.282 0.77804
## CountrySweden -9.805e-02 2.041e-01 -0.480 0.63100
## CountrySwitzerland -2.209e-01 2.034e-01 -1.086 0.27735
## CountrySyria -1.902e-01 2.075e-01 -0.917 0.35921
## CountryTaiwan -1.159e-01 1.959e-01 -0.592 0.55391
## CountryTajikistan 8.166e-02 1.977e-01 0.413 0.67957
## CountryTanzania -2.997e-01 2.100e-01 -1.427 0.15351
## CountryThailand 7.130e-02 1.932e-01 0.369 0.71210
## CountryThe Bahamas 3.842e-01 1.899e-01 2.023 0.04308
## CountryThe Gambia -8.433e-02 2.015e-01 -0.418 0.67560
## CountryTogo 1.355e-01 2.082e-01 0.651 0.51524
## CountryTonga -1.714e-01 2.043e-01 -0.839 0.40142
## CountryTrinidad and Tobago NA NA NA NA
## CountryTunisia -1.947e-01 2.012e-01 -0.968 0.33321
## CountryTurkey -2.739e-01 2.100e-01 -1.305 0.19197
## CountryTurkmenistan -9.191e-02 1.980e-01 -0.464 0.64247
## CountryTuvalu 4.688e-02 2.023e-01 0.232 0.81672
## CountryUganda 2.067e-01 2.018e-01 1.024 0.30568
## CountryUkraine -3.154e-01 1.958e-01 -1.611 0.10722
## CountryUnited Arab Emirates -7.991e-02 1.984e-01 -0.403 0.68706
## CountryUnited Kingdom -1.020e-01 2.009e-01 -0.508 0.61142
## CountryUnited States of America NA NA NA NA
## CountryUzbekistan 5.642e-02 1.953e-01 0.289 0.77271
## CountryVanuatu NA NA NA NA
## CountryVatican City NA NA NA NA
## CountryVietnam NA NA NA NA
## CountryYemen 2.538e-01 2.126e-01 1.194 0.23257
## CountryZambia 9.347e-02 2.051e-01 0.456 0.64853
## CountryZimbabwe NA NA NA NA
## Item.TypeBeverages -3.091e-02 5.893e-02 -0.524 0.60000
## Item.TypeCereal -5.367e-02 5.149e-02 -1.042 0.29726
## Item.TypeClothes -8.740e-02 5.289e-02 -1.652 0.09848
## Item.TypeCosmetics -9.111e-02 5.917e-02 -1.540 0.12364
## Item.TypeFruits -5.489e-02 6.107e-02 -0.899 0.36882
## Item.TypeHousehold -4.888e-02 6.193e-02 -0.789 0.42990
## Item.TypeMeat -8.891e-02 6.876e-02 -1.293 0.19595
## Item.TypeOffice Supplies -2.834e-02 6.752e-02 -0.420 0.67468
## Item.TypePersonal Care -4.829e-02 5.755e-02 -0.839 0.40144
## Item.TypeSnacks -1.933e-02 5.331e-02 -0.363 0.71691
## Item.TypeVegetables -7.770e-03 5.247e-02 -0.148 0.88227
## Order.PriorityH 1.668e-02 2.938e-02 0.568 0.57031
## Order.PriorityL 4.043e-05 2.933e-02 0.001 0.99890
## Order.PriorityM -3.654e-02 2.934e-02 -1.245 0.21303
## Units.Sold 4.020e-06 6.451e-06 0.623 0.53312
## Total.Cost -1.228e-08 2.969e-08 -0.414 0.67917
## Total.Profit 5.080e-08 9.919e-08 0.512 0.60854
##
## (Intercept)
## RegionAustralia and Oceania
## RegionCentral America and the Caribbean
## RegionEurope
## RegionMiddle East and North Africa
## RegionNorth America
## RegionSub-Saharan Africa
## CountryAlbania
## CountryAlgeria
## CountryAndorra
## CountryAngola
## CountryAntigua and Barbuda
## CountryArmenia
## CountryAustralia
## CountryAustria
## CountryAzerbaijan
## CountryBahrain
## CountryBangladesh
## CountryBarbados
## CountryBelarus .
## CountryBelgium
## CountryBelize
## CountryBenin
## CountryBhutan
## CountryBosnia and Herzegovina *
## CountryBotswana
## CountryBrunei
## CountryBulgaria
## CountryBurkina Faso
## CountryBurundi
## CountryCambodia
## CountryCameroon
## CountryCanada
## CountryCape Verde
## CountryCentral African Republic **
## CountryChad
## CountryChina
## CountryComoros
## CountryCosta Rica
## CountryCote d'Ivoire
## CountryCroatia
## CountryCuba
## CountryCyprus
## CountryCzech Republic
## CountryDemocratic Republic of the Congo
## CountryDenmark
## CountryDjibouti
## CountryDominica
## CountryDominican Republic
## CountryEast Timor
## CountryEgypt
## CountryEl Salvador
## CountryEquatorial Guinea
## CountryEritrea
## CountryEstonia
## CountryEthiopia
## CountryFederated States of Micronesia
## CountryFiji
## CountryFinland
## CountryFrance
## CountryGabon
## CountryGeorgia
## CountryGermany
## CountryGhana
## CountryGreece
## CountryGreenland
## CountryGrenada
## CountryGuatemala
## CountryGuinea *
## CountryGuinea-Bissau
## CountryHaiti
## CountryHonduras
## CountryHungary
## CountryIceland
## CountryIndia
## CountryIndonesia
## CountryIran
## CountryIraq
## CountryIreland
## CountryIsrael
## CountryItaly
## CountryJamaica
## CountryJapan
## CountryJordan
## CountryKazakhstan
## CountryKenya
## CountryKiribati
## CountryKosovo
## CountryKuwait
## CountryKyrgyzstan .
## CountryLaos
## CountryLatvia
## CountryLebanon
## CountryLesotho .
## CountryLiberia
## CountryLibya
## CountryLiechtenstein .
## CountryLithuania
## CountryLuxembourg
## CountryMacedonia
## CountryMadagascar
## CountryMalawi
## CountryMalaysia
## CountryMaldives
## CountryMali
## CountryMalta
## CountryMarshall Islands
## CountryMauritania
## CountryMauritius
## CountryMexico
## CountryMoldova
## CountryMonaco
## CountryMongolia
## CountryMontenegro
## CountryMorocco
## CountryMozambique
## CountryMyanmar
## CountryNamibia *
## CountryNauru
## CountryNepal
## CountryNetherlands
## CountryNew Zealand
## CountryNicaragua
## CountryNiger
## CountryNigeria
## CountryNorth Korea
## CountryNorway
## CountryOman
## CountryPakistan
## CountryPalau
## CountryPanama
## CountryPapua New Guinea
## CountryPhilippines
## CountryPoland
## CountryPortugal
## CountryQatar
## CountryRepublic of the Congo
## CountryRomania
## CountryRussia
## CountryRwanda
## CountrySaint Kitts and Nevis
## CountrySaint Lucia
## CountrySaint Vincent and the Grenadines
## CountrySamoa
## CountrySan Marino
## CountrySao Tome and Principe
## CountrySaudi Arabia
## CountrySenegal
## CountrySerbia *
## CountrySeychelles
## CountrySierra Leone
## CountrySingapore
## CountrySlovakia
## CountrySlovenia
## CountrySolomon Islands
## CountrySomalia
## CountrySouth Africa
## CountrySouth Korea
## CountrySouth Sudan
## CountrySpain .
## CountrySri Lanka
## CountrySudan
## CountrySwaziland
## CountrySweden
## CountrySwitzerland
## CountrySyria
## CountryTaiwan
## CountryTajikistan
## CountryTanzania
## CountryThailand
## CountryThe Bahamas *
## CountryThe Gambia
## CountryTogo
## CountryTonga
## CountryTrinidad and Tobago
## CountryTunisia
## CountryTurkey
## CountryTurkmenistan
## CountryTuvalu
## CountryUganda
## CountryUkraine
## CountryUnited Arab Emirates
## CountryUnited Kingdom
## CountryUnited States of America
## CountryUzbekistan
## CountryVanuatu
## CountryVatican City
## CountryVietnam
## CountryYemen
## CountryZambia
## CountryZimbabwe
## Item.TypeBeverages
## Item.TypeCereal
## Item.TypeClothes .
## Item.TypeCosmetics
## Item.TypeFruits
## Item.TypeHousehold
## Item.TypeMeat
## Item.TypeOffice Supplies
## Item.TypePersonal Care
## Item.TypeSnacks
## Item.TypeVegetables
## Order.PriorityH
## Order.PriorityL
## Order.PriorityM
## Units.Sold
## Total.Cost
## Total.Profit
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 51986 on 37499 degrees of freedom
## Residual deviance: 51778 on 37298 degrees of freedom
## AIC: 52182
##
## Number of Fisher Scoring iterations: 3
#vif(df_50000_large_mod1)