Visit the following website and explore the range of sizes of this dataset (from 100 to 5 million records).
https://eforexcel.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/
Based on your computer’s capabilities (memory, CPU), select 2 files you can handle (recommended one small, one large)
Review the structure and content of the tables, and think which two machine learning algorithms presented so far could be used to analyze the data, and how can they be applied in the suggested environment of the datasets.
Write a short essay explaining your selection. Then, select one of the 2 algorithms and explore how to analyze and predict an outcome based on the data available. This will be an exploratory exercise, so feel free to show errors and warnings that raise during the analysis. Test the code with both datasets selected and compare the results. Which result will you trust if you need to make a business decision? Do you think an analysis could be prone to errors when using too much data, or when using the least amount possible?
Develop your exploratory analysis of the data and the essay in the following 2 weeks. You’ll have until March 17 to submit both.
I have chosen the smallest 100 Sales file and the 10000 Sales files.
# define the filename-manual procedure
filename1 <- "C:/Users/Lisa/OneDrive/CUNY/622/HW1/Sales100.csv"
# load the CSV file from the local directory
dataset100 <- read.csv(filename1, header=TRUE)
dataset100$Region<-as.factor(dataset100$Region)
dataset100$Country<-as.factor(dataset100$Country)
dataset100$Item.Type<-as.factor(dataset100$Item.Type)
dataset100$Sales.Channel<-as.factor(dataset100$Sales.Channel)
dataset100$Order.Priority<-as.factor(dataset100$Order.Priority)
dataset100$Ship.Date <- as.Date(dataset100$Ship.Date, "%m/%d/%Y")
dataset100$Order.Date <- as.Date(dataset100$Order.Date, "%m/%d/%Y")
dataset100<-dataset100 %>% mutate(Days=Ship.Date-Order.Date, Order.Day=format(dataset100$Order.Date, format="%a"), Order.Month=format(dataset100$Order.Date, format="%b"),Order.Year=format(dataset100$Order.Date, format="%Y"))
dataset100$Days<-as.numeric(dataset100$Days)
dataset100$Order.Day<-as.factor(dataset100$Order.Day)
dataset100$Order.Month<-as.factor(dataset100$Order.Month)
dataset100$Order.Year<-as.numeric(dataset100$Order.Year)
dataset100<-dataset100 %>% select(Region,Country, Item.Type, Sales.Channel, Order.Priority, Units.Sold, Unit.Price, Unit.Cost, Total.Cost, Total.Profit, Total.Revenue, Days, Order.Day, Order.Month, Order.Year)
dataset100_2<-dataset100
dim(dataset100)
## [1] 100 15
str(dataset100)
## 'data.frame': 100 obs. of 15 variables:
## $ Region : Factor w/ 7 levels "Asia","Australia and Oceania",..: 2 3 4 7 7 2 7 7 7 7 ...
## $ Country : Factor w/ 76 levels "Albania","Angola",..: 74 23 56 60 57 66 2 10 54 62 ...
## $ Item.Type : Factor w/ 12 levels "Baby Food","Beverages",..: 1 3 9 6 9 1 7 12 10 3 ...
## $ Sales.Channel : Factor w/ 2 levels "Offline","Online": 1 2 1 2 1 2 1 2 1 2 ...
## $ Order.Priority: Factor w/ 4 levels "C","H","L","M": 2 1 3 1 3 1 4 2 4 2 ...
## $ Units.Sold : int 9925 2804 1779 8102 5062 2974 4187 8082 6070 6593 ...
## $ Unit.Price : num 255.28 205.7 651.21 9.33 651.21 ...
## $ Unit.Cost : num 159.42 117.11 524.96 6.92 524.96 ...
## $ Total.Cost : num 1582244 328376 933904 56066 2657348 ...
## $ Total.Profit : num 951411 248406 224599 19526 639078 ...
## $ Total.Revenue : num 2533654 576783 1158503 75592 3296425 ...
## $ Days : num 30 24 6 15 5 17 4 10 42 42 ...
## $ Order.Day : Factor w/ 7 levels "Fri","Mon","Sat",..: 1 7 1 1 1 7 3 6 6 1 ...
## $ Order.Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 9 2 9 7 4 4 1 6 6 1 ...
## $ Order.Year : num 2010 2012 2014 2014 2013 ...
summary(dataset100)
## Region Country
## Asia :11 The Gambia : 4
## Australia and Oceania :11 Australia : 3
## Central America and the Caribbean: 7 Djibouti : 3
## Europe :22 Mexico : 3
## Middle East and North Africa :10 Sao Tome and Principe: 3
## North America : 3 Sierra Leone : 3
## Sub-Saharan Africa :36 (Other) :81
## Item.Type Sales.Channel Order.Priority Units.Sold
## Clothes :13 Offline:50 C:22 Min. : 124
## Cosmetics :13 Online :50 H:30 1st Qu.:2836
## Office Supplies:12 L:27 Median :5382
## Fruits :10 M:21 Mean :5129
## Personal Care :10 3rd Qu.:7369
## Household : 9 Max. :9925
## (Other) :33
## Unit.Price Unit.Cost Total.Cost Total.Profit
## Min. : 9.33 Min. : 6.92 Min. : 3612 Min. : 1258
## 1st Qu.: 81.73 1st Qu.: 35.84 1st Qu.: 168868 1st Qu.: 121444
## Median :179.88 Median :107.28 Median : 363566 Median : 290768
## Mean :276.76 Mean :191.05 Mean : 931806 Mean : 441682
## 3rd Qu.:437.20 3rd Qu.:263.33 3rd Qu.:1613870 3rd Qu.: 635829
## Max. :668.27 Max. :524.96 Max. :4509794 Max. :1719922
##
## Total.Revenue Days Order.Day Order.Month Order.Year
## Min. : 4870 Min. : 0.00 Fri:19 Feb :13 Min. :2010
## 1st Qu.: 268721 1st Qu.: 9.75 Mon:14 Jul :12 1st Qu.:2012
## Median : 752314 Median :23.50 Sat:17 May :11 Median :2013
## Mean :1373488 Mean :23.36 Sun:11 Oct :11 Mean :2013
## 3rd Qu.:2212045 3rd Qu.:36.25 Thu:10 Jun :10 3rd Qu.:2015
## Max. :5997055 Max. :50.00 Tue:18 Apr : 9 Max. :2017
## Wed:11 (Other):34
##Set up for larger file
# define the filename-manual procedure
filename2 <- "C:/Users/Lisa/OneDrive/CUNY/622/HW1/Sales10000.csv"
# load the CSV file from the local directory
dataset10000 <- read.csv(filename2, header=TRUE)
dataset10000$Region<-as.factor(dataset10000$Region)
dataset10000$Country<-as.factor(dataset10000$Country)
dataset10000$Item.Type<-as.factor(dataset10000$Item.Type)
dataset10000$Sales.Channel<-as.factor(dataset10000$Sales.Channel)
dataset10000$Order.Priority<-as.factor(dataset10000$Order.Priority)
dataset10000$Ship.Date <- as.Date(dataset10000$Ship.Date, "%m/%d/%Y")
dataset10000$Order.Date <- as.Date(dataset10000$Order.Date, "%m/%d/%Y")
dataset10000<-dataset10000 %>% mutate(Days=Ship.Date-Order.Date, Order.Day=format(dataset10000$Order.Date, format="%a"), Order.Month=format(dataset10000$Order.Date, format="%b"),Order.Year=format(dataset10000$Order.Date, format="%Y"))
dataset10000$Days<-as.numeric(dataset10000$Days)
dataset10000$Order.Day<-as.factor(dataset10000$Order.Day)
dataset10000$Order.Month<-as.factor(dataset10000$Order.Month)
dataset10000$Order.Year<-as.numeric(dataset10000$Order.Year)
dataset10000<-dataset10000 %>% select(Region,Country, Item.Type, Sales.Channel, Order.Priority, Units.Sold, Unit.Price, Unit.Cost, Total.Cost, Total.Profit, Total.Revenue, Days, Order.Day, Order.Month, Order.Year)
dataset10000_2<-dataset10000
dim(dataset10000)
## [1] 10000 15
str(dataset10000)
## 'data.frame': 10000 obs. of 15 variables:
## $ Region : Factor w/ 7 levels "Asia","Australia and Oceania",..: 7 4 5 7 4 7 1 1 7 3 ...
## $ Country : Factor w/ 185 levels "Afghanistan",..: 30 86 123 39 38 151 85 31 48 65 ...
## $ Item.Type : Factor w/ 12 levels "Baby Food","Beverages",..: 9 2 12 7 2 2 12 1 8 9 ...
## $ Sales.Channel : Factor w/ 2 levels "Offline","Online": 2 2 1 2 2 1 2 2 2 2 ...
## $ Order.Priority: Factor w/ 4 levels "C","H","L","M": 3 1 1 1 1 2 3 1 3 1 ...
## $ Units.Sold : int 4484 1075 6515 7683 3491 9880 4825 3330 2431 6197 ...
## $ Unit.Price : num 651.2 47.5 154.1 668.3 47.5 ...
## $ Unit.Cost : num 525 31.8 90.9 502.5 31.8 ...
## $ Total.Cost : num 2353921 34174 592409 3861015 110979 ...
## $ Total.Profit : num 566105 16835 411292 1273304 54669 ...
## $ Total.Revenue : num 2920026 51009 1003701 5134318 165648 ...
## $ Days : num 16 26 19 25 39 42 28 32 50 16 ...
## $ Order.Day : Factor w/ 7 levels "Fri","Mon","Sat",..: 5 2 5 6 6 6 4 2 1 3 ...
## $ Order.Month : Factor w/ 12 levels "Apr","Aug","Dec",..: 5 3 5 12 11 6 4 1 10 6 ...
## $ Order.Year : num 2011 2015 2011 2012 2015 ...
summary(dataset10000)
## Region Country
## Asia :1469 Lithuania : 72
## Australia and Oceania : 797 United Kingdom: 72
## Central America and the Caribbean:1019 Moldova : 71
## Europe :2633 Croatia : 70
## Middle East and North Africa :1264 Seychelles : 70
## North America : 215 Botswana : 69
## Sub-Saharan Africa :2603 (Other) :9576
## Item.Type Sales.Channel Order.Priority Units.Sold
## Personal Care : 888 Offline:4939 C:2555 Min. : 2
## Household : 875 Online :5061 H:2503 1st Qu.: 2531
## Clothes : 872 L:2494 Median : 4962
## Baby Food : 842 M:2448 Mean : 5003
## Office Supplies: 837 3rd Qu.: 7472
## Vegetables : 836 Max. :10000
## (Other) :4850
## Unit.Price Unit.Cost Total.Cost Total.Profit
## Min. : 9.33 Min. : 6.92 Min. : 125 Min. : 43.4
## 1st Qu.:109.28 1st Qu.: 56.67 1st Qu.: 164786 1st Qu.: 98329.1
## Median :205.70 Median :117.11 Median : 481606 Median : 289099.0
## Mean :268.14 Mean :188.81 Mean : 938266 Mean : 395089.3
## 3rd Qu.:437.20 3rd Qu.:364.69 3rd Qu.:1183822 3rd Qu.: 566422.7
## Max. :668.27 Max. :524.96 Max. :5241726 Max. :1738178.4
##
## Total.Revenue Days Order.Day Order.Month Order.Year
## Min. : 168 Min. : 0.00 Fri:1440 Jul : 926 Min. :2010
## 1st Qu.: 288551 1st Qu.:12.00 Mon:1437 Mar : 917 1st Qu.:2011
## Median : 800051 Median :25.00 Sat:1381 Jan : 908 Median :2013
## Mean :1333355 Mean :25.06 Sun:1467 May : 897 Mean :2013
## 3rd Qu.:1819143 3rd Qu.:37.00 Thu:1406 Jun : 873 3rd Qu.:2015
## Max. :6680027 Max. :50.00 Tue:1422 Apr : 850 Max. :2017
## Wed:1447 (Other):4629
length(unique(dataset10000$Country))
## [1] 185
# Note there are 185 countries in the 100000 set.
Predicting Sales.Channel which is Online or Offline given the following predictors:
Order.ID and Ship.Date were not used.
counts1 <- table(dataset100$Sales.Channel)
barplot(counts1, main="Sales Channel file 100 ",
xlab="Sales channel")
counts2 <- table(dataset10000$Sales.Channel)
barplot(counts2, main="Sales Channel file 10000 ",
xlab="Sales channel")
## Variable work
The Order.Date and Ship.Date were converted from character to Date format and the following variables were created:
Days: Order.Date - Ship.Date Order.Day: Mon, Tue…. Order.Month: Jan, Feb…. Order.Year: 1999, 2000…….
And the Order.Date, Ship.Date and Order.ID variables are dropped.
Let’s predict the Sales.Channel using KNN:
There are no missing values.
We have to normalize features because of the euclidean distance calculations.
KNN required normalizing. Creation of dummy variables for categorical variables. Splitting the labels out of the dataset.
normalize <-function (x) {
return ((x-min(x))/(max(x) - min(x)))
}
dataset100<-dataset100 %>%
mutate (Units.Sold=normalize(Units.Sold)) %>%
mutate (Unit.Price=normalize(Unit.Price)) %>%
mutate (Unit.Cost=normalize(Unit.Cost)) %>%
mutate (Total.Revenue=normalize(Total.Revenue)) %>%
mutate (Total.Profit=normalize(Total.Profit)) %>%
mutate (Total.Cost=normalize(Total.Cost)) %>%
mutate (Days=normalize(Days)) %>%
mutate (Order.Year=normalize(Order.Year))
summary(dataset100)
## Region Country
## Asia :11 The Gambia : 4
## Australia and Oceania :11 Australia : 3
## Central America and the Caribbean: 7 Djibouti : 3
## Europe :22 Mexico : 3
## Middle East and North Africa :10 Sao Tome and Principe: 3
## North America : 3 Sierra Leone : 3
## Sub-Saharan Africa :36 (Other) :81
## Item.Type Sales.Channel Order.Priority Units.Sold
## Clothes :13 Offline:50 C:22 Min. :0.0000
## Cosmetics :13 Online :50 H:30 1st Qu.:0.2767
## Office Supplies:12 L:27 Median :0.5365
## Fruits :10 M:21 Mean :0.5106
## Personal Care :10 3rd Qu.:0.7392
## Household : 9 Max. :1.0000
## (Other) :33
## Unit.Price Unit.Cost Total.Cost Total.Profit
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.1099 1st Qu.:0.05583 1st Qu.:0.03667 1st Qu.:0.06993
## Median :0.2588 Median :0.19372 Median :0.07988 Median :0.16845
## Mean :0.4059 Mean :0.35543 Mean :0.20598 Mean :0.25626
## 3rd Qu.:0.6493 3rd Qu.:0.49496 3rd Qu.:0.35734 3rd Qu.:0.36922
## Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.00000
##
## Total.Revenue Days Order.Day Order.Month Order.Year
## Min. :0.00000 Min. :0.0000 Fri:19 Feb :13 Min. :0.0000
## 1st Qu.:0.04403 1st Qu.:0.1950 Mon:14 Jul :12 1st Qu.:0.2857
## Median :0.12474 Median :0.4700 Sat:17 May :11 Median :0.4286
## Mean :0.22840 Mean :0.4672 Sun:11 Oct :11 Mean :0.4614
## 3rd Qu.:0.36834 3rd Qu.:0.7250 Thu:10 Jun :10 3rd Qu.:0.7143
## Max. :1.00000 Max. :1.0000 Tue:18 Apr : 9 Max. :1.0000
## Wed:11 (Other):34
First, split off the dataset class labels:
dataset100_labels<-dataset100 %>% select(Sales.Channel)
dataset100<-dataset100 %>% select(-Sales.Channel)
colnames(dataset100)
## [1] "Region" "Country" "Item.Type" "Order.Priority"
## [5] "Units.Sold" "Unit.Price" "Unit.Cost" "Total.Cost"
## [9] "Total.Profit" "Total.Revenue" "Days" "Order.Day"
## [13] "Order.Month" "Order.Year"
Creating dummy variables:
dataset100<-dummy.data.frame(data=dataset100, sep="_")
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
#colnames(dataset100)
set.seed(1234)
sample_index<-sample(nrow(dataset100), round(nrow(dataset100)*.75), replace=FALSE)
dataset100_train<-dataset100[sample_index,]
dataset100_test<-dataset100[-sample_index,]
#split class labels
dataset100_train_labels<-as.factor(dataset100_labels[sample_index,])
dataset100_test_labels<-as.factor(dataset100_labels[-sample_index,])
dataset100_pred1<-knn(
train =dataset100_train,
test=dataset100_test,
cl=dataset100_train_labels,
k=8)
#head(dataset100_pred1)
Let’s look at our model actually did in predicting the right label…
dataset100_pred1_table<-table(dataset100_test_labels, dataset100_pred1)
dataset100_pred1_table
## dataset100_pred1
## dataset100_test_labels Offline Online
## Offline 7 7
## Online 6 5
sum(diag(dataset100_pred1_table))/nrow(dataset100_test)
## [1] 0.48
The rate is worse than 50% of correct prediction. Worse than a coin toss.
normalize <-function (x) {
return ((x-min(x))/(max(x) - min(x)))
}
dataset10000<-dataset10000 %>%
mutate (Units.Sold=normalize(Units.Sold)) %>%
mutate (Unit.Price=normalize(Unit.Price)) %>%
mutate (Unit.Cost=normalize(Unit.Cost)) %>%
mutate (Total.Revenue=normalize(Total.Revenue)) %>%
mutate (Total.Profit=normalize(Total.Profit)) %>%
mutate (Total.Cost=normalize(Total.Cost)) %>%
mutate (Days=normalize(Days)) %>%
mutate (Order.Year=normalize(Order.Year))
summary(dataset10000)
## Region Country
## Asia :1469 Lithuania : 72
## Australia and Oceania : 797 United Kingdom: 72
## Central America and the Caribbean:1019 Moldova : 71
## Europe :2633 Croatia : 70
## Middle East and North Africa :1264 Seychelles : 70
## North America : 215 Botswana : 69
## Sub-Saharan Africa :2603 (Other) :9576
## Item.Type Sales.Channel Order.Priority Units.Sold
## Personal Care : 888 Offline:4939 C:2555 Min. :0.0000
## Household : 875 Online :5061 H:2503 1st Qu.:0.2529
## Clothes : 872 L:2494 Median :0.4961
## Baby Food : 842 M:2448 Mean :0.5002
## Office Supplies: 837 3rd Qu.:0.7471
## Vegetables : 836 Max. :1.0000
## (Other) :4850
## Unit.Price Unit.Cost Total.Cost Total.Profit
## Min. :0.0000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.1517 1st Qu.:0.09604 1st Qu.:0.03141 1st Qu.:0.05655
## Median :0.2980 Median :0.21271 Median :0.09186 Median :0.16630
## Mean :0.3928 Mean :0.35111 Mean :0.17898 Mean :0.22728
## 3rd Qu.:0.6493 3rd Qu.:0.69062 3rd Qu.:0.22583 3rd Qu.:0.32585
## Max. :1.0000 Max. :1.00000 Max. :1.00000 Max. :1.00000
##
## Total.Revenue Days Order.Day Order.Month Order.Year
## Min. :0.00000 Min. :0.0000 Fri:1440 Jul : 926 Min. :0.0000
## 1st Qu.:0.04317 1st Qu.:0.2400 Mon:1437 Mar : 917 1st Qu.:0.1429
## Median :0.11975 Median :0.5000 Sat:1381 Jan : 908 Median :0.4286
## Mean :0.19958 Mean :0.5012 Sun:1467 May : 897 Mean :0.4771
## 3rd Qu.:0.27231 3rd Qu.:0.7400 Thu:1406 Jun : 873 3rd Qu.:0.7143
## Max. :1.00000 Max. :1.0000 Tue:1422 Apr : 850 Max. :1.0000
## Wed:1447 (Other):4629
dataset10000_labels<-dataset10000 %>% select(Sales.Channel)
dataset10000<-dataset10000 %>% select(-Sales.Channel)
#colnames(dataset100)
dataset10000<-dummy.data.frame(data=dataset10000, sep="_")
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
## Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE):
## non-list contrasts argument ignored
#colnames(dataset100)
set.seed(1234)
sample_index<-sample(nrow(dataset10000), round(nrow(dataset100)*.75), replace=FALSE)
dataset10000_train<-dataset10000[sample_index,]
dataset10000_test<-dataset10000[-sample_index,]
#split class labels
dataset10000_train_labels<-as.factor(dataset10000_labels[sample_index,])
dataset10000_test_labels<-as.factor(dataset10000_labels[-sample_index,])
dataset10000_pred1<-knn(
train =dataset10000_train,
test=dataset10000_test,
cl=dataset10000_train_labels,
k=75)
#head(dataset100_pred1)
dataset10000_pred1_table<-table(dataset10000_test_labels, dataset10000_pred1)
dataset10000_pred1_table
## dataset10000_pred1
## dataset10000_test_labels Offline Online
## Offline 4897 0
## Online 5028 0
sum(diag(dataset10000_pred1_table))/nrow(dataset10000_test)
## [1] 0.4934005
The larger data set also has a correct prediction rate of less than 50%. Worse than a coin toss.
Let’s predict Sales.Channel using a logistic model
set.seed(12345)
sample_set<-sample(nrow(dataset100_2), round(nrow(dataset100_2)*.75), replace=FALSE)
dataset100_2_train<-dataset100_2[sample_set,]
dataset100_2_test<-dataset100_2[-sample_set,]
dim(dataset100_2_train)
## [1] 75 15
dim(dataset100_2_test)
## [1] 25 15
glm.log<-glm(Sales.Channel ~ Region + Country + Item.Type + Order.Priority +
Units.Sold + Unit.Price +
Unit.Cost + Total.Cost +
Total.Profit + Total.Revenue +
Days + Order.Day + Order.Month + Order.Year, data=dataset100_2_train, family=binomial)
summary(glm.log)
##
## Call:
## glm(formula = Sales.Channel ~ Region + Country + Item.Type +
## Order.Priority + Units.Sold + Unit.Price + Unit.Cost + Total.Cost +
## Total.Profit + Total.Revenue + Days + Order.Day + Order.Month +
## Order.Year, family = binomial, data = dataset100_2_train)
##
## Deviance Residuals:
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [26] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## Coefficients: (31 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.657e+01 1.181e+06 0 1
## RegionAustralia and Oceania 9.028e-06 1.332e+06 0 1
## RegionCentral America and the Caribbean 2.657e+01 1.573e+06 0 1
## RegionEurope 5.313e+01 1.234e+06 0 1
## RegionMiddle East and North Africa 5.313e+01 1.234e+06 0 1
## RegionNorth America -2.657e+01 6.662e+05 0 1
## RegionSub-Saharan Africa 2.657e+01 9.079e+05 0 1
## CountryAngola -8.830e-06 1.511e+06 0 1
## CountryAustralia 7.970e+01 1.573e+06 0 1
## CountryAustria 5.011e-06 1.007e+06 0 1
## CountryAzerbaijan 5.313e+01 1.424e+06 0 1
## CountryBangladesh 7.970e+01 1.038e+06 0 1
## CountryBelize 2.657e+01 1.689e+06 0 1
## CountryBrunei 7.970e+01 9.079e+05 0 1
## CountryBulgaria 5.313e+01 1.126e+06 0 1
## CountryCape Verde 2.657e+01 1.154e+06 0 1
## CountryCosta Rica -5.313e+01 1.424e+06 0 1
## CountryCote d'Ivoire 2.657e+01 1.356e+06 0 1
## CountryDjibouti 2.657e+01 8.352e+05 0 1
## CountryEast Timor -2.657e+01 9.079e+05 0 1
## CountryFederated States of Micronesia 7.970e+01 1.651e+06 0 1
## CountryFiji -1.383e-05 1.745e+06 0 1
## CountryFrance 5.313e+01 1.007e+06 0 1
## CountryGrenada -2.657e+01 1.490e+06 0 1
## CountryHaiti 2.657e+01 1.208e+06 0 1
## CountryHonduras -1.018e-08 1.332e+06 0 1
## CountryIran 5.313e+01 1.007e+06 0 1
## CountryKenya 1.063e+02 1.234e+06 0 1
## CountryKiribati 7.970e+01 2.238e+06 0 1
## CountryKuwait 2.657e+01 9.753e+05 0 1
## CountryKyrgyzstan 1.594e+02 1.234e+06 0 1
## CountryLaos 5.313e+01 1.126e+06 0 1
## CountryLebanon 2.657e+01 9.079e+05 0 1
## CountryLesotho 2.657e+01 1.259e+06 0 1
## CountryLibya -4.627e-06 5.036e+05 0 1
## CountryMacedonia -5.313e+01 5.036e+05 0 1
## CountryMali 7.970e+01 1.934e+06 0 1
## CountryMexico NA NA NA NA
## CountryMoldova -2.657e+01 1.038e+06 0 1
## CountryMongolia -5.313e+01 1.126e+06 0 1
## CountryMozambique -2.657e+01 9.079e+05 0 1
## CountryMyanmar 2.657e+01 7.555e+05 0 1
## CountryNew Zealand 7.970e+01 1.532e+06 0 1
## CountryNicaragua NA NA NA NA
## CountryNiger 2.657e+01 9.079e+05 0 1
## CountryPakistan -2.657e+01 1.154e+06 0 1
## CountryPortugal 1.846e-05 1.745e+06 0 1
## CountryRepublic of the Congo -2.657e+01 1.651e+06 0 1
## CountryRomania 5.313e+01 1.007e+06 0 1
## CountryRussia -2.657e+01 9.079e+05 0 1
## CountryRwanda 2.001e-07 1.234e+06 0 1
## CountrySamoa 1.063e+02 1.332e+06 0 1
## CountrySan Marino -2.657e+01 1.612e+06 0 1
## CountrySao Tome and Principe 2.083e-07 1.332e+06 0 1
## CountrySenegal 2.657e+01 9.079e+05 0 1
## CountrySierra Leone 2.657e+01 1.038e+06 0 1
## CountrySlovenia -2.657e+01 1.098e+06 0 1
## CountrySouth Sudan -7.970e+01 1.154e+06 0 1
## CountrySpain -5.313e+01 1.234e+06 0 1
## CountrySri Lanka 5.313e+01 1.332e+06 0 1
## CountrySwitzerland -4.017e-06 1.234e+06 0 1
## CountrySyria NA NA NA NA
## CountryThe Gambia -2.657e+01 8.352e+05 0 1
## CountryTurkmenistan NA NA NA NA
## CountryTuvalu NA NA NA NA
## CountryUnited Kingdom 9.417e-06 1.234e+06 0 1
## CountryZambia NA NA NA NA
## Item.TypeBeverages -7.970e+01 1.490e+06 0 1
## Item.TypeCereal 9.028e-06 1.234e+06 0 1
## Item.TypeClothes -5.313e+01 1.424e+06 0 1
## Item.TypeCosmetics -5.313e+01 1.234e+06 0 1
## Item.TypeFruits -2.657e+01 1.447e+06 0 1
## Item.TypeHousehold -2.657e+01 9.079e+05 0 1
## Item.TypeMeat 5.313e+01 5.036e+05 0 1
## Item.TypeOffice Supplies -5.313e+01 1.332e+06 0 1
## Item.TypePersonal Care 9.023e-06 1.126e+06 0 1
## Item.TypeSnacks 4.404e-06 7.122e+05 0 1
## Item.TypeVegetables -1.063e+02 1.332e+06 0 1
## Order.PriorityH -5.313e+01 5.036e+05 0 1
## Order.PriorityL -2.657e+01 5.631e+05 0 1
## Order.PriorityM -5.313e+01 1.007e+06 0 1
## Units.Sold NA NA NA NA
## Unit.Price NA NA NA NA
## Unit.Cost NA NA NA NA
## Total.Cost NA NA NA NA
## Total.Profit NA NA NA NA
## Total.Revenue NA NA NA NA
## Days NA NA NA NA
## Order.DayMon NA NA NA NA
## Order.DaySat NA NA NA NA
## Order.DaySun NA NA NA NA
## Order.DayThu NA NA NA NA
## Order.DayTue NA NA NA NA
## Order.DayWed NA NA NA NA
## Order.MonthAug NA NA NA NA
## Order.MonthDec NA NA NA NA
## Order.MonthFeb NA NA NA NA
## Order.MonthJan NA NA NA NA
## Order.MonthJul NA NA NA NA
## Order.MonthJun NA NA NA NA
## Order.MonthMar NA NA NA NA
## Order.MonthMay NA NA NA NA
## Order.MonthNov NA NA NA NA
## Order.MonthOct NA NA NA NA
## Order.MonthSep NA NA NA NA
## Order.Year NA NA NA NA
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.0396e+02 on 74 degrees of freedom
## Residual deviance: 4.3512e-10 on 0 degrees of freedom
## AIC: 150
##
## Number of Fisher Scoring iterations: 25
#Use coef() to access coeff
#coef(glm.log)
#summary(glm.log)$coef
The predict() function used to predict probability that Sales.Channel is offline given the predictors. “response” tells r to print probabilities.
glm.probs<- predict(glm.log, type="response")
glm.probs[1:10]
## 14 51 80 90 92 24
## 2.900701e-12 1.000000e+00 1.000000e+00 2.900701e-12 2.900701e-12 1.000000e+00
## 58 93 75 88
## 2.900701e-12 1.000000e+00 2.900701e-12 2.900701e-12
contrasts(dataset100_2$Sales.Channel)
## Online
## Offline 0
## Online 1
R created dummy variable for online 1.
Must convert probabilities to class offline or online.
glm.pred<-rep("Offline",nrow(dataset100_2_train))
glm.pred[glm.probs>.5]="Online"
table(glm.pred,dataset100_2_train$Sales.Channel)
##
## glm.pred Offline Online
## Offline 37 0
## Online 0 38
mean(glm.pred ==dataset100_2_train$Sales.Channel)
## [1] 1
This means the model correctly predicted 100% of the time on the training data. This is unbelievable.
The data is probably too small and not a good representation of the data. For example, there are 100 observations and the distinct values of countries as seen in the 10000 observation dataset is 135. So even if every observation was a different country, not all of the countries could be represented. We need to use a larger dataset.
Let’s check the test data.
glm.log_test<-glm(Sales.Channel ~ Region + Country + Item.Type + Order.Priority +
Units.Sold + Unit.Price +
Unit.Cost + Total.Cost +
Total.Profit + Total.Revenue +
Days + Order.Day + Order.Month + Order.Year, data=dataset100_2_test, family=binomial)
summary(glm.log_test)
##
## Call:
## glm(formula = Sales.Channel ~ Region + Country + Item.Type +
## Order.Priority + Units.Sold + Unit.Price + Unit.Cost + Total.Cost +
## Total.Profit + Total.Revenue + Days + Order.Day + Order.Month +
## Order.Year, family = binomial, data = dataset100_2_test)
##
## Deviance Residuals:
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## Coefficients: (36 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.557e+01 4.830e+05 0 1
## RegionAustralia and Oceania -9.757e-08 3.055e+05 0 1
## RegionEurope -8.657e-08 5.291e+05 0 1
## RegionMiddle East and North Africa 5.113e+01 4.320e+05 0 1
## RegionNorth America -5.113e+01 5.291e+05 0 1
## RegionSub-Saharan Africa -5.113e+01 3.055e+05 0 1
## CountryBurkina Faso 5.113e+01 5.291e+05 0 1
## CountryCameroon 5.113e+01 3.055e+05 0 1
## CountryComoros 5.113e+01 4.320e+05 0 1
## CountryDemocratic Republic of the Congo 1.023e+02 4.320e+05 0 1
## CountryGabon 5.239e-09 5.291e+05 0 1
## CountryIceland -7.809e-09 3.055e+05 0 1
## CountryLithuania -5.113e+01 5.291e+05 0 1
## CountryMadagascar 5.239e-09 5.291e+05 0 1
## CountryMalaysia -5.113e+01 5.291e+05 0 1
## CountryMali 5.113e+01 5.291e+05 0 1
## CountryMauritania -7.127e-16 3.055e+05 0 1
## CountryMexico NA NA NA NA
## CountryMonaco -5.113e+01 3.055e+05 0 1
## CountryMyanmar 1.165e-08 5.291e+05 0 1
## CountryNorway -1.060e-08 3.055e+05 0 1
## CountryRwanda 5.239e-09 5.291e+05 0 1
## CountrySaudi Arabia NA NA NA NA
## CountrySierra Leone NA NA NA NA
## CountrySlovakia NA NA NA NA
## CountrySolomon Islands 4.548e-09 5.291e+05 0 1
## CountryTurkmenistan NA NA NA NA
## Item.TypeBeverages -5.113e+01 3.055e+05 0 1
## Item.TypeCereal -5.113e+01 5.291e+05 0 1
## Item.TypeClothes NA NA NA NA
## Item.TypeCosmetics NA NA NA NA
## Item.TypeFruits NA NA NA NA
## Item.TypeOffice Supplies 5.239e-09 4.320e+05 0 1
## Item.TypePersonal Care NA NA NA NA
## Item.TypeVegetables NA NA NA NA
## Order.PriorityH NA NA NA NA
## Order.PriorityL NA NA NA NA
## Order.PriorityM NA NA NA NA
## Units.Sold NA NA NA NA
## Unit.Price NA NA NA NA
## Unit.Cost NA NA NA NA
## Total.Cost NA NA NA NA
## Total.Profit NA NA NA NA
## Total.Revenue NA NA NA NA
## Days NA NA NA NA
## Order.DayMon NA NA NA NA
## Order.DaySat NA NA NA NA
## Order.DaySun NA NA NA NA
## Order.DayThu NA NA NA NA
## Order.DayTue NA NA NA NA
## Order.DayWed NA NA NA NA
## Order.MonthDec NA NA NA NA
## Order.MonthFeb NA NA NA NA
## Order.MonthJan NA NA NA NA
## Order.MonthJul NA NA NA NA
## Order.MonthJun NA NA NA NA
## Order.MonthMar NA NA NA NA
## Order.MonthMay NA NA NA NA
## Order.MonthNov NA NA NA NA
## Order.MonthOct NA NA NA NA
## Order.Year NA NA NA NA
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3.4617e+01 on 24 degrees of freedom
## Residual deviance: 3.9425e-10 on 0 degrees of freedom
## AIC: 50
##
## Number of Fisher Scoring iterations: 24
glm.probs_test<- predict(glm.log_test, type="response")
glm.pred_test<-rep("Offline",nrow(dataset100_2_test))
glm.pred_test [glm.probs_test>.5]="Online"
table(glm.pred_test,dataset100_2_test$Sales.Channel)
##
## glm.pred_test Offline Online
## Offline 13 0
## Online 0 12
mean(glm.pred_test ==dataset100_2_test$Sales.Channel)
## [1] 1
The Test data set also had a 100% accuracy rate!
Definitely time to move on to a larger dataset for more feasible modeling results.
set.seed(12345)
sample_set<-sample(nrow(dataset10000_2), round(nrow(dataset10000_2)*.75), replace=FALSE)
dataset10000_2_train<-dataset10000_2[sample_set,]
dataset10000_2_test<-dataset10000_2[-sample_set,]
dim(dataset10000_2_train)
## [1] 7500 15
dim(dataset10000_2_test)
## [1] 2500 15
glm.log2<-glm(Sales.Channel ~ Region + Country + Item.Type + Order.Priority +
Units.Sold + Unit.Price +
Unit.Cost + Total.Cost +
Total.Profit + Total.Revenue +
Days + Order.Day + Order.Month + Order.Year, data=dataset10000_2_train, family=binomial)
summary(glm.log2)
##
## Call:
## glm(formula = Sales.Channel ~ Region + Country + Item.Type +
## Order.Priority + Units.Sold + Unit.Price + Unit.Cost + Total.Cost +
## Total.Profit + Total.Revenue + Days + Order.Day + Order.Month +
## Order.Year, family = binomial, data = dataset10000_2_train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6883 -1.1553 0.8458 1.1528 1.5899
##
## Coefficients: (9 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.489e+01 2.203e+01 -0.676 0.49930
## RegionAustralia and Oceania 5.589e-02 4.600e-01 0.121 0.90330
## RegionCentral America and the Caribbean -6.958e-02 4.518e-01 -0.154 0.87762
## RegionEurope 2.494e-01 4.276e-01 0.583 0.55973
## RegionMiddle East and North Africa 1.504e-01 4.348e-01 0.346 0.72940
## RegionNorth America 6.237e-01 4.358e-01 1.431 0.15232
## RegionSub-Saharan Africa 7.453e-01 4.425e-01 1.684 0.09210
## CountryAlbania -2.393e-01 4.327e-01 -0.553 0.58028
## CountryAlgeria 2.986e-01 4.488e-01 0.665 0.50575
## CountryAndorra 2.824e-02 4.583e-01 0.062 0.95087
## CountryAngola -1.263e+00 4.742e-01 -2.663 0.00776
## CountryAntigua and Barbuda -1.970e-01 4.648e-01 -0.424 0.67172
## CountryArmenia 1.385e-01 4.379e-01 0.316 0.75174
## CountryAustralia 6.731e-01 4.839e-01 1.391 0.16424
## CountryAustria -8.794e-02 4.169e-01 -0.211 0.83294
## CountryAzerbaijan 4.393e-01 4.414e-01 0.995 0.31963
## CountryBahrain 8.235e-02 4.176e-01 0.197 0.84367
## CountryBangladesh 2.378e-01 4.370e-01 0.544 0.58627
## CountryBarbados 9.515e-01 4.764e-01 1.997 0.04581
## CountryBelarus 3.672e-01 4.344e-01 0.845 0.39796
## CountryBelgium 1.649e-02 4.434e-01 0.037 0.97033
## CountryBelize 6.439e-01 5.023e-01 1.282 0.19991
## CountryBenin -8.527e-01 4.281e-01 -1.992 0.04637
## CountryBhutan 3.606e-01 4.355e-01 0.828 0.40764
## CountryBosnia and Herzegovina -5.324e-01 4.267e-01 -1.248 0.21214
## CountryBotswana -1.802e-01 4.339e-01 -0.415 0.67790
## CountryBrunei 1.884e-01 4.587e-01 0.411 0.68130
## CountryBulgaria -2.539e-01 4.559e-01 -0.557 0.57762
## CountryBurkina Faso 1.546e-02 4.731e-01 0.033 0.97393
## CountryBurundi -8.295e-01 4.280e-01 -1.938 0.05261
## CountryCambodia 6.896e-01 4.263e-01 1.618 0.10573
## CountryCameroon -6.862e-01 4.729e-01 -1.451 0.14677
## CountryCanada -8.176e-01 4.170e-01 -1.960 0.04994
## CountryCape Verde 3.051e-02 5.107e-01 0.060 0.95236
## CountryCentral African Republic -2.663e-01 4.558e-01 -0.584 0.55911
## CountryChad -2.478e-01 4.474e-01 -0.554 0.57966
## CountryChina 3.247e-01 4.403e-01 0.737 0.46088
## CountryComoros -1.354e-01 4.720e-01 -0.287 0.77425
## CountryCosta Rica 5.990e-01 4.546e-01 1.318 0.18760
## CountryCote d'Ivoire -5.445e-01 4.720e-01 -1.154 0.24866
## CountryCroatia 3.323e-01 4.078e-01 0.815 0.41517
## CountryCuba 3.209e-01 4.610e-01 0.696 0.48641
## CountryCyprus -8.766e-02 4.473e-01 -0.196 0.84464
## CountryCzech Republic -4.699e-01 4.741e-01 -0.991 0.32164
## CountryDemocratic Republic of the Congo -7.377e-01 4.375e-01 -1.686 0.09171
## CountryDenmark 4.379e-01 4.241e-01 1.032 0.30184
## CountryDjibouti -6.572e-01 4.462e-01 -1.473 0.14078
## CountryDominica 1.730e-01 4.583e-01 0.377 0.70580
## CountryDominican Republic 1.805e-01 4.941e-01 0.365 0.71483
## CountryEast Timor 1.193e+00 5.181e-01 2.303 0.02128
## CountryEgypt 2.672e-01 4.313e-01 0.620 0.53554
## CountryEl Salvador 2.825e-01 4.375e-01 0.646 0.51851
## CountryEquatorial Guinea -8.096e-01 4.645e-01 -1.743 0.08133
## CountryEritrea -1.201e-01 4.700e-01 -0.256 0.79833
## CountryEstonia -4.276e-01 4.389e-01 -0.974 0.33001
## CountryEthiopia -6.283e-01 4.326e-01 -1.453 0.14636
## CountryFederated States of Micronesia 1.131e-01 4.562e-01 0.248 0.80418
## CountryFiji 5.509e-01 4.491e-01 1.226 0.22001
## CountryFinland 5.300e-04 4.291e-01 0.001 0.99901
## CountryFrance -2.764e-02 4.234e-01 -0.065 0.94795
## CountryGabon -8.523e-01 4.718e-01 -1.806 0.07089
## CountryGeorgia -1.184e-01 4.408e-01 -0.269 0.78814
## CountryGermany 2.032e-02 4.290e-01 0.047 0.96221
## CountryGhana -4.965e-01 4.433e-01 -1.120 0.26276
## CountryGreece 2.506e-02 4.373e-01 0.057 0.95429
## CountryGreenland -5.182e-01 4.702e-01 -1.102 0.27043
## CountryGrenada -9.627e-02 4.436e-01 -0.217 0.82817
## CountryGuatemala 2.739e-01 4.458e-01 0.614 0.53895
## CountryGuinea -1.612e-01 4.487e-01 -0.359 0.71936
## CountryGuinea-Bissau -3.664e-01 4.691e-01 -0.781 0.43473
## CountryHaiti 9.655e-02 4.651e-01 0.208 0.83557
## CountryHonduras 4.383e-01 4.510e-01 0.972 0.33110
## CountryHungary -3.531e-01 4.499e-01 -0.785 0.43260
## CountryIceland 3.965e-02 4.260e-01 0.093 0.92583
## CountryIndia 6.465e-01 4.221e-01 1.532 0.12557
## CountryIndonesia -6.348e-02 4.542e-01 -0.140 0.88885
## CountryIran 4.868e-02 4.120e-01 0.118 0.90594
## CountryIraq 2.382e-01 4.244e-01 0.561 0.57458
## CountryIreland 2.908e-01 4.312e-01 0.674 0.50009
## CountryIsrael -2.813e-01 4.246e-01 -0.663 0.50765
## CountryItaly -4.895e-02 4.626e-01 -0.106 0.91573
## CountryJamaica 1.379e-01 4.618e-01 0.299 0.76516
## CountryJapan 2.901e-01 4.457e-01 0.651 0.51518
## CountryJordan 3.304e-01 4.399e-01 0.751 0.45255
## CountryKazakhstan 7.268e-01 4.437e-01 1.638 0.10140
## CountryKenya -3.955e-01 4.340e-01 -0.911 0.36212
## CountryKiribati -1.146e-01 4.518e-01 -0.254 0.79970
## CountryKosovo 6.104e-02 4.105e-01 0.149 0.88180
## CountryKuwait -1.022e-01 4.368e-01 -0.234 0.81501
## CountryKyrgyzstan 3.270e-03 4.309e-01 0.008 0.99395
## CountryLaos 4.132e-01 4.500e-01 0.918 0.35848
## CountryLatvia -6.617e-02 4.439e-01 -0.149 0.88151
## CountryLebanon 3.054e-01 4.487e-01 0.681 0.49608
## CountryLesotho -8.525e-01 4.434e-01 -1.923 0.05453
## CountryLiberia -6.694e-01 4.520e-01 -1.481 0.13859
## CountryLibya 1.642e-01 4.506e-01 0.364 0.71559
## CountryLiechtenstein 1.296e-01 4.109e-01 0.315 0.75255
## CountryLithuania 5.930e-01 4.297e-01 1.380 0.16758
## CountryLuxembourg 5.580e-01 4.307e-01 1.295 0.19519
## CountryMacedonia -1.348e-01 4.475e-01 -0.301 0.76318
## CountryMadagascar -6.766e-01 4.500e-01 -1.504 0.13265
## CountryMalawi -3.297e-01 4.227e-01 -0.780 0.43536
## CountryMalaysia -1.946e-02 4.700e-01 -0.041 0.96697
## CountryMaldives 5.267e-01 4.344e-01 1.213 0.22526
## CountryMali -5.687e-01 5.001e-01 -1.137 0.25542
## CountryMalta 7.839e-01 4.589e-01 1.708 0.08758
## CountryMarshall Islands 1.018e-01 4.687e-01 0.217 0.82803
## CountryMauritania 4.907e-02 4.629e-01 0.106 0.91558
## CountryMauritius -4.689e-01 4.410e-01 -1.063 0.28764
## CountryMexico -3.170e-01 4.453e-01 -0.712 0.47651
## CountryMoldova -3.422e-01 4.004e-01 -0.855 0.39267
## CountryMonaco 9.916e-02 4.550e-01 0.218 0.82747
## CountryMongolia 2.051e-01 4.399e-01 0.466 0.64108
## CountryMontenegro 7.217e-02 4.069e-01 0.177 0.85922
## CountryMorocco 4.810e-01 4.280e-01 1.124 0.26106
## CountryMozambique -1.032e+00 4.754e-01 -2.170 0.03000
## CountryMyanmar 3.605e-01 4.428e-01 0.814 0.41551
## CountryNamibia -4.220e-01 4.722e-01 -0.894 0.37153
## CountryNauru 5.760e-01 4.703e-01 1.225 0.22071
## CountryNepal 7.992e-01 4.764e-01 1.678 0.09343
## CountryNetherlands -5.212e-02 4.166e-01 -0.125 0.90044
## CountryNew Zealand 2.009e-01 4.632e-01 0.434 0.66452
## CountryNicaragua 5.918e-01 4.840e-01 1.223 0.22146
## CountryNiger -5.841e-01 4.405e-01 -1.326 0.18480
## CountryNigeria -8.956e-01 4.405e-01 -2.033 0.04205
## CountryNorth Korea 2.341e-01 4.424e-01 0.529 0.59674
## CountryNorway -3.744e-01 4.411e-01 -0.849 0.39604
## CountryOman 3.843e-01 4.530e-01 0.848 0.39618
## CountryPakistan -1.943e-01 4.709e-01 -0.413 0.67995
## CountryPalau 3.479e-01 4.898e-01 0.710 0.47760
## CountryPanama 9.361e-02 4.539e-01 0.206 0.83662
## CountryPapua New Guinea 5.144e-01 4.812e-01 1.069 0.28509
## CountryPhilippines -2.248e-01 4.456e-01 -0.505 0.61385
## CountryPoland -1.671e-02 4.468e-01 -0.037 0.97016
## CountryPortugal -3.991e-01 4.143e-01 -0.963 0.33540
## CountryQatar 4.984e-02 4.315e-01 0.116 0.90805
## CountryRepublic of the Congo -6.655e-01 4.586e-01 -1.451 0.14674
## CountryRomania 4.990e-01 4.417e-01 1.130 0.25865
## CountryRussia -2.279e-01 4.276e-01 -0.533 0.59415
## CountryRwanda -4.575e-01 4.407e-01 -1.038 0.29921
## CountrySaint Kitts and Nevis 5.154e-01 4.424e-01 1.165 0.24397
## CountrySaint Lucia 4.607e-01 5.038e-01 0.914 0.36049
## CountrySaint Vincent and the Grenadines 4.277e-01 4.938e-01 0.866 0.38640
## CountrySamoa 3.956e-01 4.671e-01 0.847 0.39698
## CountrySan Marino -5.267e-01 4.593e-01 -1.147 0.25146
## CountrySao Tome and Principe -3.625e-01 4.519e-01 -0.802 0.42246
## CountrySaudi Arabia 3.262e-01 4.628e-01 0.705 0.48092
## CountrySenegal -6.519e-01 4.227e-01 -1.542 0.12298
## CountrySerbia -1.925e-01 4.244e-01 -0.453 0.65021
## CountrySeychelles -4.212e-01 4.219e-01 -0.998 0.31808
## CountrySierra Leone -5.858e-01 4.579e-01 -1.279 0.20081
## CountrySingapore -3.813e-02 4.696e-01 -0.081 0.93528
## CountrySlovakia -3.963e-01 4.922e-01 -0.805 0.42063
## CountrySlovenia -2.354e-01 4.278e-01 -0.550 0.58212
## CountrySolomon Islands 5.210e-01 4.612e-01 1.130 0.25863
## CountrySomalia 2.123e-01 4.340e-01 0.489 0.62476
## CountrySouth Africa -6.290e-01 4.553e-01 -1.382 0.16709
## CountrySouth Korea 1.743e-01 4.254e-01 0.410 0.68198
## CountrySouth Sudan -7.927e-01 4.978e-01 -1.592 0.11128
## CountrySpain 6.553e-01 4.520e-01 1.450 0.14716
## CountrySri Lanka 1.160e-01 4.356e-01 0.266 0.79002
## CountrySudan -1.128e+00 4.799e-01 -2.351 0.01871
## CountrySwaziland -3.879e-01 4.384e-01 -0.885 0.37635
## CountrySweden 2.491e-01 4.492e-01 0.555 0.57913
## CountrySwitzerland -1.638e-01 4.517e-01 -0.363 0.71691
## CountrySyria 2.622e-01 4.744e-01 0.553 0.58054
## CountryTaiwan 5.800e-01 4.220e-01 1.375 0.16927
## CountryTajikistan 9.231e-01 4.818e-01 1.916 0.05535
## CountryTanzania -3.599e-01 4.587e-01 -0.785 0.43267
## CountryThailand 2.533e-01 4.425e-01 0.572 0.56707
## CountryThe Bahamas -3.801e-02 4.587e-01 -0.083 0.93395
## CountryThe Gambia -5.532e-02 4.670e-01 -0.118 0.90571
## CountryTogo -8.430e-01 4.556e-01 -1.850 0.06429
## CountryTonga -9.238e-02 4.943e-01 -0.187 0.85173
## CountryTrinidad and Tobago NA NA NA NA
## CountryTunisia -2.721e-01 4.341e-01 -0.627 0.53081
## CountryTurkey -4.081e-01 4.581e-01 -0.891 0.37305
## CountryTurkmenistan -1.848e-01 4.394e-01 -0.421 0.67407
## CountryTuvalu 2.326e-01 4.633e-01 0.502 0.61559
## CountryUganda 4.805e-02 4.481e-01 0.107 0.91460
## CountryUkraine -4.707e-02 4.192e-01 -0.112 0.91059
## CountryUnited Arab Emirates -5.428e-01 4.380e-01 -1.239 0.21523
## CountryUnited Kingdom -5.522e-02 4.017e-01 -0.137 0.89066
## CountryUnited States of America NA NA NA NA
## CountryUzbekistan 5.696e-01 4.432e-01 1.285 0.19871
## CountryVanuatu NA NA NA NA
## CountryVatican City NA NA NA NA
## CountryVietnam NA NA NA NA
## CountryYemen -2.095e-01 4.521e-01 -0.463 0.64315
## CountryZambia -3.338e-02 4.314e-01 -0.077 0.93832
## CountryZimbabwe NA NA NA NA
## Item.TypeBeverages -2.617e-01 1.351e-01 -1.936 0.05283
## Item.TypeCereal 7.359e-02 1.160e-01 0.635 0.52569
## Item.TypeClothes -1.350e-01 1.174e-01 -1.150 0.25021
## Item.TypeCosmetics 5.541e-02 1.333e-01 0.416 0.67762
## Item.TypeFruits -2.580e-01 1.406e-01 -1.835 0.06647
## Item.TypeHousehold -4.127e-02 1.382e-01 -0.299 0.76528
## Item.TypeMeat -2.611e-01 1.595e-01 -1.636 0.10174
## Item.TypeOffice Supplies -1.228e-01 1.542e-01 -0.797 0.42572
## Item.TypePersonal Care -1.492e-01 1.286e-01 -1.160 0.24612
## Item.TypeSnacks -1.603e-01 1.201e-01 -1.334 0.18205
## Item.TypeVegetables -2.212e-01 1.188e-01 -1.862 0.06261
## Order.PriorityH -9.341e-02 6.718e-02 -1.390 0.16438
## Order.PriorityL -5.083e-02 6.718e-02 -0.757 0.44932
## Order.PriorityM -6.427e-02 6.726e-02 -0.956 0.33931
## Units.Sold 2.302e-05 1.495e-05 1.540 0.12349
## Unit.Price NA NA NA NA
## Unit.Cost NA NA NA NA
## Total.Cost 9.646e-08 6.917e-08 1.395 0.16312
## Total.Profit -5.653e-07 2.316e-07 -2.441 0.01463
## Total.Revenue NA NA NA NA
## Days 2.868e-04 1.626e-03 0.176 0.86001
## Order.DayMon 5.713e-02 8.863e-02 0.645 0.51918
## Order.DaySat 8.407e-02 8.936e-02 0.941 0.34684
## Order.DaySun 7.435e-02 8.813e-02 0.844 0.39882
## Order.DayThu 2.778e-02 8.933e-02 0.311 0.75581
## Order.DayTue 1.913e-02 8.853e-02 0.216 0.82893
## Order.DayWed 1.111e-01 8.876e-02 1.252 0.21050
## Order.MonthAug 8.742e-02 1.169e-01 0.748 0.45447
## Order.MonthDec 2.089e-02 1.181e-01 0.177 0.85962
## Order.MonthFeb 1.929e-01 1.164e-01 1.657 0.09756
## Order.MonthJan 1.568e-02 1.134e-01 0.138 0.89008
## Order.MonthJul 2.007e-01 1.136e-01 1.767 0.07717
## Order.MonthJun 1.342e-01 1.141e-01 1.176 0.23954
## Order.MonthMar 9.468e-02 1.124e-01 0.842 0.39974
## Order.MonthMay -5.180e-02 1.139e-01 -0.455 0.64936
## Order.MonthNov 1.568e-01 1.210e-01 1.295 0.19520
## Order.MonthOct 7.551e-02 1.182e-01 0.639 0.52294
## Order.MonthSep 1.055e-01 1.200e-01 0.879 0.37928
## Order.Year 7.297e-03 1.094e-02 0.667 0.50476
##
## (Intercept)
## RegionAustralia and Oceania
## RegionCentral America and the Caribbean
## RegionEurope
## RegionMiddle East and North Africa
## RegionNorth America
## RegionSub-Saharan Africa .
## CountryAlbania
## CountryAlgeria
## CountryAndorra
## CountryAngola **
## CountryAntigua and Barbuda
## CountryArmenia
## CountryAustralia
## CountryAustria
## CountryAzerbaijan
## CountryBahrain
## CountryBangladesh
## CountryBarbados *
## CountryBelarus
## CountryBelgium
## CountryBelize
## CountryBenin *
## CountryBhutan
## CountryBosnia and Herzegovina
## CountryBotswana
## CountryBrunei
## CountryBulgaria
## CountryBurkina Faso
## CountryBurundi .
## CountryCambodia
## CountryCameroon
## CountryCanada *
## CountryCape Verde
## CountryCentral African Republic
## CountryChad
## CountryChina
## CountryComoros
## CountryCosta Rica
## CountryCote d'Ivoire
## CountryCroatia
## CountryCuba
## CountryCyprus
## CountryCzech Republic
## CountryDemocratic Republic of the Congo .
## CountryDenmark
## CountryDjibouti
## CountryDominica
## CountryDominican Republic
## CountryEast Timor *
## CountryEgypt
## CountryEl Salvador
## CountryEquatorial Guinea .
## CountryEritrea
## CountryEstonia
## CountryEthiopia
## CountryFederated States of Micronesia
## CountryFiji
## CountryFinland
## CountryFrance
## CountryGabon .
## CountryGeorgia
## CountryGermany
## CountryGhana
## CountryGreece
## CountryGreenland
## CountryGrenada
## CountryGuatemala
## CountryGuinea
## CountryGuinea-Bissau
## CountryHaiti
## CountryHonduras
## CountryHungary
## CountryIceland
## CountryIndia
## CountryIndonesia
## CountryIran
## CountryIraq
## CountryIreland
## CountryIsrael
## CountryItaly
## CountryJamaica
## CountryJapan
## CountryJordan
## CountryKazakhstan
## CountryKenya
## CountryKiribati
## CountryKosovo
## CountryKuwait
## CountryKyrgyzstan
## CountryLaos
## CountryLatvia
## CountryLebanon
## CountryLesotho .
## CountryLiberia
## CountryLibya
## CountryLiechtenstein
## CountryLithuania
## CountryLuxembourg
## CountryMacedonia
## CountryMadagascar
## CountryMalawi
## CountryMalaysia
## CountryMaldives
## CountryMali
## CountryMalta .
## CountryMarshall Islands
## CountryMauritania
## CountryMauritius
## CountryMexico
## CountryMoldova
## CountryMonaco
## CountryMongolia
## CountryMontenegro
## CountryMorocco
## CountryMozambique *
## CountryMyanmar
## CountryNamibia
## CountryNauru
## CountryNepal .
## CountryNetherlands
## CountryNew Zealand
## CountryNicaragua
## CountryNiger
## CountryNigeria *
## CountryNorth Korea
## CountryNorway
## CountryOman
## CountryPakistan
## CountryPalau
## CountryPanama
## CountryPapua New Guinea
## CountryPhilippines
## CountryPoland
## CountryPortugal
## CountryQatar
## CountryRepublic of the Congo
## CountryRomania
## CountryRussia
## CountryRwanda
## CountrySaint Kitts and Nevis
## CountrySaint Lucia
## CountrySaint Vincent and the Grenadines
## CountrySamoa
## CountrySan Marino
## CountrySao Tome and Principe
## CountrySaudi Arabia
## CountrySenegal
## CountrySerbia
## CountrySeychelles
## CountrySierra Leone
## CountrySingapore
## CountrySlovakia
## CountrySlovenia
## CountrySolomon Islands
## CountrySomalia
## CountrySouth Africa
## CountrySouth Korea
## CountrySouth Sudan
## CountrySpain
## CountrySri Lanka
## CountrySudan *
## CountrySwaziland
## CountrySweden
## CountrySwitzerland
## CountrySyria
## CountryTaiwan
## CountryTajikistan .
## CountryTanzania
## CountryThailand
## CountryThe Bahamas
## CountryThe Gambia
## CountryTogo .
## CountryTonga
## CountryTrinidad and Tobago
## CountryTunisia
## CountryTurkey
## CountryTurkmenistan
## CountryTuvalu
## CountryUganda
## CountryUkraine
## CountryUnited Arab Emirates
## CountryUnited Kingdom
## CountryUnited States of America
## CountryUzbekistan
## CountryVanuatu
## CountryVatican City
## CountryVietnam
## CountryYemen
## CountryZambia
## CountryZimbabwe
## Item.TypeBeverages .
## Item.TypeCereal
## Item.TypeClothes
## Item.TypeCosmetics
## Item.TypeFruits .
## Item.TypeHousehold
## Item.TypeMeat
## Item.TypeOffice Supplies
## Item.TypePersonal Care
## Item.TypeSnacks
## Item.TypeVegetables .
## Order.PriorityH
## Order.PriorityL
## Order.PriorityM
## Units.Sold
## Unit.Price
## Unit.Cost
## Total.Cost
## Total.Profit *
## Total.Revenue
## Days
## Order.DayMon
## Order.DaySat
## Order.DaySun
## Order.DayThu
## Order.DayTue
## Order.DayWed
## Order.MonthAug
## Order.MonthDec
## Order.MonthFeb .
## Order.MonthJan
## Order.MonthJul .
## Order.MonthJun
## Order.MonthMar
## Order.MonthMay
## Order.MonthNov
## Order.MonthOct
## Order.MonthSep
## Order.Year
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10396 on 7499 degrees of freedom
## Residual deviance: 10198 on 7279 degrees of freedom
## AIC: 10640
##
## Number of Fisher Scoring iterations: 4
glm.probs2<- predict(glm.log2, type="response")
contrasts(dataset10000_2$Sales.Channel)
## Online
## Offline 0
## Online 1
glm.pred2<-rep("Offline",nrow(dataset10000_2_train))
glm.pred2[glm.probs2>.5]="Online"
table(glm.pred2,dataset10000_2_train$Sales.Channel)
##
## glm.pred2 Offline Online
## Offline 2062 1597
## Online 1650 2191
mean(glm.pred2 ==dataset10000_2_train$Sales.Channel)
## [1] 0.5670667
This means the model correctly predicted 57% of the time on the training data.
Let’s check the test data.
glm.log_test2<-glm(Sales.Channel ~ Region + Country + Item.Type + Order.Priority +
Units.Sold + Unit.Price +
Unit.Cost + Total.Cost +
Total.Profit + Total.Revenue +
Days + Order.Day + Order.Month + Order.Year, data=dataset10000_2_test, family=binomial)
summary(glm.log_test2)
##
## Call:
## glm(formula = Sales.Channel ~ Region + Country + Item.Type +
## Order.Priority + Units.Sold + Unit.Price + Unit.Cost + Total.Cost +
## Total.Profit + Total.Revenue + Days + Order.Day + Order.Month +
## Order.Year, family = binomial, data = dataset10000_2_test)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0024 -1.1087 0.5845 1.0877 2.0149
##
## Coefficients: (9 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.826e+01 4.074e+01 0.939 0.3477
## RegionAustralia and Oceania 4.006e-01 8.236e-01 0.486 0.6266
## RegionCentral America and the Caribbean -4.623e-01 8.472e-01 -0.546 0.5853
## RegionEurope -8.319e-01 8.550e-01 -0.973 0.3306
## RegionMiddle East and North Africa 1.848e-01 7.610e-01 0.243 0.8082
## RegionNorth America -3.200e-01 7.727e-01 -0.414 0.6788
## RegionSub-Saharan Africa 1.607e-01 7.625e-01 0.211 0.8330
## CountryAlbania 7.401e-01 8.510e-01 0.870 0.3845
## CountryAlgeria -1.041e+00 7.792e-01 -1.336 0.1817
## CountryAndorra 8.006e-01 8.047e-01 0.995 0.3198
## CountryAngola -4.364e-01 7.225e-01 -0.604 0.5459
## CountryAntigua and Barbuda -7.688e-02 9.369e-01 -0.082 0.9346
## CountryArmenia -5.427e-01 1.070e+00 -0.507 0.6119
## CountryAustralia -8.536e-01 8.048e-01 -1.061 0.2888
## CountryAustria 2.691e-01 8.979e-01 0.300 0.7644
## CountryAzerbaijan -5.454e-01 7.422e-01 -0.735 0.4624
## CountryBahrain -5.096e-01 7.566e-01 -0.674 0.5006
## CountryBangladesh 2.364e-01 7.228e-01 0.327 0.7436
## CountryBarbados -3.943e-01 9.079e-01 -0.434 0.6640
## CountryBelarus 1.858e-01 9.000e-01 0.206 0.8365
## CountryBelgium -1.581e-01 1.096e+00 -0.144 0.8853
## CountryBelize -2.126e-01 8.441e-01 -0.252 0.8012
## CountryBenin -8.352e-01 7.957e-01 -1.050 0.2939
## CountryBhutan -8.967e-01 8.021e-01 -1.118 0.2636
## CountryBosnia and Herzegovina 2.372e-02 1.016e+00 0.023 0.9814
## CountryBotswana -4.807e-01 6.808e-01 -0.706 0.4802
## CountryBrunei 8.619e-01 8.731e-01 0.987 0.3235
## CountryBulgaria 8.705e-01 8.858e-01 0.983 0.3257
## CountryBurkina Faso -3.201e-01 7.608e-01 -0.421 0.6740
## CountryBurundi -1.094e+00 7.442e-01 -1.470 0.1415
## CountryCambodia 2.244e-01 7.524e-01 0.298 0.7655
## CountryCameroon -1.821e+00 8.494e-01 -2.144 0.0320
## CountryCanada -5.098e-02 7.711e-01 -0.066 0.9473
## CountryCape Verde -8.689e-01 7.337e-01 -1.184 0.2363
## CountryCentral African Republic -9.248e-01 7.982e-01 -1.159 0.2466
## CountryChad -1.063e-01 9.072e-01 -0.117 0.9067
## CountryChina -5.000e-01 7.919e-01 -0.631 0.5278
## CountryComoros -6.388e-01 7.456e-01 -0.857 0.3916
## CountryCosta Rica 1.788e-01 8.866e-01 0.202 0.8402
## CountryCote d'Ivoire 3.194e-02 8.288e-01 0.039 0.9693
## CountryCroatia 2.867e-01 8.163e-01 0.351 0.7254
## CountryCuba 6.699e-01 9.017e-01 0.743 0.4575
## CountryCyprus 9.295e-01 8.721e-01 1.066 0.2865
## CountryCzech Republic 2.535e-01 8.320e-01 0.305 0.7606
## CountryDemocratic Republic of the Congo -9.230e-01 7.519e-01 -1.227 0.2197
## CountryDenmark 3.231e-02 8.731e-01 0.037 0.9705
## CountryDjibouti 9.193e-01 8.340e-01 1.102 0.2704
## CountryDominica -1.017e+00 9.299e-01 -1.094 0.2740
## CountryDominican Republic -5.841e-02 8.094e-01 -0.072 0.9425
## CountryEast Timor -5.749e-01 8.303e-01 -0.692 0.4887
## CountryEgypt -8.048e-01 8.061e-01 -0.998 0.3181
## CountryEl Salvador 8.381e-01 8.498e-01 0.986 0.3240
## CountryEquatorial Guinea 2.092e-01 8.938e-01 0.234 0.8149
## CountryEritrea 1.416e+00 1.196e+00 1.184 0.2365
## CountryEstonia 1.585e-01 8.232e-01 0.193 0.8473
## CountryEthiopia 2.614e-01 7.656e-01 0.341 0.7327
## CountryFederated States of Micronesia -3.483e-01 8.958e-01 -0.389 0.6974
## CountryFiji 5.331e-01 1.015e+00 0.525 0.5993
## CountryFinland 1.872e+00 1.033e+00 1.812 0.0699
## CountryFrance 4.219e-01 8.509e-01 0.496 0.6200
## CountryGabon -3.162e-01 7.737e-01 -0.409 0.6828
## CountryGeorgia 8.536e-01 8.249e-01 1.035 0.3008
## CountryGermany 7.248e-01 8.842e-01 0.820 0.4124
## CountryGhana 3.008e-01 7.976e-01 0.377 0.7061
## CountryGreece -5.336e-02 8.664e-01 -0.062 0.9509
## CountryGreenland 2.269e-01 8.680e-01 0.261 0.7937
## CountryGrenada 3.201e-01 8.544e-01 0.375 0.7079
## CountryGuatemala 9.443e-01 8.874e-01 1.064 0.2873
## CountryGuinea -2.913e-01 6.899e-01 -0.422 0.6729
## CountryGuinea-Bissau -9.337e-01 7.026e-01 -1.329 0.1839
## CountryHaiti -3.156e-01 8.743e-01 -0.361 0.7181
## CountryHonduras -4.138e-01 8.550e-01 -0.484 0.6284
## CountryHungary 9.500e-01 8.353e-01 1.137 0.2554
## CountryIceland 5.269e-01 8.386e-01 0.628 0.5298
## CountryIndia -2.456e-02 8.115e-01 -0.030 0.9759
## CountryIndonesia -9.844e-01 7.624e-01 -1.291 0.1966
## CountryIran -7.833e-02 8.404e-01 -0.093 0.9257
## CountryIraq 1.047e-01 8.919e-01 0.117 0.9065
## CountryIreland 1.826e+00 9.305e-01 1.962 0.0497
## CountryIsrael 5.798e-01 9.661e-01 0.600 0.5484
## CountryItaly 2.063e-01 8.839e-01 0.233 0.8154
## CountryJamaica -7.420e-01 9.448e-01 -0.785 0.4322
## CountryJapan -3.668e-01 7.504e-01 -0.489 0.6250
## CountryJordan 3.637e-01 7.958e-01 0.457 0.6476
## CountryKazakhstan -1.532e+00 9.920e-01 -1.545 0.1224
## CountryKenya -1.198e+00 7.743e-01 -1.547 0.1218
## CountryKiribati -1.059e+00 7.563e-01 -1.401 0.1613
## CountryKosovo -1.104e-01 8.643e-01 -0.128 0.8983
## CountryKuwait -8.392e-01 7.204e-01 -1.165 0.2440
## CountryKyrgyzstan -5.593e-01 8.224e-01 -0.680 0.4965
## CountryLaos -1.054e-01 7.888e-01 -0.134 0.8937
## CountryLatvia -3.771e-01 9.739e-01 -0.387 0.6986
## CountryLebanon -5.097e-01 7.852e-01 -0.649 0.5162
## CountryLesotho -5.407e-01 7.520e-01 -0.719 0.4721
## CountryLiberia -1.248e+00 8.889e-01 -1.404 0.1602
## CountryLibya -4.971e-01 7.417e-01 -0.670 0.5027
## CountryLiechtenstein 5.299e-01 8.421e-01 0.629 0.5291
## CountryLithuania 2.728e-01 7.662e-01 0.356 0.7218
## CountryLuxembourg -8.894e-01 1.038e+00 -0.857 0.3914
## CountryMacedonia -5.711e-01 8.882e-01 -0.643 0.5202
## CountryMadagascar -1.057e+00 7.849e-01 -1.347 0.1781
## CountryMalawi -1.074e+00 7.458e-01 -1.440 0.1498
## CountryMalaysia -6.450e-01 7.252e-01 -0.889 0.3738
## CountryMaldives -1.618e-01 9.516e-01 -0.170 0.8650
## CountryMali -9.796e-02 9.102e-01 -0.108 0.9143
## CountryMalta -1.311e+00 1.257e+00 -1.043 0.2969
## CountryMarshall Islands -2.440e+00 1.266e+00 -1.928 0.0539
## CountryMauritania -9.526e-01 7.501e-01 -1.270 0.2041
## CountryMauritius -7.874e-02 7.723e-01 -0.102 0.9188
## CountryMexico -2.409e-02 8.430e-01 -0.029 0.9772
## CountryMoldova 9.004e-02 8.542e-01 0.105 0.9161
## CountryMonaco 1.445e+00 8.868e-01 1.629 0.1033
## CountryMongolia -6.154e-01 7.613e-01 -0.808 0.4189
## CountryMontenegro 5.110e-01 8.198e-01 0.623 0.5331
## CountryMorocco -6.976e-01 6.910e-01 -1.010 0.3127
## CountryMozambique -8.577e-01 7.152e-01 -1.199 0.2305
## CountryMyanmar -2.439e-01 7.612e-01 -0.320 0.7487
## CountryNamibia -2.932e-01 7.337e-01 -0.400 0.6895
## CountryNauru -1.447e+00 8.680e-01 -1.667 0.0955
## CountryNepal -8.305e-01 8.126e-01 -1.022 0.3067
## CountryNetherlands 4.737e-01 9.177e-01 0.516 0.6057
## CountryNew Zealand -4.041e-01 7.898e-01 -0.512 0.6089
## CountryNicaragua 3.341e-01 9.193e-01 0.363 0.7163
## CountryNiger -1.971e+00 8.438e-01 -2.336 0.0195
## CountryNigeria 3.309e-01 8.787e-01 0.377 0.7065
## CountryNorth Korea -4.894e-01 7.497e-01 -0.653 0.5138
## CountryNorway 1.262e+00 1.069e+00 1.181 0.2375
## CountryOman -3.798e-01 7.371e-01 -0.515 0.6064
## CountryPakistan 2.425e-01 8.021e-01 0.302 0.7624
## CountryPalau -6.201e-01 7.684e-01 -0.807 0.4197
## CountryPanama 1.137e+00 1.047e+00 1.086 0.2775
## CountryPapua New Guinea -5.583e-01 8.199e-01 -0.681 0.4959
## CountryPhilippines -3.232e-02 7.989e-01 -0.040 0.9677
## CountryPoland -5.750e-01 8.818e-01 -0.652 0.5144
## CountryPortugal 1.828e+00 1.032e+00 1.771 0.0765
## CountryQatar -6.761e-01 7.551e-01 -0.895 0.3705
## CountryRepublic of the Congo 7.342e-02 7.371e-01 0.100 0.9207
## CountryRomania -1.908e-02 8.724e-01 -0.022 0.9826
## CountryRussia -1.990e-01 8.636e-01 -0.230 0.8178
## CountryRwanda -4.771e-01 6.791e-01 -0.703 0.4823
## CountrySaint Kitts and Nevis -6.664e-01 8.429e-01 -0.791 0.4292
## CountrySaint Lucia 6.715e-01 9.027e-01 0.744 0.4570
## CountrySaint Vincent and the Grenadines 2.765e-01 8.427e-01 0.328 0.7428
## CountrySamoa -3.970e-01 7.975e-01 -0.498 0.6186
## CountrySan Marino -2.947e-01 8.880e-01 -0.332 0.7399
## CountrySao Tome and Principe 3.787e-01 7.980e-01 0.475 0.6351
## CountrySaudi Arabia 2.173e-01 7.627e-01 0.285 0.7757
## CountrySenegal -3.456e-01 7.941e-01 -0.435 0.6634
## CountrySerbia 4.790e-01 8.275e-01 0.579 0.5627
## CountrySeychelles -1.418e-01 7.215e-01 -0.197 0.8442
## CountrySierra Leone 2.627e-01 8.756e-01 0.300 0.7641
## CountrySingapore 6.322e-01 8.835e-01 0.716 0.4743
## CountrySlovakia -2.057e-01 8.600e-01 -0.239 0.8109
## CountrySlovenia 8.744e-01 8.014e-01 1.091 0.2752
## CountrySolomon Islands -1.356e+00 8.937e-01 -1.517 0.1292
## CountrySomalia -3.547e-01 7.672e-01 -0.462 0.6438
## CountrySouth Africa -4.640e-02 8.285e-01 -0.056 0.9553
## CountrySouth Korea 8.686e-01 8.639e-01 1.005 0.3147
## CountrySouth Sudan -1.999e-01 7.086e-01 -0.282 0.7778
## CountrySpain 9.002e-01 8.854e-01 1.017 0.3093
## CountrySri Lanka -1.266e+00 8.232e-01 -1.538 0.1241
## CountrySudan -3.619e-02 7.145e-01 -0.051 0.9596
## CountrySwaziland -1.386e+00 7.220e-01 -1.919 0.0550
## CountrySweden -6.336e-01 9.405e-01 -0.674 0.5006
## CountrySwitzerland 1.459e+00 8.135e-01 1.794 0.0729
## CountrySyria -1.042e+00 7.796e-01 -1.336 0.1814
## CountryTaiwan -1.059e+00 7.925e-01 -1.336 0.1816
## CountryTajikistan 6.685e-01 1.012e+00 0.661 0.5087
## CountryTanzania -5.805e-01 7.861e-01 -0.739 0.4602
## CountryThailand 1.446e+00 9.461e-01 1.529 0.1263
## CountryThe Bahamas -7.063e-01 9.587e-01 -0.737 0.4613
## CountryThe Gambia -1.516e+01 3.103e+02 -0.049 0.9610
## CountryTogo -5.742e-01 7.137e-01 -0.805 0.4210
## CountryTonga -1.173e+00 8.418e-01 -1.393 0.1635
## CountryTrinidad and Tobago NA NA NA NA
## CountryTunisia -8.475e-01 7.651e-01 -1.108 0.2680
## CountryTurkey -1.340e+00 8.847e-01 -1.515 0.1298
## CountryTurkmenistan -5.935e-01 7.591e-01 -0.782 0.4343
## CountryTuvalu -6.961e-01 7.677e-01 -0.907 0.3645
## CountryUganda -1.296e+00 8.127e-01 -1.594 0.1109
## CountryUkraine 7.304e-01 8.952e-01 0.816 0.4146
## CountryUnited Arab Emirates -1.457e+00 7.210e-01 -2.021 0.0433
## CountryUnited Kingdom 1.253e+00 8.470e-01 1.480 0.1390
## CountryUnited States of America NA NA NA NA
## CountryUzbekistan -1.062e+00 8.305e-01 -1.279 0.2010
## CountryVanuatu NA NA NA NA
## CountryVatican City NA NA NA NA
## CountryVietnam NA NA NA NA
## CountryYemen 8.418e-01 9.521e-01 0.884 0.3766
## CountryZambia -6.463e-01 8.358e-01 -0.773 0.4393
## CountryZimbabwe NA NA NA NA
## Item.TypeBeverages 2.477e-01 2.537e-01 0.977 0.3288
## Item.TypeCereal 4.283e-02 2.207e-01 0.194 0.8461
## Item.TypeClothes -2.379e-01 2.225e-01 -1.069 0.2849
## Item.TypeCosmetics -4.978e-04 2.502e-01 -0.002 0.9984
## Item.TypeFruits 4.334e-01 2.670e-01 1.623 0.1046
## Item.TypeHousehold -4.746e-02 2.648e-01 -0.179 0.8577
## Item.TypeMeat 8.467e-02 2.903e-01 0.292 0.7706
## Item.TypeOffice Supplies 2.977e-01 2.838e-01 1.049 0.2942
## Item.TypePersonal Care -5.350e-02 2.406e-01 -0.222 0.8241
## Item.TypeSnacks 9.295e-02 2.320e-01 0.401 0.6887
## Item.TypeVegetables -2.573e-01 2.199e-01 -1.170 0.2419
## Order.PriorityH -1.365e-01 1.221e-01 -1.118 0.2635
## Order.PriorityL -1.238e-02 1.235e-01 -0.100 0.9202
## Order.PriorityM -5.833e-02 1.247e-01 -0.468 0.6400
## Units.Sold -9.806e-06 2.776e-05 -0.353 0.7239
## Unit.Price NA NA NA NA
## Unit.Cost NA NA NA NA
## Total.Cost -1.821e-07 1.264e-07 -1.440 0.1498
## Total.Profit 4.648e-07 4.267e-07 1.089 0.2760
## Total.Revenue NA NA NA NA
## Days -6.387e-04 3.002e-03 -0.213 0.8315
## Order.DayMon -8.152e-02 1.640e-01 -0.497 0.6191
## Order.DaySat -6.101e-02 1.685e-01 -0.362 0.7173
## Order.DaySun 1.534e-01 1.637e-01 0.937 0.3488
## Order.DayThu 1.396e-01 1.642e-01 0.850 0.3953
## Order.DayTue 8.353e-02 1.683e-01 0.496 0.6198
## Order.DayWed 5.096e-02 1.612e-01 0.316 0.7519
## Order.MonthAug 1.585e-01 2.133e-01 0.743 0.4574
## Order.MonthDec 5.356e-02 2.058e-01 0.260 0.7947
## Order.MonthFeb 3.081e-02 2.181e-01 0.141 0.8876
## Order.MonthJan -1.883e-01 2.083e-01 -0.904 0.3662
## Order.MonthJul 5.245e-03 2.037e-01 0.026 0.9795
## Order.MonthJun 6.941e-02 2.109e-01 0.329 0.7421
## Order.MonthMar -2.218e-01 2.104e-01 -1.054 0.2920
## Order.MonthMay 2.601e-02 2.082e-01 0.125 0.9006
## Order.MonthNov -6.481e-02 2.217e-01 -0.292 0.7700
## Order.MonthOct -3.519e-01 2.165e-01 -1.625 0.1041
## Order.MonthSep 1.591e-03 2.181e-01 0.007 0.9942
## Order.Year -1.878e-02 2.022e-02 -0.929 0.3531
##
## (Intercept)
## RegionAustralia and Oceania
## RegionCentral America and the Caribbean
## RegionEurope
## RegionMiddle East and North Africa
## RegionNorth America
## RegionSub-Saharan Africa
## CountryAlbania
## CountryAlgeria
## CountryAndorra
## CountryAngola
## CountryAntigua and Barbuda
## CountryArmenia
## CountryAustralia
## CountryAustria
## CountryAzerbaijan
## CountryBahrain
## CountryBangladesh
## CountryBarbados
## CountryBelarus
## CountryBelgium
## CountryBelize
## CountryBenin
## CountryBhutan
## CountryBosnia and Herzegovina
## CountryBotswana
## CountryBrunei
## CountryBulgaria
## CountryBurkina Faso
## CountryBurundi
## CountryCambodia
## CountryCameroon *
## CountryCanada
## CountryCape Verde
## CountryCentral African Republic
## CountryChad
## CountryChina
## CountryComoros
## CountryCosta Rica
## CountryCote d'Ivoire
## CountryCroatia
## CountryCuba
## CountryCyprus
## CountryCzech Republic
## CountryDemocratic Republic of the Congo
## CountryDenmark
## CountryDjibouti
## CountryDominica
## CountryDominican Republic
## CountryEast Timor
## CountryEgypt
## CountryEl Salvador
## CountryEquatorial Guinea
## CountryEritrea
## CountryEstonia
## CountryEthiopia
## CountryFederated States of Micronesia
## CountryFiji
## CountryFinland .
## CountryFrance
## CountryGabon
## CountryGeorgia
## CountryGermany
## CountryGhana
## CountryGreece
## CountryGreenland
## CountryGrenada
## CountryGuatemala
## CountryGuinea
## CountryGuinea-Bissau
## CountryHaiti
## CountryHonduras
## CountryHungary
## CountryIceland
## CountryIndia
## CountryIndonesia
## CountryIran
## CountryIraq
## CountryIreland *
## CountryIsrael
## CountryItaly
## CountryJamaica
## CountryJapan
## CountryJordan
## CountryKazakhstan
## CountryKenya
## CountryKiribati
## CountryKosovo
## CountryKuwait
## CountryKyrgyzstan
## CountryLaos
## CountryLatvia
## CountryLebanon
## CountryLesotho
## CountryLiberia
## CountryLibya
## CountryLiechtenstein
## CountryLithuania
## CountryLuxembourg
## CountryMacedonia
## CountryMadagascar
## CountryMalawi
## CountryMalaysia
## CountryMaldives
## CountryMali
## CountryMalta
## CountryMarshall Islands .
## CountryMauritania
## CountryMauritius
## CountryMexico
## CountryMoldova
## CountryMonaco
## CountryMongolia
## CountryMontenegro
## CountryMorocco
## CountryMozambique
## CountryMyanmar
## CountryNamibia
## CountryNauru .
## CountryNepal
## CountryNetherlands
## CountryNew Zealand
## CountryNicaragua
## CountryNiger *
## CountryNigeria
## CountryNorth Korea
## CountryNorway
## CountryOman
## CountryPakistan
## CountryPalau
## CountryPanama
## CountryPapua New Guinea
## CountryPhilippines
## CountryPoland
## CountryPortugal .
## CountryQatar
## CountryRepublic of the Congo
## CountryRomania
## CountryRussia
## CountryRwanda
## CountrySaint Kitts and Nevis
## CountrySaint Lucia
## CountrySaint Vincent and the Grenadines
## CountrySamoa
## CountrySan Marino
## CountrySao Tome and Principe
## CountrySaudi Arabia
## CountrySenegal
## CountrySerbia
## CountrySeychelles
## CountrySierra Leone
## CountrySingapore
## CountrySlovakia
## CountrySlovenia
## CountrySolomon Islands
## CountrySomalia
## CountrySouth Africa
## CountrySouth Korea
## CountrySouth Sudan
## CountrySpain
## CountrySri Lanka
## CountrySudan
## CountrySwaziland .
## CountrySweden
## CountrySwitzerland .
## CountrySyria
## CountryTaiwan
## CountryTajikistan
## CountryTanzania
## CountryThailand
## CountryThe Bahamas
## CountryThe Gambia
## CountryTogo
## CountryTonga
## CountryTrinidad and Tobago
## CountryTunisia
## CountryTurkey
## CountryTurkmenistan
## CountryTuvalu
## CountryUganda
## CountryUkraine
## CountryUnited Arab Emirates *
## CountryUnited Kingdom
## CountryUnited States of America
## CountryUzbekistan
## CountryVanuatu
## CountryVatican City
## CountryVietnam
## CountryYemen
## CountryZambia
## CountryZimbabwe
## Item.TypeBeverages
## Item.TypeCereal
## Item.TypeClothes
## Item.TypeCosmetics
## Item.TypeFruits
## Item.TypeHousehold
## Item.TypeMeat
## Item.TypeOffice Supplies
## Item.TypePersonal Care
## Item.TypeSnacks
## Item.TypeVegetables
## Order.PriorityH
## Order.PriorityL
## Order.PriorityM
## Units.Sold
## Unit.Price
## Unit.Cost
## Total.Cost
## Total.Profit
## Total.Revenue
## Days
## Order.DayMon
## Order.DaySat
## Order.DaySun
## Order.DayThu
## Order.DayTue
## Order.DayWed
## Order.MonthAug
## Order.MonthDec
## Order.MonthFeb
## Order.MonthJan
## Order.MonthJul
## Order.MonthJun
## Order.MonthMar
## Order.MonthMay
## Order.MonthNov
## Order.MonthOct
## Order.MonthSep
## Order.Year
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3464.9 on 2499 degrees of freedom
## Residual deviance: 3217.0 on 2279 degrees of freedom
## AIC: 3659
##
## Number of Fisher Scoring iterations: 13
glm.probs_test2<- predict(glm.log_test2, type="response")
glm.pred_test2<-rep("Offline",nrow(dataset10000_2_test))
glm.pred_test2 [glm.probs_test2>.5]="Online"
table(glm.pred_test2,dataset10000_2_test$Sales.Channel)
##
## glm.pred_test2 Offline Online
## Offline 739 451
## Online 488 822
mean(glm.pred_test2 ==dataset10000_2_test$Sales.Channel)
## [1] 0.6244
Conclusion - The logistic model performs well on the larger dataset. Because the dataset 100 records is too sparse and has complete separation. The correct prediction rate is approximately 60%.