Applying machine learning to sales prediction

RPubs note: This version may not be the latest, see if Github copy is more recent. The code is there too.

Introduction

To demonstrate how previous sales data can be used to predict future sales, we’ll apply neural network library in R language to a dataset from UCI machine learning depository - Dresses_Attribute_Sales Data Set

The idea is to potentially apply the same method to improve other areas of sales. Say, we can come up with a set of attributes that describe qualities of a product page in an eCommerce store. Then, if these qualities do affect sales, we should be able to predict possible results of the product page changes.

I don’t really expect this to work, but this is an interesting exercise anyway.

Some observations on physical meaning of neural networks

I have to digress to explain why I’m doing this. I’m studying for Data Science specialization on Coursera, because I realized that for my day job it would be really useful to have better understanding of these technologies. After couple of courses, however entertaining they were, I realized, that it’s not quite what I expected. So I went and looked - where else this knowledge can be used?

I found almost right away that machine learning is the closest application there is. And this area is booming - see this video, for example. It looks like right now we are at a point where Moore law (lost by Intel, but maintained by Nvidia) allows to bridge performance gap from some decades-old machine learning theory to very practical applications. Every business will become IT business at its core, they say. And every IT business will become machine learning business at its core. Android’s amazing speech recognition is machine learning, for example.

So it is all very exciting. Question is - can a layperson like me use existing machine learning libraries without having good understanding of how math works inside? Is it available for some higher level usage similar to how WWW in early nineties allowed to use complex network and PC technologies by working at much higher HTML level and create some amazing results? This is an experiment to do that. Sure, R code can look scary. But you can consider most of the things I do in this file to be just some command-line replacement of Excel. Neural network magic is a black box, I use its very basic calls and at very primitive level, without good understanding.

I mean, I have some understanding, but it’s quite superficial. Like this. At a simplest level, artificial neurons are sets of weights for each input. If input is a picture (list of pixel values), then this set of weights would be same resolution as original input, and can even be rendered - it will show then a pattern to which this neuron is sensitive to. It’s hard for me to imagine what would happen with multi-layered network. They say that for animal image recognition networks the first layer collects graphic primitives like edges, next layer - higher level graphic objects, then at some point eyes and noses, then cats and dogs, then breeds and so on. But I don’t understand yet if it’s something that happens automatically. I only can imagine how the first level is rendered as a pattern a neuron recognizes, which happens in this amazing example where 74 lines of python code, while not containing any calls to neural net libraries, do character recognition better than humans do.

Now, getting back to this dress sales dataset. While working with it, I got a realization - there’s a simple intuitive explanation for neural network logic when it decides if a dress will sell or not. Each input (see section on neuralnet() usage) is a style attribute of a dress. Does it have long sleeves? Is it black? Is it sexy? And 170 or so similar questions are answered in the inputs for each dress. Then each neuron finds a weight for each input. This means that each neuron is actually a customer preference pattern. One neuron will have higher preference for red sexy dresses with long sleeves, smaller preference for brief silk dresses, and tolerance for cashmere. Another - different set of weights meaning imitation of a customer with a different taste - different set of preferences. So neurons are like people with different tastes.

When neural net is taught about dresses and if they will sell or not, it will iterate group of most typical and most sales-influential customer tastes.

I can imagine that if there will be two layers, then deeper layer will aggregate these tastes somehow, and so at the second levels neurons will represent some taste groups - for example, different ages, different cultural groups, different regions. This is a theory, I have no idea how close it is to the reality, but perhaps this is what I will be able to do, as I continue with this research. It seems intriguing to deconstruct a taught neural network to see what patterns it has extracted and what are their meanings.

It is highly probable, of course, that a) I misunderstand and imagining things b) this all had been asked and answered long time ago c) there are simple answers I ignore just because of my ignorance. Well, the path is everything, the destination is nothing. Let’s do it.

Attribute Information for the dataset:

These are the attributes the dataset in question contains:

Style: Bohemia,brief,casual,cute,fashion,flare,novelty,OL,party,sexy,vintage,work.
Price:Low,Average,Medium,High,Very-High
Rating:1-5
Size:S,M,L,XL,Free
Season:Autumn,winter,Spring,Summer
NeckLine:O-neck,backless,board-neck,Bowneck,halter,mandarin-collor,open,peterpan-collor,ruffled,scoop,slash-neck,square-collar,sweetheart,turndowncollar,V-neck.
SleeveLength:full,half,halfsleeves,butterfly,sleveless,short,threequarter,turndown,null
waiseline:dropped,empire,natural,princess,null.
Material:wool,cotton,mix etc
FabricType:shafoon,dobby,popline,satin,knitted,jersey,flannel,corduroy etc
Decoration:applique,beading,bow,button,cascading,crystal,draped,embroridary,feathers,flowers etc
Pattern type: solid,animal,dot,leapard etc
Recommendation:0,1

Data loading and processing

This document is also supposed to demonstrate how sales data can be processed in R, so various data processing steps are explained where possible.

# Let's load libraries first
# Use call like install.packages("dplyr"), if you miss any of the libraries.
library(digest) # get checksum for data loaded from file
library(dplyr) # for column selection and handy function piping syntax
library(tidyr) # for making numeric columns from style variables
library(nnet) # this neural net library allows for one layer of hidden neurons
library(neuralnet) # this one allows several layers, but work only with numbers
library(data.table) # for renaming columns list to list
# Update Java to 64bit version if xlsx library returns Java error
# http://javadl.sun.com/webapps/download/AutoDL?BundleId=109708
library(xlsx) # for reading Excel files

First, we change working directory and download the dataset.

#setwd(file.path(normalizePath("~"),"sales-neuralnet")) # Knitr FAQ says
# it's a bad practice, but use this line when running code in console
src<-"http://archive.ics.uci.edu/ml/machine-learning-databases/00289/Dresses_Attribute_Sales.rar"
file <- basename(src)
if(!file.exists(file)) download.file(src,file)

Now you have to unrar Dresses_Attribute_Sales.rar to the current folder! - it’s a manual step, not reproducible by the script
As it seems there’s no easy way to unrar in R, so do this step outside of R. Actually the R-downloaded file is broken somehow, so download it manually too.

We see that there’s a file where for different properties of a dress there’s a recommendation result - does it sell or does it not. As authors explain, “This dataset contain Attributes of dresses and their recommendations according to their sales.” There’s also another spreadsheet where sales data is provided. For now we’ll use Recommendation parameter directly, sales numbers can be used later.

# create platform-independent file path
xls<-file.path("Dresses_Attribute_Sales","Attribute DataSet.xlsx")

# read data from excel file to a data frame
if(!exists("loaded") || digest(loaded) != "c049d992c0c433cb3b0d1cbc7e348f6c")
        loaded<-read.xlsx(xls,1) # second argument is tab number
# preserve to skip file loading on console re-runs
data<-loaded

Data review

Let’s review our dataset - first its first five lines, then data structure of the dataset in R, then table dimensions.

# Top five rows
head(data)

##     Dress_ID    Style   Price Rating Size Season NeckLine SleeveLength
## 1 1006032852     Sexy     Low    4.6    M Summer   o-neck    sleevless
## 2 1212192089   Casual     Low    0.0    L Summer   o-neck        Petal
## 3 1190380701  vintage    High    0.0    L Automn   o-neck         full
## 4  966005983    Brief Average    4.6    L Spring   o-neck         full
## 5  876339541     cute     Low    4.5    M Summer   o-neck    butterfly
## 6 1068332458 bohemian     Low    0.0    M Summer   v-neck    sleevless
##   waiseline      Material FabricType Decoration Pattern.Type
## 1    empire          null    chiffon    ruffles       animal
## 2   natural    microfiber       null    ruffles       animal
## 3   natural      polyster       null       null        print
## 4   natural          silk    chiffon embroidary        print
## 5   natural chiffonfabric    chiffon        bow          dot
## 6    empire          null       null       null        print
##   Recommendation
## 1              1
## 2              0
## 3              0
## 4              1
## 5              0
## 6              0

# Memory object structure
str(data)

## 'data.frame':    500 obs. of  14 variables:
##  $ Dress_ID      : num  1.01e+09 1.21e+09 1.19e+09 9.66e+08 8.76e+08 ...
##  $ Style         : Factor w/ 13 levels "bohemian","Brief",..: 11 3 12 2 4 1 3 7 6 1 ...
##  $ Price         : Factor w/ 7 levels "Average","high",..: 5 5 3 1 5 5 1 1 1 5 ...
##  $ Rating        : num  4.6 0 0 4.6 4.5 0 0 0 0 0 ...
##  $ Size          : Factor w/ 7 levels "free","L","M",..: 3 2 2 2 3 3 7 1 1 1 ...
##  $ Season        : Factor w/ 8 levels "Automn","Autumn",..: 6 6 1 4 6 6 6 1 4 6 ...
##  $ NeckLine      : Factor w/ 17 levels "backless","boat-neck",..: 7 7 7 7 7 17 7 7 17 17 ...
##  $ SleeveLength  : Factor w/ 18 levels "butterfly","cap-sleeves",..: 12 8 4 4 1 12 4 9 9 12 ...
##  $ waiseline     : Factor w/ 5 levels "dropped","empire",..: 2 3 3 3 3 2 4 3 2 3 ...
##  $ Material      : Factor w/ 24 levels "acrylic","cashmere",..: 14 9 17 20 3 14 4 17 4 15 ...
##  $ FabricType    : Factor w/ 23 levels "batik","broadcloth",..: 3 12 12 3 3 12 12 2 2 3 ...
##  $ Decoration    : Factor w/ 25 levels "applique","beading",..: 21 21 14 8 3 14 14 12 2 14 ...
##  $ Pattern.Type  : Factor w/ 15 levels "animal","character",..: 1 1 12 12 3 12 13 9 13 9 ...
##  $ Recommendation: num  1 0 0 1 0 0 0 0 1 1 ...

# Frame dimensions
dim(data)

## [1] 500  14

So we have 500 lines in the table. In the data language it means we have 500 observations - with a given set of dress’ attributes, does it sell or not.

What is Dress_ID?

We have two columns that are different from other styles - Dress_ID and Rating. Supposedly the first is a unique id of a dress for which we observe the recommendation based on sales level. Let’s check this assumption. Are Dress_IDs unique for each record?

It shouldn’t be said while presenting results, but as this document is about procedure, it should be noted that this particular section was added later during the analysis when attempt of tidying data showed that it’s not actually unique, and so we need to understand what it means.

length(unique(data$Dress_ID))

## [1] 475

length(data$Dress_ID)

## [1] 500

It looks some dresses have several rows. Let’s look at them.

repeats<-table(data$Dress_ID)
dupes<-data[data$Dress_ID %in% names(repeats[repeats>1]),] %>%
  arrange(Dress_ID)
head(dupes)

##    Dress_ID  Style   Price Rating Size Season NeckLine SleeveLength
## 1 549159213 Casual Average    4.5 free Summer   o-neck        short
## 2 549159213 Casual Average    4.4 free Summer   o-neck   capsleeves
## 3 560474456   Sexy     low    4.3 free Winter   v-neck    sleevless
## 4 560474456   Sexy     low    4.4 free Automn   v-neck    sleevless
## 5 699738864 Casual     low    4.7    S summer   o-neck         full
## 6 699738864 Casual     Low    4.7    S Summer   o-neck         full
##   waiseline Material FabricType Decoration Pattern.Type Recommendation
## 1   natural   cotton       null       null        solid              1
## 2   natural   cotton       null       null        solid              0
## 3   natural     silk    worsted       lace        solid              1
## 4   natural   cotton    worsted       lace        solid              0
## 5   natural   cotton       null       null    patchwork              0
## 6   natural   cotton       null       null    patchwork              0

Apparently, each dress (with the same Dress_ID) can have variations - same dress with different sleeves, from different materials. It doesn’t make sense to make Dress_ID a style parameter (see below), so we’ll just ignore it, and will treat same dress with variations as a different dresses.

# Let's add explicit unique row numbers to represent observations
data$rnumber<-1:nrow(data)

What is Rating?

Presumably the rating is average grade given to a dress by store’s customers. We should be wary about this parameter, as can be correlated to the sales too directly. What is worse, for new inventory it won’t be available. So let’s keep an eye on it.

Predictions using nnet() library

Nnet() is a standard R library, it’s even included in the basic distribution package and doesn’t need to be installed separately. It also great because it allows to analyze data with so called “factor” variables - categorical ones. For example, what material a dress is made from - cotton, silk, etc. Its drawback is that it only allows for a single hidden layer of neurons.

# Make sure data types are good for nnet() - factors and numbers
df<-as.data.frame(
  lapply(
    dplyr::select(data,-Dress_ID,-rnumber),
    as.factor))
df$Rating<-as.numeric(df$Rating)
# select() above needed direct reference to dplyr library because R libraries
# like to overwrite this function very much

It’s time to train the neural network. This code creates an object which basically includes a set of coefficients to be applied to input parameters to predict (calculate) the outcome. This set of coefficients is the substance of the trained neural network.

It takes some time to train a neural network, but once it’s trained, predictions can be calculated very fast and “cheap”. In this case the dataset is small, and network is simple, so it trains also fast.

neurons<-5 # we'll play with this number below
df1<-df # make clean copy which can be mutilated if needed - will be used later
seed<-3 # initialize randomizer for reproducibility
# We'll set seed every time randomizer is in play to reset it after previous use
set.seed(seed)
nn<-nnet(Recommendation ~ ., df1, size=neurons)

## # weights:  771
## initial  value 338.247117 
## iter  10 value 310.502710
## iter  20 value 195.739258
## iter  30 value 138.190418
## iter  40 value 121.918684
## iter  50 value 118.273514
## iter  60 value 114.846922
## iter  70 value 113.185644
## iter  80 value 110.318486
## iter  90 value 109.172075
## iter 100 value 107.684641
## final  value 107.684641 
## stopped after 100 iterations

guess<-predict(nn, df1, type = "class")
table(df1$Recommendation,guess)

##    guess
##       0   1
##   0 251  38
##   1   8 199

Let’s make nnet() a placeholder function with this result to skip these runs when developing next parts. This code itself should be skipped when producing the final document.

# To skip empty runs while developing, overwrite it
nn.placeholder<-nn
if(exists("nnet")) rm(nnet)
nnet<-function(a,b,size=1) {
        print("Console run: skipped following nnet calls")
        nn.placeholder
        }

Well, most zeroes are classified as zeroes and vice versa. Let’s calculate a percentage of the correct guesses.

# We'll use this later, let's make a reusable function out of it
# Although I'm sure there's a simpler way to do this, some basic ready-made
# R function, which I'm just don't know. Well, I'll learn that later then
qualify<-function(real,guess){
  check<-table(real,guess)
  good.ones<-check[1,1]+check[2,2]
  bad.ones<-check[1,2]+check[2,1]
  paste0(as.character(round(100*good.ones/(good.ones+bad.ones))),'%')
  }
quality<-qualify(df1$Recommendation,guess)
print(quality)

## [1] "91%"

OK, so 91% doesn’t seem too bad for just 5 neurons. But what if we separate training and test sets?
First, let’s select lines we’ll use for training, and rest will go for testing. Let’s use 80% of data for training purposes.

nr<-dim(df)[1] # number of observations
share<-0.8 # this is our 80% parameter
set.seed(seed)
trainset<-sample.int(nr,round(share*nr))

The trainset variable now has numbers of observations we’ll use for training. Let’s split the dataset in two.

neurons<-5 
df1<-df
trainers<-df1[trainset,]
testers<-df1[-trainset,]
set.seed(seed)              
nn<-nnet(Recommendation ~ ., trainers, size=neurons)

## # weights:  771
## initial  value 273.048410 
## iter  10 value 241.482580
## iter  20 value 144.842478
## iter  30 value 95.991723
## iter  40 value 82.573174
## iter  50 value 77.544592
## iter  60 value 73.808236
## iter  70 value 71.405765
## iter  80 value 69.804317
## iter  90 value 68.856340
## iter 100 value 68.259184
## final  value 68.259184 
## stopped after 100 iterations

Now let’s see what happens to our prediction quality.

guess<-predict(nn, testers, type = "class")
quality<-qualify(testers$Recommendation,guess)
# Record for later comparison
nnets<-c(Neurons=neurons,Rating="Yes",Quality=quality)
nnets

## Neurons  Rating Quality 
##     "5"   "Yes"   "60%"

nnets.all<-nnets

Well, just 60%. We could have just flipped a coin.

Moreover, it included Rating, which is kind of cheating. Theoretically speaking, there’s a confounding variable - people both rate and buy what they like. Let’s drop the rating.

neurons<-5 
df1<-dplyr::select(df,-Rating)
trainers<-df1[trainset,]
testers<-df1[-trainset,]
set.seed(seed) 
nn<-nnet(Recommendation ~ ., trainers, size=neurons)

## # weights:  766
## initial  value 298.439688 
## iter  10 value 188.133627
## iter  20 value 117.675639
## iter  30 value 89.266456
## iter  40 value 80.714295
## iter  50 value 79.602084
## iter  60 value 79.080988
## iter  70 value 78.650563
## iter  80 value 76.371040
## iter  90 value 72.361793
## iter 100 value 72.022875
## final  value 72.022875 
## stopped after 100 iterations

guess<-predict(nn, testers, type = "class")
quality<-qualify(testers$Recommendation,guess)
# Record for later comparison
nnets<-c(Neurons=neurons,Rating="No",Quality=quality)
nnets

## Neurons  Rating Quality 
##     "5"    "No"   "61%"

nnets.all<-rbind(nnets.all,nnets)

Surprisingly, it didn’t drop at all. It could - if you play with the randomizer seed, you’ll see. But not by much. Probably the Rating wasn’t affecting sales as strong as we expected.

Let’s increase number of neurons. First, with Rating.

neurons<-30 
df1<-df
trainers<-df1[trainset,]
testers<-df1[-trainset,]
set.seed(seed)
# Add a new paramter to nnet() - MaxNWts, otherwise it exceeds default
# value for maximum number of weights and errors w/ "too many (4621) weights"
nn<-nnet(Recommendation ~ ., trainers, size=neurons, MaxNWts=10000)

## # weights:  4621
## initial  value 293.796148 
## iter  10 value 233.308647
## iter  20 value 174.044699
## iter  30 value 108.898880
## iter  40 value 57.163090
## iter  50 value 13.316643
## iter  60 value 4.838987
## iter  70 value 1.660317
## iter  80 value 0.127794
## iter  90 value 0.007926
## iter 100 value 0.000615
## final  value 0.000615 
## stopped after 100 iterations

guess<-predict(nn, testers, type = "class")
quality<-qualify(testers$Recommendation,guess)
# Record for later comparison
nnets<-c(Neurons=neurons,Rating="Yes",Quality=quality)
nnets

## Neurons  Rating Quality 
##    "30"   "Yes"   "57%"

nnets.all<-rbind(nnets.all,nnets)

Now - without the rating.

neurons<-30 
df1<-dplyr::select(df,-Rating)
trainers<-df1[trainset,]
testers<-df1[-trainset,]
set.seed(seed) 
nn<-nnet(Recommendation ~ ., trainers, size=neurons,MaxNWts=10000)

## # weights:  4591
## initial  value 482.279127 
## iter  10 value 157.513021
## iter  20 value 12.781400
## iter  30 value 1.647551
## iter  40 value 1.391606
## iter  50 value 1.386979
## iter  60 value 1.386534
## iter  70 value 1.386402
## iter  80 value 1.386370
## iter  90 value 1.386324
## final  value 1.386322 
## converged

guess<-predict(nn, testers, type = "class")
quality<-qualify(testers$Recommendation,guess)
# Record for later comparison
nnets<-c(Neurons=neurons,Rating="Yes",Quality=quality)
nnets

## Neurons  Rating Quality 
##    "30"   "Yes"   "47%"

nnets.all<-rbind(nnets.all,nnets)

So rating doesn’t matter here.

Is there a rule of thumb for the number of neurons? Here’s a detailed answer to this question by Nate Kohl at StackOverflow. In short, while every problem should be handled specifically, and trial-and-error always useful, but the rule of thumb is 2/3 of number of inputs plus number of outputs. Question is - if we are dealing with the factor variables, do we count variables or their levels? That’s a question for another day, but for now we’ve seen that crude pumping neurons number doesn’t help that much.

How many levels do we have in total, anyway?

# Count number of unique values in each column, then sum it up
# Let's also use R's trick - while we assign a value to a variable, parenthesis
# do also print the value
(levels.number<-sum(
        sapply(
                dplyr::select(df,-Rating,-Recommendation),
                function(x)
                        length(unique(x)))))

## [1] 170

# Then 2/3 will be, adding input Rating and output Recommendation
(neurons<-round((levels.number+2)*2/3))

## [1] 115

Just to be safe, let’s do 115 neurons.

df1<-df
trainers<-df1[trainset,]
testers<-df1[-trainset,]
set.seed(seed) 
nn<-nnet(Recommendation ~ ., trainers, size=neurons,MaxNWts=100000)

## # weights:  17711
## initial  value 814.369720 
## iter  10 value 233.274460
## iter  20 value 192.190614
## iter  30 value 137.987112
## iter  40 value 71.908269
## iter  50 value 15.057290
## iter  60 value 1.789717
## iter  70 value 0.077771
## iter  80 value 0.003328
## iter  90 value 0.000466
## iter 100 value 0.000186
## final  value 0.000186 
## stopped after 100 iterations

guess<-predict(nn, testers, type = "class")
quality<-qualify(testers$Recommendation,guess)
# Record for later comparison
nnets<-c(Neurons=neurons,Rating="Yes",Quality=quality)
nnets

## Neurons  Rating Quality 
##   "115"   "Yes"   "51%"

nnets.all<-rbind(nnets.all,nnets)

Let’s review our attempts.

print(data.frame(nnets.all), row.names = FALSE)

##  Neurons Rating Quality
##        5    Yes     60%
##        5     No     61%
##       30    Yes     57%
##       30    Yes     47%
##      115    Yes     51%

So, it doesn’t work. Some different approach is needed, obviously. Perhaps there should be more layers, not just more neurons? Let’s try to use neuralnet() library now, hoping that it will allow for more complex networks.

Predictions using neuralnet() library

R’s neuralnet() requires numeric input, so let’s convert it.

# Convert every column to character to prevent loss of the data when gathering
# square brackets keep data.frame type
data[]<-lapply(data,as.character)

# Repack factors to numeric columns
# I know there are faster ways to do it, but this one is more transparent to me
bins<-data %>% # piped functions follow
    
  # make it narrow, don't touch numeric variables and IDs
  gather(catnames,catvalues,-Dress_ID,-Rating,-Recommendation,-rnumber) %>%
  
  # make single column out of them
  unite(newfactor,catnames,catvalues,sep=".") %>%

  # add a new column - it's "1" for every record
  mutate( is = 1) %>%

  # create a column from each factor, and where there's no record, add "0"
  spread(newfactor, is, fill = 0)

# Now let's make it back numeric, except for ID
bins[]<-lapply(bins,as.numeric)
bins$Dress_ID<-as.factor(bins$Dress_ID)

Now we have 174 columns instead of 15, and they are all numeric, except for ID.

Try full dataset first, just to get a feeling.

# replace minuses in column names to avoid breaking formula in neuralnet call
cnames<-colnames(bins)
bad.cnames<-cnames[grepl("-",cnames,fixed=TRUE)]
## Presumably there's no columns with the same names except for minus sign
#fixed.cnames<-sapply(bad.cnames,function(x){gsub("-","",x,fixed=TRUE)})
#cnames[cnames %in% fixed.cnames]
## I was wrong! It duplicates this way. Let's add something instead of minus.
##And keep this exercise in here just to remember what fun it was
fixed.cnames<-sapply(bad.cnames,function(x){gsub("-",".",x,fixed=TRUE)})
cnames[cnames %in% fixed.cnames]

## character(0)

# setnames() call from data.table works by reference, so no copying
setnames(bins,old=bad.cnames,new=fixed.cnames)

# next line was used to generate list of columns below
# cat(paste0(names(bins)[5:174],sep="+"))
# also make function to avoid copy-pasting it again
bins.nn<-function(df,rep=1,hidden=c(1),threshold=0.1) {
        set.seed(seed)
        nn.obj<-neuralnet(Recommendation ~ Decoration.applique+ Decoration.beading+ Decoration.bow+ Decoration.button+ Decoration.cascading+ Decoration.crystal+ Decoration.draped+ Decoration.embroidary+ Decoration.feathers+ Decoration.flowers+ Decoration.hollowout+ Decoration.lace+ Decoration.NA+ Decoration.none+ Decoration.null+ Decoration.pearls+ Decoration.plain+ Decoration.pleat+ Decoration.pockets+ Decoration.rivet+ Decoration.ruched+ Decoration.ruffles+ Decoration.sashes+ Decoration.sequined+ Decoration.tassel+ Decoration.Tiered+ FabricType.batik+ FabricType.broadcloth+ FabricType.chiffon+ FabricType.Corduroy+ FabricType.dobby+ FabricType.flannael+ FabricType.flannel+ FabricType.jersey+ FabricType.knitted+ FabricType.knitting+ FabricType.lace+ FabricType.NA+ FabricType.null+ FabricType.organza+ FabricType.other+ FabricType.poplin+ FabricType.satin+ FabricType.sattin+ FabricType.shiffon+ FabricType.terry+ FabricType.tulle+ FabricType.wollen+ FabricType.woolen+ FabricType.worsted+ Material.acrylic+ Material.cashmere+ Material.chiffonfabric+ Material.cotton+ Material.knitting+ Material.lace+ Material.linen+ Material.lycra+ Material.microfiber+ Material.milksilk+ Material.mix+ Material.modal+ Material.model+ Material.NA+ Material.null+ Material.nylon+ Material.other+ Material.polyster+ Material.rayon+ Material.shiffon+ Material.silk+ Material.sill+ Material.spandex+ Material.viscos+ Material.wool+ NeckLine.backless+ NeckLine.boat.neck+ NeckLine.bowneck+ NeckLine.halter+ NeckLine.mandarin.collor+ NeckLine.NA+ NeckLine.NULL+ NeckLine.o.neck+ NeckLine.open+ NeckLine.peterpan.collor+ NeckLine.ruffled+ NeckLine.Scoop+ NeckLine.slash.neck+ NeckLine.sqare.collor+ NeckLine.sweetheart+ NeckLine.Sweetheart+ NeckLine.turndowncollor+ NeckLine.v.neck+ Pattern.Type.animal+ Pattern.Type.character+ Pattern.Type.dot+ Pattern.Type.floral+ Pattern.Type.geometric+ Pattern.Type.leapord+ Pattern.Type.leopard+ Pattern.Type.NA+ Pattern.Type.none+ Pattern.Type.null+ Pattern.Type.patchwork+ Pattern.Type.plaid+ Pattern.Type.print+ Pattern.Type.solid+ Pattern.Type.splice+ Pattern.Type.striped+ Price.Average+ Price.high+ Price.High+ Price.low+ Price.Low+ Price.Medium+ Price.NA+ Price.very.high+ Season.Automn+ Season.Autumn+ Season.NA+ Season.spring+ Season.Spring+ Season.summer+ Season.Summer+ Season.winter+ Season.Winter+ Size.free+ Size.L+ Size.M+ Size.s+ Size.S+ Size.small+ Size.XL+ SleeveLength.butterfly+ SleeveLength.cap.sleeves+ SleeveLength.capsleeves+ SleeveLength.full+ SleeveLength.half+ SleeveLength.halfsleeve+ SleeveLength.NULL+ SleeveLength.Petal+ SleeveLength.short+ SleeveLength.sleeevless+ SleeveLength.sleeveless+ SleeveLength.sleevless+ SleeveLength.sleveless+ SleeveLength.threequarter+ SleeveLength.threequater+ SleeveLength.thressqatar+ SleeveLength.turndowncollor+ SleeveLength.urndowncollor+ Style.bohemian+ Style.Brief+ Style.Casual+ Style.cute+ Style.fashion+ Style.Flare+ Style.Novelty+ Style.OL+ Style.party+ Style.sexy+ Style.Sexy+ Style.vintage+ Style.work+ waiseline.dropped+ waiseline.empire+ waiseline.NA+ waiseline.natural+ waiseline.null+ waiseline.princess,
                          data=df,
                          hidden=hidden,
                          lifesign="full",
                          lifesign.step=2000,
                          threshold=threshold,
                          rep=rep)
        return(nn.obj)}
n1<-bins.nn(bins,rep=1,hidden=c(5),threshold=0.02)

## hidden: 5    thresh: 0.02    rep: 1/1    steps:    2000  min thresh: 0.07388653044
##                                                    4000  min thresh: 0.03293984918
##                                                    6000  min thresh: 0.03293984918
##                                                    8000  min thresh: 0.02927448485
##                                                   10000  min thresh: 0.02625585969
##                                                   12000  min thresh: 0.02120308571
##                                                   12110  error: 0.55624  time: 20.3 secs

res1<-neuralnet::compute(n1,bins[,5:174])
## Some fun how I couldn't get it working
## First I got this
### Error in eval(expr, envir, enclos) : object 'NeckLine.boatneck' not found
## Looked like  minus sign in column names were treated as part of the formula
## So I made code above to delete minuses from column names
## Then I was getting following:
### Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments
## Checking dimensions
#length(n1$model.list$variables)
#length(bins[1,5:174])
## Do not match, why?
#bins[1,5:174][! names(bins[1,5:174]) %in% n1$model.list$variables]
## SleeveLength.capsleeves.1
## It duplicates!
## I went up to see if fixed.cnames were present in cnames, and they were
## So I replaced minusis with dot
qualify(round(res1$net.result),bins$Recommendation)

## [1] "100%"

So it works fine for the full set. Let’s do testers/trainers again.

# do we have same number of lines?
dim(bins)

## [1] 500 174

dim(df)

## [1] 500  13

# We do
trainers<-bins[trainset,]
testers<-bins[-trainset,]
## I had another interesting hurdle here. To skip wait times on teaching the
## network, I had cache=TRUE instruction on the previous block. Guess what -
## it skipped that code block and didn't replace minuses in column names.
## And so I got errors. Thus I should be careful with using cache when
## code does some essential transformations. So I've split that chunk to two
## and cached only the neuralnet training part.
n5<-bins.nn(trainers,rep=1,hidden=c(5),threshold=0.02)

## hidden: 5    thresh: 0.02    rep: 1/1    steps:    2000  min thresh: 0.07135827152
##                                                    3926  error: 0.52462  time: 5.63 secs

res5<-neuralnet::compute(n5,testers[,5:174])
qualify(round(res5$net.result),testers$Recommendation)

## [1] "38%"

38% with 5 artificial neurons. Worse than random. Let’s try just one.

n2<-bins.nn(trainers,rep=1,hidden=c(1),threshold=0.02)

## hidden: 1    thresh: 0.02    rep: 1/1    steps:     775  error: 18.72795 time: 0.45 secs

res2<-neuralnet::compute(n2,testers[,5:174])
qualify(round(res2$net.result),testers$Recommendation)

## [1] "54%"

Only better - 54%. OK, let’s do three layers with 115 neurons in total and do 10 fitting runs.

n2<-bins.nn(trainers,rep=10,hidden=c(80,25,10),threshold=0.02)

## hidden: 80, 25, 10    thresh: 0.02    rep:  1/10    steps:     147   error: 0.50196  time: 3.23 secs
## hidden: 80, 25, 10    thresh: 0.02    rep:  2/10    steps:     122   error: 0.50089  time: 2.69 secs
## hidden: 80, 25, 10    thresh: 0.02    rep:  3/10    steps:     119   error: 0.50131  time: 2.59 secs
## hidden: 80, 25, 10    thresh: 0.02    rep:  4/10    steps:     114   error: 0.50516  time: 2.52 secs
## hidden: 80, 25, 10    thresh: 0.02    rep:  5/10    steps:     112   error: 0.50073  time: 2.44 secs
## hidden: 80, 25, 10    thresh: 0.02    rep:  6/10    steps:     129   error: 0.50561  time: 2.84 secs
## hidden: 80, 25, 10    thresh: 0.02    rep:  7/10    steps:     119   error: 0.50179  time: 2.63 secs
## hidden: 80, 25, 10    thresh: 0.02    rep:  8/10    steps:     109   error: 0.50043  time: 2.37 secs
## hidden: 80, 25, 10    thresh: 0.02    rep:  9/10    steps:     140   error: 0.50117  time: 3.1 secs
## hidden: 80, 25, 10    thresh: 0.02    rep: 10/10    steps:     121   error: 0.50216  time: 2.68 secs

res2<-neuralnet::compute(n2,testers[,5:174])
qualify(round(res2$net.result),testers$Recommendation)

## [1] "51%"

51% :(

To do

Check for NAs and handle them
See if low/Low spelling difference is something frequent, check other columns for spelling differences, unify
Find out why R-downloaded file is broken
~~Time neural net training times for demonstration purpose (for my PC’s power)~~ neuralnet() shows it automatically.
~~Apply neuralnet()~~
Should I somehow take in account that inputs and outputs are integers, when using neuralnet(), instead of rounding the result?
~~Describe how in neural network for product sales data each neuron can be explained as a customer priorities pattern~~
Check if there’s faster way to calculate prediction quality as a single number
Use actual sales numbers instead of “Recommendation” factor
For duplicated Dress_IDs, see if low/Low spelling should be fixed
Try to predict ratings instead of sales
See if predict() function is of nnet library or something more universal
Try to apply compression algorithms to reveal attributes that affect price more than others (is it possible at all with factor-based data?)
How price affect the sales, for example?
Learn alternative ways to repack factors to numeric columns (something about model.matrix() and nnet’s class.ind()
When deciding number of neurons with a 2/3 inputs+outputs rule, do we count factor variables or all they levels?

# Some code for future lookup
model.matrix(~ . + 0, 
             data=dfactors,
             contrasts.arg = lapply(data, contrasts, contrasts=FALSE))

## To do - technical
# - Make nnet call a function - takes seed, neurons, rating usage, returns list
# - Automate install.packages (and discuss reproducibility at forums)
# - Also discuss republishing to the same RPubs document from a different copy
# - Check if nnet() has successfully trained before checking it results
# - Find out why make.row.names isn't recognized by rbind properly
# - Envelope nnet() into a function and auto-calculate MaxNWts there too