Renew: 06/02 line 37~47 => Use to see the number of each outcometype line 50~58 => Not yet. line 61~168 => Breed line 174~202 => Age line 207~290

06/04 line 205~265 => Color
      line 465~617 => Random Forest

處理data

共26729筆資料,9個解釋變數,包含狗以及貓。

library(dplyr)
library(ggplot2)
library(e1071)
library(randomForest)
library(caret)
library(gridExtra)
trainInit<-read.csv("train.csv")
head(trainInit)
##   AnimalID    Name            DateTime     OutcomeType OutcomeSubtype
## 1  A671945 Hambone 2014-02-12 18:22:00 Return_to_owner               
## 2  A656520   Emily 2013-10-13 12:44:00      Euthanasia      Suffering
## 3  A686464  Pearce 2015-01-31 12:28:00        Adoption         Foster
## 4  A683430         2014-07-11 19:09:00        Transfer        Partner
## 5  A667013         2013-11-15 12:52:00        Transfer        Partner
## 6  A677334    Elsa 2014-04-25 13:04:00        Transfer        Partner
##   AnimalType SexuponOutcome AgeuponOutcome
## 1        Dog  Neutered Male         1 year
## 2        Cat  Spayed Female         1 year
## 3        Dog  Neutered Male        2 years
## 4        Cat    Intact Male        3 weeks
## 5        Dog  Neutered Male        2 years
## 6        Dog  Intact Female        1 month
##                               Breed       Color
## 1             Shetland Sheepdog Mix Brown/White
## 2            Domestic Shorthair Mix Cream Tabby
## 3                      Pit Bull Mix  Blue/White
## 4            Domestic Shorthair Mix  Blue Cream
## 5       Lhasa Apso/Miniature Poodle         Tan
## 6 Cairn Terrier/Chihuahua Shorthair   Black/Tan

刪除 Name 以及 OutcomeSubtype

train<-trainInit[,-c(1,2,5)]
head(train)
##              DateTime     OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return_to_owner        Dog  Neutered Male
## 2 2013-10-13 12:44:00      Euthanasia        Cat  Spayed Female
## 3 2015-01-31 12:28:00        Adoption        Dog  Neutered Male
## 4 2014-07-11 19:09:00        Transfer        Cat    Intact Male
## 5 2013-11-15 12:52:00        Transfer        Dog  Neutered Male
## 6 2014-04-25 13:04:00        Transfer        Dog  Intact Female
##   AgeuponOutcome                             Breed       Color
## 1         1 year             Shetland Sheepdog Mix Brown/White
## 2         1 year            Domestic Shorthair Mix Cream Tabby
## 3        2 years                      Pit Bull Mix  Blue/White
## 4        3 weeks            Domestic Shorthair Mix  Blue Cream
## 5        2 years       Lhasa Apso/Miniature Poodle         Tan
## 6        1 month Cairn Terrier/Chihuahua Shorthair   Black/Tan
attach(train)

將狗和貓的data分散開來。

Dogtrain<-train[which(AnimalType=="Dog"),]
Cattrain<-train[-which(AnimalType=="Dog"),]
head(Dogtrain)
##               DateTime     OutcomeType AnimalType SexuponOutcome
## 1  2014-02-12 18:22:00 Return_to_owner        Dog  Neutered Male
## 3  2015-01-31 12:28:00        Adoption        Dog  Neutered Male
## 5  2013-11-15 12:52:00        Transfer        Dog  Neutered Male
## 6  2014-04-25 13:04:00        Transfer        Dog  Intact Female
## 9  2014-02-04 17:17:00        Adoption        Dog  Spayed Female
## 10 2014-05-03 07:48:00        Adoption        Dog  Spayed Female
##    AgeuponOutcome                             Breed       Color
## 1          1 year             Shetland Sheepdog Mix Brown/White
## 3         2 years                      Pit Bull Mix  Blue/White
## 5         2 years       Lhasa Apso/Miniature Poodle         Tan
## 6         1 month Cairn Terrier/Chihuahua Shorthair   Black/Tan
## 9        5 months     American Pit Bull Terrier Mix   Red/White
## 10         1 year                     Cairn Terrier       White
attach(Dogtrain)

n = 15595 p=8

OutcomeType

先看各個 OucomeType 分別有幾筆資料。

## Return to owner = 4286
## Transfer = 3917
## Adoption = 6497
## Died = 50
## Euthanasia = 845

因為“死亡”的個數太少,所以我們決定把他拿掉。 而且,我們將本來的OutcomeType中的,Return_to_owner 改為 Return。 因為後面在做XGBoosting的時候,會出現錯誤,說Return_to_owner超過64位元…….

##               DateTime OutcomeType AnimalType SexuponOutcome
## 1  2014-02-12 18:22:00      Return        Dog  Neutered Male
## 3  2015-01-31 12:28:00    Adoption        Dog  Neutered Male
## 5  2013-11-15 12:52:00    Transfer        Dog  Neutered Male
## 6  2014-04-25 13:04:00    Transfer        Dog  Intact Female
## 9  2014-02-04 17:17:00    Adoption        Dog  Spayed Female
## 10 2014-05-03 07:48:00    Adoption        Dog  Spayed Female
##    AgeuponOutcome                             Breed       Color
## 1          1 year             Shetland Sheepdog Mix Brown/White
## 3         2 years                      Pit Bull Mix  Blue/White
## 5         2 years       Lhasa Apso/Miniature Poodle         Tan
## 6         1 month Cairn Terrier/Chihuahua Shorthair   Black/Tan
## 9        5 months     American Pit Bull Terrier Mix   Red/White
## 10         1 year                     Cairn Terrier       White

BREED

##               DateTime OutcomeType AnimalType SexuponOutcome
## 1  2014-02-12 18:22:00      Return        Dog  Neutered Male
## 3  2015-01-31 12:28:00    Adoption        Dog  Neutered Male
## 5  2013-11-15 12:52:00    Transfer        Dog  Neutered Male
## 6  2014-04-25 13:04:00    Transfer        Dog  Intact Female
## 9  2014-02-04 17:17:00    Adoption        Dog  Spayed Female
## 10 2014-05-03 07:48:00    Adoption        Dog  Spayed Female
##    AgeuponOutcome       Color            breed
## 1          1 year Brown/White      Other Breed
## 3         2 years  Blue/White         Pit Bull
## 5         2 years         Tan Miniature Poodle
## 6         1 month   Black/Tan    Cairn Terrier
## 9        5 months   Red/White      Other Breed
## 10         1 year       White    Cairn Terrier

Age

Change the AgeuponOutcome as “Puppy”, “AdultDog”, “OldDog” three kinds of types.

##               DateTime OutcomeType AnimalType SexuponOutcome       Color
## 1  2014-02-12 18:22:00      Return        Dog  Neutered Male Brown/White
## 3  2015-01-31 12:28:00    Adoption        Dog  Neutered Male  Blue/White
## 5  2013-11-15 12:52:00    Transfer        Dog  Neutered Male         Tan
## 6  2014-04-25 13:04:00    Transfer        Dog  Intact Female   Black/Tan
## 9  2014-02-04 17:17:00    Adoption        Dog  Spayed Female   Red/White
## 10 2014-05-03 07:48:00    Adoption        Dog  Spayed Female       White
##               breed      Age
## 1       Other Breed   OldDog
## 3          Pit Bull AdultDog
## 5  Miniature Poodle AdultDog
## 6     Cairn Terrier    Puppy
## 9       Other Breed    Puppy
## 10    Cairn Terrier   OldDog

Color

Replace each color to “Simple”, “Double”, “Tricolor”, “Brindle”, “Tick”, “Merle”, Six categories. And the “Brindle” contains the color that it has Brindle and Tick, or Brindle and Merle at the same time. But, there are still some of colors that it not belong to Brindle, like “Blue Tiger”, “Blue cream”, “Smoke” etc.. I will just consider their color, and classify them to “Simple”, “Double” or “Tricolor”.

##               DateTime OutcomeType AnimalType SexuponOutcome
## 1  2014-02-12 18:22:00      Return        Dog  Neutered Male
## 3  2015-01-31 12:28:00    Adoption        Dog  Neutered Male
## 5  2013-11-15 12:52:00    Transfer        Dog  Neutered Male
## 6  2014-04-25 13:04:00    Transfer        Dog  Intact Female
## 9  2014-02-04 17:17:00    Adoption        Dog  Spayed Female
## 10 2014-05-03 07:48:00    Adoption        Dog  Spayed Female
##               breed      Age ColorFix
## 1       Other Breed   OldDog   Double
## 3          Pit Bull AdultDog   Double
## 5  Miniature Poodle AdultDog    Light
## 6     Cairn Terrier    Puppy   Double
## 9       Other Breed    Puppy   Double
## 10    Cairn Terrier   OldDog    Light

Time

##    DateTime OutcomeType AnimalType SexuponOutcome            breed
## 1  February      Return        Dog  Neutered Male      Other Breed
## 3   January    Adoption        Dog  Neutered Male         Pit Bull
## 5  November    Transfer        Dog  Neutered Male Miniature Poodle
## 6     April    Transfer        Dog  Intact Female    Cairn Terrier
## 9  February    Adoption        Dog  Spayed Female      Other Breed
## 10      May    Adoption        Dog  Spayed Female    Cairn Terrier
##         Age ColorFix
## 1    OldDog   Double
## 3  AdultDog   Double
## 5  AdultDog    Light
## 6     Puppy   Double
## 9     Puppy   Double
## 10   OldDog    Light

SVM

Try to use SVM to fit model. case 1 : 任意選取 10000 筆資料做為 training data。

## Time difference of 27.75559 secs
## The confusion Matrix of training data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       4094        247   2242     1436
##   Euthanasia        0          0      0        0
##   Return           19        145    266      242
##   Transfer         62        137    267      843
## The confusion Matrix of testing data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       2280        146   1288      772
##   Euthanasia        0          0      0        0
##   Return           10         81    129      138
##   Transfer         32         88     94      486
## Training Error Rate = 0.4797
## Testing Error Rate = 0.4778139
## The number of Adoption in the training data is  4175
## The number of Return to transfer in the training data is 2775
## The number of Euthanasia in the training data is 529
## The number of Transfer in the training data is 2521

case 2 : 任意選取 10000 筆資料做為 training data。 並將 training data 中的資料,用得balance一點。因為主要都會是安樂死data較少….. 也就是說,將 training data 中的安樂死的資料數,補到至少跟除了安樂死以外最少的outcomeType只差一百~兩百筆,但不比他多。

p.s.0606 重複跑10次 Mean training error 0.59784
                    Mean testing error 0.5060188
## Time difference of 37.80116 secs
## The confusion Matrix of training data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       3821        671   2008     1230
##   Euthanasia       38        979    379      462
##   Return          318        442    306      162
##   Transfer         44        192     97      594
## The confusion Matrix of testing data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       2091         86   1054      759
##   Euthanasia       22        141    232      307
##   Return          145         45    167       79
##   Transfer         18         31     43      324
## Training Error Rate = 0.6043
## Testing Error Rate = 0.5088384
## The number of Adoption in the training data is  4221
## The number of Euthanasia in the training data is 2284
## The number of Return to transfer in the training data is 2790
## The number of Transfer in the training data is 2448

Random Forest

case 1 : 任意選取 10000 筆資料做為 training data。

## Time difference of 1.327143 mins

## The confusion Matrix of training data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       3903        173   1363     1101
##   Euthanasia        0        103      5        2
##   Return          277        127   1177      283
##   Transfer         61        122    162     1141
## The confusion Matrix of testing data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       1958         99   1035      647
##   Euthanasia        0          7     16       10
##   Return          257        102    361      213
##   Transfer         41        111    167      520
## Training Error Rate = 0.3676
## Testing Error Rate = 0.4866522

case 2 : 任意選取 10000 筆資料做為 training data。 並將 training data 中的資料,用得balance一點。因為主要都會是安樂死data較少….. 也就是說,將 training data 中的安樂死的資料數,補到至少跟除了安樂死以外最少的outcomeType只差一百~兩百筆,但不比他多。

## Time difference of 1.373995 mins

## The confusion Matrix of training data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       3775        288   1323     1156
##   Euthanasia      169       1971    368      280
##   Return          159         52    944      176
##   Transfer         41         68    123      959
## The confusion Matrix of testing data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       1973         91    937      596
##   Euthanasia      122        131    240      194
##   Return          226         39    243      115
##   Transfer         32         56    108      441
## Training Error Rate = 0.4203
## Testing Error Rate = 0.497114
## The number of Adoption in the training data is  4144
## The number of Return to transfer in the training data is 2758
## The number of Euthanasia in the training data is 2379
## The number of Transfer in the training data is 2571