\[問題\]

\[資料介紹\]

\[處理data\]

library(dplyr)
library(ggplot2)
library(e1071)
library(randomForest)
library(caret)
library(gridExtra)
library(data.table)
trainInit<-read.csv("train.csv")
head(trainInit)
##   AnimalID    Name            DateTime     OutcomeType OutcomeSubtype
## 1  A671945 Hambone 2014-02-12 18:22:00 Return_to_owner               
## 2  A656520   Emily 2013-10-13 12:44:00      Euthanasia      Suffering
## 3  A686464  Pearce 2015-01-31 12:28:00        Adoption         Foster
## 4  A683430         2014-07-11 19:09:00        Transfer        Partner
## 5  A667013         2013-11-15 12:52:00        Transfer        Partner
## 6  A677334    Elsa 2014-04-25 13:04:00        Transfer        Partner
##   AnimalType SexuponOutcome AgeuponOutcome
## 1        Dog  Neutered Male         1 year
## 2        Cat  Spayed Female         1 year
## 3        Dog  Neutered Male        2 years
## 4        Cat    Intact Male        3 weeks
## 5        Dog  Neutered Male        2 years
## 6        Dog  Intact Female        1 month
##                               Breed       Color
## 1             Shetland Sheepdog Mix Brown/White
## 2            Domestic Shorthair Mix Cream Tabby
## 3                      Pit Bull Mix  Blue/White
## 4            Domestic Shorthair Mix  Blue Cream
## 5       Lhasa Apso/Miniature Poodle         Tan
## 6 Cairn Terrier/Chihuahua Shorthair   Black/Tan
train<-trainInit[,-c(1,2,5)]
head(train)
##              DateTime     OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return_to_owner        Dog  Neutered Male
## 2 2013-10-13 12:44:00      Euthanasia        Cat  Spayed Female
## 3 2015-01-31 12:28:00        Adoption        Dog  Neutered Male
## 4 2014-07-11 19:09:00        Transfer        Cat    Intact Male
## 5 2013-11-15 12:52:00        Transfer        Dog  Neutered Male
## 6 2014-04-25 13:04:00        Transfer        Dog  Intact Female
##   AgeuponOutcome                             Breed       Color
## 1         1 year             Shetland Sheepdog Mix Brown/White
## 2         1 year            Domestic Shorthair Mix Cream Tabby
## 3        2 years                      Pit Bull Mix  Blue/White
## 4        3 weeks            Domestic Shorthair Mix  Blue Cream
## 5        2 years       Lhasa Apso/Miniature Poodle         Tan
## 6        1 month Cairn Terrier/Chihuahua Shorthair   Black/Tan
attach(train)
Dogtrain<-train[which(AnimalType=="Dog"),]
Cattrain<-train[-which(AnimalType=="Dog"),]
head(Dogtrain)
##               DateTime     OutcomeType AnimalType SexuponOutcome
## 1  2014-02-12 18:22:00 Return_to_owner        Dog  Neutered Male
## 3  2015-01-31 12:28:00        Adoption        Dog  Neutered Male
## 5  2013-11-15 12:52:00        Transfer        Dog  Neutered Male
## 6  2014-04-25 13:04:00        Transfer        Dog  Intact Female
## 9  2014-02-04 17:17:00        Adoption        Dog  Spayed Female
## 10 2014-05-03 07:48:00        Adoption        Dog  Spayed Female
##    AgeuponOutcome                             Breed       Color
## 1          1 year             Shetland Sheepdog Mix Brown/White
## 3         2 years                      Pit Bull Mix  Blue/White
## 5         2 years       Lhasa Apso/Miniature Poodle         Tan
## 6         1 month Cairn Terrier/Chihuahua Shorthair   Black/Tan
## 9        5 months     American Pit Bull Terrier Mix   Red/White
## 10         1 year                     Cairn Terrier       White
attach(Dogtrain)

OutcomeType

  • 先看各個 OucomeType 分別有幾筆資料。
## Return to owner = 4286
## Transfer = 3917
## Adoption = 6497
## Died = 50
## Euthanasia = 845
  • 因為“Died”的個數太少,所以我們決定把他拿掉。 而且,我們將本來的OutcomeType中的,“Return_to_owner” 改為 “Return”。
  • 因為後面在做XGBoosting的時候,會出現錯誤,說“Return_to_owner”超過64位元…….
##               DateTime OutcomeType AnimalType SexuponOutcome
## 1  2014-02-12 18:22:00      Return        Dog  Neutered Male
## 3  2015-01-31 12:28:00    Adoption        Dog  Neutered Male
## 5  2013-11-15 12:52:00    Transfer        Dog  Neutered Male
## 6  2014-04-25 13:04:00    Transfer        Dog  Intact Female
## 9  2014-02-04 17:17:00    Adoption        Dog  Spayed Female
## 10 2014-05-03 07:48:00    Adoption        Dog  Spayed Female
##    AgeuponOutcome                             Breed       Color
## 1          1 year             Shetland Sheepdog Mix Brown/White
## 3         2 years                      Pit Bull Mix  Blue/White
## 5         2 years       Lhasa Apso/Miniature Poodle         Tan
## 6         1 month Cairn Terrier/Chihuahua Shorthair   Black/Tan
## 9        5 months     American Pit Bull Terrier Mix   Red/White
## 10         1 year                     Cairn Terrier       White

BREED

由於狗的血統變數太多,於是我們先做出圖表,觀察每種不同血統出現的多寡,
並抓出前30名出現最多次的血統,其餘的歸類到Others,以便分析。

##               DateTime OutcomeType AnimalType SexuponOutcome
## 1  2014-02-12 18:22:00      Return        Dog  Neutered Male
## 3  2015-01-31 12:28:00    Adoption        Dog  Neutered Male
## 5  2013-11-15 12:52:00    Transfer        Dog  Neutered Male
## 6  2014-04-25 13:04:00    Transfer        Dog  Intact Female
## 9  2014-02-04 17:17:00    Adoption        Dog  Spayed Female
## 10 2014-05-03 07:48:00    Adoption        Dog  Spayed Female
##    AgeuponOutcome       Color            breed
## 1          1 year Brown/White      Other Breed
## 3         2 years  Blue/White         Pit Bull
## 5         2 years         Tan Miniature Poodle
## 6         1 month   Black/Tan    Cairn Terrier
## 9        5 months   Red/White      Other Breed
## 10         1 year       White    Cairn Terrier

Age

由於狗的年齡當中變數的分佈太廣,下至剛出生,上至19歲,其中也有幾周至幾個月不等,於是我們將其大致分為
1. 幼犬(不滿1歲)(Puppy)
2. 成犬(1-7歲)(AdultDog)
3. 老犬(7歲以上)(OldDog)
以便分析。

##               DateTime OutcomeType AnimalType SexuponOutcome       Color
## 1  2014-02-12 18:22:00      Return        Dog  Neutered Male Brown/White
## 3  2015-01-31 12:28:00    Adoption        Dog  Neutered Male  Blue/White
## 5  2013-11-15 12:52:00    Transfer        Dog  Neutered Male         Tan
## 6  2014-04-25 13:04:00    Transfer        Dog  Intact Female   Black/Tan
## 9  2014-02-04 17:17:00    Adoption        Dog  Spayed Female   Red/White
## 10 2014-05-03 07:48:00    Adoption        Dog  Spayed Female       White
##               breed      Age
## 1       Other Breed   OldDog
## 3          Pit Bull AdultDog
## 5  Miniature Poodle AdultDog
## 6     Cairn Terrier    Puppy
## 9       Other Breed    Puppy
## 10    Cairn Terrier   OldDog

Color

  • 由於顏色的種類太多,其中也有包含斑點、胎記等特徵的出現,
    於是我們將其大致分為:純色、雙色、三色、有斑點、有色塊、有胎記 6大類。
  • 接下來,我們希望再進一步分為深淺兩種,看是否對分析有幫助,
    一樣先做圖,發現20名以後的顏色出現次數過少,
    所以我們將前20名出現最多的顏色中,挑出單純的“顏色”,
    並將其分為深淺兩部分,再將之前所分的純色進一步劃分,
    最後我們得到:
  1. 深色(Heavy)
  2. 淺色(Light)
  3. 其它純色(Others Simple)
  4. 雙色(Double)
  5. 三色(Tricolor)
  6. 有斑點(Brindle)
  7. 有色塊(Merle)
  8. 有胎記(Tick)
##               DateTime OutcomeType AnimalType SexuponOutcome
## 1  2014-02-12 18:22:00      Return        Dog  Neutered Male
## 3  2015-01-31 12:28:00    Adoption        Dog  Neutered Male
## 5  2013-11-15 12:52:00    Transfer        Dog  Neutered Male
## 6  2014-04-25 13:04:00    Transfer        Dog  Intact Female
## 9  2014-02-04 17:17:00    Adoption        Dog  Spayed Female
## 10 2014-05-03 07:48:00    Adoption        Dog  Spayed Female
##               breed      Age ColorFix
## 1       Other Breed   OldDog   Double
## 3          Pit Bull AdultDog   Double
## 5  Miniature Poodle AdultDog    Light
## 6     Cairn Terrier    Puppy   Double
## 9       Other Breed    Puppy   Double
## 10    Cairn Terrier   OldDog    Light

Time

由於日期範圍太大,且較長的時間(年),或較短的時間(周、日…等),
對於收容所中的動物變動,可能看不太出甚麼資訊,
所以我們將其分為12個月,並進行分析。

##    DateTime OutcomeType AnimalType SexuponOutcome            breed
## 1  February      Return        Dog  Neutered Male      Other Breed
## 3   January    Adoption        Dog  Neutered Male         Pit Bull
## 5  November    Transfer        Dog  Neutered Male Miniature Poodle
## 6     April    Transfer        Dog  Intact Female    Cairn Terrier
## 9  February    Adoption        Dog  Spayed Female      Other Breed
## 10      May    Adoption        Dog  Spayed Female    Cairn Terrier
##         Age ColorFix
## 1    OldDog   Double
## 3  AdultDog   Double
## 5  AdultDog    Light
## 6     Puppy   Double
## 9     Puppy   Double
## 10   OldDog    Light

\[資料分析\]

SVM

Case 1 :

任意選取 10000 筆資料做為 Training Data。

## Time difference of 27.35156 secs
## The confusion Matrix of training data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       4095        257   2320     1385
##   Euthanasia        0          0      0        0
##   Return           15        145    242      210
##   Transfer         62        149    240      880
## The confusion Matrix of testing data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       2279        136   1210      823
##   Euthanasia        0          0      0        0
##   Return           14         64    141      136
##   Transfer         32         93    133      483
## Training Error Rate = 0.4783
## Testing Error Rate = 0.4763709
## The number of Adoption in the training data is  4172
## The number of Return to transfer in the training data is 2802
## The number of Euthanasia in the training data is 551
## The number of Transfer in the training data is 2475

Case 2 :

任意選取 10000 筆資料做為 Training Data。
並將 Training Data 中的資料,用得balance一點。
也就是說,將 Training Data中的安樂死的資料數,
補到至少跟除了安樂死以外最少的OutcomeType只差100~200筆,但不比它多。

## Time difference of 37.08312 secs
## The confusion Matrix of training data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       4122       1109   2303     1405
##   Euthanasia       43        982    367      423
##   Return            0          2      6       24
##   Transfer         39        200    102      630
## The confusion Matrix of testing data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       2252        139   1227      803
##   Euthanasia       19        136    224      281
##   Return            1          4      3        9
##   Transfer         21         29     54      342
## Training Error Rate = 0.6017
## Testing Error Rate = 0.5070346
## The number of Adoption in the training data is  4204
## The number of Euthanasia in the training data is 2293
## The number of Return to transfer in the training data is 2778
## The number of Transfer in the training data is 2482

Random Forest

Case 1 :

任意選取 10000 筆資料做為 Training Data。

## Time difference of 1.302124 mins

## The confusion Matrix of training data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       3788        161   1241     1118
##   Euthanasia        0        135     10       10
##   Return          337        156   1369      295
##   Transfer         61        112    145     1062
## The confusion Matrix of testing data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       1944         86    870      612
##   Euthanasia        0         15     28       18
##   Return          334         83    479      264
##   Transfer         33         96    144      538
## Training Error Rate = 1.015536
## Testing Error Rate = 1.647266

Case 2 :

任意選取 10000 筆資料做為 Training Data。
並將 Training Data 中的資料,用得balance一點。
也就是說,將 Training Data中的安樂死的資料數,
補到至少跟除了安樂死以外最少的OutcomeType只差100~200筆,但不比它多。

## Time difference of 1.285407 mins

## The confusion Matrix of training data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       3804        317   1326     1074
##   Euthanasia      143       1862    371      284
##   Return          199         64    936      203
##   Transfer         50         93    119      955
## The confusion Matrix of testing data.
##             Reference
## Prediction   Adoption Euthanasia Return Transfer
##   Adoption       1936         93    941      651
##   Euthanasia      113        117    259      191
##   Return          222         35    217      108
##   Transfer         30         63    117      451
## Training Error Rate = 1.02147
## Testing Error Rate = 1.7096
## The number of Adoption in the training data is  4196
## The number of Return to transfer in the training data is 2752
## The number of Euthanasia in the training data is 2336
## The number of Transfer in the training data is 2516

\[結論\]

\[Reference\]

https://www.kaggle.com/apapiu/visualizing-breeds-and-ages-by-outcome/code/notebook
https://www.kaggle.com/mrisdal/quick-dirty-randomforest/code
https://www.kaggle.com/fsmithus/reduced-model
https://cran.r-project.org/web/packages/randomForest/randomForest.pdf
https://cran.r-project.org/web/packages/xgboost/xgboost.pdf