我們的資料中共有26729筆,10個變數。
包含15595隻狗,11134隻貓咪
OutcomeType 為我們的反應變數,OutcomeSubType為它的子類別。
\[ Training Error = 1.360251 \] \[ Testing Error = 1.471125 \]
\[ Training Error Rate = 0.7314072 \]
\[ Testing Error Rate = 1.908389 \]
\[ Training Error Rate = 0.9740993 \]
\[ Testing Error Rate = 1.037753 \]
library(dplyr)
library(ggplot2)
library(e1071)
library(randomForest)
library(caret)
library(data.table)
library(xgboost)
library(Matrix)
library(formattable)
trainInit<-read.csv("train.csv")
head(trainInit)
## AnimalID Name DateTime OutcomeType OutcomeSubtype
## 1 A671945 Hambone 2014-02-12 18:22:00 Return_to_owner
## 2 A656520 Emily 2013-10-13 12:44:00 Euthanasia Suffering
## 3 A686464 Pearce 2015-01-31 12:28:00 Adoption Foster
## 4 A683430 2014-07-11 19:09:00 Transfer Partner
## 5 A667013 2013-11-15 12:52:00 Transfer Partner
## 6 A677334 Elsa 2014-04-25 13:04:00 Transfer Partner
## AnimalType SexuponOutcome AgeuponOutcome
## 1 Dog Neutered Male 1 year
## 2 Cat Spayed Female 1 year
## 3 Dog Neutered Male 2 years
## 4 Cat Intact Male 3 weeks
## 5 Dog Neutered Male 2 years
## 6 Dog Intact Female 1 month
## Breed Color
## 1 Shetland Sheepdog Mix Brown/White
## 2 Domestic Shorthair Mix Cream Tabby
## 3 Pit Bull Mix Blue/White
## 4 Domestic Shorthair Mix Blue Cream
## 5 Lhasa Apso/Miniature Poodle Tan
## 6 Cairn Terrier/Chihuahua Shorthair Black/Tan
train<-trainInit[,-c(1,2,5)]
attach(train)
Dogtrain<-train[which(AnimalType=="Dog"),]
Cattrain<-train[-which(AnimalType=="Dog"),]
Dogtrain<-Dogtrain[,-3]
head(Dogtrain)
## DateTime OutcomeType SexuponOutcome AgeuponOutcome
## 1 2014-02-12 18:22:00 Return_to_owner Neutered Male 1 year
## 3 2015-01-31 12:28:00 Adoption Neutered Male 2 years
## 5 2013-11-15 12:52:00 Transfer Neutered Male 2 years
## 6 2014-04-25 13:04:00 Transfer Intact Female 1 month
## 9 2014-02-04 17:17:00 Adoption Spayed Female 5 months
## 10 2014-05-03 07:48:00 Adoption Spayed Female 1 year
## Breed Color
## 1 Shetland Sheepdog Mix Brown/White
## 3 Pit Bull Mix Blue/White
## 5 Lhasa Apso/Miniature Poodle Tan
## 6 Cairn Terrier/Chihuahua Shorthair Black/Tan
## 9 American Pit Bull Terrier Mix Red/White
## 10 Cairn Terrier White
attach(Dogtrain)
| OutcomeType | Frequency |
|---|---|
| Adoption | 6,497 |
| Return_to_owner | 4,286 |
| Transfer | 3,917 |
| Euthanasia | 845 |
| Died | 50 |
## The following objects are masked from Dogtrain (pos = 3):
##
## AgeuponOutcome, Breed, Color, DateTime, OutcomeType,
## SexuponOutcome
## The following objects are masked from train:
##
## AgeuponOutcome, Breed, Color, DateTime, OutcomeType,
## SexuponOutcome
## DateTime OutcomeType SexuponOutcome AgeuponOutcome
## 1 2014-02-12 18:22:00 Return Neutered Male 1 year
## 3 2015-01-31 12:28:00 Adoption Neutered Male 2 years
## 5 2013-11-15 12:52:00 Transfer Neutered Male 2 years
## 6 2014-04-25 13:04:00 Transfer Intact Female 1 month
## 9 2014-02-04 17:17:00 Adoption Spayed Female 5 months
## 10 2014-05-03 07:48:00 Adoption Spayed Female 1 year
## Breed Color
## 1 Shetland Sheepdog Mix Brown/White
## 3 Pit Bull Mix Blue/White
## 5 Lhasa Apso/Miniature Poodle Tan
## 6 Cairn Terrier/Chihuahua Shorthair Black/Tan
## 9 American Pit Bull Terrier Mix Red/White
## 10 Cairn Terrier White
由於狗的血統變數太多,於是我們先做出圖表,觀察每種不同血統出現的多寡,
並抓出前30名出現最多次的血統,其餘的歸類到Others,以便分析。
## DateTime OutcomeType SexuponOutcome AgeuponOutcome
## 1 2014-02-12 18:22:00 Return Neutered Male 1 year
## 3 2015-01-31 12:28:00 Adoption Neutered Male 2 years
## 5 2013-11-15 12:52:00 Transfer Neutered Male 2 years
## 6 2014-04-25 13:04:00 Transfer Intact Female 1 month
## 9 2014-02-04 17:17:00 Adoption Spayed Female 5 months
## 10 2014-05-03 07:48:00 Adoption Spayed Female 1 year
## Color breed
## 1 Brown/White Other Breed
## 3 Blue/White Pit Bull
## 5 Tan Miniature Poodle
## 6 Black/Tan Cairn Terrier
## 9 Red/White Other Breed
## 10 White Cairn Terrier
由於狗的年齡當中的變數分佈的太廣,下至剛出生,上至19歲,其中也有幾周至幾個月不等,於是我們將其大致分為
1. 幼犬(不滿1歲)(Puppy)
2. 成犬(1-7歲)(AdultDog)
3. 老犬(7歲以上)(OldDog)
以便分析。
## DateTime OutcomeType SexuponOutcome Color
## 1 2014-02-12 18:22:00 Return Neutered Male Brown/White
## 3 2015-01-31 12:28:00 Adoption Neutered Male Blue/White
## 5 2013-11-15 12:52:00 Transfer Neutered Male Tan
## 6 2014-04-25 13:04:00 Transfer Intact Female Black/Tan
## 9 2014-02-04 17:17:00 Adoption Spayed Female Red/White
## 10 2014-05-03 07:48:00 Adoption Spayed Female White
## breed Age
## 1 Other Breed OldDog
## 3 Pit Bull AdultDog
## 5 Miniature Poodle AdultDog
## 6 Cairn Terrier Puppy
## 9 Other Breed Puppy
## 10 Cairn Terrier OldDog
## DateTime OutcomeType SexuponOutcome breed
## 1 2014-02-12 18:22:00 Return Neutered Male Other Breed
## 3 2015-01-31 12:28:00 Adoption Neutered Male Pit Bull
## 5 2013-11-15 12:52:00 Transfer Neutered Male Miniature Poodle
## 6 2014-04-25 13:04:00 Transfer Intact Female Cairn Terrier
## 9 2014-02-04 17:17:00 Adoption Spayed Female Other Breed
## 10 2014-05-03 07:48:00 Adoption Spayed Female Cairn Terrier
## Age ColorFix
## 1 OldDog Double
## 3 AdultDog Double
## 5 AdultDog Light
## 6 Puppy Double
## 9 Puppy Double
## 10 OldDog Light
最後我們得到:
1. 深色(Heavy)
2. 淺色(Light)
3. 其它純色(Others Simple)
4. 雙色(Double)
5. 三色(Tricolor)
6. 有斑點(Brindle)
7. 有色塊(Merle)
8. 有胎記(Tick)
由於日期範圍太大,且較長的時間(年),或較短的時間(周、日…等),
對於收容所中的動物變動,可能看不太出甚麼資訊,
所以我們將其分為12個月,並進行分析。
## DateTime OutcomeType SexuponOutcome breed Age ColorFix
## 1 February Return Neutered Male Other Breed OldDog Double
## 3 January Adoption Neutered Male Pit Bull AdultDog Double
## 5 November Transfer Neutered Male Miniature Poodle AdultDog Light
## 6 April Transfer Intact Female Cairn Terrier Puppy Double
## 9 February Adoption Spayed Female Other Breed Puppy Double
## 10 May Adoption Spayed Female Cairn Terrier OldDog Light
任意選取 10000 筆資料做為 Training Data。
model<-svm(OutcomeType~.,data = SVMtrain, cost = 100,
gamma = 1,probability=TRUE)
## Time difference of 1.923469 mins
## The confusion Matrix of training data.
##
## pred Adoption Euthanasia Return Transfer
## Adoption 3800 91 838 639
## Euthanasia 0 7 0 0
## Return 306 214 1710 261
## Transfer 95 218 191 1630
## The confusion Matrix of testing data.
##
## pred Adoption Euthanasia Return Transfer
## Adoption 1690 95 825 573
## Euthanasia 2 0 0 0
## Return 416 114 436 230
## Transfer 188 105 286 584
## Training Error Rate = 1.360251
## Testing Error Rate = 1.471125
## The number of Adoption in the training data is 4201
## The number of Return to transfer in the training data is 2739
## The number of Euthanasia in the training data is 530
## The number of Transfer in the training data is 2530
任意選取 10000 筆資料做為 Training Data。
並將 Training Data 中的資料,用得balance一點。
也就是說,將 Training Data中的安樂死的資料數,
補到至少跟除了安樂死以外最少的OutcomeType只差100~200筆,但不比它多。
model<-svm(OutcomeType~.,data = SVMtrain, cost = 100,
gamma = 5,probability=TRUE)
## Time difference of 2.777853 mins
## The confusion Matrix of training data.
##
## trainEr Adoption Euthanasia Return Transfer
## Adoption 3871 124 1093 666
## Euthanasia 179 2094 243 179
## Return 76 72 1190 116
## Transfer 75 52 213 1569
## The confusion Matrix of testing data.
##
## testEr Adoption Euthanasia Return Transfer
## Adoption 1786 161 1020 837
## Euthanasia 125 61 163 97
## Return 221 47 194 126
## Transfer 164 45 170 327
## Training Error Rate = 1.730448
## Testing Error Rate = 1.65136
## The number of Adoption in the training data is 4201
## The number of Euthanasia in the training data is 2342
## The number of Return to transfer in the training data is 2739
## The number of Transfer in the training data is 2530
任意選取 10000 筆資料做為 Training Data。
model<-randomForest(OutcomeType~.,data=RFtrain,
ntree=600,
mtry=4,
importance = TRUE)
## Time difference of 6.582578 mins
## The confusion Matrix of training data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 3782 90 836 617
## Euthanasia 4 275 52 16
## Return 284 77 1596 196
## Transfer 131 88 255 1701
## The confusion Matrix of testing data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 1596 89 733 508
## Euthanasia 14 30 51 53
## Return 465 101 463 260
## Transfer 221 94 300 566
## Training Error Rate = 0.7314072
## Testing Error Rate = 1.908389
任意選取 10000 筆資料做為 Training Data。
並將 Training Data 中的資料,用得balance一點。
也就是說,將 Training Data中的安樂死的資料數,
補到至少跟除了安樂死以外最少的OutcomeType只差100~200筆,但不比它多。
model<-randomForest(OutcomeType~.,
data=RFtrain,
mtry=4,
ntree=600,
importance = TRUE)
## Time difference of 6.824367 mins
## The confusion Matrix of training data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 3450 122 667 508
## Euthanasia 198 2117 281 211
## Return 375 54 1710 266
## Transfer 129 25 142 1508
## The confusion Matrix of testing data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 1526 62 598 465
## Euthanasia 141 82 221 161
## Return 504 84 467 275
## Transfer 174 61 200 523
## Training Error Rate = 0.8590911
## Testing Error Rate = 2.329743
## The number of Adoption in the training data is 4152
## The number of Return to transfer in the training data is 2800
## The number of Euthanasia in the training data is 2318
## The number of Transfer in the training data is 2493
complete not deal with data but have delete some x
## DateTime OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return_to_owner Dog Neutered Male
## 3 2015-01-31 12:28:00 Adoption Dog Neutered Male
## 5 2013-11-15 12:52:00 Transfer Dog Neutered Male
## 6 2014-04-25 13:04:00 Transfer Dog Intact Female
## 9 2014-02-04 17:17:00 Adoption Dog Spayed Female
## 10 2014-05-03 07:48:00 Adoption Dog Spayed Female
## AgeuponOutcome Breed Color
## 1 1 year Shetland Sheepdog Mix Brown/White
## 3 2 years Pit Bull Mix Blue/White
## 5 2 years Lhasa Apso/Miniature Poodle Tan
## 6 1 month Cairn Terrier/Chihuahua Shorthair Black/Tan
## 9 5 months American Pit Bull Terrier Mix Red/White
## 10 1 year Cairn Terrier White
設參數跑xgb.cv看best iteration
xgb_params=list(
objective="multi:softprob",
eta= 0.1,
max_depth= 6,
colsample_bytree= 0.7,
subsample = 0.7,
num_class = 5)
## [1] train-mlogloss:1.549853+0.003793 test-mlogloss:1.552224+0.003009
## Multiple eval metrics are present. Will use test_mlogloss for early stopping.
## Will train until test_mlogloss hasn't improved in 10 rounds.
##
## [11] train-mlogloss:1.225312+0.005037 test-mlogloss:1.247638+0.005736
## [21] train-mlogloss:1.092062+0.005594 test-mlogloss:1.131738+0.003287
## [31] train-mlogloss:1.020930+0.004125 test-mlogloss:1.075818+0.004526
## [41] train-mlogloss:0.978023+0.003041 test-mlogloss:1.048133+0.007007
## [51] train-mlogloss:0.948035+0.003411 test-mlogloss:1.033613+0.007910
## [61] train-mlogloss:0.924736+0.002959 test-mlogloss:1.025162+0.008137
## [71] train-mlogloss:0.905240+0.002458 test-mlogloss:1.020481+0.008152
## [81] train-mlogloss:0.887836+0.001817 test-mlogloss:1.017102+0.008694
## [91] train-mlogloss:0.872305+0.001962 test-mlogloss:1.015409+0.009068
## [101] train-mlogloss:0.857283+0.001713 test-mlogloss:1.014283+0.009263
## [111] train-mlogloss:0.844009+0.002195 test-mlogloss:1.013251+0.009401
## [121] train-mlogloss:0.831194+0.002237 test-mlogloss:1.012412+0.009654
## [131] train-mlogloss:0.818606+0.002341 test-mlogloss:1.012150+0.010097
## [141] train-mlogloss:0.806708+0.002933 test-mlogloss:1.012083+0.010252
## Stopping. Best iteration:
## [138] train-mlogloss:0.810012+0.002908 test-mlogloss:1.011778+0.009976
建構模型去run
## [1] train-mlogloss:1.548160 test-mlogloss:1.552089
## [6] train-mlogloss:1.349085 test-mlogloss:1.370634
## [11] train-mlogloss:1.225749 test-mlogloss:1.260025
## [16] train-mlogloss:1.141708 test-mlogloss:1.187124
## [21] train-mlogloss:1.086716 test-mlogloss:1.141999
## [26] train-mlogloss:1.043181 test-mlogloss:1.108875
## [31] train-mlogloss:1.011296 test-mlogloss:1.086143
## [36] train-mlogloss:0.985441 test-mlogloss:1.069158
## [41] train-mlogloss:0.964426 test-mlogloss:1.057213
## [46] train-mlogloss:0.947094 test-mlogloss:1.047990
## [51] train-mlogloss:0.931634 test-mlogloss:1.041483
## [56] train-mlogloss:0.917805 test-mlogloss:1.037082
## [61] train-mlogloss:0.906150 test-mlogloss:1.033031
## [66] train-mlogloss:0.895792 test-mlogloss:1.030709
## [71] train-mlogloss:0.885934 test-mlogloss:1.028769
## [76] train-mlogloss:0.876595 test-mlogloss:1.026630
## [81] train-mlogloss:0.867169 test-mlogloss:1.025943
## [86] train-mlogloss:0.858633 test-mlogloss:1.025280
## [91] train-mlogloss:0.849955 test-mlogloss:1.024257
## [96] train-mlogloss:0.841065 test-mlogloss:1.024020
## [101] train-mlogloss:0.832991 test-mlogloss:1.022957
## [106] train-mlogloss:0.824987 test-mlogloss:1.022457
## [111] train-mlogloss:0.817332 test-mlogloss:1.022702
## [116] train-mlogloss:0.809637 test-mlogloss:1.022025
## [121] train-mlogloss:0.802716 test-mlogloss:1.022224
## [126] train-mlogloss:0.796048 test-mlogloss:1.022252
## [131] train-mlogloss:0.787958 test-mlogloss:1.022125
## [136] train-mlogloss:0.781307 test-mlogloss:1.023031
## [138] train-mlogloss:0.778695 test-mlogloss:1.023148
## Feature Gain Cover Frequency
## 1: DateTime 0.2525362 0.33468214 0.32568625
## 2: SexuponOutcome 0.2331548 0.09992673 0.06740927
## 3: AgeuponOutcome 0.2207229 0.21478670 0.16553950
## 4: Breed 0.1794696 0.23555338 0.24903859
## 5: Color 0.1141164 0.11505105 0.19232639
算train-logloss
## Adoption Died Euthanasia Return_to_owner Transfer
## 1: 0.45878291 0.0017017922 0.053674176 0.36169776 0.12414335
## 2: 0.06730244 0.0028727080 0.061332077 0.29447994 0.57401288
## 3: 0.92318618 0.0008445218 0.003078837 0.02244440 0.05044603
## 4: 0.39830166 0.0022729915 0.026639974 0.39395362 0.17883176
## 5: 0.01725563 0.0022073565 0.157445490 0.19899887 0.62409264
## ---
## 9996: 0.43103918 0.0022689765 0.016445016 0.44577870 0.10446814
## 9997: 0.52740359 0.0035767918 0.037226059 0.30193165 0.12986192
## 9998: 0.35443890 0.0010068018 0.063138068 0.37588447 0.20553173
## 9999: 0.25305191 0.0008430329 0.020748764 0.58629858 0.13905773
## 10000: 0.02249851 0.0072929328 0.106823727 0.03305306 0.83033180
## class
## 1: 0
## 2: 4
## 3: 0
## 4: 0
## 5: 4
## ---
## 9996: 3
## 9997: 0
## 9998: 3
## 9999: 3
## 10000: 4
## The confusion Matrix of training data.
##
## y1 0 1 2 3 4
## 0 3856 0 1 242 69
## 1 8 8 0 6 16
## 2 103 0 183 137 123
## 3 973 0 11 1556 186
## 4 933 0 8 262 1319
## Training Error Rate = 0.7786949
算test-logloss
## Adoption Died Euthanasia Return_to_owner Transfer class
## 1: 0.63511705 0.0005314900 0.015772911 0.1706489 0.17792968 0
## 2: 0.49277225 0.0015316177 0.074896708 0.2807500 0.15004942 0
## 3: 0.33833238 0.0020514179 0.026702372 0.2123156 0.42059824 4
## 4: 0.52782530 0.0006620213 0.005164368 0.2965512 0.16979709 0
## 5: 0.02366604 0.0041919844 0.092562102 0.3129480 0.56663185 4
## ---
## 5591: 0.02657869 0.0031870250 0.063063629 0.1351611 0.77200961 4
## 5592: 0.22057034 0.0013268294 0.106754571 0.5059654 0.16538292 3
## 5593: 0.47880501 0.0015834671 0.039816048 0.3957697 0.08402585 0
## 5594: 0.37256369 0.0176620018 0.017697802 0.5037267 0.08834974 3
## 5595: 0.51050073 0.0009238016 0.033122234 0.3270785 0.12837476 0
## The confusion Matrix of testing data.
##
## y2 0 2 3 4
## 0 1929 6 317 77
## 1 1 0 1 10
## 2 72 21 99 107
## 3 797 20 557 186
## 4 571 14 237 573
## Testing Error Rate = 1.023148
用前面處理完的資料跑xgboost
## DateTime OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return_to_owner Dog Neutered Male
## 3 2015-01-31 12:28:00 Adoption Dog Neutered Male
## 5 2013-11-15 12:52:00 Transfer Dog Neutered Male
## 6 2014-04-25 13:04:00 Transfer Dog Intact Female
## 9 2014-02-04 17:17:00 Adoption Dog Spayed Female
## 10 2014-05-03 07:48:00 Adoption Dog Spayed Female
## AgeuponOutcome Breed Color
## 1 1 year Shetland Sheepdog Mix Brown/White
## 3 2 years Pit Bull Mix Blue/White
## 5 2 years Lhasa Apso/Miniature Poodle Tan
## 6 1 month Cairn Terrier/Chihuahua Shorthair Black/Tan
## 9 5 months American Pit Bull Terrier Mix Red/White
## 10 1 year Cairn Terrier White
這裡設參數並且跑xgb.cv看best iteration
xgb_params=list(
objective="multi:softprob",
eta= 0.1,
max_depth= 6,
colsample_bytree= 0.7,
subsample = 0.7,
num_class = 4)
## [1] train-mlogloss:1.353026+0.001894 test-mlogloss:1.354286+0.002167
## Multiple eval metrics are present. Will use test_mlogloss for early stopping.
## Will train until test_mlogloss hasn't improved in 10 rounds.
##
## [11] train-mlogloss:1.153811+0.008038 test-mlogloss:1.164682+0.011225
## [21] train-mlogloss:1.073628+0.001869 test-mlogloss:1.092965+0.005694
## [31] train-mlogloss:1.034891+0.000647 test-mlogloss:1.062665+0.005294
## [41] train-mlogloss:1.012826+0.001450 test-mlogloss:1.048729+0.004878
## [51] train-mlogloss:0.998796+0.001706 test-mlogloss:1.042443+0.005295
## [61] train-mlogloss:0.988680+0.001155 test-mlogloss:1.039640+0.005719
## [71] train-mlogloss:0.980706+0.001269 test-mlogloss:1.038768+0.006748
## [81] train-mlogloss:0.974023+0.000990 test-mlogloss:1.038960+0.006907
## Stopping. Best iteration:
## [71] train-mlogloss:0.980706+0.001269 test-mlogloss:1.038768+0.006748
建構模型去run
## [1] train-mlogloss:1.361548 test-mlogloss:1.363891
## [6] train-mlogloss:1.240742 test-mlogloss:1.249884
## [11] train-mlogloss:1.157302 test-mlogloss:1.172423
## [16] train-mlogloss:1.104450 test-mlogloss:1.123224
## [21] train-mlogloss:1.067908 test-mlogloss:1.090902
## [26] train-mlogloss:1.043646 test-mlogloss:1.070384
## [31] train-mlogloss:1.026773 test-mlogloss:1.058267
## [36] train-mlogloss:1.014848 test-mlogloss:1.050070
## [41] train-mlogloss:1.006555 test-mlogloss:1.046505
## [46] train-mlogloss:0.999439 test-mlogloss:1.042665
## [51] train-mlogloss:0.993587 test-mlogloss:1.040677
## [56] train-mlogloss:0.987523 test-mlogloss:1.038515
## [61] train-mlogloss:0.983255 test-mlogloss:1.037505
## [66] train-mlogloss:0.978379 test-mlogloss:1.037405
## [71] train-mlogloss:0.974099 test-mlogloss:1.037753
## Feature Gain Cover Frequency
## 1: SexuponOutcome 0.4402829 0.2199861 0.13977939
## 2: Age 0.2045508 0.1281322 0.09694933
## 3: breed 0.1817956 0.3037280 0.30248190
## 4: DateTime 0.1047721 0.1924553 0.26232334
## 5: ColorFix 0.0685986 0.1556984 0.19846605
算train-logloss
## Adoption Euthanasia Return_to_owner Transfer class
## 1: 0.51531571 0.02636277 0.26090056 0.1974209 0
## 2: 0.07967110 0.02886126 0.03955148 0.8519161 3
## 3: 0.37937066 0.04655133 0.41217107 0.1619070 2
## 4: 0.56485575 0.02584560 0.26619008 0.1431085 0
## 5: 0.03970097 0.29154360 0.27125841 0.3974970 3
## ---
## 9996: 0.38061544 0.08665598 0.33801147 0.1947171 0
## 9997: 0.33354595 0.05495799 0.42315775 0.1883383 2
## 9998: 0.05908908 0.12486818 0.19097961 0.6250631 3
## 9999: 0.51241267 0.02518433 0.24549526 0.2169078 0
## 10000: 0.37589514 0.08972079 0.33793738 0.1964467 0
## The confusion Matrix of training data.
##
## y1 0 1 2 3
## 0 3634 1 488 78
## 1 139 36 184 171
## 2 1532 10 963 234
## 3 1068 9 438 1015
## Training Error Rate = 0.9740993
算test-logloss
## Adoption Euthanasia Return_to_owner Transfer class
## 1: 0.39595342 0.033554636 0.38870567 0.1817863 0
## 2: 0.80586094 0.008442121 0.08419906 0.1014979 0
## 3: 0.31180611 0.090440273 0.37623608 0.2215175 2
## 4: 0.40552264 0.030381208 0.38745058 0.1766456 0
## 5: 0.04399491 0.116008461 0.11627511 0.7237215 3
## ---
## 5540: 0.38396648 0.031884395 0.36405474 0.2200943 0
## 5541: 0.79534328 0.011070084 0.07400112 0.1195855 0
## 5542: 0.76695311 0.008629197 0.07759160 0.1468261 0
## 5543: 0.36331820 0.031529091 0.41140535 0.1937474 2
## 5544: 0.44894662 0.021116264 0.36779380 0.1621433 0
## The confusion Matrix of testing data.
##
## y2 0 1 2 3
## 0 1912 1 347 36
## 1 96 6 129 83
## 2 884 7 473 183
## 3 630 9 216 532
## Testing Error Rate = 1.037753
用前面處理完的資料(不包含AgeuponOutcomeType跟DateTime)跑xgb (這裡另外處理AgeuponOutcomeType跟DateTime)
## DateTime OutcomeType AnimalType SexuponOutcome AgeuponOutcome
## 1 02 Return_to_owner Dog Neutered Male 10000
## 3 01 Adoption Dog Neutered Male 20000
## 5 11 Transfer Dog Neutered Male 20000
## 6 04 Transfer Dog Intact Female 100
## 9 02 Adoption Dog Spayed Female 500
## 10 05 Adoption Dog Spayed Female 10000
## breed ColorFix
## 1 Other Breed Double
## 3 Pit Bull Double
## 5 Miniature Poodle Light
## 6 Cairn Terrier Double
## 9 Other Breed Double
## 10 Cairn Terrier Light
設參數跑xgb.cv並且看best iteration
xgb_params=list(
objective="multi:softprob",
eta= 0.1,
max_depth= 6,
colsample_bytree= 0.7,
subsample = 0.7,
num_class = 4)
## [1] train-mlogloss:1.353026+0.001894 test-mlogloss:1.354286+0.002167
## Multiple eval metrics are present. Will use test_mlogloss for early stopping.
## Will train until test_mlogloss hasn't improved in 10 rounds.
##
## [11] train-mlogloss:1.153811+0.008038 test-mlogloss:1.164682+0.011225
## [21] train-mlogloss:1.073628+0.001869 test-mlogloss:1.092965+0.005694
## [31] train-mlogloss:1.034891+0.000647 test-mlogloss:1.062665+0.005294
## [41] train-mlogloss:1.012826+0.001450 test-mlogloss:1.048729+0.004878
## [51] train-mlogloss:0.998796+0.001706 test-mlogloss:1.042443+0.005295
## [61] train-mlogloss:0.988680+0.001155 test-mlogloss:1.039640+0.005719
## [71] train-mlogloss:0.980706+0.001269 test-mlogloss:1.038768+0.006748
## [81] train-mlogloss:0.974023+0.000990 test-mlogloss:1.038960+0.006907
## Stopping. Best iteration:
## [71] train-mlogloss:0.980706+0.001269 test-mlogloss:1.038768+0.006748
建構模型四run
## [1] train-mlogloss:1.361548 test-mlogloss:1.363891
## [6] train-mlogloss:1.240742 test-mlogloss:1.249884
## [11] train-mlogloss:1.157302 test-mlogloss:1.172423
## [16] train-mlogloss:1.104450 test-mlogloss:1.123224
## [21] train-mlogloss:1.067908 test-mlogloss:1.090902
## [26] train-mlogloss:1.043646 test-mlogloss:1.070384
## [31] train-mlogloss:1.026773 test-mlogloss:1.058267
## [36] train-mlogloss:1.014848 test-mlogloss:1.050070
## [41] train-mlogloss:1.006555 test-mlogloss:1.046505
## [46] train-mlogloss:0.999439 test-mlogloss:1.042665
## [51] train-mlogloss:0.993587 test-mlogloss:1.040677
## [56] train-mlogloss:0.987523 test-mlogloss:1.038515
## [61] train-mlogloss:0.983255 test-mlogloss:1.037505
## [66] train-mlogloss:0.978379 test-mlogloss:1.037405
## [71] train-mlogloss:0.974099 test-mlogloss:1.037753
## Feature Gain Cover Frequency
## 1: SexuponOutcome 0.4402829 0.2199861 0.13977939
## 2: Age 0.2045508 0.1281322 0.09694933
## 3: breed 0.1817956 0.3037280 0.30248190
## 4: DateTime 0.1047721 0.1924553 0.26232334
## 5: ColorFix 0.0685986 0.1556984 0.19846605
算train-logloss
## Adoption Euthanasia Return_to_owner Transfer class
## 1: 0.51531571 0.02636277 0.26090056 0.1974209 0
## 2: 0.07967110 0.02886126 0.03955148 0.8519161 3
## 3: 0.37937066 0.04655133 0.41217107 0.1619070 2
## 4: 0.56485575 0.02584560 0.26619008 0.1431085 0
## 5: 0.03970097 0.29154360 0.27125841 0.3974970 3
## ---
## 9996: 0.38061544 0.08665598 0.33801147 0.1947171 0
## 9997: 0.33354595 0.05495799 0.42315775 0.1883383 2
## 9998: 0.05908908 0.12486818 0.19097961 0.6250631 3
## 9999: 0.51241267 0.02518433 0.24549526 0.2169078 0
## 10000: 0.37589514 0.08972079 0.33793738 0.1964467 0
## The confusion Matrix of training data.
##
## y1 0 1 2 3
## 0 3634 1 488 78
## 1 139 36 184 171
## 2 1532 10 963 234
## 3 1068 9 438 1015
## Training Error Rate = 0.9740993
算test-logloss
## Adoption Euthanasia Return_to_owner Transfer class
## 1: 0.39595342 0.033554636 0.38870567 0.1817863 0
## 2: 0.80586094 0.008442121 0.08419906 0.1014979 0
## 3: 0.31180611 0.090440273 0.37623608 0.2215175 2
## 4: 0.40552264 0.030381208 0.38745058 0.1766456 0
## 5: 0.04399491 0.116008461 0.11627511 0.7237215 3
## ---
## 5540: 0.38396648 0.031884395 0.36405474 0.2200943 0
## 5541: 0.79534328 0.011070084 0.07400112 0.1195855 0
## 5542: 0.76695311 0.008629197 0.07759160 0.1468261 0
## 5543: 0.36331820 0.031529091 0.41140535 0.1937474 2
## 5544: 0.44894662 0.021116264 0.36779380 0.1621433 0
## The confusion Matrix of testing data.
##
## y2 0 1 2 3
## 0 1912 1 347 36
## 1 96 6 129 83
## 2 884 7 473 183
## 3 630 9 216 532
## Testing Error Rate = 1.037753
https://www.kaggle.com/apapiu/visualizing-breeds-and-ages-by-outcome/code/notebook
https://www.kaggle.com/mrisdal/quick-dirty-randomforest/code
https://www.kaggle.com/fsmithus/reduced-model
https://cran.r-project.org/web/packages/randomForest/randomForest.pdf
https://cran.r-project.org/web/packages/xgboost/xgboost.pdf
https://stackoverflow.com/questions/24197809/functionality-of-probability-true-in-svm-function-of-e1071-package-in-r
https://cran.r-project.org/web/packages/e1071/vignettes/svmdoc.pdf 林子軒學長!!!!!!! by 潘星丞 https://stackoverflow.com/questions/16961921/plot-data-in-descending-order-as-appears-in-data-frame