Renew: 06/02 line 37~47 => Use to see the number of each outcometype line 50~58 => Not yet. line 61~168 => Breed line 174~202 => Age line 207~290
06/04 line 205~265 => Color
line 465~617 => Random Forest
共26729筆資料,9個解釋變數,包含狗以及貓。
library(dplyr)
library(ggplot2)
library(e1071)
library(randomForest)
library(caret)
library(gridExtra)
trainInit<-read.csv("train.csv")
head(trainInit)
## AnimalID Name DateTime OutcomeType OutcomeSubtype
## 1 A671945 Hambone 2014-02-12 18:22:00 Return_to_owner
## 2 A656520 Emily 2013-10-13 12:44:00 Euthanasia Suffering
## 3 A686464 Pearce 2015-01-31 12:28:00 Adoption Foster
## 4 A683430 2014-07-11 19:09:00 Transfer Partner
## 5 A667013 2013-11-15 12:52:00 Transfer Partner
## 6 A677334 Elsa 2014-04-25 13:04:00 Transfer Partner
## AnimalType SexuponOutcome AgeuponOutcome
## 1 Dog Neutered Male 1 year
## 2 Cat Spayed Female 1 year
## 3 Dog Neutered Male 2 years
## 4 Cat Intact Male 3 weeks
## 5 Dog Neutered Male 2 years
## 6 Dog Intact Female 1 month
## Breed Color
## 1 Shetland Sheepdog Mix Brown/White
## 2 Domestic Shorthair Mix Cream Tabby
## 3 Pit Bull Mix Blue/White
## 4 Domestic Shorthair Mix Blue Cream
## 5 Lhasa Apso/Miniature Poodle Tan
## 6 Cairn Terrier/Chihuahua Shorthair Black/Tan
刪除 Name 以及 OutcomeSubtype
train<-trainInit[,-c(1,2,5)]
head(train)
## DateTime OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return_to_owner Dog Neutered Male
## 2 2013-10-13 12:44:00 Euthanasia Cat Spayed Female
## 3 2015-01-31 12:28:00 Adoption Dog Neutered Male
## 4 2014-07-11 19:09:00 Transfer Cat Intact Male
## 5 2013-11-15 12:52:00 Transfer Dog Neutered Male
## 6 2014-04-25 13:04:00 Transfer Dog Intact Female
## AgeuponOutcome Breed Color
## 1 1 year Shetland Sheepdog Mix Brown/White
## 2 1 year Domestic Shorthair Mix Cream Tabby
## 3 2 years Pit Bull Mix Blue/White
## 4 3 weeks Domestic Shorthair Mix Blue Cream
## 5 2 years Lhasa Apso/Miniature Poodle Tan
## 6 1 month Cairn Terrier/Chihuahua Shorthair Black/Tan
attach(train)
將狗和貓的data分散開來。
Dogtrain<-train[which(AnimalType=="Dog"),]
Cattrain<-train[-which(AnimalType=="Dog"),]
head(Dogtrain)
## DateTime OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return_to_owner Dog Neutered Male
## 3 2015-01-31 12:28:00 Adoption Dog Neutered Male
## 5 2013-11-15 12:52:00 Transfer Dog Neutered Male
## 6 2014-04-25 13:04:00 Transfer Dog Intact Female
## 9 2014-02-04 17:17:00 Adoption Dog Spayed Female
## 10 2014-05-03 07:48:00 Adoption Dog Spayed Female
## AgeuponOutcome Breed Color
## 1 1 year Shetland Sheepdog Mix Brown/White
## 3 2 years Pit Bull Mix Blue/White
## 5 2 years Lhasa Apso/Miniature Poodle Tan
## 6 1 month Cairn Terrier/Chihuahua Shorthair Black/Tan
## 9 5 months American Pit Bull Terrier Mix Red/White
## 10 1 year Cairn Terrier White
attach(Dogtrain)
n = 15595 p=8
先看各個 OucomeType 分別有幾筆資料。
## Return to owner = 4286
## Transfer = 3917
## Adoption = 6497
## Died = 50
## Euthanasia = 845
因為“死亡”的個數太少,所以我們決定把他拿掉。 而且,我們將本來的OutcomeType中的,Return_to_owner 改為 Return。 因為後面在做XGBoosting的時候,會出現錯誤,說Return_to_owner超過64位元…….
## DateTime OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return Dog Neutered Male
## 3 2015-01-31 12:28:00 Adoption Dog Neutered Male
## 5 2013-11-15 12:52:00 Transfer Dog Neutered Male
## 6 2014-04-25 13:04:00 Transfer Dog Intact Female
## 9 2014-02-04 17:17:00 Adoption Dog Spayed Female
## 10 2014-05-03 07:48:00 Adoption Dog Spayed Female
## AgeuponOutcome Breed Color
## 1 1 year Shetland Sheepdog Mix Brown/White
## 3 2 years Pit Bull Mix Blue/White
## 5 2 years Lhasa Apso/Miniature Poodle Tan
## 6 1 month Cairn Terrier/Chihuahua Shorthair Black/Tan
## 9 5 months American Pit Bull Terrier Mix Red/White
## 10 1 year Cairn Terrier White
## DateTime OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return Dog Neutered Male
## 3 2015-01-31 12:28:00 Adoption Dog Neutered Male
## 5 2013-11-15 12:52:00 Transfer Dog Neutered Male
## 6 2014-04-25 13:04:00 Transfer Dog Intact Female
## 9 2014-02-04 17:17:00 Adoption Dog Spayed Female
## 10 2014-05-03 07:48:00 Adoption Dog Spayed Female
## AgeuponOutcome Color breed
## 1 1 year Brown/White Other Breed
## 3 2 years Blue/White Pit Bull
## 5 2 years Tan Miniature Poodle
## 6 1 month Black/Tan Cairn Terrier
## 9 5 months Red/White Other Breed
## 10 1 year White Cairn Terrier
Change the AgeuponOutcome as “Puppy”, “AdultDog”, “OldDog” three kinds of types.
## DateTime OutcomeType AnimalType SexuponOutcome Color
## 1 2014-02-12 18:22:00 Return Dog Neutered Male Brown/White
## 3 2015-01-31 12:28:00 Adoption Dog Neutered Male Blue/White
## 5 2013-11-15 12:52:00 Transfer Dog Neutered Male Tan
## 6 2014-04-25 13:04:00 Transfer Dog Intact Female Black/Tan
## 9 2014-02-04 17:17:00 Adoption Dog Spayed Female Red/White
## 10 2014-05-03 07:48:00 Adoption Dog Spayed Female White
## breed Age
## 1 Other Breed OldDog
## 3 Pit Bull AdultDog
## 5 Miniature Poodle AdultDog
## 6 Cairn Terrier Puppy
## 9 Other Breed Puppy
## 10 Cairn Terrier OldDog
Replace each color to “Simple”, “Double”, “Tricolor”, “Brindle”, “Tick”, “Merle”, Six categories. And the “Brindle” contains the color that it has Brindle and Tick, or Brindle and Merle at the same time. But, there are still some of colors that it not belong to Brindle, like “Blue Tiger”, “Blue cream”, “Smoke” etc.. I will just consider their color, and classify them to “Simple”, “Double” or “Tricolor”.
## DateTime OutcomeType AnimalType SexuponOutcome
## 1 2014-02-12 18:22:00 Return Dog Neutered Male
## 3 2015-01-31 12:28:00 Adoption Dog Neutered Male
## 5 2013-11-15 12:52:00 Transfer Dog Neutered Male
## 6 2014-04-25 13:04:00 Transfer Dog Intact Female
## 9 2014-02-04 17:17:00 Adoption Dog Spayed Female
## 10 2014-05-03 07:48:00 Adoption Dog Spayed Female
## breed Age ColorFix
## 1 Other Breed OldDog Double
## 3 Pit Bull AdultDog Double
## 5 Miniature Poodle AdultDog Light
## 6 Cairn Terrier Puppy Double
## 9 Other Breed Puppy Double
## 10 Cairn Terrier OldDog Light
## DateTime OutcomeType AnimalType SexuponOutcome breed
## 1 February Return Dog Neutered Male Other Breed
## 3 January Adoption Dog Neutered Male Pit Bull
## 5 November Transfer Dog Neutered Male Miniature Poodle
## 6 April Transfer Dog Intact Female Cairn Terrier
## 9 February Adoption Dog Spayed Female Other Breed
## 10 May Adoption Dog Spayed Female Cairn Terrier
## Age ColorFix
## 1 OldDog Double
## 3 AdultDog Double
## 5 AdultDog Light
## 6 Puppy Double
## 9 Puppy Double
## 10 OldDog Light
Try to use SVM to fit model. case 1 : 任意選取 10000 筆資料做為 training data。
## Time difference of 27.75559 secs
## The confusion Matrix of training data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 4094 247 2242 1436
## Euthanasia 0 0 0 0
## Return 19 145 266 242
## Transfer 62 137 267 843
## The confusion Matrix of testing data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 2280 146 1288 772
## Euthanasia 0 0 0 0
## Return 10 81 129 138
## Transfer 32 88 94 486
## Training Error Rate = 0.4797
## Testing Error Rate = 0.4778139
## The number of Adoption in the training data is 4175
## The number of Return to transfer in the training data is 2775
## The number of Euthanasia in the training data is 529
## The number of Transfer in the training data is 2521
case 2 : 任意選取 10000 筆資料做為 training data。 並將 training data 中的資料,用得balance一點。因為主要都會是安樂死data較少….. 也就是說,將 training data 中的安樂死的資料數,補到至少跟除了安樂死以外最少的outcomeType只差一百~兩百筆,但不比他多。
p.s.0606 重複跑10次 Mean training error 0.59784
Mean testing error 0.5060188
## Time difference of 37.80116 secs
## The confusion Matrix of training data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 3821 671 2008 1230
## Euthanasia 38 979 379 462
## Return 318 442 306 162
## Transfer 44 192 97 594
## The confusion Matrix of testing data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 2091 86 1054 759
## Euthanasia 22 141 232 307
## Return 145 45 167 79
## Transfer 18 31 43 324
## Training Error Rate = 0.6043
## Testing Error Rate = 0.5088384
## The number of Adoption in the training data is 4221
## The number of Euthanasia in the training data is 2284
## The number of Return to transfer in the training data is 2790
## The number of Transfer in the training data is 2448
case 1 : 任意選取 10000 筆資料做為 training data。
## Time difference of 1.327143 mins
## The confusion Matrix of training data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 3903 173 1363 1101
## Euthanasia 0 103 5 2
## Return 277 127 1177 283
## Transfer 61 122 162 1141
## The confusion Matrix of testing data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 1958 99 1035 647
## Euthanasia 0 7 16 10
## Return 257 102 361 213
## Transfer 41 111 167 520
## Training Error Rate = 0.3676
## Testing Error Rate = 0.4866522
case 2 : 任意選取 10000 筆資料做為 training data。 並將 training data 中的資料,用得balance一點。因為主要都會是安樂死data較少….. 也就是說,將 training data 中的安樂死的資料數,補到至少跟除了安樂死以外最少的outcomeType只差一百~兩百筆,但不比他多。
## Time difference of 1.373995 mins
## The confusion Matrix of training data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 3775 288 1323 1156
## Euthanasia 169 1971 368 280
## Return 159 52 944 176
## Transfer 41 68 123 959
## The confusion Matrix of testing data.
## Reference
## Prediction Adoption Euthanasia Return Transfer
## Adoption 1973 91 937 596
## Euthanasia 122 131 240 194
## Return 226 39 243 115
## Transfer 32 56 108 441
## Training Error Rate = 0.4203
## Testing Error Rate = 0.497114
## The number of Adoption in the training data is 4144
## The number of Return to transfer in the training data is 2758
## The number of Euthanasia in the training data is 2379
## The number of Transfer in the training data is 2571