Penguin Data Set
This dataset contains size measurements for three differen kind penguin species observed on three islands in the Palmer Archipelago, Antarctica.These data were collected from 2007 - 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network. Penguins dataset contains 8 variables (n = 344 penguins).
Overview
In this assignment I will build a Logistic regression model with binary outcome and a Multinomial logistic regression using palmer dataset and “Species” as the outcome variable. I will be conducting several data preparations to build the models.
Logistic Regression with Binary Outcome
- The penguin dataset has ‘species’ column. Please check how many categories you have in the species column. Conduct whatever data manipulation you need to do to be able to build a logistic regression with binary outcome. Please explain your reasoning behind your decision as you manipulate the outcome/dependent variable (species).
- Please make sure you are evaluating the independent variables appropriately in deciding which ones should be in the model.
- Provide variable interpretations in your model.
Data Preparation
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, A...
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torge...
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34....
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18....
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, ...
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 347...
## $ sex <fct> male, female, female, NA, female, male, female, m...
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2...
## species island bill_length_mm bill_depth_mm
## Adelie :152 Biscoe :168 Min. :32.10 Min. :13.10
## Chinstrap: 68 Dream :124 1st Qu.:39.23 1st Qu.:15.60
## Gentoo :124 Torgersen: 52 Median :44.45 Median :17.30
## Mean :43.92 Mean :17.15
## 3rd Qu.:48.50 3rd Qu.:18.70
## Max. :59.60 Max. :21.50
## NA's :2 NA's :2
## flipper_length_mm body_mass_g sex year
## Min. :172.0 Min. :2700 female:165 Min. :2007
## 1st Qu.:190.0 1st Qu.:3550 male :168 1st Qu.:2007
## Median :197.0 Median :4050 NA's : 11 Median :2008
## Mean :200.9 Mean :4202 Mean :2008
## 3rd Qu.:213.0 3rd Qu.:4750 3rd Qu.:2009
## Max. :231.0 Max. :6300 Max. :2009
## NA's :2 NA's :2
As mentioned in part a, we will check what are the categories in the species. From the summary we already know that there are three categories but we will focus only on the species column with the below syntax. This will help us to understand how many in each category.
## Adelie Chinstrap Gentoo
## 152 68 124
species and its relationship with numerical variables.
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 6
## species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Adelie 38.8 18.3 190. 3701. 2008.
## 2 Chinstrap 48.8 18.4 196. 3733. 2008.
## 3 Gentoo 47.5 15.0 217. 5076. 2008.
penguins %>%
count(species) %>%
ggplot() + geom_col(aes(x = species, y = n, fill = species)) +
geom_label(aes(x = species, y = n, label = n)) +
scale_fill_manual(values = c("darkorange","purple","cyan4")) +
theme_minimal() +
labs(title = 'Penguins Species & Count') As you can see from the above visualization and the summaries, We see from the three categories Adelie has the highest count followed by Gentoo and Chinstrap.
Relationship with categorical features
In this section, my purpose is to analyze the relationship between species with other categorical features such as sex and the island.
plot_island<-ggplot(penguins, aes(x = island, y = species, color = species)) +
geom_jitter(size = 3)
plot_sex<-ggplot(penguins, aes(x = sex, y = species, color = species)) +
geom_jitter(size = 3)
plot_grid(plot_island, plot_sex, labels = "AUTO") From the above island graph we can see that the Adelie penguins can be found in all three islands while Chintstrap in only Dream island and Gentoo only in Biscoe island.
Relationship with numerical features
Penguin dataset also has four continuous variables.We can visualize the relationship between those variables from the below plots.
# Scatterplot 1: penguin flipper length versus body mass
plot1<-ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(aes(color = species,
shape = species),
size = 2) +
scale_color_manual(values = c("darkorange","darkorchid","cyan"))
# Scatterplot 2: penguin bill length versus bill depth
plot2<-ggplot(data = penguins, aes(x = bill_length_mm, y = bill_depth_mm)) +
geom_point(aes(color = species,
shape = species),
size = 2) +
scale_color_manual(values = c("darkorange","darkorchid","cyan"))
plot3<-ggplot(penguins, aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point(aes(color = sex)) +
scale_color_manual(values = c("darkorange","cyan4"),
na.translate = FALSE) +
facet_wrap(~species)
plot_grid(plot1, plot2, plot3, labels = "AUTO")## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
ggpairs(penguins, columns = 3:6, title = "Correlogram of Variables",
ggplot2::aes(color = island),
progress = FALSE,
lower = list(continuous = wrap("smooth", alpha = 0.3, size = 0.1))) With above observations, we need to determine how are we going to proceed with three categories of outcome variable species to build a logistic regression with binary outcome. Being that Adelie is the only species that is in all three island, we can get a binary outcome of species as if whether the species is Adelie or not. To go more further with the data processing, I will remove the missing data and set the outcome to binary with Adelie or NOn Adelie, before we apply to the logistic regression model with all variables. Dealing with missing data is important as it can be problematic in predictive modeling.
Model1
library(dplyr)
set.seed(1234567)
# create test and train dataset
penguins_partition <- createDataPartition(penguins$target_adelie, p=0.8, list=FALSE)
train <- penguins[penguins_partition,]## Warning: The `i` argument of ``[`()` can't be a matrix as of tibble 3.0.0.
## Convert to a vector.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: glm.fit: algorithm did not converge
##
## Call:
## glm(formula = target_adelie ~ ., family = binomial("logit"),
## data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.409e-06 -2.409e-06 2.409e-06 2.409e-06 2.409e-06
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.657e+01 5.803e+07 0 1
## speciesChinstrap 5.313e+01 1.260e+05 0 1
## speciesGentoo 5.313e+01 2.114e+05 0 1
## islandDream 7.230e-12 8.171e+04 0 1
## islandTorgersen 1.606e-11 8.710e+04 0 1
## bill_length_mm 3.395e-13 9.858e+03 0 1
## bill_depth_mm 6.329e-12 2.812e+04 0 1
## flipper_length_mm -3.869e-12 4.590e+03 0 1
## body_mass_g -3.576e-14 7.675e+01 0 1
## sexmale 4.818e-11 7.377e+04 0 1
## year 3.839e-11 2.900e+04 0 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3.6605e+02 on 266 degrees of freedom
## Residual deviance: 1.5490e-09 on 256 degrees of freedom
## AIC: 22
##
## Number of Fisher Scoring iterations: 25
Model2
We can optimize the AIC and derive a better model.
## Start: AIC=22
## target_adelie ~ species + island + bill_length_mm + bill_depth_mm +
## flipper_length_mm + body_mass_g + sex + year
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Df Deviance AIC
## - island 2 1.5490e-09 18
## - species 2 2.8409e-08 18
## - bill_length_mm 1 1.5490e-09 20
## - bill_depth_mm 1 1.5490e-09 20
## - flipper_length_mm 1 1.5490e-09 20
## - body_mass_g 1 1.5490e-09 20
## - sex 1 1.5490e-09 20
## - year 1 1.5490e-09 20
## <none> 1.5490e-09 22
## Warning: glm.fit: algorithm did not converge
##
## Step: AIC=18
## target_adelie ~ species + bill_length_mm + bill_depth_mm + flipper_length_mm +
## body_mass_g + sex + year
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: algorithm did not converge
## Df Deviance AIC
## - species 2 1.4122e-07 14
## - bill_length_mm 1 1.5490e-09 16
## - bill_depth_mm 1 1.5490e-09 16
## - flipper_length_mm 1 1.5490e-09 16
## - body_mass_g 1 1.5490e-09 16
## - sex 1 1.5490e-09 16
## - year 1 1.5490e-09 16
## <none> 1.5490e-09 18
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Step: AIC=14
## target_adelie ~ bill_length_mm + bill_depth_mm + flipper_length_mm +
## body_mass_g + sex + year
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Df Deviance AIC
## - year 1 0.000 12.000
## - flipper_length_mm 1 0.000 12.000
## - body_mass_g 1 0.000 12.000
## <none> 0.000 14.000
## - bill_depth_mm 1 6.423 18.423
## - sex 1 9.173 21.173
## - bill_length_mm 1 183.194 195.194
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Step: AIC=12
## target_adelie ~ bill_length_mm + bill_depth_mm + flipper_length_mm +
## body_mass_g + sex
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Df Deviance AIC
## - flipper_length_mm 1 0.000 10.000
## - body_mass_g 1 0.000 10.000
## <none> 0.000 12.000
## - bill_depth_mm 1 7.221 17.221
## - sex 1 9.175 19.175
## - bill_length_mm 1 190.509 200.509
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Step: AIC=10
## target_adelie ~ bill_length_mm + bill_depth_mm + body_mass_g +
## sex
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Df Deviance AIC
## - body_mass_g 1 0.00 8.00
## <none> 0.00 10.00
## - sex 1 9.20 17.20
## - bill_depth_mm 1 11.50 19.50
## - bill_length_mm 1 239.91 247.91
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Step: AIC=8
## target_adelie ~ bill_length_mm + bill_depth_mm + sex
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Df Deviance AIC
## <none> 0.000 8.000
## - sex 1 17.389 23.389
## - bill_depth_mm 1 26.376 32.376
## - bill_length_mm 1 267.598 273.598
##
## Call:
## glm(formula = target_adelie ~ bill_length_mm + bill_depth_mm +
## sex, family = binomial("logit"), data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.558e-04 -2.000e-08 2.000e-08 2.000e-08 4.648e-04
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1260.95 126865.82 -0.010 0.992
## bill_length_mm 96.41 7885.01 0.012 0.990
## bill_depth_mm -160.60 13178.47 -0.012 0.990
## sexmale -143.54 126874.52 -0.001 0.999
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3.6605e+02 on 266 degrees of freedom
## Residual deviance: 4.3094e-07 on 263 degrees of freedom
## AIC: 8
##
## Number of Fisher Scoring iterations: 25
Model 2 with optimized AIC has a AIC value of 8.
## [1] "VIF scores of predictors"
## bill_length_mm bill_depth_mm sex
## 13.090152 13.050076 1.011118
There is a high correlation in bill_length_mm and bill_depth_mm as their values are over 5. And a moderate correlation for variable sex.
model3_LR <- glm(target_adelie ~ bill_length_mm + bill_depth_mm, family=binomial(link='logit'), data=test, maxit = 100)## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
##
## Call:
## glm(formula = target_adelie ~ bill_length_mm + bill_depth_mm,
## family = binomial(link = "logit"), data = test, maxit = 100)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.251e-05 -2.110e-08 2.110e-08 2.110e-08 1.585e-05
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -188.8 1149506.0 0.000 1
## bill_length_mm 27.1 43437.1 0.001 1
## bill_depth_mm -54.2 97217.0 -0.001 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9.0523e+01 on 65 degrees of freedom
## Residual deviance: 5.6205e-10 on 63 degrees of freedom
## AIC: 6
##
## Number of Fisher Scoring iterations: 28
When comparing AIC values of all the models, Model 3 has a lower value of AIC and could be a good fit.
Variable Interpretation
Continuous variables
- Each one unit change in bill_length_mm will increase the log odds of being an Adelie by 2.377*10^-11
- Each one unit change in bill_depth_mm will increase the log odds of being an Adelie by 2.589*10^-11
- Each one unit change in flipper_length_mm will decrease the log odds of being an Adelie by 2.821*10^-12
- Each one unit change in body_mass_g will increase the log odds of being an Adelie by 1.452*10^-13
- Each one unit change in year will increase the log odds of being an Adelie by 6.835*10^-11
Categorical variables
- If the species is Chinstrap or Gentoo, the log odds of being an Adelie increased by 5.313*10^01
- If the island is Dream, the log odds of being an Adelie increased by 3.192*10^-11
- If the island is Torgersen, the log odds of being an Adelie increased by 1.043*10^-11
- If the sex is male, the log odds of being an Adelie is decreased by 1.773*10^-10
Model evaluation: AUC, Accuracy, TPR, FPR, TNR, FNR
We will check the predictions and confusion matrix for the Model 1
predictTrain = predict(model1_LR, type = "response")
train$target = round(predictTrain, 1)
mytable <- table(train$target_adelie, predictTrain > 0.5)
mytable##
## FALSE TRUE
## Adelie 117 0
## NotAdelie 0 150
## [1] 1
100 accuracy!!! Based on the Confusion Matrix, our results are as follows:
True Positive Result (TPR): 150 / 150 = 100% False Positive Result (FPR): 0 / 150 = 0% True Negative Result (TNR): 117 / 117 = 100% False Negative Result (FNR): 0 / 117 = 0%
predictTest = predict(model1_LR, type = "response", newdata = test)
test$target = round(predictTest, 1)
confusion_matrix <- table(test$target_adelie, predictTest > 0.5)
confusion_matrix##
## FALSE TRUE
## Adelie 29 0
## NotAdelie 0 37
tp = true positive tn = true negative fp = false positive fn = false negative
# TPR, TNR, FPR, and FNR
tp <- confusion_matrix[4]
tn <- confusion_matrix[1]
fp <- confusion_matrix[2]
fn <- confusion_matrix[3]
p <- sum(confusion_matrix[3], confusion_matrix[4])
n <- sum(confusion_matrix[1], confusion_matrix[2])
tpr <- tp / p
tnr <- tn / n
fpr <- fp / n
fnr <- fn / p
title_row <- c('(TPR)', '(TNR)', '(FPR)', '(FNR)')
values <- c(tpr, tnr, fpr, fnr)
results <- data.frame(title_row, values)
colnames(results) <- c('Measure', 'Value')
results## Measure Value
## 1 (TPR) 1
## 2 (TNR) 1
## 3 (FPR) 0
## 4 (FNR) 0
## [1] 1
95% of accuracy!
Based on accuracy and confusion matrix, model 1 seems a good fit.
Multinomial Logistic Regression
For multinomial logistic regression section I will be using our initial species variable. For that I will be getting data from the original dataset. Also will be using multinom() function in this case.
# newly taking data from the penguin dataset
data(penguins)
# Remove NAs
penguins <- penguins %>% na.omit()#Build multinomial logistic regression model
model1_MLR <- multinom(species~., data = penguins, model = TRUE)## # weights: 30 (18 variable)
## initial value 365.837892
## iter 10 value 38.413009
## iter 20 value 1.753187
## iter 30 value 0.032963
## final value 0.000004
## converged
## Start: AIC=36
## species ~ island + bill_length_mm + bill_depth_mm + flipper_length_mm +
## body_mass_g + sex + year
##
## # weights: 24 (14 variable)
## initial value 365.837892
## iter 10 value 39.684733
## iter 20 value 3.493800
## iter 30 value 0.041983
## iter 40 value 0.004550
## iter 50 value 0.002583
## iter 60 value 0.000942
## iter 70 value 0.000750
## iter 80 value 0.000166
## final value 0.000072
## converged
## # weights: 27 (16 variable)
## initial value 365.837892
## iter 10 value 107.613850
## iter 20 value 70.986260
## iter 30 value 70.288710
## iter 40 value 69.111777
## iter 50 value 68.116406
## final value 67.940910
## converged
## # weights: 27 (16 variable)
## initial value 365.837892
## iter 10 value 29.234711
## iter 20 value 2.653631
## iter 30 value 1.924314
## iter 40 value 0.144006
## iter 50 value 0.104312
## iter 60 value 0.015151
## iter 70 value 0.013014
## iter 80 value 0.011594
## iter 90 value 0.007703
## iter 100 value 0.005010
## final value 0.005010
## stopped after 100 iterations
## # weights: 27 (16 variable)
## initial value 365.837892
## iter 10 value 36.868693
## iter 20 value 1.102155
## iter 30 value 0.002861
## final value 0.000006
## converged
## # weights: 27 (16 variable)
## initial value 365.837892
## iter 10 value 14.861553
## iter 20 value 1.788791
## iter 30 value 0.460648
## iter 40 value 0.043136
## iter 50 value 0.020817
## iter 60 value 0.008312
## iter 70 value 0.006563
## iter 80 value 0.005939
## iter 90 value 0.002767
## iter 100 value 0.002125
## final value 0.002125
## stopped after 100 iterations
## # weights: 27 (16 variable)
## initial value 365.837892
## iter 10 value 38.605627
## iter 20 value 1.754644
## iter 30 value 0.004685
## iter 40 value 0.000130
## final value 0.000092
## converged
## # weights: 27 (16 variable)
## initial value 365.837892
## iter 10 value 14.312441
## iter 20 value 2.590536
## iter 30 value 0.589502
## iter 40 value 0.059470
## iter 50 value 0.014357
## iter 60 value 0.005840
## iter 70 value 0.003263
## iter 80 value 0.002841
## iter 90 value 0.002636
## iter 100 value 0.002421
## final value 0.002421
## stopped after 100 iterations
## Df AIC
## - island 4 28.000
## - flipper_length_mm 2 32.000
## - sex 2 32.000
## - body_mass_g 2 32.004
## - year 2 32.005
## - bill_depth_mm 2 32.010
## <none> 36.000
## - bill_length_mm 2 167.882
## # weights: 24 (14 variable)
## initial value 365.837892
## iter 10 value 39.684733
## iter 20 value 3.493800
## iter 30 value 0.041983
## iter 40 value 0.004550
## iter 50 value 0.002583
## iter 60 value 0.000942
## iter 70 value 0.000750
## iter 80 value 0.000166
## final value 0.000072
## converged
##
## Step: AIC=28
## species ~ bill_length_mm + bill_depth_mm + flipper_length_mm +
## body_mass_g + sex + year
##
## # weights: 21 (12 variable)
## initial value 365.837892
## iter 10 value 146.564554
## iter 20 value 113.998585
## iter 30 value 113.009733
## iter 40 value 112.969642
## iter 50 value 112.476681
## iter 60 value 111.106656
## iter 70 value 109.300764
## final value 107.543041
## converged
## # weights: 21 (12 variable)
## initial value 365.837892
## iter 10 value 36.356943
## iter 20 value 4.689074
## iter 30 value 3.216662
## iter 40 value 2.025031
## iter 50 value 1.937979
## iter 60 value 1.882920
## iter 70 value 1.636742
## iter 80 value 0.656217
## iter 90 value 0.488711
## iter 100 value 0.409244
## final value 0.409244
## stopped after 100 iterations
## # weights: 21 (12 variable)
## initial value 365.837892
## iter 10 value 29.688186
## iter 20 value 2.173176
## iter 30 value 0.018585
## iter 40 value 0.002157
## iter 50 value 0.001850
## iter 60 value 0.001671
## iter 70 value 0.000175
## iter 70 value 0.000092
## iter 70 value 0.000090
## final value 0.000090
## converged
## # weights: 21 (12 variable)
## initial value 365.837892
## iter 10 value 16.984156
## iter 20 value 3.900785
## iter 30 value 2.266374
## iter 40 value 0.537528
## iter 50 value 0.060681
## iter 60 value 0.031129
## iter 70 value 0.009115
## iter 80 value 0.004372
## iter 90 value 0.003783
## iter 100 value 0.003204
## final value 0.003204
## stopped after 100 iterations
## # weights: 21 (12 variable)
## initial value 365.837892
## iter 10 value 39.951831
## iter 20 value 4.346676
## iter 30 value 1.080490
## iter 40 value 0.395837
## iter 50 value 0.285150
## iter 60 value 0.184950
## iter 70 value 0.135562
## iter 80 value 0.018328
## iter 90 value 0.010902
## iter 100 value 0.009150
## final value 0.009150
## stopped after 100 iterations
## # weights: 21 (12 variable)
## initial value 365.837892
## iter 10 value 16.719027
## iter 20 value 3.047829
## iter 30 value 0.103484
## iter 40 value 0.063985
## iter 50 value 0.006201
## iter 60 value 0.003924
## iter 70 value 0.001733
## iter 80 value 0.001547
## iter 90 value 0.001513
## iter 100 value 0.001498
## final value 0.001498
## stopped after 100 iterations
## Df AIC
## - flipper_length_mm 2 24.000
## - year 2 24.003
## - body_mass_g 2 24.006
## - sex 2 24.018
## - bill_depth_mm 2 24.818
## <none> 28.000
## - bill_length_mm 2 239.086
## # weights: 21 (12 variable)
## initial value 365.837892
## iter 10 value 29.688186
## iter 20 value 2.173176
## iter 30 value 0.018585
## iter 40 value 0.002157
## iter 50 value 0.001850
## iter 60 value 0.001671
## iter 70 value 0.000175
## iter 70 value 0.000092
## iter 70 value 0.000090
## final value 0.000090
## converged
##
## Step: AIC=24
## species ~ bill_length_mm + bill_depth_mm + body_mass_g + sex +
## year
##
## # weights: 18 (10 variable)
## initial value 365.837892
## iter 10 value 145.597733
## iter 20 value 133.588157
## iter 30 value 133.557418
## iter 40 value 133.531903
## final value 133.358517
## converged
## # weights: 18 (10 variable)
## initial value 365.837892
## iter 10 value 29.102517
## iter 20 value 5.398473
## iter 30 value 4.687579
## iter 40 value 4.582359
## iter 50 value 4.561633
## iter 60 value 4.557843
## iter 70 value 4.557615
## iter 80 value 4.557541
## iter 90 value 4.556795
## iter 100 value 4.556677
## final value 4.556677
## stopped after 100 iterations
## # weights: 18 (10 variable)
## initial value 365.837892
## iter 10 value 9.505164
## iter 20 value 5.215222
## iter 30 value 2.122952
## iter 40 value 0.005871
## iter 50 value 0.000406
## iter 60 value 0.000290
## iter 70 value 0.000289
## iter 80 value 0.000289
## iter 90 value 0.000278
## iter 100 value 0.000275
## final value 0.000275
## stopped after 100 iterations
## # weights: 18 (10 variable)
## initial value 365.837892
## iter 10 value 32.702824
## iter 20 value 4.436647
## iter 30 value 1.747794
## iter 40 value 0.957199
## iter 50 value 0.208096
## iter 60 value 0.174845
## iter 70 value 0.164077
## iter 80 value 0.149767
## iter 90 value 0.142422
## iter 100 value 0.104917
## final value 0.104917
## stopped after 100 iterations
## # weights: 18 (10 variable)
## initial value 365.837892
## iter 10 value 25.561758
## iter 20 value 1.350538
## iter 30 value 0.065803
## iter 40 value 0.020499
## iter 50 value 0.016911
## iter 60 value 0.012712
## iter 70 value 0.005348
## iter 80 value 0.001290
## iter 90 value 0.000225
## iter 100 value 0.000216
## final value 0.000216
## stopped after 100 iterations
## Df AIC
## - year 2 20.000
## - body_mass_g 2 20.001
## - sex 2 20.210
## <none> 24.000
## - bill_depth_mm 2 29.113
## - bill_length_mm 2 286.717
## # weights: 18 (10 variable)
## initial value 365.837892
## iter 10 value 25.561758
## iter 20 value 1.350538
## iter 30 value 0.065803
## iter 40 value 0.020499
## iter 50 value 0.016911
## iter 60 value 0.012712
## iter 70 value 0.005348
## iter 80 value 0.001290
## iter 90 value 0.000225
## iter 100 value 0.000216
## final value 0.000216
## stopped after 100 iterations
##
## Step: AIC=20
## species ~ bill_length_mm + bill_depth_mm + body_mass_g + sex
##
## # weights: 15 (8 variable)
## initial value 365.837892
## iter 10 value 136.772749
## iter 20 value 133.782638
## iter 30 value 133.545483
## final value 133.538635
## converged
## # weights: 15 (8 variable)
## initial value 365.837892
## iter 10 value 30.179445
## iter 20 value 5.699489
## iter 30 value 5.175455
## iter 40 value 5.153196
## iter 50 value 5.070255
## iter 60 value 4.997789
## iter 70 value 4.893410
## iter 80 value 4.826209
## iter 90 value 4.708542
## iter 100 value 4.695168
## final value 4.695168
## stopped after 100 iterations
## # weights: 15 (8 variable)
## initial value 365.837892
## iter 10 value 11.953428
## iter 20 value 3.975600
## iter 30 value 2.172365
## iter 40 value 2.111553
## iter 50 value 1.881400
## iter 60 value 1.236767
## iter 70 value 1.133364
## iter 80 value 1.059253
## iter 90 value 0.588576
## iter 100 value 0.490499
## final value 0.490499
## stopped after 100 iterations
## # weights: 15 (8 variable)
## initial value 365.837892
## iter 10 value 20.098944
## iter 20 value 3.621850
## iter 30 value 1.160548
## iter 40 value 1.108738
## iter 50 value 1.063387
## iter 60 value 0.958674
## iter 70 value 0.926099
## iter 80 value 0.837719
## iter 90 value 0.797644
## iter 100 value 0.741759
## final value 0.741759
## stopped after 100 iterations
## Df AIC
## - body_mass_g 2 16.981
## - sex 2 17.484
## <none> 20.000
## - bill_depth_mm 2 25.390
## - bill_length_mm 2 283.077
## # weights: 15 (8 variable)
## initial value 365.837892
## iter 10 value 11.953428
## iter 20 value 3.975600
## iter 30 value 2.172365
## iter 40 value 2.111553
## iter 50 value 1.881400
## iter 60 value 1.236767
## iter 70 value 1.133364
## iter 80 value 1.059253
## iter 90 value 0.588576
## iter 100 value 0.490499
## final value 0.490499
## stopped after 100 iterations
##
## Step: AIC=16.98
## species ~ bill_length_mm + bill_depth_mm + sex
##
## # weights: 12 (6 variable)
## initial value 365.837892
## iter 10 value 150.109767
## iter 20 value 142.419807
## iter 30 value 142.047323
## final value 142.041293
## converged
## # weights: 12 (6 variable)
## initial value 365.837892
## iter 10 value 139.953898
## iter 20 value 133.697479
## iter 30 value 133.399000
## iter 40 value 133.326376
## final value 133.325373
## converged
## # weights: 12 (6 variable)
## initial value 365.837892
## iter 10 value 26.650783
## iter 20 value 23.943597
## iter 30 value 23.916873
## iter 40 value 23.901339
## iter 50 value 23.895442
## iter 60 value 23.894251
## final value 23.892065
## converged
## Df AIC
## <none> 16.981
## - sex 2 59.784
## - bill_depth_mm 2 278.651
## - bill_length_mm 2 296.083
## Call:
## multinom(formula = species ~ ., data = penguins, model = TRUE)
##
## Coefficients:
## (Intercept) islandDream islandTorgersen bill_length_mm bill_depth_mm
## Chinstrap -0.2496435 211.62912 97.64471 32.96624 -60.84017
## Gentoo 0.7633414 -68.84639 -140.79025 28.45770 -42.86214
## flipper_length_mm body_mass_g sexmale year
## Chinstrap 0.01515612 -0.0193027 -70.73544 -0.2242371
## Gentoo -2.79321661 0.1330873 -67.23109 -0.2391631
##
## Std. Errors:
## (Intercept) islandDream islandTorgersen bill_length_mm bill_depth_mm
## Chinstrap 0.02157370 0.02157370 6.871032e-16 0.2205894 0.1689163
## Gentoo 0.02114133 0.02114131 5.401394e-53 1.0993488 0.4376254
## flipper_length_mm body_mass_g sexmale year
## Chinstrap 4.937946 25.79546 0.02113835 42.20323
## Gentoo 4.439678 101.47836 0.02114132 42.45178
##
## Residual Deviance: 7.087421e-06
## AIC: 36.00001
## Call:
## multinom(formula = species ~ bill_length_mm + bill_depth_mm +
## sex, data = penguins, model = TRUE)
##
## Coefficients:
## (Intercept) bill_length_mm bill_depth_mm sexmale
## Chinstrap -242.9740 14.88938 -21.91411 -36.9051125
## Gentoo 121.0476 14.80645 -44.69711 -0.1301506
##
## Std. Errors:
## (Intercept) bill_length_mm bill_depth_mm sexmale
## Chinstrap 3.985819 9.369182 23.16685 23.33386
## Gentoo 4.071031 9.413971 23.17697 43.25185
##
## Residual Deviance: 0.9809986
## AIC: 16.981
Summary
When looking at the summary of model1_MLR, we can see that it has a high AIC value compared to the model2_MLR. Optimized AIC model has only three variables bill_length_mm, bill_depth_mm and sex.
Negative Coefficients of bill_depth_mm and sexmale in Chinstrap can be seen as negatively correlated with the species being Chinstraps. Also positive coefficient of bill_length_mm can be interpret as its being positively correlated with the species being Chinstrap. Chinstrap has a negative coefficient intercept which can be interpret as Chinstraps as a unpopular species.
Positive intercept of Gentoo can be represent Gentoo as one of the popular species. As in Chinstrap species, Gentoo also have a positively correlated bill_length_mm that is positively correlated with Gentoos.