A health company wants to understand the needs of its customers better. We want to help them predict two outcomes for given details of any US zipcode:
Revenue is linearly dependent on below variables and therefore can be estimated for any zipcode using below formula : \[Revenue = 31740 + 108.1*number_of_establishments + 0.08215*annual_payroll - 0.1162*male_population - 120.5*males_5_to_14_years + 59.21*females_15_to_44_years + 80.02*females_60_over_years - 124.9*males_under_5_years - 83.73*male_median_age \]
Using multinomial logistic regression, Popular product in any given zipcode can be predicted with an accuracy of 99% using below 6 predictors out of the 25 potential predictors:
Odds of popularity of different products can be calculated using formula:
\[ Log(P(Life)/P(College)) = -55.818868 + female_population(0.00003791523) - males_under_5_years(9.06659) - males_45_to_49_years(0.09553038) - females_45_to_49_years(0.0718931) + males_15_to_44_years(3.06368448) -0.09443131(male_median_age) \]
\[ Log (P(Retirement)/P(College)) = -7.114253 + female_population(0.00004654075) - males_under_5_years(8.943701) + males_45_to_49_years(9.49303753) - females_45_to_49_years(0.2191584) + males_15_to_44_years(0.02681199) - male_median_age(0.10628488) \]
First, we will create a master data set containing all variables in one table.
library("car") #contains vif function to check for multicolliearity amongst predictors
library("nnet") #contains multino function for multinomial regression
library("GGally") #Vizualize correlations
library("dplyr") #Data Manipulations
library("ggplot2") #Create Vizualizations
library("tidyr") #Transformation of data
library("leaps") #For best subset selection
library("purrr")
library("knitr")
library("kableExtra")
#Data import
setwd("C:/Users/Vivobook/Documents/Data/clarity")
geo_info <- read.csv("geo_info.csv")
census <- read.csv("census_train.csv", na.strings = "-")
dependent_variables <- read.csv("revenues_products.csv")
#creating a master table with independent and dependent variables in one
dependent_variables$Geographic.area.name <- as.character(dependent_variables$Geographic.area.name)
geo_info$Geographic.area.name <- as.character(geo_info$Geographic.area.name)
dependent_variables_2 <- left_join(dependent_variables, geo_info, by = c("Geographic.area.name" = "Geographic.area.name"))
master <- left_join(census, dependent_variables_2, by = "Id2")
Observation: In raw data missing numeric values were denoted by “-” placeholder. This has been changed to NA
Renaming and formatting data.
#Renaming columns
names(master) <- c("id", "total_population", "male_population", "female_population", "males_under_5_years", "females_under_5_years", "males_45_to_49_years", "females_45_to_49_years", "males_50_to_54_years", "females_50_to_54_years", "males_55_to_59_years", "females_55_to_59_years", "males_5_to_14_years", "females_5_to_14_years", "males_15_to_44_years", "females_15_to_44_years", "males_60_over_years", "females_60_over_years", "male_median_age", "female_median_age", "number_of_establishments", "number_of_paid_employees", "first_quarter_payroll", "annual_payroll", "Geographic.area.name", "revenue", "pop_product" )
#Correcting format
master$id <- as.numeric(master$id)
master$males_under_5_years <- as.numeric(master$males_under_5_years)
master$females_under_5_years <- as.numeric(master$females_under_5_years)
master$males_45_to_49_years <- as.numeric(master$males_45_to_49_years)
master$females_45_to_49_years <- as.numeric(master$females_45_to_49_years)
master$males_50_to_54_years <- as.numeric(master$males_50_to_54_years)
master$females_50_to_54_years <- as.numeric(master$females_50_to_54_years)
master$males_55_to_59_years <- as.numeric(master$males_55_to_59_years)
master$females_55_to_59_years <- as.numeric(master$females_55_to_59_years)
master$males_5_to_14_years <- as.numeric(master$males_5_to_14_years)
master$females_5_to_14_years <- as.numeric(master$females_5_to_14_years)
master$males_15_to_44_years <- as.numeric(master$males_15_to_44_years)
master$females_15_to_44_years <- as.numeric(master$females_15_to_44_years)
master$males_60_over_years <- as.numeric(master$males_60_over_years)
master$females_60_over_years <- as.numeric(master$females_60_over_years)
master$male_median_age <- as.numeric(master$male_median_age)
master$female_median_age <- as.numeric(master$female_median_age)
| Variable | Description |
|---|---|
| id | Area Zipcode |
| males_under_5_years | Proportion of males under 5 years of age |
| males_50_to_54_years | Proportion of males between 50 to 54 years |
| males_5_to_14_years | Proportion of males between 5 to 14 years |
| males_60_over_years | Proportion of males over 60 years |
| number_of_establishments | number of establishments in the area |
| Geographic.area.name | Area name along with zipcode |
| total_population | total_population of zipcode |
| females_under_5_years | Area Zipcode |
| females_50_to_54_years | Proportion of females under 5 years of age |
| females_5_to_14_years | Proportion of females between 50 to 54 years |
| females_60_over_years | Proportion of females between 5 to 14 years |
| number_of_paid_employees | Proportion of females over 60 years |
| revenue | Revenue generated. Dependent Variable |
| male_population | Total males population |
| males_45_to_49_years | Proportion of males between 45 and 49 years of age |
| males_55_to_59_years | Proportion of males between 55 and 59 years of age |
| males_15_to_44_years | Proportion of males between 15 and 44 years of age |
| male_median_age | Male median age |
| first_quarter_payroll | Income in first quarter of the year |
| pop_product | Popular product in zipcode. Categorical dependent variable |
| female_population | total female population of area |
| females_45_to_49_years | Proportion of females between 45 and 49 years of age |
| females_55_to_59_years | Proportion of females between 55 and 59 years of age |
| females_15_to_44_years | Proportion of females between 15 and 44 years of age |
| female_median_age | female median age of zipcode |
| annual_payroll | Average annual income of zipcode |
anyDuplicated(master)
## [1] 0
Observation: No duplicate rows in data
sapply(master, function(x) {sum(is.na(x))})
## id total_population male_population
## 0 0 0
## female_population males_under_5_years females_under_5_years
## 0 47 51
## males_45_to_49_years females_45_to_49_years males_50_to_54_years
## 47 51 47
## females_50_to_54_years males_55_to_59_years females_55_to_59_years
## 51 47 51
## males_5_to_14_years females_5_to_14_years males_15_to_44_years
## 47 51 47
## females_15_to_44_years males_60_over_years females_60_over_years
## 51 47 51
## male_median_age female_median_age number_of_establishments
## 91 99 0
## number_of_paid_employees first_quarter_payroll annual_payroll
## 0 0 0
## Geographic.area.name revenue pop_product
## 0 0 0
Observation: Most of the variables have missing values in them. Lets see how these missing values are distributed across rows:
missing_cols <- NULL
for(i in 1:nrow(master)){
count1 <- 0
for(j in 1:length(master)){
count1 <- count1 + is.na(master[i,j])
}
missing_cols[i] <- count1
}
abc<- as.data.frame(cbind(master$id, missing_cols))
names(abc)[1] <- "id"
abc <- abc %>%
filter(missing_cols != 0) %>%
arrange(-missing_cols)
kable(abc[1:20,]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "responsive","bordered")) %>%
scroll_box(width = "100%", height = "500px")
| id | missing_cols |
|---|---|
| 28019 | 16 |
| 40363 | 16 |
| 11424 | 16 |
| 13138 | 16 |
| 10153 | 16 |
| 93943 | 16 |
| 20240 | 16 |
| 38132 | 16 |
| 83826 | 16 |
| 99164 | 16 |
| 48397 | 16 |
| 98158 | 16 |
| 10119 | 16 |
| 5740 | 16 |
| 19109 | 16 |
| 93064 | 16 |
| 19736 | 16 |
| 10199 | 16 |
| 23250 | 16 |
| 66115 | 16 |
Observations:
census_data <- anti_join(master, abc, by = "id")
nrow(census_data)
## [1] 6731
kable(round(cbind(
"Min" = apply(census_data[,-c(1,25,27)], 2, min),
"Q1" = apply(census_data[,-c(1,25,27)], 2, quantile)[1,],
"Q2" = apply(census_data[,-c(1,25,27)], 2, quantile)[2,],
"Mean" = apply(census_data[,-c(1,25,27)], 2, mean),
"Median" = apply(census_data[,-c(1,25,27)], 2, median),
"Q3" = apply(census_data[,-c(1,25,27)], 2, quantile)[3,],
"Q4" = apply(census_data[,-c(1,25,27)], 2, quantile)[4,],
"Max" = apply(census_data[,-c(1,25,27)], 2, max)
),2))
| Min | Q1 | Q2 | Mean | Median | Q3 | Q4 | Max | |
|---|---|---|---|---|---|---|---|---|
| total_population | 22.0 | 22.0 | 1143.5 | 10969.77 | 3797.0 | 3797.0 | 15875.0 | 112709.0 |
| male_population | 11.0 | 11.0 | 580.5 | 5398.79 | 1904.0 | 1904.0 | 7807.0 | 60595.0 |
| female_population | 11.0 | 11.0 | 560.0 | 5570.98 | 1885.0 | 1885.0 | 8039.0 | 52550.0 |
| males_under_5_years | 0.0 | 0.0 | 3.8 | 5.87 | 5.7 | 5.7 | 7.5 | 47.4 |
| females_under_5_years | 0.0 | 0.0 | 3.6 | 5.57 | 5.4 | 5.4 | 7.2 | 47.0 |
| males_45_to_49_years | 0.0 | 0.0 | 5.1 | 6.77 | 6.7 | 6.7 | 8.2 | 55.2 |
| females_45_to_49_years | 0.0 | 0.0 | 5.1 | 6.68 | 6.5 | 6.5 | 8.0 | 50.0 |
| males_50_to_54_years | 0.0 | 0.0 | 5.7 | 7.55 | 7.3 | 7.3 | 8.9 | 52.2 |
| females_50_to_54_years | 0.0 | 0.0 | 5.8 | 7.62 | 7.3 | 7.3 | 8.9 | 83.9 |
| males_55_to_59_years | 0.0 | 0.0 | 5.5 | 7.42 | 7.0 | 7.0 | 8.8 | 72.7 |
| females_55_to_59_years | 0.0 | 0.0 | 5.6 | 7.55 | 7.1 | 7.1 | 8.8 | 80.3 |
| males_5_to_14_years | 0.0 | 0.0 | 9.9 | 12.74 | 12.9 | 12.9 | 15.6 | 57.1 |
| females_5_to_14_years | 0.0 | 0.0 | 9.3 | 12.08 | 12.1 | 12.1 | 14.7 | 62.1 |
| males_15_to_44_years | 0.0 | 0.0 | 31.6 | 37.01 | 36.7 | 36.7 | 41.8 | 100.0 |
| females_15_to_44_years | 0.0 | 0.0 | 30.4 | 35.31 | 35.1 | 35.1 | 39.7 | 100.0 |
| males_60_over_years | 0.0 | 0.0 | 16.5 | 22.64 | 21.0 | 21.0 | 26.5 | 100.0 |
| females_60_over_years | 0.0 | 0.0 | 19.1 | 25.19 | 24.0 | 24.0 | 29.4 | 95.7 |
| male_median_age | 6.7 | 6.7 | 35.3 | 40.58 | 40.1 | 40.1 | 45.3 | 83.9 |
| female_median_age | 6.9 | 6.9 | 37.2 | 42.44 | 42.3 | 42.3 | 47.3 | 84.2 |
| number_of_establishments | 3.0 | 3.0 | 17.0 | 261.80 | 68.0 | 68.0 | 346.0 | 6723.0 |
| number_of_paid_employees | 1.0 | 1.0 | 103.0 | 4014.39 | 655.0 | 655.0 | 4434.5 | 163895.0 |
| first_quarter_payroll | 1.0 | 1.0 | 735.5 | 51145.04 | 5414.0 | 5414.0 | 39540.0 | 8176054.0 |
| annual_payroll | 19.0 | 19.0 | 3338.0 | 201949.57 | 23499.0 | 23499.0 | 168815.5 | 22165514.0 |
| revenue | 14098.0 | 14098.0 | 32904.0 | 74537.95 | 41287.0 | 41287.0 | 82411.0 | 2686772.0 |
Observations
census_data[,-c(1, 25, 27)] %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) + facet_wrap(~key, scales = "free") + geom_density()
Observations
census_data[,-c(1, 25, 27)] %>%
gather(-revenue, key = "var", value = "value") %>%
ggplot(aes(x = value, y = revenue)) +
geom_point() +
geom_smooth() +
facet_wrap(~ var, scales = "free") +
theme_bw()
Observation: All variables are follow approximately linear trend with respect to the dependent variable so we will predict revenue using linear regression modeling
M <- round(cor(census_data[,-c(1, 25,27)]),1)
ggcorr(M, nbreaks=8, palette='RdGy', label=TRUE, label_color='white')
Observations:
Removing unused levels
census_data$pop_product <- droplevels(census_data$pop_product)
barplot(table(census_data$pop_product))
Observation: Dependent variable is almost equally distributed between the 3 factors.
census_data[,-c(1,25,26)] %>%
gather(-pop_product, key = "var", value = "value") %>%
ggplot(aes(x = value, fill = pop_product)) + facet_wrap(~var, scales = "free") + geom_density()
Observations:
set.seed(100)
index <- sample(nrow(census_data), nrow(census_data)*0.8)
regression.train <- census_data[index,-c(1,25,27)]
regression.test <- census_data[-index,-c(1,25,27)]
We will use stepwise selection for feature selection. We will use CP as the selection criteria since it tends to favor fewer variables and is less likely to overfit our data with so may predictors having a high correlation.
nullmodel=lm(revenue~1, data=regression.train)
fullmodel=lm(revenue ~ ., data=regression.train)
model.regression <- step(nullmodel, scope=list(lower=nullmodel, upper=fullmodel),
direction = c("forward"), select = "CP", trace = FALSE)
summary(model.regression)
##
## Call:
## lm(formula = revenue ~ number_of_paid_employees + first_quarter_payroll +
## number_of_establishments + annual_payroll + male_population +
## males_5_to_14_years + females_15_to_44_years + females_60_over_years +
## males_15_to_44_years + males_60_over_years + males_under_5_years +
## male_median_age + total_population, data = regression.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -102215 -3558 126 3619 130942
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.765e+04 2.666e+03 10.371 < 2e-16 ***
## number_of_paid_employees 4.764e-01 5.002e-02 9.524 < 2e-16 ***
## first_quarter_payroll 4.471e-02 3.325e-03 13.449 < 2e-16 ***
## number_of_establishments 1.068e+02 6.733e-01 158.628 < 2e-16 ***
## annual_payroll 6.723e-02 1.298e-03 51.790 < 2e-16 ***
## male_population -7.259e-01 2.325e-01 -3.122 0.001803 **
## males_5_to_14_years -1.026e+02 3.111e+01 -3.297 0.000982 ***
## females_15_to_44_years 8.776e+01 1.666e+01 5.269 1.43e-07 ***
## females_60_over_years 6.967e+01 1.654e+01 4.213 2.56e-05 ***
## males_15_to_44_years 4.229e+01 2.539e+01 1.665 0.095899 .
## males_60_over_years 6.076e+01 2.042e+01 2.976 0.002938 **
## males_under_5_years -1.175e+02 3.844e+01 -3.056 0.002252 **
## male_median_age -7.816e+01 3.394e+01 -2.303 0.021318 *
## total_population 2.212e-01 1.142e-01 1.937 0.052793 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7019 on 5370 degrees of freedom
## Multiple R-squared: 0.9946, Adjusted R-squared: 0.9946
## F-statistic: 7.56e+04 on 13 and 5370 DF, p-value: < 2.2e-16
Observations:
par(mfrow = c(2,2))
plot(model.regression)
Observations:
Let’s have a closer look at outliers:
par(mfrow = c(1,1))
plot(model.regression,4)
A general rule of thumb is that observations with a Cook’s Distance of more than 3 times the mean cooks distance are outliers
cooksd <- cooks.distance(model.regression)
cooks.data <- regression.train[which(cooksd < 3*mean(cooksd)),]
vif(model.regression)
## number_of_paid_employees first_quarter_payroll number_of_establishments
## 18.445215 51.067082 8.910991
## annual_payroll male_population males_5_to_14_years
## 82.176014 311.389271 3.081838
## females_15_to_44_years females_60_over_years males_15_to_44_years
## 3.135534 3.262169 7.991069
## males_60_over_years males_under_5_years male_median_age
## 5.081531 2.093824 8.870817
## total_population
## 311.712791
Observation: Some of predictors have multicollinearity so we will remove them
model.regression <- lm(revenue ~ number_of_establishments +
annual_payroll +
male_population +
males_5_to_14_years +
females_15_to_44_years +
females_60_over_years +
males_under_5_years +
male_median_age,
data = cooks.data)
summary(model.regression)
##
## Call:
## lm(formula = revenue ~ number_of_establishments + annual_payroll +
## male_population + males_5_to_14_years + females_15_to_44_years +
## females_60_over_years + males_under_5_years + male_median_age,
## data = cooks.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34773 -3562 71 3527 44962
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.174e+04 1.204e+03 26.367 < 2e-16 ***
## number_of_establishments 1.081e+02 5.058e-01 213.726 < 2e-16 ***
## annual_payroll 8.215e-02 3.598e-04 228.313 < 2e-16 ***
## male_population -1.162e-01 1.999e-02 -5.813 6.49e-09 ***
## males_5_to_14_years -1.205e+02 1.729e+01 -6.973 3.49e-12 ***
## females_15_to_44_years 5.921e+01 1.279e+01 4.629 3.75e-06 ***
## females_60_over_years 8.002e+01 1.199e+01 6.672 2.78e-11 ***
## males_under_5_years -1.249e+02 2.479e+01 -5.039 4.83e-07 ***
## male_median_age -8.373e+01 1.697e+01 -4.934 8.30e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5758 on 5347 degrees of freedom
## Multiple R-squared: 0.9931, Adjusted R-squared: 0.9931
## F-statistic: 9.592e+04 on 8 and 5347 DF, p-value: < 2.2e-16
par(mfrow = c(2,2))
plot(model.regression)
Observations:
#predicted values for test data
test.pred.regression <- predict(model.regression, newdata = regression.test)
# Train RMSE
sqrt(mean((model.regression$fitted.values - cooks.data$revenue)^2))
## [1] 5753.413
# Test RMSE
sqrt(mean((test.pred.regression - regression.test$revenue)^2))
## [1] 6298.322
Observations: Since the test and train RMSE are close, the model does not overfit.
summary(model.regression)
##
## Call:
## lm(formula = revenue ~ number_of_establishments + annual_payroll +
## male_population + males_5_to_14_years + females_15_to_44_years +
## females_60_over_years + males_under_5_years + male_median_age,
## data = cooks.data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34773 -3562 71 3527 44962
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.174e+04 1.204e+03 26.367 < 2e-16 ***
## number_of_establishments 1.081e+02 5.058e-01 213.726 < 2e-16 ***
## annual_payroll 8.215e-02 3.598e-04 228.313 < 2e-16 ***
## male_population -1.162e-01 1.999e-02 -5.813 6.49e-09 ***
## males_5_to_14_years -1.205e+02 1.729e+01 -6.973 3.49e-12 ***
## females_15_to_44_years 5.921e+01 1.279e+01 4.629 3.75e-06 ***
## females_60_over_years 8.002e+01 1.199e+01 6.672 2.78e-11 ***
## males_under_5_years -1.249e+02 2.479e+01 -5.039 4.83e-07 ***
## male_median_age -8.373e+01 1.697e+01 -4.934 8.30e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5758 on 5347 degrees of freedom
## Multiple R-squared: 0.9931, Adjusted R-squared: 0.9931
## F-statistic: 9.592e+04 on 8 and 5347 DF, p-value: < 2.2e-16
Interpretation
classification.train <- census_data[index,-c(1,25,26)]
classification.test <- census_data[-index,-c(1,25,26)]
multinom.model <- multinom(pop_product~. ,
data = classification.train)
step.model <- step(multinom.model, trace = FALSE)
summary(step.model)
Observations: Below variables are selected in our model:
Conducting a 2-tailed z test to measure significance of coefficients
z <- summary(step.model)$coefficients/summary(step.model)$standard.errors
(1-pnorm(abs(z),0,1))*2
## (Intercept) total_population male_population female_population
## life 0 0.3596849 0.1518196 0.1363326
## retirement 0 0.3346846 0.1945903 0.1749857
## males_under_5_years males_45_to_49_years females_45_to_49_years
## life 0 0 0
## retirement 0 0 0
## males_15_to_44_years male_median_age
## life 0.00000000 0
## retirement 0.01962579 0
Observation: some predictors do have significant effect on odds of either “life” or “retirement” with respect to “college” so we will remove some of them.
## # weights: 24 (14 variable)
## initial value 5914.928562
## iter 10 value 2799.583413
## iter 20 value 572.241697
## iter 30 value 338.922701
## iter 40 value 331.651033
## iter 50 value 329.990961
## final value 329.982646
## converged
## Call:
## multinom(formula = pop_product ~ female_population + males_under_5_years +
## males_45_to_49_years + females_45_to_49_years + males_15_to_44_years +
## male_median_age, data = classification.train)
##
## Coefficients:
## (Intercept) female_population males_under_5_years
## life -55.818868 3.791523e-05 -9.066590
## retirement -7.114253 4.654075e-05 -8.943701
## males_45_to_49_years females_45_to_49_years
## life -0.09553038 -0.0718931
## retirement 9.49303753 -0.2191584
## males_15_to_44_years male_median_age
## life 3.06368448 -0.09443131
## retirement 0.02681199 -0.10628488
##
## Std. Errors:
## (Intercept) female_population males_under_5_years
## life 0.0003247091 1.779576e-05 0.02054906
## retirement 0.0005119870 2.090553e-05 0.03809443
## males_45_to_49_years females_45_to_49_years
## life 0.01512256 0.03729657
## retirement 0.01570786 0.02652953
## males_15_to_44_years male_median_age
## life 0.01318473 0.01225210
## retirement 0.01523920 0.01047576
##
## Residual Deviance: 659.9653
## AIC: 687.9653
Observations:
#Confusion matrix for training data:
table(classification.train$pop_product, predict(step.model))
##
## college life retirement
## college 1758 15 18
## life 5 1666 8
## retirement 20 8 1886
#Confusion matrix for training data:
table(classification.test$pop_product, predict(step.model, newdata = classification.test))
##
## college life retirement
## college 446 1 2
## life 2 428 0
## retirement 3 4 461
Observation: As per the conusion matrix, in case of both training and testing data prediction is almost accurate with very few misclassifications
#Missclassification rate of training data
mean(classification.train$pop_product != predict(step.model))
## [1] 0.01374443
#Missclassification rate of testing data
mean(classification.test$pop_product != predict(step.model, newdata = classification.test))
## [1] 0.008908686
Observation The model has a very high accuracy (1- Misclassification rate) of close to 99%
all_data <- census_data[-index,-c(1,25,26)]
abc <- data.frame(female_population = all_data$female_population,
males_under_5_years = all_data$males_under_5_years,
males_45_to_49_years = all_data$males_45_to_49_years,
females_45_to_49_years = all_data$females_45_to_49_years,
males_15_to_44_years = all_data$males_15_to_44_years,
male_median_age = all_data$male_median_age,
predict(step.model, newdata = all_data, type = "probs"))
abc %>%
gather(c(college, life, retirement), key = "pop_product", value = "prob") %>%
gather(-c(pop_product,prob), key = "var", value = "value") %>%
ggplot(aes(x = value, y = prob, color = pop_product)) + facet_wrap(~var, scales = "free") +
geom_smooth(se = FALSE) + geom_point() + theme_minimal()
summary(step.model)
## Call:
## multinom(formula = pop_product ~ female_population + males_under_5_years +
## males_45_to_49_years + females_45_to_49_years + males_15_to_44_years +
## male_median_age, data = classification.train)
##
## Coefficients:
## (Intercept) female_population males_under_5_years
## life -55.818868 3.791523e-05 -9.066590
## retirement -7.114253 4.654075e-05 -8.943701
## males_45_to_49_years females_45_to_49_years
## life -0.09553038 -0.0718931
## retirement 9.49303753 -0.2191584
## males_15_to_44_years male_median_age
## life 3.06368448 -0.09443131
## retirement 0.02681199 -0.10628488
##
## Std. Errors:
## (Intercept) female_population males_under_5_years
## life 0.0003247091 1.779576e-05 0.02054906
## retirement 0.0005119870 2.090553e-05 0.03809443
## males_45_to_49_years females_45_to_49_years
## life 0.01512256 0.03729657
## retirement 0.01570786 0.02652953
## males_15_to_44_years male_median_age
## life 0.01318473 0.01225210
## retirement 0.01523920 0.01047576
##
## Residual Deviance: 659.9653
## AIC: 687.9653
Observations: