We are to embark on a study of occupational importance deploying data on the subject from some time back in Canada. Some might call it prestige. The outcome of interest is the importance attached to the occupation in a broad sample of Canadian individuals. The data are contained in Occupational.Importance in the Markdown below. To make the data active in the RStudio at any point, simply press play to the code chunk below.
## 'data.frame': 102 obs. of 7 variables:
## $ title : chr "GOV.ADMINISTRATORS" "GENERAL.MANAGERS" "ACCOUNTANTS" "PURCHASING.OFFICERS" ...
## $ education : num 13.1 12.3 12.8 11.4 14.6 ...
## $ income : num 12351 25879 9271 8865 8403 ...
## $ women : num 11.16 4.02 15.7 9.11 11.68 ...
## $ importance: num 68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
## $ census : num 1113 1130 1171 1175 2111 ...
## $ type : chr "prof" "prof" "prof" "prof" ...
The outcome:
Importance: The average importance rating for an occupation from a broad national survey.
The predictors:
* education: The average number of years of education for occupational incumbents.
* income: The average income of occupational incumbents, in dollars.
* women: The percentage of women in the occupation.
* census: The code of the occupation used in the survey (an ID variable).
* type: Professional and managerial(prof), white collar(wc), blue collar(bc), or other
* type: the factor version of type
What explains occupational importance as considered by society at large?
We are particularly interested in education, income, and the gender distribution.
qbinom(c(0.1,0.9), 101, 0.15)
## [1] 11 20
1-pbinom(22, 101, 0.15)
## [1] 0.02471246
pbinom(13, 101, 0.15)
## [1] 0.3324065
ppois(0,0.7)
## [1] 0.4965853
1-ppois(0,0.7)
## [1] 0.5034147
qpois(0.85,0.7)
## [1] 2
pnorm(500, 435, 40, lower.tail = FALSE)
## [1] 0.05208128
qnorm(c(0.25,0.75), 435, 40)
## [1] 408.0204 461.9796
The true answer depends on the sample size. We normally use the t distribution but a t with infinite degrees of freedom is normal. The key thing to note is two things. First, the \(t\) distribution of the mean is defined on \(n-1\) degrees of freedom. And the standard error of the mean is \(\frac{s}{\sqrt{n}}\). Both parts depend on n. So let’s sample from a normal and calculate this; any example will do. The analytical solution is that it will be normal with \(\mu=435\) and \(\sigma=\frac{\sigma}{\sqrt{n}}\). The great irony of the whole thing is that, in the limit, any interval about the mean converges to a constant because \(\lim_{n\rightarrow \infty} \frac{s}{\sqrt{n}} = 0\) for any finite standard deviation \(s\).
t.test(rnorm(100, 435, 40))
##
## One Sample t-test
##
## data: rnorm(100, 435, 40)
## t = 113.84, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 425.0026 440.0815
## sample estimates:
## mean of x
## 432.542
t.test(rnorm(1000, 435, 40))
##
## One Sample t-test
##
## data: rnorm(1000, 435, 40)
## t = 335.87, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 431.0425 436.1089
## sample estimates:
## mean of x
## 433.5757
t.test(rnorm(10000, 435, 40))
##
## One Sample t-test
##
## data: rnorm(10000, 435, 40)
## t = 1085.1, df = 9999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 433.9898 435.5606
## sample estimates:
## mean of x
## 434.7752
qf(0.95, 2, 77)
## [1] 3.115366
For context on the data, the variable names in Occupational.Importance are shown below.
names(Occupational.Importance)
## [1] "title" "education" "income" "women" "importance"
## [6] "census" "type"
library(ggplot2)
ggplot(data = Occupational.Importance) +
aes(x = importance) +
geom_density(adjust = 1, fill = "#0c4c8a") +
theme_minimal()
A density plot shows the data to have a bit of right skew but symmetry is not out of the question. To preview a brief focus on types, I will show the boxplots.
ggplot(data = Occupational.Importance) +
aes(x = type, y = importance, fill = type) +
geom_boxplot() +
theme_minimal()
OCI <- Occupational.Importance
library(car)
## Loading required package: carData
densityPlot(OCI$importance)
qqPlot(OCI$importance)
## [1] 24 21
# The qqplot shows some skewness. Is it normal by the Shapiro test?
shapiro.test(OCI$importance)
##
## Shapiro-Wilk normality test
##
## data: OCI$importance
## W = 0.97198, p-value = 0.02875
# Nope.
It is fairly symmetric. The mean and median are somewhat close to one another though the mean is bigger than the median; it is not much bigger. You could argue for either.
library(tidyverse)
library(skimr)
OCI %>% skim(importance)
## Skim summary statistics
## n obs: 102
## n variables: 7
##
## ── Variable type:numeric ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## variable missing complete n mean sd p0 p25 p50 p75 p100
## importance 0 102 102 46.83 17.2 14.8 35.23 43.6 59.28 87.2
## hist
## ▂▅▇▆▅▃▃▁
OCI$title[OCI$importance==max(OCI$importance)]
## [1] "PHYSICIANS"
OCI$title[OCI$importance==min(OCI$importance)]
## [1] "NEWSBOYS"
t.test(OCI$importance)
##
## One Sample t-test
##
## data: OCI$importance
## t = 27.492, df = 101, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 43.45405 50.21261
## sample estimates:
## mean of x
## 46.83333
With 95% confidence, average importance ranges from _43.4540517__ to 50.212615 .
What is the mean and standard deviation OF IMPORTANCE for each type of occupation?
OCI %>% group_by(type) %>% skim(importance) %>% filter(stat%in%c("mean","sd"))
## # A tibble: 8 x 6
## variable type stat level value formatted
## <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 importance numeric mean .all 35.5 35.53
## 2 importance numeric sd .all 10.0 10.02
## 3 importance numeric mean .all 34.7 34.73
## 4 importance numeric sd .all 17.7 17.68
## 5 importance numeric mean .all 67.8 67.85
## 6 importance numeric sd .all 8.68 8.68
## 7 importance numeric mean .all 42.2 42.24
## 8 importance numeric sd .all 9.52 9.52
OCI %>% skim(income, education, women) %>% filter(stat%in%c("mean","sd"))
## # A tibble: 6 x 6
## variable type stat level value formatted
## <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 income numeric mean .all 6798. 6797.9
## 2 income numeric sd .all 4246. 4245.92
## 3 education numeric mean .all 10.7 10.74
## 4 education numeric sd .all 2.73 2.73
## 5 women numeric mean .all 29.0 28.98
## 6 women numeric sd .all 31.7 31.72
WC <- OCI[OCI$type=="wc",]
Prof <- OCI[OCI$type=="prof",]
Are they different? TRUE
t.test(WC$education,Prof$education)$conf.int
## [1] -3.696390 -2.428519
## attr(,"conf.level")
## [1] 0.95
Are they different? TRUE
t.test(WC$income,Prof$income)
##
## Welch Two Sample t-test
##
## data: WC$income and Prof$income
## t = -5.2202, df = 39.673, p-value = 5.977e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7639.884 -3374.411
## sample estimates:
## mean of x mean of y
## 5052.304 10559.452
plot(x=OCI$education, y=OCI$income)
abline(lm(income~education, data=OCI), col="red")
summary(lm(income~education, data=OCI))
##
## Call:
## lm(formula = income ~ education, data = OCI)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5493.2 -2433.8 -41.9 1491.5 17713.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2853.6 1407.0 -2.028 0.0452 *
## education 898.8 127.0 7.075 2.08e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3483 on 100 degrees of freedom
## Multiple R-squared: 0.3336, Adjusted R-squared: 0.3269
## F-statistic: 50.06 on 1 and 100 DF, p-value: 2.079e-10
Yes. The probability of no relationship is basically zero and 0.334 as a proportion of the variance is explained.
confint(lm(income~education, data=OCI))
## 2.5 % 97.5 %
## (Intercept) -5645.1114 -62.05979
## education 646.7782 1150.84748
Each additional year of education is worth between 646.78 and 1150.85 dollars in income with 95% confidence.
plot(x=OCI$education, y=residuals(lm(income~education, data=OCI)))
Line.educ <- lm(income~education, data=OCI)
Poly.educ <- lm(income~poly(education, 2, raw=TRUE), data=OCI)
anova(Line.educ,Poly.educ)
## Analysis of Variance Table
##
## Model 1: income ~ education
## Model 2: income ~ poly(education, 2, raw = TRUE)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 100 1213392025
## 2 99 1123363857 1 90028168 7.934 0.005857 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The curve fits better. Why?
library(effects)
## Registered S3 methods overwritten by 'lme4':
## method from
## cooks.distance.influence.merMod car
## influence.merMod car
## dfbeta.influence.merMod car
## dfbetas.influence.merMod car
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(Poly.educ, partial.residuals=TRUE))
It appears to have a rate of return that is reasonably flat until secondary school and the rate increases more dramatically thereafter. This is consistent with a screening/signaling theory of education.
We will now turn to the multiple regression portion of the inquiry.
The following set of questions utilize the multiple regression below. I have embedded the object as OIM and the syntax appears below.
OIM <- lm(importance~education+income+women+type, data=Occupational.Importance)
Occupational.Importance$resids <- Occupational.Importance$importance - predict(OIM, newdata=Occupational.Importance)
summary(OIM)
##
## Call:
## lm(formula = importance ~ education + income + women + type,
## data = Occupational.Importance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.9325 -4.7107 0.0764 5.4033 19.1735
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6465650 5.4281200 0.303 0.7623
## education 3.2965572 0.6601724 4.993 2.69e-06 ***
## income 0.0011606 0.0002737 4.241 5.17e-05 ***
## women 0.0045517 0.0309262 0.147 0.8833
## typeother -1.7242964 4.0468736 -0.426 0.6710
## typeprof 7.4007262 4.0337748 1.835 0.0697 .
## typewc -1.8411949 2.7586817 -0.667 0.5061
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.551 on 95 degrees of freedom
## Multiple R-squared: 0.8188, Adjusted R-squared: 0.8074
## F-statistic: 71.56 on 6 and 95 DF, p-value: < 2.2e-16
OCI <- Occupational.Importance
Importance = 1.65 [blue collar] + 3.297 per year of education + 0.001 per CA dollar of income + 0.005 per percent women + 7.401[if Professional] - 1.84[if White Collar] + \epsilon
The education slope measures change in importance per additional year of education. The income slope measures change in importance per dollar of income.
The women slope measures changes in importance per additional percent of women in the job. The three slopes for type measure the difference between importance for Professional, Other, and White Collar types [vis a vis Blue Collar].
A better answer actually thinks about how much is explained by the type, taken as a whole. The anova table of the regression model gives us some information.
anova(OIM)
## Analysis of Variance Table
##
## Response: importance
## Df Sum Sq Mean Sq F value Pr(>F)
## education 1 21608.4 21608.4 378.9917 < 2.2e-16 ***
## income 1 2248.1 2248.1 39.4303 1.015e-08 ***
## women 1 5.3 5.3 0.0926 0.76154
## type 3 617.1 205.7 3.6077 0.01616 *
## Residuals 95 5416.5 57.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that the type matters in the sense that it accounts variation. You have some indication that this is true from the pivot table or other summary of importance by type.
From the t-tests in the table this is true. Though the analysis of variance makes this question a bit more complicated.
The metric of the residual standard error is percentage points in importance. It measures the degrees of freedom averaged standard deviation of the residual – the distance between the fitted and actual values. We can compare it to the standard deviation of the data to see how much error is reduced in explaining importance with these factors. Note, the original standard error of importance was 17.2044856.
R-squared. 0.8188191 The proportion of variance.
The F statistic answers the question, does the model explain more than random variation. What is the p-value for this statistic from the above table? 0
Let’s turn to education. As the number of years of education that an incumbent in the job increases by one year, importance/prestige increases by 3.29 points. With 95% confidence, this effect ranges from 1.986 to 4.607.
What occupation has the most prestige/importance compared to the expected/predicted/fitted value?
MEDICAL.TECHNICIANS
NEWSBOYS
quantile(OCI$resid, probs=c(0.25,0.75), na.rm=TRUE)
## 25% 75%
## -4.710664 5.403303
IQR(OCI$resids, na.rm=TRUE)
## [1] 10.11397
The 1st quartile is -4.71 and the 3rd quartile is 5.40. So the IQR is 5.40–4.71=10.11.
library(car)
densityPlot(OIM$residuals)
It is plausible.
qqPlot(OIM$residuals)
## [1] 31 53
shapiro.test(OIM$residuals)
##
## Shapiro-Wilk normality test
##
## data: OIM$residuals
## W = 0.98929, p-value = 0.5932
We cannot rule our normal; the p-value of the Shapiro test is over 0.59.
pnorm(10, 0, 7.551) - pnorm(-10, 0, 7.551)
## [1] 0.8146058
The probability of a prediction within 10 rating points is 0.8146058.
GM <- Occupational.Importance[Occupational.Importance$title=="GENERAL.MANAGERS",]
predict(OIM, newdata=GM, interval="confidence")
## fit lwr upr
## 2 79.51734 70.34393 88.69076
The 95% confidence interval gives average importance for a general manager to be between 70.34 and 88.69.
predict(OIM, newdata=GM, interval="prediction")
## fit lwr upr
## 2 79.51734 61.94286 97.09183
The 95% prediction interval gives the values of importance for a general manager to be between 61.94 and 97.09 with the aforementioned probability.
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
stepAIC(OIM)
## Start: AIC=419.17
## importance ~ education + income + women + type
##
## Df Sum of Sq RSS AIC
## - women 1 1.24 5417.7 417.19
## <none> 5416.5 419.17
## - type 3 617.09 6033.6 424.17
## - income 1 1025.40 6441.9 434.85
## - education 1 1421.67 6838.2 440.94
##
## Step: AIC=417.19
## importance ~ education + income + type
##
## Df Sum of Sq RSS AIC
## <none> 5417.7 417.19
## - type 3 621.13 6038.9 422.26
## - income 1 1378.12 6795.8 438.31
## - education 1 1443.36 6861.1 439.28
##
## Call:
## lm(formula = importance ~ education + income + type, data = Occupational.Importance)
##
## Coefficients:
## (Intercept) education income typeother typeprof
## 1.769393 3.305973 0.001139 -1.732226 7.487737
## typewc
## -1.719057
Step.Res <- stepAIC(OIM)
## Start: AIC=419.17
## importance ~ education + income + women + type
##
## Df Sum of Sq RSS AIC
## - women 1 1.24 5417.7 417.19
## <none> 5416.5 419.17
## - type 3 617.09 6033.6 424.17
## - income 1 1025.40 6441.9 434.85
## - education 1 1421.67 6838.2 440.94
##
## Step: AIC=417.19
## importance ~ education + income + type
##
## Df Sum of Sq RSS AIC
## <none> 5417.7 417.19
## - type 3 621.13 6038.9 422.26
## - income 1 1378.12 6795.8 438.31
## - education 1 1443.36 6861.1 439.28
The final model contains type, education, and income.
ModA <- lm(importance~education+income+type, data=OCI)
summary(ModA)
##
## Call:
## lm(formula = importance ~ education + income + type, data = OCI)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.0864 -4.8662 0.1436 5.3524 19.2652
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7693934 5.3361820 0.332 0.7409
## education 3.3059733 0.6537085 5.057 2.04e-06 ***
## income 0.0011392 0.0002305 4.942 3.28e-06 ***
## typeother -1.7322264 4.0258430 -0.430 0.6680
## typeprof 7.4877370 3.9698331 1.886 0.0623 .
## typewc -1.7190573 2.6174653 -0.657 0.5129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.512 on 96 degrees of freedom
## Multiple R-squared: 0.8188, Adjusted R-squared: 0.8093
## F-statistic: 86.75 on 5 and 96 DF, p-value: < 2.2e-16
anova(ModA)
## Analysis of Variance Table
##
## Response: importance
## Df Sum Sq Mean Sq F value Pr(>F)
## education 1 21608.4 21608.4 382.8938 < 2.2e-16 ***
## income 1 2248.1 2248.1 39.8362 8.515e-09 ***
## type 3 621.1 207.0 3.6688 0.01494 *
## Residuals 96 5417.7 56.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Everything matters. Education seems most important with income behind that and type still important but not on the order of the others.
It does not appear so. But we should investigate a bit. Looking at correlations among the variables, the relationships are the smallest. In the following scatterplots, the third row shows women on the y axis. The first column, education shows nothing obvious. The second column is negatively correlated and far more strongly than the previous. Not much going on. What if it depends on the type?
Occupational.Importance[,c("education","income","women")] %>% cor()
## education income women
## education 1.00000000 0.5775802 0.06185286
## income 0.57758023 1.0000000 -0.44105927
## women 0.06185286 -0.4410593 1.00000000
Occupational.Importance[,c("education","income","women")] %>% plot()
library(ggplot2)
ggplot(data = Occupational.Importance) +
aes(x = education, y = women) +
geom_point(color = "#0c4c8a") +
geom_smooth(span = 2) +
theme_minimal() +
facet_wrap(vars(type))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Above shows education.
ggplot(data = Occupational.Importance) +
aes(x = income, y = women) +
geom_point(color = "#0c4c8a") +
geom_smooth(span = 2) +
theme_minimal() +
facet_wrap(vars(type))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Not much there.
StepR <- lm(formula = importance ~ education + income + type, data = Occupational.Importance)
plot(effect("income",StepR, partial.residuals=TRUE))
StepR2 <- lm(formula = importance ~ education + poly(income, 2, raw=TRUE) + type, data = Occupational.Importance)
anova(StepR,StepR2)
## Analysis of Variance Table
##
## Model 1: importance ~ education + income + type
## Model 2: importance ~ education + poly(income, 2, raw = TRUE) + type
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 96 5417.7
## 2 95 4771.1 1 646.6 12.875 0.0005288 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A line is not quite sufficient.
summary(StepR2)
##
## Call:
## lm(formula = importance ~ education + poly(income, 2, raw = TRUE) +
## type, data = Occupational.Importance)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.6728 -3.9463 0.3106 4.2692 19.8789
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.310e+00 5.107e+00 -0.256 0.798161
## education 2.714e+00 6.384e-01 4.251 4.97e-05 ***
## poly(income, 2, raw = TRUE)1 3.112e-03 5.913e-04 5.263 8.76e-07 ***
## poly(income, 2, raw = TRUE)2 -7.851e-08 2.188e-08 -3.588 0.000529 ***
## typeother 1.883e+00 3.929e+00 0.479 0.632800
## typeprof 9.058e+00 3.770e+00 2.402 0.018228 *
## typewc 2.045e-01 2.527e+00 0.081 0.935678
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.087 on 95 degrees of freedom
## Multiple R-squared: 0.8404, Adjusted R-squared: 0.8303
## F-statistic: 83.38 on 6 and 95 DF, p-value: < 2.2e-16
Incorporating a curve gives me 84.86 percent of variation explained. All other things equal, professional is highest importance, then white and blue collar below; importance increases with education and increases at a decreasing rate with income.
shapiro.test(StepR2$residuals)
##
## Shapiro-Wilk normality test
##
## data: StepR2$residuals
## W = 0.9893, p-value = 0.594
What does the model say for a hypothetical professional with income 18000 and education 13; the 95% confidence interval for the average is:
Hypo.D <- data.frame(income=18000, education=13, type="prof")
predict(StepR2, newdata=Hypo.D, interval="confidence")
## fit lwr upr
## 1 73.61446 68.99409 78.23483
I want to conclude with what you can do with esquisse. Just visualizing this can tell almost all of the story in one picture. Or two, depending on how you wish. The first one is a scatterplot with a third dimension determied by size and color determined by the type of profession. It confirms the basic contours of our investigation before. Importance is increasing with both education and income; income is itself, in part, a function of education [that we cannot really show here]. There is also an apparent general grouping by type though it is embedded in income and education.
Occupational.Importance$SR2FV <- round(predict(StepR2, newdata=Occupational.Importance), digits=2)
Occupational.Importance$SR2Resid <- Occupational.Importance$importance - Occupational.Importance$SR2FV
ggplot(data = Occupational.Importance) +
aes(x = education, y = income, color = type, size = importance) +
geom_point() +
theme_minimal()
It is also apparent that this seems to hold even in the presence of dividing up the data to only examining what is going on within types of profession, but that profession types are partially defined by income and education and have limited variation within them excepting professionals.
ggplot(data = Occupational.Importance) +
aes(x = education, y = income, size = importance, color=type) +
geom_point() +
theme_minimal() +
facet_wrap(vars(type))
With the previous two plots in mind, I will close out the investigation with two plots that use plotly. The first shows education and income with color and size by importance with a hover on the residual from the final model that incorporate more than a line for the relationship between income and imporance.
library(plotly)
Occupational.Importance <- Occupational.Importance %>% mutate(labsM = paste("Importance: ",importance,"<br> Fit:", SR2FV, "<br>", title))
ggplot(Occupational.Importance, aes(x=education, y=income, text=labsM, size=SR2FV, color=type)) + geom_point() + labs(title="Occupational Importance:", caption = "Sized by Final Model Fitted Value", size="Fitted") + theme_minimal() -> p
ggplotly(p, tooltip = "text")
and
p <- plot_ly(Occupational.Importance, x=~education, y=~income, text = ~labsM, size=~SR2Resid, color=~type) %>% layout(title="Occupational Importance: Sized by Final Model Residual")
p
## No trace type specified:
## Based on info supplied, a 'scatter' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No scatter mode specifed:
## Setting the mode to markers
## Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
Another tack at the same issue but with everything in a graphic.
ggplot(Occupational.Importance) + aes(x=education, y=income, label=type, group=type, size=importance, color=type, name=title) + geom_density2d(size=0.05) + geom_text() + labs(color="Women Pct") + scale_color_viridis_d() -> Plotly.Me
ggplotly(Plotly.Me)