Problem Set 4

TASK 1

# Inspect the dataset
str(States)

## 'data.frame':    50 obs. of  22 variables:
##  $ State           : Factor w/ 50 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ HouseholdIncome : num  46.5 76.1 53.5 43.8 67.2 ...
##  $ Region          : Factor w/ 4 levels "MW","NE","S",..: 3 4 4 3 4 4 2 2 3 3 ...
##  $ Population      : num  4.88 0.74 7.02 3 39.54 ...
##  $ EighthGradeMath : num  269 274 280 274 276 ...
##  $ HighSchool      : num  87.1 92.8 87.1 89.1 87.4 91.5 92.4 89.2 89.4 87.5 ...
##  $ College         : num  26 26.5 27.4 24.7 34.5 39.6 42.7 33.4 28.8 30.9 ...
##  $ IQ              : num  95.7 99 97.4 97.5 95.5 ...
##  $ GSP             : num  40.3 70.9 43.1 38.5 67.7 ...
##  $ Vegetables      : num  80.7 81 79.2 80.7 78.6 82.6 83.1 82.8 80.6 81.9 ...
##  $ Fruit           : num  55.1 63.1 62.8 55.3 67.5 67 68.5 64.6 65.6 61.4 ...
##  $ Smokers         : num  20.9 21 15.6 22.3 11.3 14.6 12.7 17 16.1 17.5 ...
##  $ PhysicalActivity: num  42.8 58.3 52.7 45.4 57.5 58.7 51.9 46.3 49.5 46.1 ...
##  $ Obese           : num  36.2 29.5 29.5 37.1 25.8 23 27.4 33.5 30.7 32.5 ...
##  $ NonWhite        : num  31.6 34.7 22.5 22.7 39.4 15.8 23.3 30.9 24.3 40.6 ...
##  $ HeavyDrinkers   : num  5.45 7.33 5.57 5.32 5.95 7.3 6.45 6.4 7.61 5.82 ...
##  $ Electoral       : int  9 3 11 6 55 9 7 3 29 16 ...
##  $ ClintonVote     : num  34.4 36.5 45.1 33.6 61.7 ...
##  $ Elect2016       : Factor w/ 2 levels "D","R": 2 2 2 2 1 1 1 1 2 2 ...
##  $ TwoParents      : num  60.9 71.5 62.7 63.3 66.8 71.9 66.5 62.6 61 62.1 ...
##  $ StudentSpending : num  9.24 17.51 7.61 9.85 11.49 ...
##  $ Insured         : num  83.7 80.2 83.4 84.1 85.2 87.2 91.1 90.7 78.2 79.3 ...

summary(States)

##         State    HouseholdIncome Region    Population     EighthGradeMath
##  Alabama   : 1   Min.   :42.01   MW:13   Min.   : 0.579   Min.   :268.7  
##  Alaska    : 1   1st Qu.:50.91   NE:11   1st Qu.: 1.842   1st Qu.:276.6  
##  Arizona   : 1   Median :56.34   S :13   Median : 4.569   Median :281.2  
##  Arkansas  : 1   Mean   :57.85   W :13   Mean   : 6.501   Mean   :281.2  
##  California: 1   3rd Qu.:64.75           3rd Qu.: 7.309   3rd Qu.:285.7  
##  Colorado  : 1   Max.   :78.92           Max.   :39.537   Max.   :294.5  
##  (Other)   :44                                                           
##    HighSchool       College            IQ              GSP       
##  Min.   :86.40   Min.   :22.50   Min.   : 94.20   Min.   :34.03  
##  1st Qu.:89.03   1st Qu.:27.98   1st Qu.: 98.47   1st Qu.:46.26  
##  Median :91.35   Median :32.65   Median :100.85   Median :51.23  
##  Mean   :90.76   Mean   :33.08   Mean   :100.34   Mean   :52.89  
##  3rd Qu.:92.47   3rd Qu.:37.65   3rd Qu.:102.70   3rd Qu.:59.06  
##  Max.   :95.00   Max.   :50.90   Max.   :104.30   Max.   :73.53  
##                                                                  
##    Vegetables        Fruit          Smokers      PhysicalActivity
##  Min.   :76.10   Min.   :53.70   Min.   : 8.90   Min.   :41.90   
##  1st Qu.:80.62   1st Qu.:61.62   1st Qu.:15.03   1st Qu.:47.10   
##  Median :81.95   Median :63.30   Median :17.15   Median :50.40   
##  Mean   :81.87   Mean   :63.32   Mean   :17.33   Mean   :50.58   
##  3rd Qu.:83.17   3rd Qu.:66.65   3rd Qu.:19.30   3rd Qu.:53.98   
##  Max.   :87.60   Max.   :70.30   Max.   :26.00   Max.   :59.70   
##                                                                  
##      Obese          NonWhite     HeavyDrinkers     Electoral    
##  Min.   :23.00   Min.   : 5.40   Min.   :3.750   Min.   : 3.00  
##  1st Qu.:28.77   1st Qu.:14.35   1st Qu.:5.827   1st Qu.: 5.00  
##  Median :30.90   Median :21.75   Median :6.440   Median : 8.00  
##  Mean   :31.43   Mean   :23.05   Mean   :6.483   Mean   :10.70  
##  3rd Qu.:34.38   3rd Qu.:31.45   3rd Qu.:7.322   3rd Qu.:11.75  
##  Max.   :39.50   Max.   :74.90   Max.   :8.770   Max.   :55.00  
##                                                                 
##   ClintonVote    Elect2016   TwoParents    StudentSpending     Insured     
##  Min.   :21.63   D:20      Min.   :54.10   Min.   : 6.953   Min.   :75.20  
##  1st Qu.:35.83   R:30      1st Qu.:63.30   1st Qu.: 9.604   1st Qu.:83.42  
##  Median :45.91             Median :66.10   Median :11.319   Median :86.20  
##  Mean   :43.78             Mean   :66.53   Mean   :11.950   Mean   :86.24  
##  3rd Qu.:49.98             3rd Qu.:70.42   3rd Qu.:14.072   3rd Qu.:89.67  
##  Max.   :62.22             Max.   :80.60   Max.   :22.366   Max.   :95.80  
##

# Conduct multiple linear regression
model <- lm(ClintonVote ~ College + HouseholdIncome + NonWhite, data = States)

# Create regression tables using sjPlot
tab_model(model, show.std = TRUE, show.ci = FALSE, show.se = TRUE, show.p = TRUE)

	ClintonVote
Predictors	Estimates	std. Error	std. Beta	standardized std. Error	p
(Intercept)	0.99	6.01	0.00	0.09	0.870
College	1.12	0.23	0.69	0.14	<0.001
HouseholdIncome	-0.08	0.15	-0.07	0.14	0.625
NonWhite	0.43	0.08	0.53	0.10	<0.001
Observations	50
R² / R² adjusted	0.616 / 0.590

# Summary of the model
summary(model)

## 
## Call:
## lm(formula = ClintonVote ~ College + HouseholdIncome + NonWhite, 
##     data = States)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.697  -3.605  -1.036   4.495  13.686 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.99082    6.01413   0.165    0.870    
## College          1.12498    0.22514   4.997 8.88e-06 ***
## HouseholdIncome -0.07554    0.15345  -0.492    0.625    
## NonWhite         0.43123    0.08246   5.230 4.05e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.622 on 46 degrees of freedom
## Multiple R-squared:  0.6155, Adjusted R-squared:  0.5904 
## F-statistic: 24.55 on 3 and 46 DF,  p-value: 1.237e-09

The results show that ‘College’ and ‘NonWhite’ are significant predictors of ‘ClintonVote’, but ‘HouseholdIncome’ is not. Specifically, for each percentage point increase in the college-educated population, the percentage of votes for Clinton is expected to increase by 1.12 percentage points (p < 0.001). Similarly, for each percentage point increase in the non-white population, the percentage of votes for Clinton is expected to rise by 0.43 percentage points (p < 0.001). The standardized coefficients (0.69 for ‘College’ and 0.53 for ‘NonWhite’) show that these variables have a significant positive effect on Clinton’s vote share. On the other hand, ‘HouseholdIncome’ has a negligible negative effect on ‘ClintonVote’ (-0.08), which is not statistically significant (p = 0.625). The model explains approximately 61.6% of the variance in ‘ClintonVote’ (R-squared = 0.616), with an adjusted R-squared value of 0.590 indicating the model’s reliability. In conclusion, higher educational attainment and a larger non-white population are associated with higher percentages of votes for Clinton, whereas household income has no significant effect on voting results. The model fits the data well and explains a significant portion of the variation in voting behavior.

TASK 2

#install packages 
install.packages("stargazer")

## 
## The downloaded binary packages are in
##  /var/folders/t_/pz759vz53jz772p82qtw5p8r0000gn/T//RtmpGN4ooR/downloaded_packages

#Load packages 
library("stargazer")

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

# Initial model (without Region)
model1 <- lm(ClintonVote ~ College + HouseholdIncome + NonWhite, data = States)

# Extended model (with Region)
model2 <- lm(ClintonVote ~ College + HouseholdIncome + NonWhite + Region, data = States)

# Compare models using AIC and BIC
aic1 <- AIC(model1)
aic2 <- AIC(model2)
bic1 <- BIC(model1)
bic2 <- BIC(model2)

# Print AIC and BIC values
cat("AIC for model without Region: ", aic1, "\n")

## AIC for model without Region:  336.7667

cat("AIC for model with Region: ", aic2, "\n")

## AIC for model with Region:  326.4857

cat("BIC for model without Region: ", bic1, "\n")

## BIC for model without Region:  346.3268

cat("BIC for model with Region: ", bic2, "\n")

## BIC for model with Region:  341.7819

# Create regression tables using stargazer
stargazer(model1, model2, type = "text",
          title = "Regression Results",
          column.labels = c("Without Region", "With Region"),
          covariate.labels = c("College", "Household Income", "Non-White", "Region"),
          dep.var.labels = "Clinton Vote",
          star.cutoffs = c(0.05, 0.01, 0.001))

## 
## Regression Results
## =================================================================
##                                  Dependent variable:             
##                     ---------------------------------------------
##                                     Clinton Vote                 
##                         Without Region          With Region      
##                              (1)                    (2)          
## -----------------------------------------------------------------
## College                    1.125***               1.189***       
##                            (0.225)                (0.257)        
##                                                                  
## Household Income            -0.076                -0.429*        
##                            (0.153)                (0.171)        
##                                                                  
## Non-White                  0.431***               0.525***       
##                            (0.082)                (0.082)        
##                                                                  
## Region                                            8.031**        
##                                                   (2.721)        
##                                                                  
## RegionS                                            -3.404        
##                                                   (2.742)        
##                                                                  
## RegionW                                            5.658*        
##                                                   (2.772)        
##                                                                  
## Constant                    0.991                 14.806*        
##                            (6.014)                (7.159)        
##                                                                  
## -----------------------------------------------------------------
## Observations                  50                     50          
## R2                          0.616                  0.722         
## Adjusted R2                 0.590                  0.684         
## Residual Std. Error    6.622 (df = 46)        5.820 (df = 43)    
## F Statistic         24.548*** (df = 3; 46) 18.648*** (df = 6; 43)
## =================================================================
## Note:                               *p<0.05; **p<0.01; ***p<0.001

# Interpret the results
summary(model1)

## 
## Call:
## lm(formula = ClintonVote ~ College + HouseholdIncome + NonWhite, 
##     data = States)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.697  -3.605  -1.036   4.495  13.686 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.99082    6.01413   0.165    0.870    
## College          1.12498    0.22514   4.997 8.88e-06 ***
## HouseholdIncome -0.07554    0.15345  -0.492    0.625    
## NonWhite         0.43123    0.08246   5.230 4.05e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.622 on 46 degrees of freedom
## Multiple R-squared:  0.6155, Adjusted R-squared:  0.5904 
## F-statistic: 24.55 on 3 and 46 DF,  p-value: 1.237e-09

summary(model2)

## 
## Call:
## lm(formula = ClintonVote ~ College + HouseholdIncome + NonWhite + 
##     Region, data = States)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3217  -4.1695   0.0435   4.2880  11.2477 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     14.80589    7.15877   2.068   0.0447 *  
## College          1.18862    0.25668   4.631 3.36e-05 ***
## HouseholdIncome -0.42863    0.17107  -2.506   0.0161 *  
## NonWhite         0.52461    0.08175   6.417 9.03e-08 ***
## RegionNE         8.03052    2.72062   2.952   0.0051 ** 
## RegionS         -3.40381    2.74204  -1.241   0.2212    
## RegionW          5.65798    2.77249   2.041   0.0474 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.82 on 43 degrees of freedom
## Multiple R-squared:  0.7224, Adjusted R-squared:  0.6836 
## F-statistic: 18.65 on 6 and 43 DF,  p-value: 1.545e-10

In this model that includes ‘Region’, ‘College’ and ‘NonWhite’ are still significant positive predictors of Clinton’s vote share. Each percentage point increase in the college-educated population corresponds with a 1.19 percentage point increase in votes for Clinton (p < 0.001). Similarly, each percentage point increase in the non-white population is associated with a 0.52 percentage point increase in votes for Clinton. Interestingly, household income becomes a significant predictor in this model. Each unit increase in household income is associated with a 0.43 percentage point decrease in votes for Clinton (p = 0.016).

The inclusion of the ‘Region’ variable reveals significant regional differences in voting patterns. Being in the Northeast region correlates with an 8.03 percentage point increase in Clinton votes with regard to the reference region (p = 0.005). The West region also has a positive effect, with Clinton’s votes increasing by 5.66 percentage points (p = 0.047). However, the South region has no significant impact on Clinton’s vote share (p = 0.221). The extended model explains a larger proportion of the variance in Clinton’s vote share (R-squared = 0.722), indicating a better fit.

I prefer the extened model with the ‘Region’ variable because it provides a more detailed and accurate understanding of the variation in Clinton’s vote share. It has a better fit and reveals more information.

TASK 3

# Load necessary libraries
library(effects)

## Loading required package: carData

## 
## Attaching package: 'carData'

## The following object is masked _by_ '.GlobalEnv':
## 
##     States

## lattice theme set by effectsTheme()
## See ?effectsTheme for details.

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:sjlabelled':
## 
##     as_label

# Fit the linear model with interaction
interaction_model <- lm(ClintonVote ~ College * NonWhite, data = States)

# Summary of the model
summary(interaction_model)

## 
## Call:
## lm(formula = ClintonVote ~ College * NonWhite, data = States)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.446  -3.543  -0.295   3.936  13.831 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -15.22574   13.26349  -1.148 0.256927    
## College            1.50058    0.39913   3.760 0.000479 ***
## NonWhite           0.99725    0.47769   2.088 0.042397 *  
## College:NonWhite  -0.01802    0.01457  -1.236 0.222664    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.532 on 46 degrees of freedom
## Multiple R-squared:  0.6259, Adjusted R-squared:  0.6015 
## F-statistic: 25.66 on 3 and 46 DF,  p-value: 6.636e-10

# Plot the interaction effect using the effects package
interaction_effect <- effect("College*NonWhite", interaction_model)
plot(interaction_effect, main="Interaction Effect of College and NonWhite on ClintonVote", xlab="College (%)", ylab="Predicted ClintonVote (%)")

# Detailed model summary
summary(interaction_model)

## 
## Call:
## lm(formula = ClintonVote ~ College * NonWhite, data = States)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.446  -3.543  -0.295   3.936  13.831 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -15.22574   13.26349  -1.148 0.256927    
## College            1.50058    0.39913   3.760 0.000479 ***
## NonWhite           0.99725    0.47769   2.088 0.042397 *  
## College:NonWhite  -0.01802    0.01457  -1.236 0.222664    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.532 on 46 degrees of freedom
## Multiple R-squared:  0.6259, Adjusted R-squared:  0.6015 
## F-statistic: 25.66 on 3 and 46 DF,  p-value: 6.636e-10

Including the interaction term between ‘College’ and ‘NonWhite’ in the model provides a more nuanced view of the relationship between education and voting patterns. The negative but non-significant interaction term implies that the positive effect of college education on Clinton’s vote percentage could decrease slightly as the percentage of the non-white population grows. However, because the interaction is not statistically significant, the moderating effect is not sufficient enough to draw a firm conclusion.

TASK 4

# Create the binary outcome variable
States$ClintonWin <- ifelse(States$ClintonVote > 50, 1, 0)

# Inspect the new variable
table(States$ClintonWin)

## 
##  0  1 
## 37 13

# Fit the logistic regression model
logistic_model <- glm(ClintonWin ~ College + HouseholdIncome + Region, data = States, family = binomial)

# Summary of the logistic regression model
summary(logistic_model)

## 
## Call:
## glm(formula = ClintonWin ~ College + HouseholdIncome + Region, 
##     family = binomial, data = States)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)   
## (Intercept)      -17.16240    6.31603  -2.717  0.00658 **
## College            0.28494    0.14633   1.947  0.05150 . 
## HouseholdIncome    0.07628    0.07376   1.034  0.30110   
## RegionNE           2.17592    1.40578   1.548  0.12166   
## RegionS          -16.31016 2420.64651  -0.007  0.99462   
## RegionW            2.58895    1.62430   1.594  0.11096   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 57.306  on 49  degrees of freedom
## Residual deviance: 25.780  on 44  degrees of freedom
## AIC: 37.78
## 
## Number of Fisher Scoring iterations: 18

library(modelsummary)

## `modelsummary` 2.0.0 now uses `tinytable` as its default table-drawing
##   backend. Learn more at: https://vincentarelbundock.github.io/tinytable/
## 
## Revert to `kableExtra` for one session:
## 
##   options(modelsummary_factory_default = 'kableExtra')
##   options(modelsummary_factory_latex = 'kableExtra')
##   options(modelsummary_factory_html = 'kableExtra')
## 
## Silence this message forever:
## 
##   config_modelsummary(startup_message = FALSE)

# Create regression tables using modelsummary
model_list <- list("Logistic Regression" = logistic_model)
modelsummary(model_list, statistic = c('std.error', 'p.value'), stars = TRUE)

tinytable_yhzfnu2wft1cu35hl4ll

	Logistic Regression
+ p < 0.1, * p < 0.05, p < 0.01, * p < 0.001
(Intercept)	-17.162**
	(6.316)
	(0.007)
College	0.285+
	(0.146)
	(0.052)
HouseholdIncome	0.076
	(0.074)
	(0.301)
RegionNE	2.176
	(1.406)
	(0.122)
RegionS	-16.310
	(2420.647)
	(0.995)
RegionW	2.589
	(1.624)
	(0.111)
Num.Obs.	50
AIC	37.8
BIC	49.3
Log.Lik.	-12.890
RMSE	0.29

# Create regression tables using sjPlot
tab_model(logistic_model, show.std = TRUE, show.ci = TRUE, show.se = TRUE, show.p = TRUE)

	ClintonWin
Predictors	Odds Ratios	std. Error	std. Beta	standardized std. Error	CI	standardized CI	p
(Intercept)	0.00	0.00	0.04	0.05	0.00 – Inf	0.00 – Inf	0.007
College	1.33	0.19	6.05	5.59	0.00 – Inf	0.00 – Inf	0.052
HouseholdIncome	1.08	0.08	2.07	1.46	0.00 – Inf	0.00 – Inf	0.301
Region [NE]	8.81	12.39	8.81	12.39	0.00 – Inf	0.00 – Inf	0.122
Region [S]	0.00	0.00	0.00	0.00	0.00 – Inf	0.00 – Inf	0.995
Region [W]	13.32	21.63	13.32	21.63	0.00 – Inf	0.00 – Inf	0.111
Observations	50
R² Tjur	0.564

The odds ratio of 1.33 indicates that for every percentage point increase in the college-educated population, the chances of voting for Clinton rise by 33%. This effect is marginally significant (p-value = 0.052), indicating that education may have a positive influence on voting behavior, though the data does not firmly support this conclusion. Household income, on the other hand, does not seem to have a significant impact on the likelihood of voting for Clinton. The odds ratio of 1.08 indicates that people with higher household incomes are slightly more likely to vote for Clinton, but this effect is not statistically significant (p = 0.301). This suggests that household income does not have a significant impact on voting outcomes in this model. The regional effects provide interesting insights, though they are not statistically significant. States in the Northeast have an odds ratio of 8.81, indicating that living in the Northeast increases the likelihood of voting for Clinton by about 8.81 times when compared to the reference region. However, this effect is statistically insignificant (p = 0.122). Similarly, the odds ratio for states in the West is 13.32, which indicates a strong positive influence on voting for Clinton, but again, this result is not statistically significant (p = 0.111). In contrast, the South has an odds ratio of 0.00, indicating a negative effect on voting for Clinton, though once again, this is not statistically significant (p = 0.995).

The model fit, shown by Tjur’s R-squared value of 0.564, suggests that the predictors in the model explain about 56.4% of the variation in the binary outcome (whether or not a state voted for Clinton). This indicates a reasonably good fit, emphasizing the importance of including educational and regional factors in voting behavior analysis.

TASK 5

library(effects)
# Compute the effects for College separated by Region
college_effect <- Effect(c("College", "Region"), logistic_model)

# Print the computed effects to verify
print(college_effect)

## 
##  College*Region effect
##        Region
## College          MW         NE            S          W
##      22 0.001529445 0.01331582 1.264117e-10 0.01998916
##      30 0.014747399 0.11650937 1.235255e-09 0.16618842
##      37 0.099098735 0.49216147 9.077778e-09 0.59427511
##      44 0.447017978 0.87687838 6.671179e-08 0.91499584
##      51 0.855922149 0.98125211 4.902589e-07 0.98751633

# Extract the predicted probabilities from the effects object
predicted_data <- as.data.frame(college_effect)

# Inspect the predicted data
print(head(predicted_data))

##   College Region         fit          se        lower     upper
## 1      22     MW 0.001529445 0.003839379 1.109538e-05 0.1745571
## 2      30     MW 0.014747399 0.022696798 7.002030e-04 0.2422794
## 3      37     MW 0.099098735 0.101499274 1.170994e-02 0.5052449
## 4      44     MW 0.447017978 0.370236482 4.115875e-02 0.9383607
## 5      51     MW 0.855922149 0.283699493 6.139181e-02 0.9981501
## 6      22     NE 0.013315818 0.032057321 1.130447e-04 0.6169967

# Create a ggplot2 plot
ggplot(predicted_data, aes(x = College, y = fit, color = Region)) +
  geom_line(linewidth = 1) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = Region), alpha = 0.2) +
  labs(title = "Predicted Probability of Voting for Clinton by College and Region",
       x = "Percentage of College Educated", y = "Predicted Probability of Voting for Clinton") +
  theme_minimal()

Problem Set 4

Anum Peshimam

2024-07-24