Data & Results Visualizations

We will be working with the Flextable, Stargazer and jtools packages in this tutorial. For more information about each one, see the following links:

Descriptive Data Results

Sometimes you need/want to create a table of results that includes the descriptive results from your data. This can mean things like demographic profile of your sample, counts of the number of unique outcomes for specific variables, and means/variances of continuous variables. Here, we use two packages to create nicely formatted and easily manipulated tables using the tabyl function from the janitor package and the flextable package.

names (anes)

## [1] "age"          "female"       "education"    "trump_vote"   "income_group"
## [6] "poc"          "ideo_str"     "party_str"    "donate"

We start by looking at the names of variables in our ANES data to find the ones that would be considered demographic information. We see 5 variables:

Education
Income
Biological Sex
Person of Color or Not
Age

Here we look at summarizing the education and age variables. The tabyl function allows you to create nicely formatted tibbles that you can then pass to flextable for exporting to specific file types as well as other functions.

We do two unique things in this code. First, we manually calculate the percentage with % sign using the mutate function then we relabel the row names so that they have options anyone can easily understand. We also set missing data to No Response.

educ_tab <- anes %>%  
  tabyl(education) %>% 
  filter(n != 0) %>% 
    mutate(percent = round(percent * 100, 1), 
         percent = paste0(percent, "%"))  %>% 
  mutate(Response = case_when(
     is.na(education) ~ "No Response", 
    education == 1 ~ "HS or Less",     
    education == 2 ~ "Some College",    
    education == 3 ~ "BA/BS",           
    education == 4 ~ "Advanced Degree",   
    TRUE ~ as.character(education)          
  )) %>% 
  rename(Count = n, Percent = percent) %>% 
 dplyr::select(-matches("valid_percent")) 

e<-flextable(educ_tab) #Save the tibble as a flextable 

print(e, preview = "docx")

## a flextable object.
## col_keys: `education`, `Count`, `Percent`, `Response` 
## header has 1 row(s) 
## body has 5 row(s) 
## original dataset sample: 
##  education Count Percent        Response
##          1   586   15.7%      HS or Less
##          2   943   25.2%    Some College
##          3   720   19.3%           BA/BS
##          4   557   14.9% Advanced Degree
##         NA   929   24.9%     No Response

Now, we look at summarizing age, which as a continuous variable, is done in a different way compared to the ordered response above.

age_avg<-anes %>% #Gets average income level in the sample
  summarise(mean = mean(age, na.rm = TRUE),  # Exclude NA values
            sd = sd(age, na.rm = TRUE),      # Exclude NA values
            n = sum(!is.na(age))) %>%        # Count non-NA values
  mutate(mean = round(mean, 1)) %>% 
  mutate(sd = round(sd, 1))

a<-flextable(age_avg)

Now that we have saved our flextables with demographic information, we then combine them and export to Word. Flextable handles a wide variety of format types so if you want a different one just change the format type. See this reference website for Flextable: https://ardata-fr.github.io/flextable-book/rendering.html

save_as_docx(
  "Education" = e,
  "Age" = a, 
  path = "example.docx")
getwd() #This is where your word doc will be

## [1] "C:/Users/Carey/Dropbox/UMass/2024-2025/Winter/DACSS 603/Code"

Model Estimation

Now, let’s turn our attention to visualizing regression results. We first will estimate the logit model, everything works the same for OLS, we worked on in class last week. Once we have saved those regression results, we then work with stargazer and jtools packages to return nicely formatted tables of results.

logit <- glm(donate ~ factor(party_str) + factor(ideo_str) +  factor(education) + factor(income_group) + female + poc + age + trump_vote , data = anes, family = binomial(link = "logit"))

Table of Regression Results

Stargazer

We start by reviewing stargazer code for presenting results. First, we show the most basic output, type = "text", using digits=3 to control the number of decimal points shown in the table. This gives us a long but narrow table of results. With only 1 model to show, it can be useful to make the results wide format. We do that next.

stargazer(logit, digits=3, type="text")

## 
## =================================================
##                           Dependent variable:    
##                       ---------------------------
##                                 donate           
## -------------------------------------------------
## factor(party_str)2               0.494           
##                                 (0.312)          
##                                                  
## factor(party_str)3               0.223           
##                                 (0.321)          
##                                                  
## factor(party_str)4             0.917***          
##                                 (0.301)          
##                                                  
## factor(ideo_str)2               -0.022           
##                                 (0.181)          
##                                                  
## factor(ideo_str)3              0.720***          
##                                 (0.175)          
##                                                  
## factor(ideo_str)4              1.089***          
##                                 (0.214)          
##                                                  
## factor(education)2             0.791***          
##                                 (0.198)          
##                                                  
## factor(education)3             1.161***          
##                                 (0.202)          
##                                                  
## factor(education)4             1.407***          
##                                 (0.210)          
##                                                  
## factor(income_group)2           -0.145           
##                                 (0.211)          
##                                                  
## factor(income_group)3            0.099           
##                                 (0.186)          
##                                                  
## factor(income_group)4           0.463**          
##                                 (0.190)          
##                                                  
## factor(income_group)5          0.729***          
##                                 (0.262)          
##                                                  
## female                         -0.266**          
##                                 (0.113)          
##                                                  
## poc                              0.031           
##                                 (0.139)          
##                                                  
## age                            0.036***          
##                                 (0.004)          
##                                                  
## trump_vote                     -0.742***         
##                                 (0.125)          
##                                                  
## Constant                       -4.689***         
##                                 (0.469)          
##                                                  
## -------------------------------------------------
## Observations                     2,116           
## Log Likelihood                -1,009.556         
## Akaike Inf. Crit.              2,055.112         
## =================================================
## Note:                 *p<0.1; **p<0.05; ***p<0.01

With long tables but only one or two models, it can be easier to display the coefficients with associated standard errors side-by-side rather than vertically. We do this by adding single.row=TRUE to the stargazer command.

stargazer(logit, digits=3, type="text",  single.row = TRUE)

## 
## =================================================
##                           Dependent variable:    
##                       ---------------------------
##                                 donate           
## -------------------------------------------------
## factor(party_str)2           0.494 (0.312)       
## factor(party_str)3           0.223 (0.321)       
## factor(party_str)4         0.917*** (0.301)      
## factor(ideo_str)2           -0.022 (0.181)       
## factor(ideo_str)3          0.720*** (0.175)      
## factor(ideo_str)4          1.089*** (0.214)      
## factor(education)2         0.791*** (0.198)      
## factor(education)3         1.161*** (0.202)      
## factor(education)4         1.407*** (0.210)      
## factor(income_group)2       -0.145 (0.211)       
## factor(income_group)3        0.099 (0.186)       
## factor(income_group)4       0.463** (0.190)      
## factor(income_group)5      0.729*** (0.262)      
## female                     -0.266** (0.113)      
## poc                          0.031 (0.139)       
## age                        0.036*** (0.004)      
## trump_vote                 -0.742*** (0.125)     
## Constant                   -4.689*** (0.469)     
## -------------------------------------------------
## Observations                     2,116           
## Log Likelihood                -1,009.556         
## Akaike Inf. Crit.              2,055.112         
## =================================================
## Note:                 *p<0.1; **p<0.05; ***p<0.01

Right now the columns are called ‘donate’, which reflects the variable name of the dependent variable in the model. This should be updated to be more descriptive of the DV. Here we use dep.var.labels=c("Donate Money") to update the name on the top of the model coefficients to Donate Money.

We also need to update the labels of the individual rows so that the coefficients can be more easily understood by your audience. In stargazer, you do this using the covariate.labels command. Importantly, you have to type in the new row names in the order they will appear in the output. So it is easy to miss one and get mislabel the variable in the model itself. Be careful and double-check your work on this. It can be helpful to print a stargazer model first so that you can update in order that you see in the output.

Because we have a lot of factors in this model, therefore a lot of coefficients, this is an extreme example of updating row labels. Only do this step at the very end when you are ready to display the results to others.

stargazer(logit, digits=3, type="text",  single.row = TRUE)

## 
## =================================================
##                           Dependent variable:    
##                       ---------------------------
##                                 donate           
## -------------------------------------------------
## factor(party_str)2           0.494 (0.312)       
## factor(party_str)3           0.223 (0.321)       
## factor(party_str)4         0.917*** (0.301)      
## factor(ideo_str)2           -0.022 (0.181)       
## factor(ideo_str)3          0.720*** (0.175)      
## factor(ideo_str)4          1.089*** (0.214)      
## factor(education)2         0.791*** (0.198)      
## factor(education)3         1.161*** (0.202)      
## factor(education)4         1.407*** (0.210)      
## factor(income_group)2       -0.145 (0.211)       
## factor(income_group)3        0.099 (0.186)       
## factor(income_group)4       0.463** (0.190)      
## factor(income_group)5      0.729*** (0.262)      
## female                     -0.266** (0.113)      
## poc                          0.031 (0.139)       
## age                        0.036*** (0.004)      
## trump_vote                 -0.742*** (0.125)     
## Constant                   -4.689*** (0.469)     
## -------------------------------------------------
## Observations                     2,116           
## Log Likelihood                -1,009.556         
## Akaike Inf. Crit.              2,055.112         
## =================================================
## Note:                 *p<0.1; **p<0.05; ***p<0.01

stargazer(logit, digits=3, type="text", dep.var.labels=c("Donate Money"),
          covariate.labels=c("PID Ind: Lean", "PID Ind: Weak",  "PID Ind: Strong",
                             "Ideo Moderate: Slight", "Ideo Moderate: Medium",  "Ideo Moderate: Extreme", "ED HS or Less: Some College", "ED HS or Less: BA", "ED HS or Less: Advanced Degree", 
"Inc 0-25K: 25-50K","Inc 0-25K: 50-100K", "Inc 0-25K: 100-250K", "Inc 0-25K: 250+K",
"Sex: Female or Not", "Person of Color or Not", "Age", "Supported Trump"),  single.row = TRUE)

## 
## ==========================================================
##                                    Dependent variable:    
##                                ---------------------------
##                                       Donate Money        
## ----------------------------------------------------------
## PID Ind: Lean                         0.494 (0.312)       
## PID Ind: Weak                         0.223 (0.321)       
## PID Ind: Strong                     0.917*** (0.301)      
## Ideo Moderate: Slight                -0.022 (0.181)       
## Ideo Moderate: Medium               0.720*** (0.175)      
## Ideo Moderate: Extreme              1.089*** (0.214)      
## ED HS or Less: Some College         0.791*** (0.198)      
## ED HS or Less: BA                   1.161*** (0.202)      
## ED HS or Less: Advanced Degree      1.407*** (0.210)      
## Inc 0-25K: 25-50K                    -0.145 (0.211)       
## Inc 0-25K: 50-100K                    0.099 (0.186)       
## Inc 0-25K: 100-250K                  0.463** (0.190)      
## Inc 0-25K: 250+K                    0.729*** (0.262)      
## Sex: Female or Not                  -0.266** (0.113)      
## Person of Color or Not                0.031 (0.139)       
## Age                                 0.036*** (0.004)      
## Supported Trump                     -0.742*** (0.125)     
## Constant                            -4.689*** (0.469)     
## ----------------------------------------------------------
## Observations                              2,116           
## Log Likelihood                         -1,009.556         
## Akaike Inf. Crit.                       2,055.112         
## ==========================================================
## Note:                          *p<0.1; **p<0.05; ***p<0.01

We can also export the stargazer output into different formats that we can then include in presentations and papers. Here, we export as type="html" with out="logit.htm" used to specify the name of the outputted results.

## 
## <table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td>Donate Money</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">PID Ind: Lean</td><td>0.494 (0.312)</td></tr>
## <tr><td style="text-align:left">PID Ind: Weak</td><td>0.223 (0.321)</td></tr>
## <tr><td style="text-align:left">PID Ind: Strong</td><td>0.917<sup>***</sup> (0.301)</td></tr>
## <tr><td style="text-align:left">Ideo Moderate: Slight</td><td>-0.022 (0.181)</td></tr>
## <tr><td style="text-align:left">Ideo Moderate: Medium</td><td>0.720<sup>***</sup> (0.175)</td></tr>
## <tr><td style="text-align:left">Ideo Moderate: Extreme</td><td>1.089<sup>***</sup> (0.214)</td></tr>
## <tr><td style="text-align:left">ED HS or Less: Some College</td><td>0.791<sup>***</sup> (0.198)</td></tr>
## <tr><td style="text-align:left">ED HS or Less: BA</td><td>1.161<sup>***</sup> (0.202)</td></tr>
## <tr><td style="text-align:left">ED HS or Less: Advanced Degree</td><td>1.407<sup>***</sup> (0.210)</td></tr>
## <tr><td style="text-align:left">Inc 0-25K: 25-50K</td><td>-0.145 (0.211)</td></tr>
## <tr><td style="text-align:left">Inc 0-25K: 50-100K</td><td>0.099 (0.186)</td></tr>
## <tr><td style="text-align:left">Inc 0-25K: 100-250K</td><td>0.463<sup>**</sup> (0.190)</td></tr>
## <tr><td style="text-align:left">Inc 0-25K: 250+K</td><td>0.729<sup>***</sup> (0.262)</td></tr>
## <tr><td style="text-align:left">Sex: Female or Not</td><td>-0.266<sup>**</sup> (0.113)</td></tr>
## <tr><td style="text-align:left">Person of Color or Not</td><td>0.031 (0.139)</td></tr>
## <tr><td style="text-align:left">Age</td><td>0.036<sup>***</sup> (0.004)</td></tr>
## <tr><td style="text-align:left">Supported Trump</td><td>-0.742<sup>***</sup> (0.125)</td></tr>
## <tr><td style="text-align:left">Constant</td><td>-4.689<sup>***</sup> (0.469)</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>2,116</td></tr>
## <tr><td style="text-align:left">Log Likelihood</td><td>-1,009.556</td></tr>
## <tr><td style="text-align:left">Akaike Inf. Crit.</td><td>2,055.112</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>

## [1] "C:/Users/Carey/Dropbox/UMass/2024-2025/Winter/DACSS 603/Code"

JTools

The jtools package provides us with another easy way to summarizing regression results. In addition to summarizing results in a tabular format, jtools also provides easy ways to create coefficient plots to visually display the results.

summ(logit)

## Warning in !is.null(rmarkdown::metadata$output) && rmarkdown::metadata$output
## %in% : 'length(x) = 2 > 1' in coercion to 'logical(1)'

Observations	2116 (1619 missing obs. deleted)
Dependent variable	donate
Type	Generalized linear model
Family	binomial
Link	logit

χ²(17)	340.71
Pseudo-R² (Cragg-Uhler)	0.22
Pseudo-R² (McFadden)	0.14
AIC	2055.11
BIC	2156.94

	Est.	S.E.	z val.	p
(Intercept)	-4.69	0.47	-10.00	0.00
factor(party_str)2	0.49	0.31	1.59	0.11
factor(party_str)3	0.22	0.32	0.70	0.49
factor(party_str)4	0.92	0.30	3.05	0.00
factor(ideo_str)2	-0.02	0.18	-0.12	0.90
factor(ideo_str)3	0.72	0.17	4.13	0.00
factor(ideo_str)4	1.09	0.21	5.08	0.00
factor(education)2	0.79	0.20	4.00	0.00
factor(education)3	1.16	0.20	5.74	0.00
factor(education)4	1.41	0.21	6.71	0.00
factor(income_group)2	-0.14	0.21	-0.69	0.49
factor(income_group)3	0.10	0.19	0.53	0.60
factor(income_group)4	0.46	0.19	2.43	0.01
factor(income_group)5	0.73	0.26	2.78	0.01
female	-0.27	0.11	-2.34	0.02
poc	0.03	0.14	0.22	0.82
age	0.04	0.00	9.41	0.00
trump_vote	-0.74	0.13	-5.91	0.00
Standard errors: MLE

The most basic way to display results with the jtools package is through the summ function. This is a quick and efficient way to review results of any model. However, it does not give us a lot of control over the look of the results as we want. For this, we turn to the export_summ function from jtools, which provides a wide variety of controls in terms of how the regression results look.

With the export_summ function, we can control the model names using model.names and the names of the coefficients using coefs=. Unlike Stargazer, you can worry less about getting labels out of order since you have to specify which variable the new name is associated with. However, if you leave a variable name out using the coefs= approach, that variable’s results will not be displayed.

export_summs(logit,  model.names = c("Logit: Donate Money or Not"), coefs=c("Intercept" = "(Intercept)", "PID: Ind to Lean" = "factor(party_str)2", "PID: Ind to Weak" = "factor(party_str)3", "PID: Ind to Strong" = "factor(party_str)4", "Ideo: Mod to Slight" = "factor(ideo_str)2", "Ideo: Mod to Medium" = "factor(ideo_str)3", "Ideo: Mod to Extreme" = "factor(ideo_str)4", "Educ: HS or less to Some College" = "factor(education)2", "Educ: HS or less to BA/BS" = "factor(education)3", "Educ: HS or less to Advanced Deg" = "factor(education)4",  "Inc: 0-25K to 25-50K" = "factor(income_group)2", "Inc: 0-25K to 50-100K" = "factor(income_group)3", "Inc: 0-25K to 100-250K" = "factor(income_group)4", "Inc: 0-25K to 250K+" = "factor(income_group)5", 
"Female" = "female", "Person of Color" = "poc", "Age" = "age", "Trump Supporter" = "trump_vote") , quiet = TRUE)

## Registered S3 methods overwritten by 'broom':
##   method            from  
##   tidy.glht         jtools
##   tidy.summary.glht jtools

	Logit: Donate Money or Not
Intercept	-4.69 ***
	(0.47)
PID: Ind to Lean	0.49
	(0.31)
PID: Ind to Weak	0.22
	(0.32)
PID: Ind to Strong	0.92 **
	(0.30)
Ideo: Mod to Slight	-0.02
	(0.18)
Ideo: Mod to Medium	0.72 ***
	(0.17)
Ideo: Mod to Extreme	1.09 ***
	(0.21)
Educ: HS or less to Some College	0.79 ***
	(0.20)
Educ: HS or less to BA/BS	1.16 ***
	(0.20)
Educ: HS or less to Advanced Deg	1.41 ***
	(0.21)
Inc: 0-25K to 25-50K	-0.14
	(0.21)
Inc: 0-25K to 50-100K	0.10
	(0.19)
Inc: 0-25K to 100-250K	0.46 *
	(0.19)
Inc: 0-25K to 250K+	0.73 **
	(0.26)
Female	-0.27 *
	(0.11)
Person of Color	0.03
	(0.14)
Age	0.04 ***
	(0.00)
Trump Supporter	-0.74 ***
	(0.13)
N	2116
AIC	2055.11
BIC	2156.94
Pseudo R2	0.22
* p < 0.001; p < 0.01; * p < 0.05.

We can also easily export jtools regression results into a wide variety of formats. Here we export to Word using to.file="docx", file.name=logit_jtools.docx". This specifies the file type as Word, “docx” and then the file name to be saved as a Word document, “.docx”. This will be saved in your current working directory.

We also estimate a second logit model to illustrate how to control the column names for multiple columns in the same table.

logit2 <- glm(donate ~ factor(party_str) + factor(ideo_str) +   factor(income_group) + female + poc + age + trump_vote , data = anes, family = binomial(link = "logit"))

export_summs(logit, logit2, digits=3,  model.names = c("Donate: W/Education", "Donate:WO/Education"), coefs=c("Intercept" = "(Intercept)", "PID: Ind to Lean" = "factor(party_str)2", "PID: Ind to Weak" = "factor(party_str)3", "PID: Ind to Strong" = "factor(party_str)4", "Ideo: Mod to Slight" = "factor(ideo_str)2", "Ideo: Mod to Medium" = "factor(ideo_str)3", "Ideo: Mod to Extreme" = "factor(ideo_str)4", "Educ: HS or less to Some College" = "factor(education)2", "Educ: HS or less to BA/BS" = "factor(education)3", "Educ: HS or less to Advanced Deg" = "factor(education)4",  "Inc: 0-25K to 25-50K" = "factor(income_group)2", "Inc: 0-25K to 50-100K" = "factor(income_group)3", "Inc: 0-25K to 100-250K" = "factor(income_group)4", "Inc: 0-25K to 250K+" = "factor(income_group)5", 
"Female" = "female", "Person of Color" = "poc", "Age" = "age", "Trump Supporter" = "trump_vote"), to.file = "docx", file.name = "logit_coef.docx")

	Donate: W/Education	Donate:WO/Education
Intercept	-4.689 ***	-4.015 ***
	(0.469)	(0.433)
PID: Ind to Lean	0.494	0.656 *
	(0.312)	(0.308)
PID: Ind to Weak	0.223	0.374
	(0.321)	(0.317)
PID: Ind to Strong	0.917 **	0.983 ***
	(0.301)	(0.297)
Ideo: Mod to Slight	-0.022	-0.016
	(0.181)	(0.178)
Ideo: Mod to Medium	0.720 ***	0.853 ***
	(0.175)	(0.171)
Ideo: Mod to Extreme	1.089 ***	1.158 ***
	(0.214)	(0.209)
Educ: HS or less to Some College	0.791 ***
	(0.198)
Educ: HS or less to BA/BS	1.161 ***
	(0.202)
Educ: HS or less to Advanced Deg	1.407 ***
	(0.210)
Inc: 0-25K to 25-50K	-0.145	-0.145
	(0.211)	(0.206)
Inc: 0-25K to 50-100K	0.099	0.272
	(0.186)	(0.179)
Inc: 0-25K to 100-250K	0.463 *	0.754 ***
	(0.190)	(0.182)
Inc: 0-25K to 250K+	0.729 **	1.070 ***
	(0.262)	(0.251)
Female	-0.266 *	-0.228 *
	(0.113)	(0.111)
Person of Color	0.031	0.008
	(0.139)	(0.136)
Age	0.036 ***	0.035 ***
	(0.004)	(0.004)
Trump Supporter	-0.742 ***	-0.914 ***
	(0.125)	(0.122)
N	2116	2136
AIC	2055.112	2121.100
BIC	2156.944	2206.100
Pseudo R2	0.221	0.191
* p < 0.001; p < 0.01; * p < 0.05.

Coefficient Plots

Sometimes you want to visualize your regression coefficients. You can do that using the jtools package along with the plot_summs function. The plot_summs function code is the same as the export_summs code we reviewed above. This function has a wide variety of options available. You can change the confidence level if needed using the inner_ci_level= option or add robust standard errors using robust=TRUE.

Note: Model names will only work with two or more models in the graph.

plot_summs(logit,  model.names = c("Logit: Donate Money or Not"), coefs=c( "PID: Ind to Lean" = "factor(party_str)2", "PID: Ind to Weak" = "factor(party_str)3", "PID: Ind to Strong" = "factor(party_str)4", "Ideo: Mod to Slight" = "factor(ideo_str)2", "Ideo: Mod to Medium" = "factor(ideo_str)3", "Ideo: Mod to Extreme" = "factor(ideo_str)4", "Educ: HS or less to Some College" = "factor(education)2", "Educ: HS or less to BA/BS" = "factor(education)3", "Educ: HS or less to Advanced Deg" = "factor(education)4",  "Inc: 0-25K to 25-50K" = "factor(income_group)2", "Inc: 0-25K to 50-100K" = "factor(income_group)3", "Inc: 0-25K to 100-250K" = "factor(income_group)4", "Inc: 0-25K to 250K+" = "factor(income_group)5", 
"Female" = "female", "Person of Color" = "poc", "Age" = "age"),
inner_ci_level = .9, robust=TRUE)

To graph multiple models together, you simply add the name of the additional model to the code. Here, we add logit2, to the code to compare the two logit models side-by-side. We also drop the inner_ci_level from the code since we don’t need multiple confidence interval levels on the plot.

It is also important to think about the colors you want to display in the graphs. Depending on your audience, you might need to use colors that are differentiable for people with issues with seeing colors, you might need to follow a specific style guide, or you might want to just play around with different colors to see what you like and what you don’t like. You can handle that easily with colors=. You must specify the exact number of colors as you have models. Since we are comparing two models, we specify two colors, colors=c("black", "grey").

plot_summs(logit, logit2,   model.names = c("Donate: W/Education", "Donate:WO/Education"), coefs=c( "PID: Ind to Lean" = "factor(party_str)2", "PID: Ind to Weak" = "factor(party_str)3", "PID: Ind to Strong" = "factor(party_str)4", "Ideo: Mod to Slight" = "factor(ideo_str)2", "Ideo: Mod to Medium" = "factor(ideo_str)3", "Ideo: Mod to Extreme" = "factor(ideo_str)4", "Educ: HS or less to Some College" = "factor(education)2", "Educ: HS or less to BA/BS" = "factor(education)3", "Educ: HS or less to Advanced Deg" = "factor(education)4",  "Inc: 0-25K to 25-50K" = "factor(income_group)2", "Inc: 0-25K to 50-100K" = "factor(income_group)3", "Inc: 0-25K to 100-250K" = "factor(income_group)4", "Inc: 0-25K to 250K+" = "factor(income_group)5", 
"Female" = "female", "Person of Color" = "poc", "Age" = "age"),
 robust=TRUE,  
           colors = c("black", "darkgrey"))

Other JTools Functions

Jtools has other functions that can be useful for understanding regression results. Here we look at a few basic ones. First, we can use vif=TRUE to get the variance inflation factor results for each IV in your model. This is important when testing for multicollinearity. We looked at a different approach previously to calculate the VIF. With the jtools approach, the VIF is listed in the same regression table as the coefficients. You wouldn’t necessarily want to show this model since VIF is not something that needs to be reported but it can efficiently help you understand if you have any issues with your IVs being highly correlated.

summ(logit, vif=TRUE)

Observations	2116 (1619 missing obs. deleted)
Dependent variable	donate
Type	Generalized linear model
Family	binomial
Link	logit

χ²(17)	340.71
Pseudo-R² (Cragg-Uhler)	0.22
Pseudo-R² (McFadden)	0.14
AIC	2055.11
BIC	2156.94

	Est.	S.E.	z val.	p	VIF
(Intercept)	-4.69	0.47	-10.00	0.00	NA
factor(party_str)2	0.49	0.31	1.59	0.11	1.21
factor(party_str)3	0.22	0.32	0.70	0.49	1.21
factor(party_str)4	0.92	0.30	3.05	0.00	1.21
factor(ideo_str)2	-0.02	0.18	-0.12	0.90	1.31
factor(ideo_str)3	0.72	0.17	4.13	0.00	1.31
factor(ideo_str)4	1.09	0.21	5.08	0.00	1.31
factor(education)2	0.79	0.20	4.00	0.00	1.24
factor(education)3	1.16	0.20	5.74	0.00	1.24
factor(education)4	1.41	0.21	6.71	0.00	1.24
factor(income_group)2	-0.14	0.21	-0.69	0.49	1.24
factor(income_group)3	0.10	0.19	0.53	0.60	1.24
factor(income_group)4	0.46	0.19	2.43	0.01	1.24
factor(income_group)5	0.73	0.26	2.78	0.01	1.24
female	-0.27	0.11	-2.34	0.02	1.06
poc	0.03	0.14	0.22	0.82	1.09
age	0.04	0.00	9.41	0.00	1.16
trump_vote	-0.74	0.13	-5.91	0.00	1.21
Standard errors: MLE

You can also easily apply different robust standard errors. We previously looked at a different way to apply robust standard errors. Jtools can help to simplify the reporting of models that use robust standard errors. Here, we look at the same model with three different standard error calculations. We specify robust="HC1" and robust="HC3" to specify the two different robust standard errors we covered in class this semester.

summ(logit, digits=3, robust = "HC1")

Observations	2116 (1619 missing obs. deleted)
Dependent variable	donate
Type	Generalized linear model
Family	binomial
Link	logit

χ²(17)	340.710
Pseudo-R² (Cragg-Uhler)	0.221
Pseudo-R² (McFadden)	0.144
AIC	2055.112
BIC	2156.944

	Est.	S.E.	z val.	p
(Intercept)	-4.689	0.497	-9.432	0.000
factor(party_str)2	0.494	0.331	1.492	0.136
factor(party_str)3	0.223	0.337	0.663	0.507
factor(party_str)4	0.917	0.320	2.868	0.004
factor(ideo_str)2	-0.022	0.181	-0.121	0.904
factor(ideo_str)3	0.720	0.174	4.135	0.000
factor(ideo_str)4	1.089	0.216	5.034	0.000
factor(education)2	0.791	0.205	3.863	0.000
factor(education)3	1.161	0.210	5.540	0.000
factor(education)4	1.407	0.213	6.597	0.000
factor(income_group)2	-0.145	0.214	-0.674	0.500
factor(income_group)3	0.099	0.191	0.515	0.607
factor(income_group)4	0.463	0.192	2.408	0.016
factor(income_group)5	0.729	0.261	2.793	0.005
female	-0.266	0.113	-2.348	0.019
poc	0.031	0.145	0.213	0.832
age	0.036	0.004	9.312	0.000
trump_vote	-0.742	0.124	-5.991	0.000
Standard errors: Robust, type = HC1

summ(logit, digits=3, robust = "HC3")

Observations	2116 (1619 missing obs. deleted)
Dependent variable	donate
Type	Generalized linear model
Family	binomial
Link	logit

χ²(17)	340.710
Pseudo-R² (Cragg-Uhler)	0.221
Pseudo-R² (McFadden)	0.144
AIC	2055.112
BIC	2156.944

	Est.	S.E.	z val.	p
(Intercept)	-4.689	0.499	-9.387	0.000
factor(party_str)2	0.494	0.334	1.481	0.139
factor(party_str)3	0.223	0.339	0.658	0.511
factor(party_str)4	0.917	0.322	2.846	0.004
factor(ideo_str)2	-0.022	0.182	-0.120	0.905
factor(ideo_str)3	0.720	0.175	4.111	0.000
factor(ideo_str)4	1.089	0.218	4.999	0.000
factor(education)2	0.791	0.206	3.847	0.000
factor(education)3	1.161	0.211	5.516	0.000
factor(education)4	1.407	0.214	6.566	0.000
factor(income_group)2	-0.145	0.216	-0.670	0.503
factor(income_group)3	0.099	0.193	0.512	0.609
factor(income_group)4	0.463	0.193	2.393	0.017
factor(income_group)5	0.729	0.263	2.766	0.006
female	-0.266	0.114	-2.335	0.020
poc	0.031	0.146	0.211	0.833
age	0.036	0.004	9.265	0.000
trump_vote	-0.742	0.124	-5.962	0.000
Standard errors: Robust, type = HC3

summ(logit, digits=3)

Observations	2116 (1619 missing obs. deleted)
Dependent variable	donate
Type	Generalized linear model
Family	binomial
Link	logit

χ²(17)	340.710
Pseudo-R² (Cragg-Uhler)	0.221
Pseudo-R² (McFadden)	0.144
AIC	2055.112
BIC	2156.944

	Est.	S.E.	z val.	p
(Intercept)	-4.689	0.469	-9.997	0.000
factor(party_str)2	0.494	0.312	1.587	0.113
factor(party_str)3	0.223	0.321	0.695	0.487
factor(party_str)4	0.917	0.301	3.051	0.002
factor(ideo_str)2	-0.022	0.181	-0.121	0.904
factor(ideo_str)3	0.720	0.175	4.128	0.000
factor(ideo_str)4	1.089	0.214	5.078	0.000
factor(education)2	0.791	0.198	4.000	0.000
factor(education)3	1.161	0.202	5.738	0.000
factor(education)4	1.407	0.210	6.714	0.000
factor(income_group)2	-0.145	0.211	-0.686	0.493
factor(income_group)3	0.099	0.186	0.531	0.595
factor(income_group)4	0.463	0.190	2.435	0.015
factor(income_group)5	0.729	0.262	2.780	0.005
female	-0.266	0.113	-2.345	0.019
poc	0.031	0.139	0.222	0.824
age	0.036	0.004	9.407	0.000
trump_vote	-0.742	0.125	-5.912	0.000
Standard errors: MLE

Here, we do the same thing but use export_summs to compare the models side-by-side and provide appropriate model names to each column of results. We also apply plot_summs as well to visualize the change in standard error size. We obviously would want to change the row labels if we were showing these results to an audience. But to simplify the code here, we exclude that portion.

export_summs(logit, logit, logit, robust = list(FALSE, "HC1", "HC3"),
           model.names = c("Logit SE", "HC1 SE", "HC3 SE"))

	Logit SE	HC1 SE	HC3 SE
(Intercept)	-4.69 ***	-4.69 ***	-4.69 ***
	(0.47)	(0.50)	(0.50)
factor(party_str)2	0.49	0.49	0.49
	(0.31)	(0.33)	(0.33)
factor(party_str)3	0.22	0.22	0.22
	(0.32)	(0.34)	(0.34)
factor(party_str)4	0.92 **	0.92 **	0.92 **
	(0.30)	(0.32)	(0.32)
factor(ideo_str)2	-0.02	-0.02	-0.02
	(0.18)	(0.18)	(0.18)
factor(ideo_str)3	0.72 ***	0.72 ***	0.72 ***
	(0.17)	(0.17)	(0.18)
factor(ideo_str)4	1.09 ***	1.09 ***	1.09 ***
	(0.21)	(0.22)	(0.22)
factor(education)2	0.79 ***	0.79 ***	0.79 ***
	(0.20)	(0.20)	(0.21)
factor(education)3	1.16 ***	1.16 ***	1.16 ***
	(0.20)	(0.21)	(0.21)
factor(education)4	1.41 ***	1.41 ***	1.41 ***
	(0.21)	(0.21)	(0.21)
factor(income_group)2	-0.14	-0.14	-0.14
	(0.21)	(0.21)	(0.22)
factor(income_group)3	0.10	0.10	0.10
	(0.19)	(0.19)	(0.19)
factor(income_group)4	0.46 *	0.46 *	0.46 *
	(0.19)	(0.19)	(0.19)
factor(income_group)5	0.73 **	0.73 **	0.73 **
	(0.26)	(0.26)	(0.26)
female	-0.27 *	-0.27 *	-0.27 *
	(0.11)	(0.11)	(0.11)
poc	0.03	0.03	0.03
	(0.14)	(0.14)	(0.15)
age	0.04 ***	0.04 ***	0.04 ***
	(0.00)	(0.00)	(0.00)
trump_vote	-0.74 ***	-0.74 ***	-0.74 ***
	(0.13)	(0.12)	(0.12)
N	2116	2116	2116
AIC	2055.11	2055.11	2055.11
BIC	2156.94	2156.94	2156.94
Pseudo R2	0.22	0.22	0.22
* p < 0.001; p < 0.01; * p < 0.05.

plot_summs(logit, logit, logit, robust = list(FALSE, "HC1", "HC3"),
           model.names = c("Logit SE", "HC1 SE", "HC SE3"))

plot_summs(logit, logit, logit, robust = list(FALSE, "HC1", "HC3"),
           model.names = c("Logit SE", "HC1 SE", "HC SE3"), 
           colors=c("black", "darkgrey", "darkkhaki" ))

Graphing Regression Results

There are any number of ways to graph results from regression output. Above, we looked at how to graph the coefficients from some model. Here, we are going to look at how to graph the predicted value of the outcome variable by the independent variables in the model. Specifically, we will look at graphing one variable at a time plus 2 or more variables. We will also review how to add confidence intervals to both line and bar plots.

We use the ggpredict function from the ggeffects package to save the predicted value of y at whatever independent variables we specify. This package works on a wide variety of regressions including OLS and most GLMs including logit/probit.

First step is to use ggpredict to save the predicted values of y for your specified independent variable(s) before graphing them. We start by examining the estimated impact of annual income on the probability of a person donating to a political party or candidate. We include terms=("income_group") to specify to the ggpredict function which independent variable we want to calculate the predicted value of y for.

We don’t initially save the results because we want to review what values the other IVs take on in this specific equation. By default, it uses the mean of the numeric variable and the reference group of the factor variables. Here that means we are initially looking at the impact of income_group for the reference group of education, party strength and ideological commitment. This might not be the best way to reveal the impact that income can have so you can change the specify value of the independent variables to something that is more representative of the data. Always, be up front and mention what values each IV is set to in the prediction since it can have a big influence.

For OLS, the values of the other IVs aren’t as important since OLS assumes linearity between x and y. Logit does not make that assumption so the values of the other IVs are more important to the visualization of the results. You want to select options that are common in the data and that reflect any significance you find in the model results.

ggpredict(logit, terms=("income_group"))

## # Predicted probabilities of donate
## 
## income_group | Predicted |       95% CI
## ---------------------------------------
##            1 |      0.03 | [0.02, 0.06]
##            2 |      0.03 | [0.01, 0.05]
##            3 |      0.03 | [0.02, 0.06]
##            4 |      0.05 | [0.02, 0.09]
##            5 |      0.06 | [0.03, 0.12]
## 
## Adjusted for:
## *  party_str =     1
## *   ideo_str =     1
## *  education =     1
## *     female =  1.54
## *        poc =  0.24
## *        age = 53.93
## * trump_vote =  0.42

low<-ggpredict(logit, terms=("income_group"))
inc<-ggpredict(logit, terms=c("income_group", "education [3]", "ideo_str [4]", "party_str [3:4]"))

Now that we have saved the predicted probabilities at the specified IV levels, we can graph the results using ggplot. It is important to know that the order you input your variables into the ggpredict function will be the way the variables are named and ordered in the new tibble that contains the predicted probabilities. Specifically, it will follow this naming convention based on the order of your terms:

term 1 = x
term 2 = group
term 3 = facet
term 4 = panel

You cannot specify more than 4 terms here though so that is a limitation for really complex models. Here, we are going to graph based on the first term, income variable or x, and term 4, party strength variable or panel.

We specify that x=x, y = predicted,fill = as.factor(panel) to create our graph telling ggplot that we want to create different colored parts of the graph based on what is in the fill part of the code. We specify panel since that is the second term we want to graph.

We do this twice. First, without cleaning the graph up and second by updating the x-axis labels and color scheme used in the plot

ggplot(inc, aes(x = x, y = predicted,fill = as.factor(panel))) +
  geom_bar(stat = "identity",  width = 0.7 , position = position_dodge()) +
  theme_minimal(base_size = 7) +
  labs(x = "Annual Income", y = "Predicted Probability of Donating", 
       title = "Bar Chart by Response Level and Group")+
  geom_errorbar(aes(ymin=conf.low, ymax=conf.high),
                linewidth=.3,    # Thinner lines
                width=.2, position = position_dodge(width=.7))

ggplot(inc, aes(x = factor(x, labels = c("0-25K", "25-50K", "50-100K", "100-250K", "250K+")), 
                 y = predicted, 
                 group = as.factor(panel), 
                 fill = as.factor(panel))) +
  geom_bar(stat = "identity", width = 0.7, position = position_dodge()) +
  scale_fill_manual(name = "Partisan Strength",
                    values = c("lightgrey", "darkgrey"),
                    labels = c("3" = "Weak", "4" = "Strong")) +  # Update legend labels
  theme_minimal(base_size = 10) +
  labs(x = "Annual Income", 
       y = "Predicted Probability of Donating", 
       title = "Bar Chart by Response Level and Group") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                linewidth = 0.3,    # Thinner lines
                width = 0.2, 
                position = position_dodge(width = 0.7))

Here, we change the values of the second variable to graph from 3 and 4 to 1 and 4. We then change the name of the variable in the legend to account for this shift.

We also add a note to the graph detailing the levels that the other variables are set to so that your audience can understand where the values came from.

inc2<-ggpredict(logit, terms=c("income_group", "education [3]", "ideo_str [4]", "party_str [1,4]"))

ggplot(inc2, aes(x = factor(x, labels = c("0-25K", "25-50K", "50-100K", "100-250K", "250K+")), 
                 y = predicted, 
                 group = as.factor(panel), 
                 fill = as.factor(panel))) +
  geom_bar(stat = "identity", width = 0.7, position = position_dodge()) +
  scale_fill_manual(name = "Partisan Strength",
                    values = c("lightgrey", "darkgrey"),
                    labels = c("1" = "Independent", "4" = "Strong")) +  # Update legend labels
  theme_minimal(base_size = 10) +
  labs(x = "Annual Income", 
       y = "Predicted Probability of Donating", 
       title = "Bar Chart by Response Level and Group") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                linewidth = 0.3,    # Thinner lines
                width = 0.2, 
                position = position_dodge(width = 0.7))

footnote_text <- str_wrap("Note: Education level is set to 'BA/BS', Ideology is 'Extreme', and Party Strength ranges from 'Independent' to 'Strong'.", width = 70)

# Create the ggplot with the footnote annotation
ggplot(inc2, aes(x = factor(x, labels = c("0-25K", "25-50K", "50-100K", "100-250K", "250K+")), 
                         y = predicted, 
                         group = as.factor(panel), 
                         fill = as.factor(panel))) +
  geom_bar(stat = "identity", width = 0.7, position = position_dodge()) +
  scale_fill_manual(name = "Partisan Strength",
                    values = c("lightgrey", "darkgrey"),
                    labels = c("1" = "Independent", "4" = "Strong")) +  # Update legend labels
  theme_minimal(base_size = 10) +
  labs(x = "Annual Income", 
       y = "Predicted Probability of Donating", 
       title = "Bar Chart by Response Level and Group", 
       subtitle= footnote_text) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                linewidth = 0.3,    # Thinner lines
                width = 0.2, 
                position = position_dodge(width = 0.7))

Next, we use age as the primary variable on the x-axis and switch to a line plot, geom_line, to account for the continuous nature of the variable. We also need to switch to geom_ribbon for the confidence intervals.

age<-ggpredict(logit, terms=c("age", "income_group [4]", "education [3]", "party_str [1,4]", "ideo_str"))

## `terms` must have not more than four values. Using first four values
##   now.

## Data were 'prettified'. Consider using `terms="age [all]"` to get smooth
##   plots.

age

## # Predicted probabilities of donate
## 
## age | Predicted |       95% CI
## ------------------------------
##  10 |      0.03 | [0.02, 0.06]
##  20 |      0.10 | [0.07, 0.16]
##  30 |      0.14 | [0.10, 0.21]
##  40 |      0.19 | [0.14, 0.27]
##  60 |      0.16 | [0.10, 0.27]
##  80 |      0.50 | [0.40, 0.61]
## 
## Adjusted for:
## *   ideo_str =    1
## *     female = 1.54
## *        poc = 0.24
## * trump_vote = 0.42

## 
## Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
##   all rows.

ggplot(age, aes(x = x, y = predicted, 
                         group = as.factor(panel), 
                         color = as.factor(panel))) +
  geom_line(linewidth = 1) +  # Line for the predicted probabilities
  geom_point(size = 2) +  # Points on the lines
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = as.factor(panel)), alpha = 0.2) +  # Confidence intervals
  scale_color_manual(name = "Partisan Strength",
                     values = c("lightgrey", "darkgrey"),
                     labels = c("1" = "Independent", "4" = "Strong")) +  # Update legend labels
  scale_fill_manual(name = "Partisan Strength",
                    values = c("lightgrey", "darkgrey"),
                    labels = c("1" = "Independent", "4" = "Strong")) +  # Update fill legend labels
  theme_minimal(base_size = 10) +
  labs(x = "Age", 
       y = "Predicted Probability of Donating", 
       title = "Line Chart by Response Level and Group")

Graphing Interactions

If you have an interactive model, it is important to visualize the results to help audience understand the interactive impact. Here, we interact age and trump supporter or not once again predicting if a person donated money or not. We also simplify the model by using income as a numeric rather than factor variable.

logitx<-glm(logit <- glm(donate ~ factor(party_str) + factor(ideo_str) +  factor(education) + income_group + female + poc + age*factor(trump_vote) , data = anes, family = binomial(link = "logit")))

summary(logitx)

## 
## Call:
## glm(formula = logit <- glm(donate ~ factor(party_str) + factor(ideo_str) + 
##     factor(education) + income_group + female + poc + age * factor(trump_vote), 
##     data = anes, family = binomial(link = "logit")))
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.6375  -0.2818  -0.1411   0.1775   1.1748  
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -0.2200308  0.0669928  -3.284 0.001039 ** 
## factor(party_str)2       0.0332192  0.0383504   0.866 0.386477    
## factor(party_str)3       0.0037594  0.0384088   0.098 0.922037    
## factor(party_str)4       0.1136911  0.0365821   3.108 0.001910 ** 
## factor(ideo_str)2       -0.0075193  0.0254468  -0.295 0.767649    
## factor(ideo_str)3        0.1033517  0.0263243   3.926 8.91e-05 ***
## factor(ideo_str)4        0.1606807  0.0343048   4.684 2.99e-06 ***
## factor(education)2       0.1017441  0.0259974   3.914 9.38e-05 ***
## factor(education)3       0.1634451  0.0277790   5.884 4.66e-09 ***
## factor(education)4       0.2293728  0.0301045   7.619 3.84e-14 ***
## income_group             0.0295729  0.0082272   3.595 0.000332 ***
## female                  -0.0429262  0.0177991  -2.412 0.015964 *  
## poc                     -0.0030700  0.0214324  -0.143 0.886113    
## age                      0.0044638  0.0007221   6.182 7.60e-10 ***
## factor(trump_vote)1     -0.2326073  0.0631488  -3.683 0.000236 ***
## age:factor(trump_vote)1  0.0022266  0.0011109   2.004 0.045158 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.159248)
## 
##     Null deviance: 392.21  on 2115  degrees of freedom
## Residual deviance: 334.42  on 2100  degrees of freedom
##   (1619 observations deleted due to missingness)
## AIC: 2135.2
## 
## Number of Fisher Scoring iterations: 2

When graphing an interactive model, you need to specify the different values you want the two variables included in the interaction term to be set to. Here, since we are using age, a continuous variable, and trump supporter or not, a dichotomous variable, we specify first age and trump_vote [0, 1]. By not including any other information about age, ggpredict will select specific values that won’t always make sense. Here, it selects the ages of 10 to 80 split by 10 year intervals. This doesn’t make sense for this data since no one is under 18. Next, we specify the exact values of age we want to see graphed age [18:78, by=10] starting at age 18 and going to age 78 by 10 year intervals. All of this is customizable.

Because age is a continuous variable, we use geom_line along with geom_ribbon to graphically display the predicted probability of donating at each age group along with the associated 95% CI. We also group the results by the trump_vote variable to split the results by the interaction term.

age<-ggpredict(logitx, terms=c( "age","education [4]", "trump_vote [0,1]", "party_str [4]", "ideo_str [4]"))

## `terms` must have not more than four values. Using first four values
##   now.

age

## # Predicted values of donate
## 
## age | Predicted |        95% CI
## -------------------------------
##  10 |      0.19 | [ 0.10, 0.28]
##  20 |      0.04 | [-0.05, 0.13]
##  30 |      0.11 | [ 0.03, 0.19]
##  40 |      0.18 | [ 0.11, 0.25]
##  60 |      0.41 | [ 0.35, 0.47]
##  80 |      0.45 | [ 0.37, 0.53]
## 
## Adjusted for:
## *     ideo_str =    1
## * income_group = 2.93
## *       female = 1.54
## *          poc = 0.24

## 
## Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show
##   all rows.

age<-ggpredict(logitx, terms=c( "age [18:78, by=10]","education [4]", "trump_vote [0,1]", "party_str [4]", "ideo_str [4]"))

## `terms` must have not more than four values. Using first four values
##   now.

ggplot(age, aes(x = factor(x), 
                 y = predicted, 
                         group = as.factor(facet), 
                         color = as.factor(facet))) +
  geom_line(linewidth = 1) +  # Line for the predicted probabilities
  geom_point(size = 2) +  # Points on the lines
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.1) +  # Confidence intervals
  scale_color_manual(name = "Trump Supporter",
                     values = c("black", "darkgrey"),
                     labels = c("0" = "Non Trump Supporter", "1" = "Trump Supporter")) +  # Update legend labels
  theme_minimal(base_size = 10) +
  labs(x = "Age", 
       y = "Predicted Probability of Donating", 
       title = "Line Chart by Response Level and Group",
       subtitle="Note: Party Strength = Strong & Ideological Committment = Extreme")

We could also try this as a bar-plot if we wanted to see if we like how that looks compared to the line graph.

ggplot(age, aes(x = factor(x), 
                 y = predicted, 
                 group = facet, 
                 fill = facet)) +
  geom_bar(stat = "identity", width = 0.7, position = position_dodge()) +
  scale_fill_manual(name = "Trump Supporter or Not",
                    values = c("lightgrey", "darkgrey"),
                    labels = c("0" = "Non Trump", "1" = "Trump")) +  # Update legend labels
  theme_minimal(base_size = 10) +
  labs(x = "Age", 
       y = "Predicted Probability of Donating", 
       title = "Bar Chart by Response Level and Group") +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                linewidth = 0.3,    # Thinner lines
                width = 0.2, 
                position = position_dodge(width = 0.7))

At this point, it would be up to you as the analyst to decide how to present the results. I personally like the bar chart graph better but if we selected more values for age then the line plot would probably be clearer.

age<-ggpredict(logitx, terms=c( "age [18:78, by=5]","education [4]", "trump_vote [0,1]", "party_str [4]", "ideo_str [4]"))

## `terms` must have not more than four values. Using first four values
##   now.

ggplot(age, aes(x = factor(x), 
                 y = predicted, 
                         group = as.factor(facet), 
                         color = as.factor(facet))) +
  geom_line(linewidth = 1) +  # Line for the predicted probabilities
  geom_point(size = 2) +  # Points on the lines
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.1) +  # Confidence intervals
  scale_color_manual(name = "Trump Supporter",
                     values = c("black", "darkgrey"),
                     labels = c("0" = "Non Trump Supporter", "1" = "Trump Supporter")) +  # Update legend labels
  theme_minimal(base_size = 10) +
  labs(x = "Age", 
       y = "Predicted Probability of Donating", 
       title = "Line Chart by Response Level and Group",
       subtitle="Note: Party Strength = Strong & Ideological Committment = Extreme")

Data Visualization Tutorial

CEStapleton

2025-01-22