Stata - `marginsplot` & `mplotoffset` commands for plotting average marginal effects

Mark Bounthavong

30 January 2025

Introduction

In Stata, users have a lot of flexibility with creating plots, particularly after the margins command has been executed. Once a regression command has been run, users can estimate the average marginal effect of a factor with respect to another variable using the margins command in Stata. Once the average marginal effect has been estimated, users can plot this using the marginsplot or mplotoffset commands. These are power tools that allow us to visualize the average marginal effects, particularly when we have interaction terms.

This article will review some basic features of the marginsplot and mplotoffset commands and provide some practical examples of customization.

Motivating example

We will use data from the Agency for Healthcare Research and Quality (AHRQ) Medical Expenditure Panel Survey (MEPS).

Load data from GitHub repository

We will load our data from the GitHub repository Stata tutorials. From Stata, we can load data using the import delimited command. Once the data is loaded, we can explore the data using the describe command.

// SELECT DIRECTORY / LOAD DATA FROM GITHUB
clear all
import delimited "https://raw.githubusercontent.com/mbounthavong/Stata-tutorials/refs/heads/main/Data/meps22.csv"

// DESCRIBE DATA
describe

. // SELECT DIRECTORY / LOAD DATA FROM GITHUB

. clear all

. import delimited "https://raw.githubusercontent.com/mbounthavong/Stata-tutori
> als/refs/heads/main/Data/meps22.csv"
(encoding automatically selected: ISO-8859-1)
(5 vars, 18,101 obs)

. 
. // DESCRIBE DATA
. describe

Contains data
 Observations:        18,101                  
    Variables:             5                  
-------------------------------------------------------------------------------
Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
dupersid        double  %10.0g                
age             byte    %8.0g                 
race            str43   %43s                  
povcat          str15   %15s                  
totexp          long    %12.0g                
-------------------------------------------------------------------------------
Sorted by: 
     Note: Dataset has changed since last saved.

.

The following variables are listed:

dupersid: This is the unique identifier of the respondent
age: Age of the respondent - Continuous data type
race: Race of the respondent - Categorical data type
povcat: Poverty category of the respondent - Categorical data type
totexp: Total healthcare expenditures (costs, $US) - Continuous data type

Create new variables

Let’s take a look at the race and povcat variables.

// RACE CATEGORIES
tab race, m

// POVERTY CATEGORIES
tab povcat, m

. // RACE CATEGORIES
. tab race, m

                                   race |      Freq.     Percent        Cum.
----------------------------------------+-----------------------------------
       1 WHITE - NO OTHER RACE REPORTED |     13,639       75.35       75.35
       2 BLACK - NO OTHER RACE REPORTED |      2,683       14.82       90.17
3 AMER INDIAN/ALASKA NATIVE - NO OTHE.. |        145        0.80       90.97
 4 ASIAN/NATV HAWAIIAN/PACFC ISL-NO OTH |      1,090        6.02       96.99
              6 MULTIPLE RACES REPORTED |        544        3.01      100.00
----------------------------------------+-----------------------------------
                                  Total |     18,101      100.00

. 
. // POVERTY CATEGORIES
. tab povcat, m

         povcat |      Freq.     Percent        Cum.
----------------+-----------------------------------
1 POOR/NEGATIVE |      2,788       15.40       15.40
    2 NEAR POOR |        815        4.50       19.90
   3 LOW INCOME |      2,387       13.19       33.09
4 MIDDLE INCOME |      5,032       27.80       60.89
  5 HIGH INCOME |      7,079       39.11      100.00
----------------+-----------------------------------
          Total |     18,101      100.00

.

The race and povcat variables are in string format. We want to change these into numeric for our computations. We can do this by creating new variables in Stata.

// NEW RACE VARIABLE
codebook race, tab(1000) /* Provides the full label in a tabulate format */

gen race1 = .
    replace race1 = 0 if race == "1 WHITE - NO OTHER RACE REPORTED"
    replace race1 = 1 if race == "2 BLACK - NO OTHER RACE REPORTED"
    replace race1 = 2 if race == "3 AMER INDIAN/ALASKA NATIVE - NO OTHER RACE"
    replace race1 = 3 if race == "4 ASIAN/NATV HAWAIIAN/PACFC ISL-NO OTH"
    replace race1 = 4 if race == "6 MULTIPLE RACES REPORTED"

label define race_lbl 0 "White" 1 "Black" 2 "AI/AN" 3 "Asian" 4 "Mulitple"
label values race1 race_lbl
tab race1, m 

// NEW POVERTY VARIABLE
codebook povcat, tab(1000)  /* Provides the full label in a tabulate format */

gen poverty = .
    replace poverty = 0 if povcat == "1 POOR/NEGATIVE"
    replace poverty = 1 if povcat == "2 NEAR POOR"
    replace poverty = 2 if povcat == "3 LOW INCOME"
    replace poverty = 3 if povcat == "4 MIDDLE INCOME"
    replace poverty = 4 if povcat == "5 HIGH INCOME"
    
label define poverty_lbl 0 "Poor" 1 "Near Poor" 2 "Low-income" 3 "Middle-income" 4 "High-income"
label values poverty poverty_lbl
tab poverty, m

. // NEW RACE VARIABLE
. codebook race, tab(1000) /* Provides the full label in a tabulate format */

-------------------------------------------------------------------------------
race                                                                (unlabeled)
-------------------------------------------------------------------------------

                  Type: String (str43)

         Unique values: 5                         Missing "": 0/18,101

            Tabulation: Freq.  Value
                       13,639  "1 WHITE - NO OTHER RACE REPORTED"
                        2,683  "2 BLACK - NO OTHER RACE REPORTED"
                          145  "3 AMER INDIAN/ALASKA NATIVE - NO OTHER
                               RACE"
                        1,090  "4 ASIAN/NATV HAWAIIAN/PACFC ISL-NO
                               OTH"
                          544  "6 MULTIPLE RACES REPORTED"

               Warning: Variable has embedded blanks.

. 
. gen race1 = .
(18,101 missing values generated)

.         replace race1 = 0 if race == "1 WHITE - NO OTHER RACE REPORTED"
(13,639 real changes made)

.         replace race1 = 1 if race == "2 BLACK - NO OTHER RACE REPORTED"
(2,683 real changes made)

.         replace race1 = 2 if race == "3 AMER INDIAN/ALASKA NATIVE - NO OTHER 
> RACE"
(145 real changes made)

.         replace race1 = 3 if race == "4 ASIAN/NATV HAWAIIAN/PACFC ISL-NO OTH"
(1,090 real changes made)

.         replace race1 = 4 if race == "6 MULTIPLE RACES REPORTED"
(544 real changes made)

. 
. label define race_lbl 0 "White" 1 "Black" 2 "AI/AN" 3 "Asian" 4 "Mulitple"

. label values race1 race_lbl

. tab race1, m 

      race1 |      Freq.     Percent        Cum.
------------+-----------------------------------
      White |     13,639       75.35       75.35
      Black |      2,683       14.82       90.17
      AI/AN |        145        0.80       90.97
      Asian |      1,090        6.02       96.99
   Mulitple |        544        3.01      100.00
------------+-----------------------------------
      Total |     18,101      100.00

. 
. // NEW POVERTY VARIABLE
. codebook povcat, tab(1000)  /* Provides the full label in a tabulate format *
> /

-------------------------------------------------------------------------------
povcat                                                              (unlabeled)
-------------------------------------------------------------------------------

                  Type: String (str15)

         Unique values: 5                         Missing "": 0/18,101

            Tabulation: Freq.  Value
                        2,788  "1 POOR/NEGATIVE"
                          815  "2 NEAR POOR"
                        2,387  "3 LOW INCOME"
                        5,032  "4 MIDDLE INCOME"
                        7,079  "5 HIGH INCOME"

               Warning: Variable has embedded blanks.

. 
. gen poverty = .
(18,101 missing values generated)

.         replace poverty = 0 if povcat == "1 POOR/NEGATIVE"
(2,788 real changes made)

.         replace poverty = 1 if povcat == "2 NEAR POOR"
(815 real changes made)

.         replace poverty = 2 if povcat == "3 LOW INCOME"
(2,387 real changes made)

.         replace poverty = 3 if povcat == "4 MIDDLE INCOME"
(5,032 real changes made)

.         replace poverty = 4 if povcat == "5 HIGH INCOME"
(7,079 real changes made)

.         
. label define poverty_lbl 0 "Poor" 1 "Near Poor" 2 "Low-income" 3 "Middle-inco
> me" 4 "High-income"

. label values poverty poverty_lbl

. tab poverty, m 

      poverty |      Freq.     Percent        Cum.
--------------+-----------------------------------
         Poor |      2,788       15.40       15.40
    Near Poor |        815        4.50       19.90
   Low-income |      2,387       13.19       33.09
Middle-income |      5,032       27.80       60.89
  High-income |      7,079       39.11      100.00
--------------+-----------------------------------
        Total |     18,101      100.00

.

Now that we have race1 and poverty variables that are in the correct factor format, we can start using these in our analysis. ### Regression model Next, we will create a linear regression model where the total expenditure (totexp) is the dependent variable with age, race1, and poverty as the independent variables. We will also create an interaction between age and poverty so that we can apply the margins command.

Here is the structural form of the linear regression model:

\[\begin{aligned} E[Y | X] = \beta_{0} + \beta_{1}Age_{i} + \beta_{2}Poverty_{i} + \beta_{3}(Age_{i}*Poverty_{i}) + \beta_{4}Race_{i} + \epsilon_{i} \end{aligned}\]

To run the linear regression model in Stata, we use the following code chunk. Note that we are running this model with age as a continuous term and poverty as a factor:

// REGRESSION MODEL
glm totexp c.age i.poverty c.age#i.poverty i.race1, family("Gaussian") link("identity") vce(robust)

. // REGRESSION MODEL
. glm totexp c.age i.poverty c.age#i.poverty i.race1, family("Gaussian") link("
> identity") vce(robust)

Iteration 0:  Log pseudolikelihood = -208498.48  

Generalized linear models                         Number of obs   =     18,101
Optimization     : ML                             Residual df     =     18,087
                                                  Scale parameter =   5.93e+08
Deviance         =  1.07194e+13                   (1/df) Deviance =   5.93e+08
Pearson          =  1.07194e+13                   (1/df) Pearson  =   5.93e+08

Variance function: V(u) = 1                       [Gaussian]
Link function    : g(u) = u                       [Identity]

                                                  AIC             =   23.03878
Log pseudolikelihood = -208498.4763               BIC             =   1.07e+13

------------------------------------------------------------------------------
             |               Robust
      totexp | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   237.6858     27.462     8.66   0.000     183.8613    291.5104
             |
     poverty |
  Near Poor  |   10488.48   11060.77     0.95   0.343    -11190.23     32167.2
 Low-income  |   246.2486   1748.971     0.14   0.888    -3181.671    3674.169
Middle-in~e  |  -196.0182   1421.265    -0.14   0.890    -2981.647     2589.61
High-income  |   2020.086   1574.106     1.28   0.199    -1065.105    5105.278
             |
     poverty#|
       c.age |
  Near Poor  |  -156.7318   171.3784    -0.91   0.360    -492.6272    179.1636
 Low-income  |  -20.34791   35.95424    -0.57   0.571    -90.81694    50.12111
Middle-in~e  |  -30.38333   30.51265    -1.00   0.319    -90.18702    29.42036
High-income  |  -54.06838   32.07875    -1.69   0.092    -116.9416    8.804808
             |
       race1 |
      Black  |  -1184.321   448.4658    -2.64   0.008    -2063.298   -305.3441
      AI/AN  |  -377.8293     1530.5    -0.25   0.805    -3377.554    2621.896
      Asian  |  -2736.707   514.3436    -5.32   0.000    -3744.802   -1728.612
   Mulitple  |   1686.795   1039.337     1.62   0.105    -350.2694    3723.859
             |
       _cons |  -2353.181   1312.973    -1.79   0.073    -4926.559    220.1983
------------------------------------------------------------------------------

.

Now that we have the regression model output, we can start the next step, which is to use the margins() function to estimate the average marginal effect.

Average marginal effect

For this example, we want to estimate the average marginal effect of poverty on total healthcare expenditures with respect to various ages. In other words, we want to estimate the average difference in healthcare expenditures between poverty levels at various ages.

margins, dydx(poverty) at(age = (25 35 45 55 65))

. margins, dydx(poverty) at(age = (25 35 45 55 65))

Average marginal effects                                Number of obs = 18,101
Model VCE: Robust

Expression: Predicted mean totexp, predict()
dy/dx wrt:  1.poverty 2.poverty 3.poverty 4.poverty
1._at: age = 25
2._at: age = 35
3._at: age = 45
4._at: age = 55
5._at: age = 65

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
0.poverty    |  (base outcome)
-------------+----------------------------------------------------------------
1.poverty    |
         _at |
          1  |    6570.19   6818.189     0.96   0.335    -6793.216     19933.6
          2  |   5002.872   5140.642     0.97   0.330      -5072.6    15078.34
          3  |   3435.554   3498.399     0.98   0.326    -3421.182    10292.29
          4  |   1868.236   1981.288     0.94   0.346    -2015.018     5751.49
          5  |   300.9179   1219.147     0.25   0.805    -2088.567    2690.403
-------------+----------------------------------------------------------------
2.poverty    |
         _at |
          1  |  -262.4492     958.86    -0.27   0.784     -2141.78    1616.882
          2  |  -465.9284   718.3771    -0.65   0.517    -1873.922    942.0648
          3  |  -669.4075   609.3112    -1.10   0.272    -1863.636    524.8205
          4  |  -872.8866   696.4167    -1.25   0.210    -2237.838     492.065
          5  |  -1076.366   925.8908    -1.16   0.245    -2891.078    738.3469
-------------+----------------------------------------------------------------
3.poverty    |
         _at |
          1  |  -955.6014   754.4823    -1.27   0.205    -2434.359    523.1566
          2  |  -1259.435   561.0709    -2.24   0.025    -2359.114   -159.7559
          3  |  -1563.268   496.5502    -3.15   0.002    -2536.488   -590.0475
          4  |  -1867.101   603.7614    -3.09   0.002    -3050.452   -683.7506
          5  |  -2170.935   817.7396    -2.65   0.008    -3773.675   -568.1945
-------------+----------------------------------------------------------------
4.poverty    |
         _at |
          1  |   668.3766   862.0841     0.78   0.438    -1021.277     2358.03
          2  |   127.6928   639.9271     0.20   0.842    -1126.541    1381.927
          3  |  -412.9911   530.6915    -0.78   0.436    -1453.127    627.1452
          4  |  -953.6749   599.6412    -1.59   0.112     -2128.95    221.6002
          5  |  -1494.359   802.0691    -1.86   0.062    -3066.385    77.66783
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

.

Once we estimate the average marginal effects, we used the marginsplot to plot the difference in healthcare expenditures between poverty groups at age = 25, 35, 45, 55, and 65 years. The difference in healthcare expenditure uses the poverty category = Poor as the reference.

Plotting the average marginal effect using `marginsplot`

marginsplot
quietly graph export marginsplot1.svg, replace

. marginsplot

Variables that uniquely identify margins: age _deriv

. quietly graph export marginsplot1.svg, replace

.

Average difference in healthcare expenditures between poverty categories

We can generate the same results and plot using the r.group argument.

margins r.poverty, at(age = (25 35 45 55 65))

marginsplot
quietly graph export marginsplot2.svg, replace

. margins r.poverty, at(age = (25 35 45 55 65))

Contrasts of predictive margins                         Number of obs = 18,101
Model VCE: Robust

Expression: Predicted mean totexp, predict()
1._at: age = 25
2._at: age = 35
3._at: age = 45
4._at: age = 55
5._at: age = 65

--------------------------------------------------------------
                           |         df        chi2     P>chi2
---------------------------+----------------------------------
               poverty@_at |
    (Near Poor vs Poor) 1  |          1        0.93     0.3352
    (Near Poor vs Poor) 2  |          1        0.95     0.3305
    (Near Poor vs Poor) 3  |          1        0.96     0.3261
    (Near Poor vs Poor) 4  |          1        0.89     0.3457
    (Near Poor vs Poor) 5  |          1        0.06     0.8050
   (Low-income vs Poor) 1  |          1        0.07     0.7843
   (Low-income vs Poor) 2  |          1        0.42     0.5166
   (Low-income vs Poor) 3  |          1        1.21     0.2719
   (Low-income vs Poor) 4  |          1        1.57     0.2101
   (Low-income vs Poor) 5  |          1        1.35     0.2450
(Middle-income vs Poor) 1  |          1        1.60     0.2053
(Middle-income vs Poor) 2  |          1        5.04     0.0248
(Middle-income vs Poor) 3  |          1        9.91     0.0016
(Middle-income vs Poor) 4  |          1        9.56     0.0020
(Middle-income vs Poor) 5  |          1        7.05     0.0079
  (High-income vs Poor) 1  |          1        0.60     0.4382
  (High-income vs Poor) 2  |          1        0.04     0.8418
  (High-income vs Poor) 3  |          1        0.61     0.4364
  (High-income vs Poor) 4  |          1        2.53     0.1117
  (High-income vs Poor) 5  |          1        3.47     0.0624
                    Joint  |          8       21.94     0.0050
--------------------------------------------------------------

----------------------------------------------------------------------------
                           |            Delta-method
                           |   Contrast   std. err.     [95% conf. interval]
---------------------------+------------------------------------------------
               poverty@_at |
    (Near Poor vs Poor) 1  |    6570.19   6818.189     -6793.216     19933.6
    (Near Poor vs Poor) 2  |   5002.872   5140.642       -5072.6    15078.34
    (Near Poor vs Poor) 3  |   3435.554   3498.399     -3421.182    10292.29
    (Near Poor vs Poor) 4  |   1868.236   1981.288     -2015.018     5751.49
    (Near Poor vs Poor) 5  |   300.9179   1219.147     -2088.567    2690.403
   (Low-income vs Poor) 1  |  -262.4492     958.86      -2141.78    1616.882
   (Low-income vs Poor) 2  |  -465.9284   718.3771     -1873.922    942.0648
   (Low-income vs Poor) 3  |  -669.4075   609.3112     -1863.636    524.8205
   (Low-income vs Poor) 4  |  -872.8866   696.4167     -2237.838     492.065
   (Low-income vs Poor) 5  |  -1076.366   925.8908     -2891.078    738.3469
(Middle-income vs Poor) 1  |  -955.6014   754.4823     -2434.359    523.1566
(Middle-income vs Poor) 2  |  -1259.435   561.0709     -2359.114   -159.7559
(Middle-income vs Poor) 3  |  -1563.268   496.5502     -2536.488   -590.0475
(Middle-income vs Poor) 4  |  -1867.101   603.7614     -3050.452   -683.7506
(Middle-income vs Poor) 5  |  -2170.935   817.7396     -3773.675   -568.1945
  (High-income vs Poor) 1  |   668.3766   862.0841     -1021.277     2358.03
  (High-income vs Poor) 2  |   127.6928   639.9271     -1126.541    1381.927
  (High-income vs Poor) 3  |  -412.9911   530.6915     -1453.127    627.1452
  (High-income vs Poor) 4  |  -953.6749   599.6412      -2128.95    221.6002
  (High-income vs Poor) 5  |  -1494.359   802.0691     -3066.385    77.66783
----------------------------------------------------------------------------

. 
. marginsplot

Variables that uniquely identify margins: age poverty

. quietly graph export marginsplot2.svg, replace

.

Average difference in healthcare expenditures between poverty categories

Using the `mplotoffset` command to add an offset

However, the average differences in healthcare expenditures between the poverty categories are overlapping at each age interval. To make this a little bit easier to view, we can add an offset. We’ll need to use the mplotoffset package, which you can install in Stata using the following code chunk: ssc install mplotoffset.

mplotoffset, offset(1.5)
quietly graph export marginsplot3.svg, replace

. mplotoffset, offset(1.5)

  Variables that uniquely identify margins: age poverty

. quietly graph export marginsplot3.svg, replace

.

Average difference in healthcare expenditures between poverty categories with an offset

Using the offset remove the overlap and improves the visibility of the average marginal effect plot.

Improving the `mplotoffset` visualization - Changing the symbol

The offset helps to improve visibility, but the lines connecting the various poverty groups comparisons is messy. We can remove these lines with further arguments to the mplotoffset command.

Let’s change the symbol from a circle to a square. We can also change the color to navy. We’ll also add a neutral line at 0, which denotes that there was no difference in healthcare expenditures between the two poverty groups. We’ll give the neutral line a cranberry dashed pattern.

mplotoffset, offset(1.5) ///
             plotopts(msymbol(square) msize(large) mcol("navy") dcol("none")) ///
             ciopts(lcol("navy")) ///
             recast(dot) ///
             yline(0, lcol("cranberry")) ///
             xtitle("Age (Years)") ///
             xlab( , nogrid) ///
             ytitle("Avg Difference in Total Healthcare Expenditures ($)") ylab(, nogrid) ///
             title("")  
quietly graph export marginsplot4.svg, replace

. mplotoffset, offset(1.5) ///
>                          plotopts(msymbol(square) msize(large) mcol("navy") d
> col("none")) ///
>                          ciopts(lcol("navy")) ///
>                          recast(dot) ///
>                          yline(0, lcol("cranberry")) ///
>                          xtitle("Age (Years)") ///
>                          xlab( , nogrid) ///
>                          ytitle("Avg Difference in Total Healthcare Expenditu
> res ($)") ylab(, nogrid) ///
>                          title("")      

  Variables that uniquely identify margins: age poverty

. quietly graph export marginsplot4.svg, replace

.

Average difference in healthcare expenditures between poverty categories with an offset and updated symbol

Improving the `mplotoffset` visualization - Changing the symbol & colors

Having the same colors is a little boring. So let’s change the colors.

We can do this by adding additional arguments to the mplotoffset argument.

We will need to change the colors for each comparisons using the plotopts() function. However, we need to add an identifier to each one of the comparisons. For example, the first comparison is plot1opts(), which denotes the Near Poor v. Poor comparison.

The colors for the 95% confidence interval (CI) whiskers also follows a similar coding pattern. Instead of plotopts(), we use ciopts(). Hence, ci1opts() denotes the 95% CI for the first comparison.

Here is the full code:

mplotoffset, offset(1.5) ///
             plot1opts(msymbol(square) msize(large) mcol("navy") dcol("none")) ///
             plot2opts(msymbol(square) msize(large) mcol("green") dcol("none")) ///
             plot3opts(msymbol(square) msize(large) mcol("cranberry") dcol("none")) ///
             plot4opts(msymbol(square) msize(large) mcol("orange") dcol("none")) ///
             ci1opts(lcol("navy")) ///
             ci2opts(lcol("green")) ///
             ci3opts(lcol("cranberry")) ///
             ci4opts(lcol("orange")) ///
             recast(dot) ///
             yline(0, lcol("cranberry")) ///
             xtitle("Age (Years)") xlab(, nogrid) ///
             ytitle("Avg Difference in Total Healthcare Expenditures ($)") ylab(, nogrid) ///
             title("") ///
             legend(order(1 "Near Poor v. Poor" 2 "Low-income v. Poor" 3 "Middle-income v. Poor" 4 "High-income v. Poor"))  
quietly graph export marginsplot5.svg, replace

. mplotoffset, offset(1.5) ///
>                          plot1opts(msymbol(square) msize(large) mcol("navy") 
> dcol("none")) ///
>                          plot2opts(msymbol(square) msize(large) mcol("green")
>  dcol("none")) ///
>                          plot3opts(msymbol(square) msize(large) mcol("cranber
> ry") dcol("none")) ///
>                          plot4opts(msymbol(square) msize(large) mcol("orange"
> ) dcol("none")) ///
>                          ci1opts(lcol("navy")) ///
>                          ci2opts(lcol("green")) ///
>                          ci3opts(lcol("cranberry")) ///
>                          ci4opts(lcol("orange")) ///
>                          recast(dot) ///
>                          yline(0, lcol("cranberry")) ///
>                          xtitle("Age (Years)") xlab(, nogrid) ///
>                          ytitle("Avg Difference in Total Healthcare Expenditu
> res ($)") ylab(, nogrid) ///
>                          title("") ///
>                          legend(order(1 "Near Poor v. Poor" 2 "Low-income v. 
> Poor" 3 "Middle-income v. Poor" 4 "High-income v. Poor"))  

  Variables that uniquely identify margins: age poverty

. quietly graph export marginsplot5.svg, replace

.

Average difference in healthcare expenditures between poverty categories with an offset and updated symbol and color

The legend has lines denoting the different poverty group comparisons. We can change this to the symbols used on the figure by changing the legend(order()) argument. Instead of using the 1 "Near Poor v. Poor" 2 "Low-income v. Poor" 3 "Middle-incoome v. Poor" 4 "High-income v. Poor" label, we will shift the numbers to 5 "Near Poor v. Poor" 6 "Low-income v. Poor" 7 "Middle-income v. Poor" 8 "High-income v. Poor", which denotes the symbol instead of the lines.

mplotoffset, offset(1.5) ///
             plot1opts(msymbol(square) msize(large) mcol("navy") dcol("none")) ///
             plot2opts(msymbol(square) msize(large) mcol("green") dcol("none")) ///
             plot3opts(msymbol(square) msize(large) mcol("cranberry") dcol("none")) ///
             plot4opts(msymbol(square) msize(large) mcol("orange") dcol("none")) ///
             ci1opts(lcol("navy")) ///
             ci2opts(lcol("green")) ///
             ci3opts(lcol("cranberry")) ///
             ci4opts(lcol("orange")) ///
             recast(dot) ///
             yline(0, lcol("cranberry")) ///
             xtitle("Age (Years)") xlab(, nogrid) ///
             ytitle("Avg Difference in Total Healthcare Expenditures ($)") ylab(, nogrid) ///
             title("") ///
             legend(order(5 "Near Poor v. Poor" 6 "Low-income v. Poor" 7 "Middle-income v. Poor" 8 "High-income v. Poor"))  
quietly graph export marginsplot6.svg, replace

. mplotoffset, offset(1.5) ///
>                          plot1opts(msymbol(square) msize(large) mcol("navy") 
> dcol("none")) ///
>                          plot2opts(msymbol(square) msize(large) mcol("green")
>  dcol("none")) ///
>                          plot3opts(msymbol(square) msize(large) mcol("cranber
> ry") dcol("none")) ///
>                          plot4opts(msymbol(square) msize(large) mcol("orange"
> ) dcol("none")) ///
>                          ci1opts(lcol("navy")) ///
>                          ci2opts(lcol("green")) ///
>                          ci3opts(lcol("cranberry")) ///
>                          ci4opts(lcol("orange")) ///
>                          recast(dot) ///
>                          yline(0, lcol("cranberry")) ///
>                          xtitle("Age (Years)") xlab(, nogrid) ///
>                          ytitle("Avg Difference in Total Healthcare Expenditu
> res ($)") ylab(, nogrid) ///
>                          title("") ///
>                          legend(order(5 "Near Poor v. Poor" 6 "Low-income v. 
> Poor" 7 "Middle-income v. Poor" 8 "High-income v. Poor"))  

  Variables that uniquely identify margins: age poverty

. quietly graph export marginsplot6.svg, replace

.

Average difference in healthcare expenditures between poverty categories with an offset and updated symbol, color, and legend

Conclusions

The marginsplot feature in Stata is a remarkable tool to allow us to plot the average marginal effects from a regression model. This is quite important when we are trying to interpret the interaction term in the regression model. Conventional coefficients from the regression output is difficult to interpret, particularly when interacting between a continuous term with a categorical term. The mplotoffset takes the marginsplot to a different level by allowing us to incorporate an offset to improve the visuals of the plot. Further using the Stata graph options allows us to add different symbols, colors, and labels to the marginsplot.

This tutorial only provides some example of the marginsplot and mplotoffset features. It is encouraged that you explore these amazing features with your own work.

Aknowledgements

The mplotoffset command was created by Nick Winter from the University of Virginia. (URL: https://econpapers.repec.org/software/bocbocode/s458344.htm)

Richard Williams’ paper on the margins command continues to be an invaluable introduction on how to use this to estimate the average marginal effects. Stata Journal. 2012;12(2):308-331

Disclaimers & Disclosures

This is a work in progress and subject to future changes and updates.

This is for educational purposes only.

Stata - marginsplot & mplotoffset commands for plotting average marginal effects