Introduction
In Stata
, users have a lot of flexibility with creating
plots, particularly after the margins
command has been
executed. Once a regression command has been run, users can estimate the
average marginal effect of a factor with respect to another variable
using the margins
command in Stata
. Once the
average marginal effect has been estimated, users can plot this using
the marginsplot
or mplotoffset
commands. These
are power tools that allow us to visualize the average marginal effects,
particularly when we have interaction terms.
This article will review some basic features of the
marginsplot
and mplotoffset
commands and
provide some practical examples of customization.
Motivating example
We will use data from the Agency for Healthcare Research and Quality (AHRQ) Medical Expenditure Panel Survey (MEPS).
Load data from GitHub repository
We will load our data from the GitHub repository Stata
tutorials. From Stata
, we can load data using the
import delimited
command. Once the data is loaded, we can
explore the data using the describe
command.
// SELECT DIRECTORY / LOAD DATA FROM GITHUB
clear all
import delimited "https://raw.githubusercontent.com/mbounthavong/Stata-tutorials/refs/heads/main/Data/meps22.csv"
// DESCRIBE DATA
describe
. // SELECT DIRECTORY / LOAD DATA FROM GITHUB
. clear all
. import delimited "https://raw.githubusercontent.com/mbounthavong/Stata-tutori
> als/refs/heads/main/Data/meps22.csv"
(encoding automatically selected: ISO-8859-1)
(5 vars, 18,101 obs)
.
. // DESCRIBE DATA
. describe
Contains data
Observations: 18,101
Variables: 5
-------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------
dupersid double %10.0g
age byte %8.0g
race str43 %43s
povcat str15 %15s
totexp long %12.0g
-------------------------------------------------------------------------------
Sorted by:
Note: Dataset has changed since last saved.
.
The following variables are listed:
dupersid: This is the unique identifier of the respondent
age: Age of the respondent - Continuous data type
race: Race of the respondent - Categorical data type
povcat: Poverty category of the respondent - Categorical data type
totexp: Total healthcare expenditures (costs, $US) - Continuous data type
Create new variables
Let’s take a look at the race
and povcat
variables.
// RACE CATEGORIES
tab race, m
// POVERTY CATEGORIES
tab povcat, m
. // RACE CATEGORIES
. tab race, m
race | Freq. Percent Cum.
----------------------------------------+-----------------------------------
1 WHITE - NO OTHER RACE REPORTED | 13,639 75.35 75.35
2 BLACK - NO OTHER RACE REPORTED | 2,683 14.82 90.17
3 AMER INDIAN/ALASKA NATIVE - NO OTHE.. | 145 0.80 90.97
4 ASIAN/NATV HAWAIIAN/PACFC ISL-NO OTH | 1,090 6.02 96.99
6 MULTIPLE RACES REPORTED | 544 3.01 100.00
----------------------------------------+-----------------------------------
Total | 18,101 100.00
.
. // POVERTY CATEGORIES
. tab povcat, m
povcat | Freq. Percent Cum.
----------------+-----------------------------------
1 POOR/NEGATIVE | 2,788 15.40 15.40
2 NEAR POOR | 815 4.50 19.90
3 LOW INCOME | 2,387 13.19 33.09
4 MIDDLE INCOME | 5,032 27.80 60.89
5 HIGH INCOME | 7,079 39.11 100.00
----------------+-----------------------------------
Total | 18,101 100.00
.
The race
and povcat
variables are in string
format. We want to change these into numeric for our computations. We
can do this by creating new variables in Stata
.
// NEW RACE VARIABLE
codebook race, tab(1000) /* Provides the full label in a tabulate format */
gen race1 = .
replace race1 = 0 if race == "1 WHITE - NO OTHER RACE REPORTED"
replace race1 = 1 if race == "2 BLACK - NO OTHER RACE REPORTED"
replace race1 = 2 if race == "3 AMER INDIAN/ALASKA NATIVE - NO OTHER RACE"
replace race1 = 3 if race == "4 ASIAN/NATV HAWAIIAN/PACFC ISL-NO OTH"
replace race1 = 4 if race == "6 MULTIPLE RACES REPORTED"
label define race_lbl 0 "White" 1 "Black" 2 "AI/AN" 3 "Asian" 4 "Mulitple"
label values race1 race_lbl
tab race1, m
// NEW POVERTY VARIABLE
codebook povcat, tab(1000) /* Provides the full label in a tabulate format */
gen poverty = .
replace poverty = 0 if povcat == "1 POOR/NEGATIVE"
replace poverty = 1 if povcat == "2 NEAR POOR"
replace poverty = 2 if povcat == "3 LOW INCOME"
replace poverty = 3 if povcat == "4 MIDDLE INCOME"
replace poverty = 4 if povcat == "5 HIGH INCOME"
label define poverty_lbl 0 "Poor" 1 "Near Poor" 2 "Low-income" 3 "Middle-income" 4 "High-income"
label values poverty poverty_lbl
tab poverty, m
. // NEW RACE VARIABLE
. codebook race, tab(1000) /* Provides the full label in a tabulate format */
-------------------------------------------------------------------------------
race (unlabeled)
-------------------------------------------------------------------------------
Type: String (str43)
Unique values: 5 Missing "": 0/18,101
Tabulation: Freq. Value
13,639 "1 WHITE - NO OTHER RACE REPORTED"
2,683 "2 BLACK - NO OTHER RACE REPORTED"
145 "3 AMER INDIAN/ALASKA NATIVE - NO OTHER
RACE"
1,090 "4 ASIAN/NATV HAWAIIAN/PACFC ISL-NO
OTH"
544 "6 MULTIPLE RACES REPORTED"
Warning: Variable has embedded blanks.
.
. gen race1 = .
(18,101 missing values generated)
. replace race1 = 0 if race == "1 WHITE - NO OTHER RACE REPORTED"
(13,639 real changes made)
. replace race1 = 1 if race == "2 BLACK - NO OTHER RACE REPORTED"
(2,683 real changes made)
. replace race1 = 2 if race == "3 AMER INDIAN/ALASKA NATIVE - NO OTHER
> RACE"
(145 real changes made)
. replace race1 = 3 if race == "4 ASIAN/NATV HAWAIIAN/PACFC ISL-NO OTH"
(1,090 real changes made)
. replace race1 = 4 if race == "6 MULTIPLE RACES REPORTED"
(544 real changes made)
.
. label define race_lbl 0 "White" 1 "Black" 2 "AI/AN" 3 "Asian" 4 "Mulitple"
. label values race1 race_lbl
. tab race1, m
race1 | Freq. Percent Cum.
------------+-----------------------------------
White | 13,639 75.35 75.35
Black | 2,683 14.82 90.17
AI/AN | 145 0.80 90.97
Asian | 1,090 6.02 96.99
Mulitple | 544 3.01 100.00
------------+-----------------------------------
Total | 18,101 100.00
.
. // NEW POVERTY VARIABLE
. codebook povcat, tab(1000) /* Provides the full label in a tabulate format *
> /
-------------------------------------------------------------------------------
povcat (unlabeled)
-------------------------------------------------------------------------------
Type: String (str15)
Unique values: 5 Missing "": 0/18,101
Tabulation: Freq. Value
2,788 "1 POOR/NEGATIVE"
815 "2 NEAR POOR"
2,387 "3 LOW INCOME"
5,032 "4 MIDDLE INCOME"
7,079 "5 HIGH INCOME"
Warning: Variable has embedded blanks.
.
. gen poverty = .
(18,101 missing values generated)
. replace poverty = 0 if povcat == "1 POOR/NEGATIVE"
(2,788 real changes made)
. replace poverty = 1 if povcat == "2 NEAR POOR"
(815 real changes made)
. replace poverty = 2 if povcat == "3 LOW INCOME"
(2,387 real changes made)
. replace poverty = 3 if povcat == "4 MIDDLE INCOME"
(5,032 real changes made)
. replace poverty = 4 if povcat == "5 HIGH INCOME"
(7,079 real changes made)
.
. label define poverty_lbl 0 "Poor" 1 "Near Poor" 2 "Low-income" 3 "Middle-inco
> me" 4 "High-income"
. label values poverty poverty_lbl
. tab poverty, m
poverty | Freq. Percent Cum.
--------------+-----------------------------------
Poor | 2,788 15.40 15.40
Near Poor | 815 4.50 19.90
Low-income | 2,387 13.19 33.09
Middle-income | 5,032 27.80 60.89
High-income | 7,079 39.11 100.00
--------------+-----------------------------------
Total | 18,101 100.00
.
Now that we have race1
and poverty
variables that are in the correct factor format, we can start using
these in our analysis. ### Regression model Next, we will create a
linear regression model where the total expenditure
(totexp
) is the dependent variable with age
,
race1
, and poverty
as the independent
variables. We will also create an interaction between age
and poverty
so that we can apply the margins
command.
Here is the structural form of the linear regression model:
\[\begin{aligned} E[Y | X] = \beta_{0} + \beta_{1}Age_{i} + \beta_{2}Poverty_{i} + \beta_{3}(Age_{i}*Poverty_{i}) + \beta_{4}Race_{i} + \epsilon_{i} \end{aligned}\]To run the linear regression model in Stata
, we use the
following code chunk. Note that we are running this model with
age
as a continuous term and poverty
as a
factor:
// REGRESSION MODEL
glm totexp c.age i.poverty c.age#i.poverty i.race1, family("Gaussian") link("identity") vce(robust)
. // REGRESSION MODEL
. glm totexp c.age i.poverty c.age#i.poverty i.race1, family("Gaussian") link("
> identity") vce(robust)
Iteration 0: Log pseudolikelihood = -208498.48
Generalized linear models Number of obs = 18,101
Optimization : ML Residual df = 18,087
Scale parameter = 5.93e+08
Deviance = 1.07194e+13 (1/df) Deviance = 5.93e+08
Pearson = 1.07194e+13 (1/df) Pearson = 5.93e+08
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
AIC = 23.03878
Log pseudolikelihood = -208498.4763 BIC = 1.07e+13
------------------------------------------------------------------------------
| Robust
totexp | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | 237.6858 27.462 8.66 0.000 183.8613 291.5104
|
poverty |
Near Poor | 10488.48 11060.77 0.95 0.343 -11190.23 32167.2
Low-income | 246.2486 1748.971 0.14 0.888 -3181.671 3674.169
Middle-in~e | -196.0182 1421.265 -0.14 0.890 -2981.647 2589.61
High-income | 2020.086 1574.106 1.28 0.199 -1065.105 5105.278
|
poverty#|
c.age |
Near Poor | -156.7318 171.3784 -0.91 0.360 -492.6272 179.1636
Low-income | -20.34791 35.95424 -0.57 0.571 -90.81694 50.12111
Middle-in~e | -30.38333 30.51265 -1.00 0.319 -90.18702 29.42036
High-income | -54.06838 32.07875 -1.69 0.092 -116.9416 8.804808
|
race1 |
Black | -1184.321 448.4658 -2.64 0.008 -2063.298 -305.3441
AI/AN | -377.8293 1530.5 -0.25 0.805 -3377.554 2621.896
Asian | -2736.707 514.3436 -5.32 0.000 -3744.802 -1728.612
Mulitple | 1686.795 1039.337 1.62 0.105 -350.2694 3723.859
|
_cons | -2353.181 1312.973 -1.79 0.073 -4926.559 220.1983
------------------------------------------------------------------------------
.
Now that we have the regression model output, we can start the next
step, which is to use the margins()
function to estimate
the average marginal effect.
Average marginal effect
For this example, we want to estimate the average marginal effect of poverty on total healthcare expenditures with respect to various ages. In other words, we want to estimate the average difference in healthcare expenditures between poverty levels at various ages.
margins, dydx(poverty) at(age = (25 35 45 55 65))
. margins, dydx(poverty) at(age = (25 35 45 55 65))
Average marginal effects Number of obs = 18,101
Model VCE: Robust
Expression: Predicted mean totexp, predict()
dy/dx wrt: 1.poverty 2.poverty 3.poverty 4.poverty
1._at: age = 25
2._at: age = 35
3._at: age = 45
4._at: age = 55
5._at: age = 65
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
0.poverty | (base outcome)
-------------+----------------------------------------------------------------
1.poverty |
_at |
1 | 6570.19 6818.189 0.96 0.335 -6793.216 19933.6
2 | 5002.872 5140.642 0.97 0.330 -5072.6 15078.34
3 | 3435.554 3498.399 0.98 0.326 -3421.182 10292.29
4 | 1868.236 1981.288 0.94 0.346 -2015.018 5751.49
5 | 300.9179 1219.147 0.25 0.805 -2088.567 2690.403
-------------+----------------------------------------------------------------
2.poverty |
_at |
1 | -262.4492 958.86 -0.27 0.784 -2141.78 1616.882
2 | -465.9284 718.3771 -0.65 0.517 -1873.922 942.0648
3 | -669.4075 609.3112 -1.10 0.272 -1863.636 524.8205
4 | -872.8866 696.4167 -1.25 0.210 -2237.838 492.065
5 | -1076.366 925.8908 -1.16 0.245 -2891.078 738.3469
-------------+----------------------------------------------------------------
3.poverty |
_at |
1 | -955.6014 754.4823 -1.27 0.205 -2434.359 523.1566
2 | -1259.435 561.0709 -2.24 0.025 -2359.114 -159.7559
3 | -1563.268 496.5502 -3.15 0.002 -2536.488 -590.0475
4 | -1867.101 603.7614 -3.09 0.002 -3050.452 -683.7506
5 | -2170.935 817.7396 -2.65 0.008 -3773.675 -568.1945
-------------+----------------------------------------------------------------
4.poverty |
_at |
1 | 668.3766 862.0841 0.78 0.438 -1021.277 2358.03
2 | 127.6928 639.9271 0.20 0.842 -1126.541 1381.927
3 | -412.9911 530.6915 -0.78 0.436 -1453.127 627.1452
4 | -953.6749 599.6412 -1.59 0.112 -2128.95 221.6002
5 | -1494.359 802.0691 -1.86 0.062 -3066.385 77.66783
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
.
Once we estimate the average marginal effects, we used the
marginsplot
to plot the difference in healthcare
expenditures between poverty groups at age = 25, 35, 45, 55, and 65
years. The difference in healthcare expenditure uses the poverty
category = Poor
as the reference.
Plotting the average marginal effect using
marginsplot
marginsplot
quietly graph export marginsplot1.svg, replace
. marginsplot
Variables that uniquely identify margins: age _deriv
. quietly graph export marginsplot1.svg, replace
.
We can generate the same results and plot using the
r.group
argument.
margins r.poverty, at(age = (25 35 45 55 65))
marginsplot
quietly graph export marginsplot2.svg, replace
. margins r.poverty, at(age = (25 35 45 55 65))
Contrasts of predictive margins Number of obs = 18,101
Model VCE: Robust
Expression: Predicted mean totexp, predict()
1._at: age = 25
2._at: age = 35
3._at: age = 45
4._at: age = 55
5._at: age = 65
--------------------------------------------------------------
| df chi2 P>chi2
---------------------------+----------------------------------
poverty@_at |
(Near Poor vs Poor) 1 | 1 0.93 0.3352
(Near Poor vs Poor) 2 | 1 0.95 0.3305
(Near Poor vs Poor) 3 | 1 0.96 0.3261
(Near Poor vs Poor) 4 | 1 0.89 0.3457
(Near Poor vs Poor) 5 | 1 0.06 0.8050
(Low-income vs Poor) 1 | 1 0.07 0.7843
(Low-income vs Poor) 2 | 1 0.42 0.5166
(Low-income vs Poor) 3 | 1 1.21 0.2719
(Low-income vs Poor) 4 | 1 1.57 0.2101
(Low-income vs Poor) 5 | 1 1.35 0.2450
(Middle-income vs Poor) 1 | 1 1.60 0.2053
(Middle-income vs Poor) 2 | 1 5.04 0.0248
(Middle-income vs Poor) 3 | 1 9.91 0.0016
(Middle-income vs Poor) 4 | 1 9.56 0.0020
(Middle-income vs Poor) 5 | 1 7.05 0.0079
(High-income vs Poor) 1 | 1 0.60 0.4382
(High-income vs Poor) 2 | 1 0.04 0.8418
(High-income vs Poor) 3 | 1 0.61 0.4364
(High-income vs Poor) 4 | 1 2.53 0.1117
(High-income vs Poor) 5 | 1 3.47 0.0624
Joint | 8 21.94 0.0050
--------------------------------------------------------------
----------------------------------------------------------------------------
| Delta-method
| Contrast std. err. [95% conf. interval]
---------------------------+------------------------------------------------
poverty@_at |
(Near Poor vs Poor) 1 | 6570.19 6818.189 -6793.216 19933.6
(Near Poor vs Poor) 2 | 5002.872 5140.642 -5072.6 15078.34
(Near Poor vs Poor) 3 | 3435.554 3498.399 -3421.182 10292.29
(Near Poor vs Poor) 4 | 1868.236 1981.288 -2015.018 5751.49
(Near Poor vs Poor) 5 | 300.9179 1219.147 -2088.567 2690.403
(Low-income vs Poor) 1 | -262.4492 958.86 -2141.78 1616.882
(Low-income vs Poor) 2 | -465.9284 718.3771 -1873.922 942.0648
(Low-income vs Poor) 3 | -669.4075 609.3112 -1863.636 524.8205
(Low-income vs Poor) 4 | -872.8866 696.4167 -2237.838 492.065
(Low-income vs Poor) 5 | -1076.366 925.8908 -2891.078 738.3469
(Middle-income vs Poor) 1 | -955.6014 754.4823 -2434.359 523.1566
(Middle-income vs Poor) 2 | -1259.435 561.0709 -2359.114 -159.7559
(Middle-income vs Poor) 3 | -1563.268 496.5502 -2536.488 -590.0475
(Middle-income vs Poor) 4 | -1867.101 603.7614 -3050.452 -683.7506
(Middle-income vs Poor) 5 | -2170.935 817.7396 -3773.675 -568.1945
(High-income vs Poor) 1 | 668.3766 862.0841 -1021.277 2358.03
(High-income vs Poor) 2 | 127.6928 639.9271 -1126.541 1381.927
(High-income vs Poor) 3 | -412.9911 530.6915 -1453.127 627.1452
(High-income vs Poor) 4 | -953.6749 599.6412 -2128.95 221.6002
(High-income vs Poor) 5 | -1494.359 802.0691 -3066.385 77.66783
----------------------------------------------------------------------------
.
. marginsplot
Variables that uniquely identify margins: age poverty
. quietly graph export marginsplot2.svg, replace
.
Using the mplotoffset
command to add an offset
However, the average differences in healthcare expenditures between
the poverty categories are overlapping at each age interval. To make
this a little bit easier to view, we can add an offset. We’ll need to
use the mplotoffset
package, which you can install in
Stata
using the following code chunk:
ssc install mplotoffset
.
mplotoffset, offset(1.5)
quietly graph export marginsplot3.svg, replace
. mplotoffset, offset(1.5)
Variables that uniquely identify margins: age poverty
. quietly graph export marginsplot3.svg, replace
.
Using the offset remove the overlap and improves the visibility of the average marginal effect plot.
Improving the mplotoffset
visualization - Changing the
symbol
The offset helps to improve visibility, but the lines connecting the
various poverty groups comparisons is messy. We can remove these lines
with further arguments to the mplotoffset
command.
Let’s change the symbol from a circle to a square. We can also change
the color to navy
. We’ll also add a neutral line at
0
, which denotes that there was no difference in healthcare
expenditures between the two poverty groups. We’ll give the neutral line
a cranberry dashed pattern.
mplotoffset, offset(1.5) ///
plotopts(msymbol(square) msize(large) mcol("navy") dcol("none")) ///
ciopts(lcol("navy")) ///
recast(dot) ///
yline(0, lcol("cranberry")) ///
xtitle("Age (Years)") ///
xlab( , nogrid) ///
ytitle("Avg Difference in Total Healthcare Expenditures ($)") ylab(, nogrid) ///
title("")
quietly graph export marginsplot4.svg, replace
. mplotoffset, offset(1.5) ///
> plotopts(msymbol(square) msize(large) mcol("navy") d
> col("none")) ///
> ciopts(lcol("navy")) ///
> recast(dot) ///
> yline(0, lcol("cranberry")) ///
> xtitle("Age (Years)") ///
> xlab( , nogrid) ///
> ytitle("Avg Difference in Total Healthcare Expenditu
> res ($)") ylab(, nogrid) ///
> title("")
Variables that uniquely identify margins: age poverty
. quietly graph export marginsplot4.svg, replace
.
Improving the mplotoffset
visualization - Changing the
symbol & colors
Having the same colors is a little boring. So let’s change the colors.
We can do this by adding additional arguments to the
mplotoffset
argument.
We will need to change the colors for each comparisons using the
plotopts()
function. However, we need to add an identifier
to each one of the comparisons. For example, the first comparison is
plot1opts()
, which denotes the
Near Poor v. Poor
comparison.
The colors for the 95% confidence interval (CI) whiskers also follows
a similar coding pattern. Instead of plotopts()
, we use
ciopts()
. Hence, ci1opts()
denotes the 95% CI
for the first comparison.
Here is the full code:
mplotoffset, offset(1.5) ///
plot1opts(msymbol(square) msize(large) mcol("navy") dcol("none")) ///
plot2opts(msymbol(square) msize(large) mcol("green") dcol("none")) ///
plot3opts(msymbol(square) msize(large) mcol("cranberry") dcol("none")) ///
plot4opts(msymbol(square) msize(large) mcol("orange") dcol("none")) ///
ci1opts(lcol("navy")) ///
ci2opts(lcol("green")) ///
ci3opts(lcol("cranberry")) ///
ci4opts(lcol("orange")) ///
recast(dot) ///
yline(0, lcol("cranberry")) ///
xtitle("Age (Years)") xlab(, nogrid) ///
ytitle("Avg Difference in Total Healthcare Expenditures ($)") ylab(, nogrid) ///
title("") ///
legend(order(1 "Near Poor v. Poor" 2 "Low-income v. Poor" 3 "Middle-income v. Poor" 4 "High-income v. Poor"))
quietly graph export marginsplot5.svg, replace
. mplotoffset, offset(1.5) ///
> plot1opts(msymbol(square) msize(large) mcol("navy")
> dcol("none")) ///
> plot2opts(msymbol(square) msize(large) mcol("green")
> dcol("none")) ///
> plot3opts(msymbol(square) msize(large) mcol("cranber
> ry") dcol("none")) ///
> plot4opts(msymbol(square) msize(large) mcol("orange"
> ) dcol("none")) ///
> ci1opts(lcol("navy")) ///
> ci2opts(lcol("green")) ///
> ci3opts(lcol("cranberry")) ///
> ci4opts(lcol("orange")) ///
> recast(dot) ///
> yline(0, lcol("cranberry")) ///
> xtitle("Age (Years)") xlab(, nogrid) ///
> ytitle("Avg Difference in Total Healthcare Expenditu
> res ($)") ylab(, nogrid) ///
> title("") ///
> legend(order(1 "Near Poor v. Poor" 2 "Low-income v.
> Poor" 3 "Middle-income v. Poor" 4 "High-income v. Poor"))
Variables that uniquely identify margins: age poverty
. quietly graph export marginsplot5.svg, replace
.
The legend has lines denoting the different poverty group
comparisons. We can change this to the symbols used on the figure by
changing the legend(order())
argument. Instead of using the
1 "Near Poor v. Poor" 2 "Low-income v. Poor" 3 "Middle-incoome v. Poor" 4 "High-income v. Poor"
label, we will shift the numbers to
5 "Near Poor v. Poor" 6 "Low-income v. Poor" 7 "Middle-income v. Poor" 8 "High-income v. Poor"
,
which denotes the symbol instead of the lines.
mplotoffset, offset(1.5) ///
plot1opts(msymbol(square) msize(large) mcol("navy") dcol("none")) ///
plot2opts(msymbol(square) msize(large) mcol("green") dcol("none")) ///
plot3opts(msymbol(square) msize(large) mcol("cranberry") dcol("none")) ///
plot4opts(msymbol(square) msize(large) mcol("orange") dcol("none")) ///
ci1opts(lcol("navy")) ///
ci2opts(lcol("green")) ///
ci3opts(lcol("cranberry")) ///
ci4opts(lcol("orange")) ///
recast(dot) ///
yline(0, lcol("cranberry")) ///
xtitle("Age (Years)") xlab(, nogrid) ///
ytitle("Avg Difference in Total Healthcare Expenditures ($)") ylab(, nogrid) ///
title("") ///
legend(order(5 "Near Poor v. Poor" 6 "Low-income v. Poor" 7 "Middle-income v. Poor" 8 "High-income v. Poor"))
quietly graph export marginsplot6.svg, replace
. mplotoffset, offset(1.5) ///
> plot1opts(msymbol(square) msize(large) mcol("navy")
> dcol("none")) ///
> plot2opts(msymbol(square) msize(large) mcol("green")
> dcol("none")) ///
> plot3opts(msymbol(square) msize(large) mcol("cranber
> ry") dcol("none")) ///
> plot4opts(msymbol(square) msize(large) mcol("orange"
> ) dcol("none")) ///
> ci1opts(lcol("navy")) ///
> ci2opts(lcol("green")) ///
> ci3opts(lcol("cranberry")) ///
> ci4opts(lcol("orange")) ///
> recast(dot) ///
> yline(0, lcol("cranberry")) ///
> xtitle("Age (Years)") xlab(, nogrid) ///
> ytitle("Avg Difference in Total Healthcare Expenditu
> res ($)") ylab(, nogrid) ///
> title("") ///
> legend(order(5 "Near Poor v. Poor" 6 "Low-income v.
> Poor" 7 "Middle-income v. Poor" 8 "High-income v. Poor"))
Variables that uniquely identify margins: age poverty
. quietly graph export marginsplot6.svg, replace
.
Conclusions
The marginsplot
feature in Stata
is a
remarkable tool to allow us to plot the average marginal effects from a
regression model. This is quite important when we are trying to
interpret the interaction term in the regression model. Conventional
coefficients from the regression output is difficult to interpret,
particularly when interacting between a continuous term with a
categorical term. The mplotoffset
takes the
marginsplot
to a different level by allowing us to
incorporate an offset to improve the visuals of the plot. Further using
the Stata
graph options allows us to add different symbols,
colors, and labels to the marginsplot
.
This tutorial only provides some example of the
marginsplot
and mplotoffset
features. It is
encouraged that you explore these amazing features with your own
work.
Aknowledgements
The mplotoffset
command was created by Nick Winter from
the University of Virginia. (URL: https://econpapers.repec.org/software/bocbocode/s458344.htm)
Richard Williams’ paper on the margins
command continues
to be an invaluable introduction on how to use this to estimate the
average marginal effects. Stata
Journal. 2012;12(2):308-331
Disclaimers & Disclosures
This is a work in progress and subject to future changes and updates.
This is for educational purposes only.