Trade policy is back on front pages. In 2025 the United States announced a baseline tariff on most imports under IEEPA and later modified reciprocal tariff rates by executive order on July 31, 2025, with further adjustments announced in August. These moves revived questions about how broad tariffs ripple through prices, sourcing, and trade volumes, and how exemptions or phased schedules may change effective rates country by country.
The near-term political news also underscores why a country-level cross-section is useful: several partners negotiated temporary reprieves or country-specific terms (e.g., Mexico’s 90-day pause to work on a broader deal), implying heterogeneous exposure to tariffs across countries at a point in time. That heterogeneity motivates regressions that control for development and connectivity when we ask whether higher tariffs are associated with lower import intensity (imports as % of GDP).
This workshop estimates the association between tariffs and trade openness using a one-year cross-section of countries. Theory predicts that a higher import tariff raises domestic relative prices and tends to reduce import volumes; in general equilibrium, tariffs can also affect the terms of trade and reallocate production and consumption across sectors (classics include Gorman, 1959; Leith, 1971). Modern empirical work further separates tariff from non-tariff changes in trade agreements and finds measurable impacts on trade margins (Cheong, Kwak, & Tang, 2018). Recent analysis also clarifies how a permanent import tariff can affect trade balances under minimal structure, motivating a look at net trade as well as imports and exports (Costinot & Werning, 2025). Concretely, we begin with imports,
\text{Imports}_i \;=\; \alpha \;+\; \beta_{\text{tariff}}\cdot \text{Tariff}_i \;+\; \gamma' Z_i \;+\; u_i,
where Z_i collects income, size, and connectivity controls, and we expect \beta_{\text{tariff}}<0 if higher tariffs compress import intensity. We then replicate the exercise for exports (tariffs may also depress exports via input costs and retaliation) and for net trade, where \text{net trade}_i=\text{exports}_i-\text{imports}_i, to summarize overall external balance in the same cross-section.
Academic work helps us choose covariates. A large literature shows that digital connectivity lowers search and coordination costs, expanding both the extensive margin (who trades) and the intensive margin (how much). Early and influential evidence includes Freund & Weinhold (2004) and Osnago & Tan (2016), who document positive links between Internet adoption and trade flows. These insights justify our working set of regressors, tariffs (policy wedge), income (GDP per capita, PPP), market size/geography (population and area or density), and connectivity (Internet %). Finally, we extend the model in two useful directions:
interactions with dummy variables for World Bank income group, to test whether certain countries are more affected by tariffs than others, and
a polynomial term in the tariff rate to allow for curvature (e.g., marginal effects that get stronger as tariffs rise).
All analyses are cross‑sectional (one year only).
2 OLS regression
We begin by defining a linear relationship between an outcome y (e.g., imports as % of GDP) and predictors x_1,\dots,x_k (e.g., tariffs, income, population, connectivity):
Differentiating and setting to zero yields the closed-form estimator:
\hat{\beta} \;=\; (X'X)^{-1} X'y \quad \text{(provided \(X'X\) is invertible)}.
The fitted values and residuals are
\hat{y} \;=\; X\hat{\beta}, \qquad \hat{u} \;=\; y - \hat{y}.
An unbiased estimator of the error variance under the classical assumptions is
\hat{\sigma}^2 \;=\; \frac{\hat{u}'\hat{u}}{\,n - k - 1\,}.
The classical standard error of \hat{\beta}_j is
\operatorname{se}(\hat{\beta}_j) \;=\; \sqrt{ \, \hat{\sigma}^2 \cdot \big[(X'X)^{-1}\big]_{jj} \, } .
With normally distributed errors, the usual t-tests and confidence intervals follow:
\frac{\hat{\beta}_j - \beta_j}{\operatorname{se}(\hat{\beta}_j)} \;\sim\; t_{\,n-k-1}.
2.0.1 Simple vs. multiple regression
In a simple regressiony_i=\beta_0+\beta_1 x_{i1}+u_i, \hat{\beta}_1 measures the average change in y associated with a one-unit change in x_{1}. This is easy to read but fragile: any factor correlated with both y and x_1 gets absorbed into u_i, potentially biasing \hat{\beta}_1.
In a multiple regressiony_i=\beta_0+\beta_1 x_{i1}+\cdots+\beta_k x_{ik}+u_i, \hat{\beta}_j measures the partial association between x_j and yceteris paribus (holding the other regressors fixed). This is why, when we study tariffs and trade, we include income, size/geography, and connectivity, to reduce omitted-variable bias and read the tariff coefficient as a conditional association rather than a proxy for broader development or access differences.
2.0.2 The OLS assumptions (cross-section)
These standard conditions underpin unbiasedness, consistency, (and, with a few extras) efficiency, plus valid small-sample inference. Under exogeneity, OLS is unbiased \mathbb{E}[\hat\beta \mid X]=\beta With i.i.d. sampling, OLS is consistent (properties improve with sample size). If we also assume homoskedasticity and no autocorrelation, OLS is efficient among linear unbiased estimators (BLUE): it achieves the smallest sampling variance \operatorname{Var}(\hat\beta \mid X)=\sigma^2 (X'X)^{-1}. Finally, adding normal errors justifies exact small-sample t tests and confidence intervals. We will test these conditions in the next workshop; here they provide the scaffold for interpreting coefficients.
Assumption
Mathematical statement
Meaning (plain language)
Linearity in parameters
y_i=\beta_0+\sum_{j=1}^k \beta_j x_{ij}+u_i
The model is linear in the coefficients\beta. You can include transformations like \log x, x^2, or interactions; OLS remains linear in \beta.
Exogeneity (zero conditional mean)
\mathbb{E}[u_i \mid X] = 0
Regressors are uncorrelated with unobservables that affect y. Delivers unbiased and consistent\hat{\beta}. Violations cause omitted-variable or simultaneity bias.
Homoskedasticity
\operatorname{Var}(u_i \mid X)=\sigma^2
Error variance is constant across observations; with no autocorrelation this makes OLS BLUE. If false, classical standard errors are wrong, so you have to use robust standard errors.
No autocorrelation (cross-section)
\operatorname{Cov}(u_i,u_j \mid X)=0,\; i\neq j
Country errors are uncorrelated. Spatial clustering (neighbors share shocks) can violate this; cluster-robust standard errors help.
No perfect multicollinearity
\operatorname{rank}(X)=k{+}1
No regressor is an exact linear combination of others
Normality (for exact small-sample inference)
u_i \sim \mathcal{N}(0,\sigma^2)
Not required for unbiasedness; gives exact t/F in small samples. Otherwise rely on large-sample or robust inference.
3 Load data and prepare variables
We will:
read the Excel,
create logs for skewed scale variables
3.1 Python libraries used in this workshop
We lean on a small, standard toolkit that’s available in RStudio’s Python engine and in Google Colab without extra installs:
pandas. This is our workhorse for data wrangling. A DataFrame (think: a labeled table) lets us clean columns, merge country attributes, compute logs, and build the final modeling dataset. You’ll see patterns like pd.read_excel(url_or_path) to load data, df.rename(...) and df.assign(...) to create clean variables (e.g., ln_population = log(population + 1)), and df.dropna() to keep complete cases. Two core objects:
DataFrame: 2-D table with labeled rows/columns (e.g., one row per country).
Series: 1-D labeled array (e.g., a single variable/column).
NumPy. Provides fast, vectorized math. We mostly use it through pandas, but call numpy directly for safe transforms, e.g. np.log(x + 1) to avoid log(0), or to create polynomial terms like x**2. NumPy arrays (ndarray) are the low-level structures that make column-wise operations fast and memory-efficient.
statsmodels. Our statistics and econometrics engine. We’ll use the formula API (statsmodels.formula.api as smf) to specify models in a clear syntax:
Categoricals: Just include string columns (e.g., continent, wb_income_group) and statsmodels will dummy-encode them automatically; or use C(var) to force categorical treatment if needed.
Interactions: var1 * var2 expands to main effects + interaction; var1 : var2 is interaction only.
Polynomials: Wrap with I() to treat math literally, e.g., I(tariff_weighted**2).
We’ll also use statsmodels.iolib.summary2.summary_col to print a compact comparison table of models (simple, multiple, interactions, polynomial) with sample size and R^2.
Matplotlib. Simple plots for quick data exploration, such as two-way scatter plots and a correlation heatmap. It’s part of the standard stack and doesn’t require any configuration for our use (e.g., plt.scatter(x, y)).
In short: pandas to shape the data, NumPy for numerical transforms (logs, squares), statsmodels to estimate and summarize regressions, and matplotlib to visualize relationships before we model. This keeps the workflow transparent and reproducible across RStudio and Colab.
3.2 Importing data from the internet
For this workshop we will download the data directly from Git Hub. The dataset consists of the following variables:
Variable (column name)
What it is & role in the regression
imports_pct_gdp
Imports of goods & services (% of GDP). Primary dependent variable (DV) to measure trade openness on the import side. We expect a negative association with tariffs when conditioning on controls.
exports_pct_gdp
Exports of goods & services (% of GDP). Alternative DV for robustness. Tariffs can affect exports via higher input costs, retaliation, or supply-chain re-routing. Sign is ambiguous a priori.
trade_pct_gdp
Total trade (% of GDP) = Exports + Imports. Optional DV for a one-number openness measure; can be used as a robustness outcome.
tariff_weighted
Applied, weighted-mean tariff (%). Main policy regressor (IV). Captures the effective tariff burden based on import shares. In polynomial specs, also include I(tariff_weighted**2) to allow curvature.
gdppc_ppp_const / ln_gdppc_ppp_const
GDP per capita, PPP (constant intl $). Development/market depth control. Use the log transform (ln_gdppc_ppp_const) to reduce skew and interpret coefficients as semi-elasticities.
population / ln_population
Total population. Market size control. Larger economies often have lower trade/GDP ratios due to internal absorption. Prefer the log version.
surface_area / ln_surface_area
Country surface area (km²). Geography/scale control. Can capture remoteness and internal transport costs. Prefer the log version.
pop_density / ln_pop_density
Population density (people/km²). Optional structure control. Do not include together with both population and surface_area (risk of multicollinearity).
internet_users_pct
Individuals using the Internet (% of population). Connectivity/control (IV). Proxy for information/search and coordination channels.
continent
Region label (categorical). Use as dummies to absorb broad regional level differences; also useful in interactions (tariff_weighted*continent) to test heterogeneous tariff effects by region.
wb_income_group
World Bank income category (categorical). Use as dummies for baseline differences in development; also for interactions (tariff_weighted*wb_income_group) to test income-level heterogeneity.
For logged variables, use \ln(x+1) on strictly non-negative scales to avoid \ln(0). Keep percentage variables (e.g., tariffs, Internet %) in levels unless you have a specific elasticity interpretation in mind.
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport statsmodels.formula.api as smffrom statsmodels.iolib.summary2 import summary_col# Set or reset display formats# pd.options.display.float_format = Nonepd.options.display.float_format ="{:.2f}".format# >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>># EDIT THIS: set your path accordinglyurl ="https://github.com/chechurris/CD2010/raw/refs/heads/main/wbdata_trade.xlsx"# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<# Read the first sheet (or specify sheet_name="...")data = pd.read_excel(url)# Create logs for skewed scale variables (avoid log(0) with replace)for col in ["gdppc_ppp_const", "population", "surface_area", "pop_density"]:if col in data.columns: data[f"ln_{col}"] = np.log(data[col]+1e-7)data.head(8)
time
time_code
country
iso3c
gdp_const
gdp_nominal
gdppc_const
gdppc_nominal
gdppc_ppp_const
gdppc_ppp_current
...
internet_users_pct
pop_density
population
surface_area
continent
wb_income_group
ln_gdppc_ppp_const
ln_population
ln_surface_area
ln_pop_density
0
2022
YR2022
Albania
ALB
14385329993.53
19017244116.72
5178.88
6846.43
17111.95
19444.71
...
82.60
101.38
2777689.00
28750.00
Europe
Upper middle income
9.75
14.84
10.27
4.62
1
2022
YR2022
Algeria
DZA
206670488126.83
225638456572.14
4544.47
4961.55
14782.20
15836.09
...
74.80
19.09
45477389.00
2381741.00
Africa
Upper middle income
9.60
17.63
14.68
2.95
2
2022
YR2022
Angola
AGO
84883445838.16
104399746853.40
2382.02
2929.69
7397.49
7924.89
...
42.10
28.58
35635029.00
1246700.00
Africa
Lower middle income
8.91
17.39
14.04
3.35
3
2022
YR2022
Argentina
ARG
598603016935.29
632790070063.12
13182.79
13935.68
27627.96
29597.69
...
88.40
16.59
45407904.00
2780400.00
South America
Upper middle income
10.23
17.63
14.84
2.81
4
2022
YR2022
Australia
AUS
1587133480804.53
1690858246994.43
61009.81
64997.01
59883.65
65871.77
...
97.00
3.38
26014399.00
7741220.00
Oceania
High income
11.00
17.07
15.86
1.22
5
2022
YR2022
Austria
AUT
427236226728.30
471773629830.38
47250.97
52176.66
65661.45
70734.94
...
93.60
109.57
9041851.00
83879.00
Europe
High income
11.09
16.02
11.34
4.70
6
2022
YR2022
Azerbaijan
AZE
56789627316.90
78807470588.24
5599.59
7770.59
21051.24
22552.09
...
88.00
122.71
10141756.00
86600.00
Asia
Upper middle income
9.95
16.13
11.37
4.81
7
2022
YR2022
Bahamas, The
BHS
12795362392.88
13896800000.00
32186.51
34957.16
34342.79
36791.25
...
94.70
39.71
397538.00
13880.00
North America
High income
10.44
12.89
9.54
3.68
8 rows × 25 columns
3.2.1 Choice of outcome and predictors
For the baseline specification we take Imports (% of GDP) as the dependent variable. As predictors we use: tariff (weighted mean), log GDP per capita (PPP, constant), log population, log surface area, and internet users (%). This set captures policy exposure, development, market size, geographic scale, and digital adoption; five distinct levers that theory suggests should matter for trade intensity.
4 Descriptive statistics and visual exploration
Before running regressions, it is essential to understand the data’s level and spread. We examine descriptive statistics, two‑way scatter plots of the dependent variable against each regressor, and a correlation heatmap. This helps detect outliers, nonlinear patterns, and near‑redundant variables.
# Summary stats for variables useddesc = data.drop(columns=["country","iso3c","continent","wb_income_group"], errors="ignore").describe().Tdesc[["count","mean","std","min","25%","50%","75%","max"]]
count
mean
std
min
25%
50%
75%
max
gdp_const
128.00
664424051724.42
2465768654505.64
482629585.76
14593597529.00
63070517151.70
366728761759.98
21443388432051.03
gdp_nominal
128.00
742741833719.01
2861572424714.76
518180029.41
17727732488.17
72558819155.97
407091463782.61
26006893000000.00
gdppc_const
128.00
18045.42
22306.89
253.69
2343.29
7008.16
26611.60
109642.67
gdppc_nominal
128.00
20788.73
25908.96
250.63
2984.89
8368.96
30951.56
121613.94
gdppc_ppp_const
128.00
30039.53
27469.16
829.39
7367.57
21008.89
47322.66
133571.96
gdppc_ppp_current
128.00
32566.11
30169.37
888.52
7892.84
22506.71
51117.25
143094.95
trade_pct_gdp
128.00
98.51
57.60
26.89
60.93
85.59
123.36
384.88
exports_pct_gdp
128.00
47.86
31.80
4.97
26.26
40.84
62.06
194.49
imports_pct_gdp
128.00
50.64
28.00
15.29
29.04
43.66
63.26
190.39
tariff_simple
128.00
6.72
5.63
0.00
1.95
4.56
11.48
26.31
tariff_weighted
128.00
5.28
5.33
0.00
1.33
3.21
7.99
29.52
internet_users_pct
125.00
72.52
24.71
11.00
57.70
82.10
91.50
100.00
pop_density
125.00
239.29
755.96
2.20
27.91
82.79
150.42
7851.01
population
128.00
50870541.05
180798717.14
64749.00
2818151.50
10435607.00
33230723.50
1425423212.00
surface_area
125.00
813220.12
2108325.92
300.00
38390.00
147570.00
587295.00
15634410.00
ln_gdppc_ppp_const
128.00
9.78
1.17
6.72
8.90
9.95
10.76
11.80
ln_population
128.00
16.06
1.84
11.08
14.85
16.16
17.32
21.08
ln_surface_area
125.00
11.83
2.17
5.70
10.56
11.90
13.28
16.56
ln_pop_density
125.00
4.34
1.43
0.79
3.33
4.42
5.01
8.97
# Pairwise scatter plots: Imports %GDP vs each IVivs = []for v in ["tariff_weighted","ln_gdppc_ppp_const","ln_population","ln_surface_area","internet_users_pct"]:if v in data.columns: ivs.append(v)fig, axs = plt.subplots(2, 3, figsize=(12, 8))axs = axs.ravel()for i, v inenumerate(ivs): axs[i].scatter(data[v], data["imports_pct_gdp"], s=14) axs[i].set_xlabel(v) axs[i].set_ylabel("Imorts (%GDP)") axs[i].set_title(f"Imports (%GDP) vs {v}")# Hide any spare subplotfor j inrange(len(ivs), len(axs)): axs[j].axis("off")plt.tight_layout()plt.show()
# Cleaner correlation heatmap using seabornimport seaborn as sns# Pick DV + ~5 IVs (edit if you chose different names)corr_vars = ["imports_pct_gdp","tariff_weighted","ln_gdppc_ppp_const","ln_population","ln_surface_area","internet_users_pct",]corr_p = data[corr_vars].corr(method="pearson")plt.figure(figsize=(7,5))sns.heatmap(corr_p, annot=True, vmin=-1, vmax=1, cmap="vlag", fmt=".2f", linewidths=.5, linecolor="white")plt.title("Pearson Correlation — Trade & Covariates")plt.tight_layout()plt.show()
The scatter plots show how imports as a share of GDP co-move with each predictor in isolation, while the heatmap provides a compact view of linear associations among all variables. Patterns here should inform your expectations about coefficient signs and magnitudes in the regression, but remember that bivariate associations can be misleading when variables are correlated with one another.
5 Simple vs. multiple regression
We now estimate two models. First, a simple regression of imports on tariffs only. Second, a multiple regression that includes the additional controls. Comparing the tariff coefficient across these two models illustrates how multiple regression corrects for confounding influences; large changes indicate that some of the simple relationship was due to other factors (like size or development).
# Simple OLS: only tariff as predictorm_simple = smf.ols( formula="imports_pct_gdp ~ tariff_weighted", data=data).fit()print(m_simple.summary())
5.1 Reading your first regression output (Imports % of GDP ~ Tariff %)
Below is a short guide to the main statistics you see in the statsmodels summary, and how to interpret the tariff coefficient for this simple OLS:
5.1.1R^2 and AdjustedR^2
R^2 is the fraction of the variance of the dependent variable explained by the model: R^2 = 1 - \frac{\text{RSS}}{\text{TSS}}, where RSS is the residual sum of squares and TSS is the total sum of squares. Values are between 0 and 1. Higher is better fit. However, if R^2 is too-high it alsgo signlas problems with the OLS assumptions.
Adjusted R^2 penalizes adding regressors that don’t help (useful for multivariate regressions): \text{Adj }R^2 = 1 - \frac{\text{RSS}/(n-k-1)}{\text{TSS}/(n-1)}. If you add irrelevant variables, Adj R^2 can go down, acting as a simple complexity check. In the first regression there’s only one regressor, so R^2 and Adj R^2 are usually very close.
5.1.2t-tests (individual significance)
Each coefficient \hat\beta_j is tested against 0 with a t-statistic: t_j = \frac{\hat\beta_j-0}{\operatorname{se}(\hat\beta_j)}
a large |t| (e.g., |t|>1.96) and small p-value (e.g., p<0.05) means the variable is individually significant.
For the tariff slope, the t-test asks: is the relationship between tariffs and imports statistically different from zero?
5.1.3F-test (joint significance)
The summary shows an overall F-statistic with Prob (F-statistic). In the simple model it essentially tests whether the slope is zero (same idea as the t-test), but in multiple regression it tests whether all slopes are zero jointly (i.e., the model has explanatory power beyond a constant).
5.1.4AIC and BIC (fit vs. parsimony)
AIC and BIC reward goodness of fit but penalize model size. Lower is better.
BIC penalizes complexity more than AIC, so it favors simpler models when evidence is similar. Use these to compare non-nested specifications (e.g., linear vs. polynomial) estimated on the same dataset.
5.2 Interpreting the tariff coefficient (units and meaning)
Dependent variable:imports_pct_gdp is measured in percentage points of GDP (e.g., 50 means imports are 50% of GDP).
Regressor:tariff_weighted is measured in percentage points (e.g., 5 means a 5% tariff).
So the slope \hat\beta_{\text{tariff}} is interpreted as:
A 1-percentage-point increase in the tariff rate is associated with a \hat\beta_{\text{tariff}} percentage-point change in imports as % of GDP, ceteris paribus.
Examples to anchor the units:
If \hat\beta_{\text{tariff}} = -2.01 and it’s statistically significant, then raising tariffs from 5.28% (the sample mean) to 6.28% is associated with imports/GDP falling by 2.01 percentage points (e.g., from 50.64% to 48.63 of GDP), on average.
If \hat\beta_{\text{tariff}} is close to zero and not significant, your data don’t show a clear linear association in this simple bivariate frame.
Intercept: The constant \hat\beta_0 is the model’s predicted imports_pct_gdp when tariff_weighted = 0. In cross-country data, the intercept is usually just a baseline level, in this case 61.28%; the slope carries the substantive economic interpretation.
OLS Regression Results
==============================================================================
Dep. Variable: imports_pct_gdp R-squared: 0.544
Model: OLS Adj. R-squared: 0.525
Method: Least Squares F-statistic: 27.71
Date: Mon, 18 Aug 2025 Prob (F-statistic): 2.26e-18
Time: 13:21:56 Log-Likelihood: -518.16
No. Observations: 122 AIC: 1048.
Df Residuals: 116 BIC: 1065.
Df Model: 5
Covariance Type: nonrobust
======================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------------
Intercept 173.4888 31.223 5.556 0.000 111.648 235.330
tariff_weighted -2.0940 0.439 -4.770 0.000 -2.963 -1.224
ln_gdppc_ppp_const -2.2177 3.768 -0.589 0.557 -9.680 5.245
ln_population -1.6214 1.405 -1.154 0.251 -4.404 1.161
ln_surface_area -6.2471 1.132 -5.518 0.000 -8.490 -4.005
internet_users_pct 0.1232 0.172 0.716 0.475 -0.217 0.464
==============================================================================
Omnibus: 6.263 Durbin-Watson: 1.849
Prob(Omnibus): 0.044 Jarque-Bera (JB): 5.885
Skew: 0.526 Prob(JB): 0.0527
Kurtosis: 3.225 Cond. No. 1.58e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.58e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
In the multiple model, the tariff coefficient measures the partial association between tariff rates and imports holding development, population, surface area, and internet adoption constant. A negative and statistically significant estimate would suggest that higher tariffs are associated with lower import intensity in the cross‑section. Magnitude matters: if the coefficient is -2.1, then a one‑percentage‑point increase in the tariff rate is linked to 2.1 percentage points lower imports/GDP, ceteris paribus. Notice that the effect of tariffs did not change significantly from the simple to the multivariate model. How would you interpret the remaining coefficients?
6 Estimation with dummy variables and interactions
Countries differ systematically by continent and income group. To capture these differences, we include dummy variables (one‑hot indicators) for continent and income, omitting one category from each set to serve as the baseline. We then add interactions between tariffs and selected dummies to test whether the tariff–imports relationship differs by region or income.
The dummy coefficients reflect level differences relative to the omitted baseline categories (e.g., “High income,” depending on what’s omitted). The interaction coefficients show how the effect of tariffs on imports changes in those categories. For example, a negative and significant inc_High income × tariff term would imply that higher‑income countries experience a more negative tariff–imports association than the baseline income group, after controlling for other factors.
# Drop unknown income groupdata = data[data["wb_income_group"] !="Unknown"]# Interactions: `*` = main effects + interaction with tariffm_interact = smf.ols( formula=("imports_pct_gdp ~ ""ln_population + ln_surface_area + internet_users_pct ""+ tariff_weighted * wb_income_group" ), data=data).fit()print(m_interact.summary())
7 Polynomial regression: allowing curvature in tariffs
A polynomial regression is still linear in the parameters; you simply add higher‑order terms of a predictor to allow curvature. For tariffs, including a squared term lets the marginal effect of tariffs depend on the tariff level itself. Formally,
\text{Imports} = \alpha + \beta_1 \cdot \text{Tariff} + \beta_2 \cdot \text{Tariff}^2 + \gamma^\top Z + u.
The marginal effect of tariffs is then $ , / , = _1 + 2_2 $. If $ _2 $, curvature matters. No special estimator is required, you just add the squared term.
What does the polynomial contribute? If the squared term is significant and the adjusted R^2 improves, the data favor curvature—perhaps small tariff changes matter little at low levels but bite more as tariffs climb. If the squared term is small and insignificant and fit does not improve, keep the linear model for parsimony. The goal is interpretability with enough flexibility to capture important non‑linear patterns.
OLS Regression Results
==============================================================================
Dep. Variable: imports_pct_gdp R-squared: 0.570
Model: OLS Adj. R-squared: 0.546
Method: Least Squares F-statistic: 23.65
Date: Mon, 18 Aug 2025 Prob (F-statistic): 1.23e-17
Time: 13:21:56 Log-Likelihood: -483.14
No. Observations: 114 AIC: 980.3
Df Residuals: 107 BIC: 999.4
Df Model: 6
Covariance Type: nonrobust
===========================================================================================
coef std err t P>|t| [0.025 0.975]
-------------------------------------------------------------------------------------------
Intercept 208.3254 33.660 6.189 0.000 141.599 275.052
tariff_weighted -4.6382 1.370 -3.386 0.001 -7.354 -1.922
I(tariff_weighted ** 2) 0.1360 0.072 1.893 0.061 -0.006 0.278
ln_gdppc_ppp_const -4.7052 3.946 -1.192 0.236 -12.527 3.117
ln_population -1.6004 1.444 -1.108 0.270 -4.463 1.263
ln_surface_area -6.5670 1.165 -5.636 0.000 -8.877 -4.257
internet_users_pct 0.1229 0.176 0.700 0.486 -0.225 0.471
==============================================================================
Omnibus: 4.495 Durbin-Watson: 2.023
Prob(Omnibus): 0.106 Jarque-Bera (JB): 4.073
Skew: 0.457 Prob(JB): 0.130
Kurtosis: 3.147 Cond. No. 2.09e+03
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.09e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
8 Presenting your results
Usually, output generated by programming languages is neither straightforward, nor clear. Therefore, the following code allows you to generate cleaner, nicer looking regression tables.
You now have a complete,cross‑section pipeline on tariffs and trade: descriptive analysis, simple vs. multiple OLS, dummy variables and interactions for income, and a polynomial extension for curvature. In your write‑up, emphasize economic mechanisms and magnitudes, not just statistical significance. In the next workshop we will rigorously check model assumptions and discuss robustness (e.g., heteroskedasticity‑robust standard errors, and alternative outcomes like Trade % GDP and Exports % GDP).
10 References
Cheong, J., Kwak, D. W., & Tang, K. K. (2018). The trade effects of tariffs and non-tariff changes of preferential trade agreements. Economic Modelling, 70, 370–382. https://doi.org/10.1016/j.econmod.2017.08.011
Costinot, A., & Werning, I. (2025). How tariffs affect trade deficits (NBER Working Paper No. 33709). National Bureau of Economic Research. https://www.nber.org/papers/w33709
Freund, C. L., & Weinhold, D. (2004). The effect of the Internet on international trade. Journal of International Economics, 62(1), 171–189. https://doi.org/10.1016/S0022-1996(03)00059-X
Gorman, W. M. (1959). The effect of tariffs on the level and terms of trade. Journal of Political Economy, 67(3), 246–265. https://doi.org/10.1086/258174
Leith, J. C. (1971). The effects of tariffs on production, consumption, and trade: A revised analysis. American Economic Review, 61(1), 74–81. https://www.jstor.org/stable/1910542
Osnago, A., & Tan, S. W. (2016). Disaggregating the impact of the Internet on international trade (Policy Research Working Paper No. 7785). World Bank. https://openknowledge.worldbank.org/bitstreams/0208884c-459b-56b2-af92-a4f029fe0a17/download
Politico. (2025, July 31). Trump issues order imposing new global tariff rates effective Aug. 7. https://www.politico.com/news/2025/07/31/trump-executive-order-higher-tariff-rates-00487913
Reuters. (2025, July 31). Trump gives Mexico 90-day tariff reprieve as deadline for higher duties looms. https://www.reuters.com/world/americas/trump-gives-mexico-90-day-tariff-reprieve-deadline-higher-duties-looms-2025-07-31/
The White House. (2025, April 2). Fact sheet: President Donald J. Trump declares national emergency to increase our competitive edge, protect our sovereignty, and strengthen our national and economic security. https://www.whitehouse.gov/fact-sheets/2025/04/fact-sheet-president-donald-j-trump-declares-national-emergency-to-increase-our-competitive-edge-protect-our-sovereignty-and-strengthen-our-national-and-economic-security/
The White House. (2025, July 31). Further modifying the reciprocal tariff rates [Executive order]. https://www.whitehouse.gov/presidential-actions/2025/07/further-modifying-the-reciprocal-tariff-rates/