Review correlation and ANOVA

CD2010 Introduction to Econometrics

Author

Affiliation

Sergio Castellanos-Gamboa, PhD

Tecnológico de Monterrey

Published

August 12, 2025

0.1 Before You Begin: Important Instructions for All Workshops

Welcome to our workshop series! Please read these instructions carefully before starting any activity. Following these guidelines will make your work smoother and ensure that your submissions are graded without issues.

0.1.1 Working Environment

We will use Google Colab for all workshops. Colab runs Python in the cloud — you don’t need to install anything locally.

Access Colab at: https://colab.research.google.com/
Sign in with your institutional Google account for access to all features.
Always save a copy of the notebook to your Google Drive:
- Go to File → Save a copy in Drive.

0.1.2 Loading Data

You may work with datasets provided by the instructor or public datasets online. You will receive instructions each time to load the data with Python code. However, it is a good idea to store files, like data or your own notes, in a dedicated Google Drive folder:

Create a folder in your Google Drive named econ_workshops (or similar).
Upload your datasets there.

0.1.3 Output and Submission Format

After completing the workshop, export your notebook as PDF:
- In Colab: File → Print → Save as PDF.
Only submit the PDF file through Canvas. Do not submit .ipynb or .py files unless explicitly requested.
Include all outputs, tables, and graphs in your PDF — make sure you run all cells before exporting.

0.1.4 Naming Convention

Name your PDF file using the following format: Lastname_Firstname_WorkshopX.pdf

Example:

0.1.5 Deadlines

All assignments must be uploaded to Canvas before the stated deadline. Late submissions are not accepted.

Once you have read and understood these instructions, you are ready to begin the workshop!

1 Introduction

In this lecture, we will review two key statistical tools:

Correlation analysis — measures the strength and direction of the relationship between two continuous variables.
ANOVA (Analysis of Variance) — tests whether the means of different groups are significantly different.

We will work through both theoretical concepts and practical Python examples using simulated datasets.
Our examples will involve three types of companies: - SMEs - Startups - Big Companies

2 Correlation Analysis

2.1 Concept

The Pearson correlation coefficient measures the linear relationship between two variables (X) and (Y):

r = \frac{\sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{n} (X_i - \bar{X})^2} \cdot \sqrt{\sum_{i=1}^{n} (Y_i - \bar{Y})^2}}

(r) ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation).
(r=0) indicates no linear correlation.
Pearson targets linear relationships; Spearman is rank-based and robust to outliers/monotone trends.

2.2 Example: Simulated Data

We will simulate: 1. A correlated dataset (high (r)) 2. An uncorrelated dataset (low (r))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

np.random.seed(42)

# Simulate correlated data
n = 500
x_corr = np.random.normal(50, 10, n)
y_corr = x_corr * 1.5 + np.random.normal(0, 5, n)

# Simulate uncorrelated data
x_uncorr = np.random.normal(50, 10, n)
y_uncorr = np.random.normal(50, 10, n)

df_corr = pd.DataFrame({'X': x_corr, 'Y': y_corr})
df_uncorr = pd.DataFrame({'X': x_uncorr, 'Y': y_uncorr})

# Correlation coefficients
corr_val = df_corr.corr().iloc[0,1]
uncorr_val = df_uncorr.corr().iloc[0,1]

print(f"Correlation (correlated data): {corr_val:.3f}")
print(f"Correlation (uncorrelated data): {uncorr_val:.3f}")

# Plot
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
sns.regplot(x='X', y='Y', data=df_corr, ax=axs[0])
axs[0].set_title(f"Correlated Data (r={corr_val:.2f})")
sns.regplot(x='X', y='Y', data=df_uncorr, ax=axs[1])
axs[1].set_title(f"Uncorrelated Data (r={uncorr_val:.2f})")
plt.show()

Correlation (correlated data): 0.947
Correlation (uncorrelated data): -0.022

3 Overview

This workshop uses the ISLR Carseats dataset (downloaded online via statsmodels.get_rdataset) to practice:

Correlation between business variables (e.g., Sales, Price, Advertising, Income).
Two-way plots (scatter, pair plots) and box plots.
ANOVA (one-way and two-way with interaction) using categorical marketing/retail factors (ShelveLoc, Urban, US).
A short preview of regression models motivated by the ANOVA results, including AIC comparison and robust standard errors.

Business context. Sales is store-level unit sales. We’ll ask: Do mean sales differ across store characteristics (e.g., shelf location quality)? and What continuous predictors co-move with Sales?

This workshop bridges statistical analysis with real business decision-making. The Carseats dataset represents sales and characteristics of retail outlets. It is a strong example for applied econometrics because:

Variables cover pricing, promotion, demographics, and store characteristics.
We can illustrate relationships (correlation), differences between groups (ANOVA), and predictive modeling (regression).

Why ANOVA in business?

ANOVA lets us test if average outcomes (e.g., sales) differ significantly between groups, such as stores with good vs poor shelf location, or urban vs rural markets. This is essential for deciding where to invest, how to segment customers, or which store attributes matter most.

4 Setup and Data

We pull the dataset directly from the R ISLR package using statsmodels.api.datasets.get_rdataset. This works in Colab (internet required).

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Display / plotting style
pd.set_option("display.precision", 3)
sns.set_context("notebook")

# Load Carseats from ISLR via statsmodels (downloads from online R datasets repo)
carseats = sm.datasets.get_rdataset("Carseats", package="ISLR", cache=True).data.copy()

# Inspect
carseats.head()

	Sales	CompPrice	Income	Advertising	Population	Price	ShelveLoc	Age	Education	Urban	US
0	9.50	138	73	11	276	120	Bad	42	17	Yes	Yes
1	11.22	111	48	16	260	83	Good	65	10	Yes	Yes
2	10.06	113	35	10	269	80	Medium	59	12	Yes	Yes
3	7.40	117	100	4	466	97	Medium	55	14	Yes	Yes
4	4.15	141	64	3	340	128	Bad	38	13	Yes	No

4.0.1 Variable notes (selected)

Sales: Unit sales (response).
CompPrice: Price charged by competitor in the region.
Income: Community income level.
Advertising: Local advertising budget.
Price: Price at each location.
ShelveLoc: Factor with levels Bad/Medium/Good (quality of shelf location).
Age: Average age of the local population.
Education: Education level.
Urban: Yes/No.
US: Yes/No.

We’ll create a Sales log variable (optional, useful for variance stabilization) and a simple margin proxy to connect with business intuition.

carseats["log_Sales"] = np.log(carseats["Sales"] + 1e-6)
# Simple price margin proxy: competitor price minus our price (higher may indicate room to price higher)
carseats["margin_proxy"] = carseats["CompPrice"] - carseats["Price"]
carseats.describe()

	Sales	CompPrice	Income	Advertising	Population	Price	Age	Education	log_Sales	margin_proxy
count	400.000	400.000	400.000	400.000	400.000	400.000	400.000	400.000	400.000	400.000
mean	7.496	124.975	68.657	6.635	264.840	115.795	53.322	13.900	1.887	9.180
std	2.824	15.335	27.986	6.650	147.376	23.677	16.200	2.621	0.927	19.263
min	0.000	77.000	21.000	0.000	10.000	24.000	25.000	10.000	-13.816	-46.000
25%	5.390	115.000	42.750	0.000	139.000	100.000	39.750	12.000	1.685	-4.000
50%	7.490	125.000	69.000	5.000	272.000	117.000	54.500	14.000	2.014	9.000
75%	9.320	135.000	91.000	12.000	398.500	131.000	66.000	16.000	2.232	21.250
max	16.270	175.000	120.000	29.000	509.000	191.000	80.000	18.000	2.789	57.000

5 Correlation matrices (business metrics)

What correlation tells us:
Correlation measures how two variables move together.

Positive correlation: As one increases, the other tends to increase.
Negative correlation: As one increases, the other tends to decrease.
Zero correlation: No linear relationship.

Pearson vs Spearman:

Pearson is for linear relationships with continuous variables.
Spearman is rank-based, works well when data has outliers or a non-linear but monotonic pattern.

num_cols = ["Sales","Price","CompPrice","Income","Advertising","Age","Education","margin_proxy"]
corr_p = carseats[num_cols].corr(method="pearson")
corr_s = carseats[num_cols].corr(method="spearman")
print("Pearson correlation:\n", corr_p, "\n")
print("Spearman correlation:\n", corr_s)

Pearson correlation:
               Sales  Price  CompPrice  Income  Advertising    Age  Education  \
Sales         1.000 -0.445      0.064   0.152        0.270 -0.232     -0.052   
Price        -0.445  1.000      0.585  -0.057        0.045 -0.102      0.012   
CompPrice     0.064  0.585      1.000  -0.081       -0.024 -0.100      0.025   
Income        0.152 -0.057     -0.081   1.000        0.059 -0.005     -0.057   
Advertising   0.270  0.045     -0.024   0.059        1.000 -0.005     -0.034   
Age          -0.232 -0.102     -0.100  -0.005       -0.005  1.000      0.006   
Education    -0.052  0.012      0.025  -0.057       -0.034  0.006      1.000   
margin_proxy  0.598 -0.764      0.077   0.005       -0.074  0.046      0.006   

              margin_proxy  
Sales                0.598  
Price               -0.764  
CompPrice            0.077  
Income               0.005  
Advertising         -0.074  
Age                  0.046  
Education            0.006  
margin_proxy         1.000   

Spearman correlation:
               Sales  Price  CompPrice  Income  Advertising    Age  Education  \
Sales         1.000 -0.408      0.068   0.155        0.275 -0.236     -0.034   
Price        -0.408  1.000      0.542  -0.049        0.036 -0.118      0.024   
CompPrice     0.068  0.542      1.000  -0.066       -0.029 -0.109      0.021   
Income        0.155 -0.049     -0.066   1.000        0.058  0.003     -0.061   
Advertising   0.275  0.036     -0.029   0.058        1.000  0.005     -0.046   
Age          -0.236 -0.118     -0.109   0.003        0.005  1.000      0.004   
Education    -0.034  0.024      0.021  -0.061       -0.046  0.004      1.000   
margin_proxy  0.566 -0.749      0.092   0.007       -0.089  0.052     -0.005   

              margin_proxy  
Sales                0.566  
Price               -0.749  
CompPrice            0.092  
Income               0.007  
Advertising         -0.089  
Age                  0.052  
Education           -0.005  
margin_proxy         1.000

After viewing the matrix:

Focus on:

Sales and Price: Strong negative correlation? That suggests higher prices lower sales.
Sales and Advertising: Positive correlation? Higher advertising might be linked to higher sales.
margin_proxy: Can hint at competitive positioning.

5.1 Heatmap

plt.figure(figsize=(7,5))
sns.heatmap(corr_p, annot=True, vmin=-1, vmax=1, cmap="vlag")
plt.title("Pearson Correlation — Carseats Business Metrics")
plt.tight_layout()
plt.show()

Interpretation prompt. How do Sales co-move with Price and Advertising? What does margin_proxy suggest?

How to read the heatmap:

Color scale: Blue for negative, red for positive (in vlag palette).
Diagonal: Always 1.0 — variable with itself.
Strong correlation: > |0.7| is strong, 0.3–0.7 is moderate.

6 2. Two-Way Plots

Two-way plots help diagnose linearity, clusters, and heteroskedasticity.

# Sales vs Price
sns.jointplot(data=carseats, x="Price", y="Sales", kind="reg", height=5)
plt.show()

Scatter with regression line:
The slope shows the direction of the relationship. If the spread of points widens with higher values, that’s heteroskedasticity.

# Pair plot of selected variables
sns.pairplot(
    carseats[["Sales","Price","CompPrice","Income","Advertising","margin_proxy"]],
    kind="reg", diag_kind="hist", corner=True
)
plt.show()

Pair plot:

Check if clouds of points form linear patterns or curves.
Look for outliers — extreme points may distort correlation.
Detect clusters — groups of stores that behave differently.

7 3. Box Plots (Group Comparisons)

We compare Sales across ShelveLoc (Bad/Medium/Good) and across Urban/US segments.

What you’re seeing: - Box: Middle 50% of the data (interquartile range). - Line inside box: Median value. - Whiskers: Range without outliers. - Dots beyond whiskers: Outliers.

plt.figure(figsize=(7,4))
sns.boxplot(data=carseats, x="ShelveLoc", y="Sales")
sns.stripplot(data=carseats, x="ShelveLoc", y="Sales", size=3, alpha=0.4, color="k")
plt.title("Sales by Shelf Location Quality")
plt.show()

plt.figure(figsize=(7,4))
sns.boxplot(data=carseats, x="Urban", y="Sales")
sns.stripplot(data=carseats, x="Urban", y="Sales", size=3, alpha=0.4, color="k")
plt.title("Sales by Urban Segment")
plt.show()

plt.figure(figsize=(7,4))
sns.boxplot(data=carseats, x="US", y="Sales")
sns.stripplot(data=carseats, x="US", y="Sales", size=3, alpha=0.4, color="k")
plt.title("Sales by US vs non-US")
plt.show()

Business question. Does shelf location quality drive higher average sales? Are there urban or US effects that matter?

Interpretation examples:

ShelveLoc: If “Good” is clearly higher median than “Bad”, this suggests shelf placement is a key driver of sales.
Urban: If medians are similar, location type might not be a strong differentiator.
US: Differences could imply market-level conditions.

Managerial link:
Insights here can guide store design, merchandising, and market targeting.

8 ANOVA

8.1 One-Way ANOVA: Sales ~ ShelveLoc

Tests if the mean of a continuous variable differs across k groups.
- Null hypothesis: All group means are equal.

Alternative: At least one group differs.

F-statistic: Ratio of variation between groups to variation within groups.

Large F and small p-value → reject null.

η² (eta-squared): Proportion of variance explained by the grouping variable.

0.01 = small, 0.06 = medium, 0.14 = large (Cohen’s guideline).

Diagnostics:

QQ plot: If points follow the line, residuals are roughly normal.
Levene’s test: Checks equal variances across groups. If p < 0.05, variances differ.

Tukey HSD:

Post-hoc test after significant ANOVA.
Compares each pair of groups and adjusts for multiple comparisons.

m1 = ols("Sales ~ C(ShelveLoc)", data=carseats).fit()
anova_1 = anova_lm(m1, typ=2)
anova_1

	sum_sq	df	F	PR(>F)
C(ShelveLoc)	1009.531	2.0	92.23	1.267e-33
Residual	2172.744	397.0	NaN	NaN

8.1.1 Effect size (η²)

ss_effect = anova_1.loc["C(ShelveLoc)", "sum_sq"]
ss_total  = ss_effect + anova_1.loc["Residual", "sum_sq"]
eta_sq = ss_effect / ss_total
print(f"Eta-squared (η²) for ShelveLoc: {eta_sq:.3f}")

Eta-squared (η²) for ShelveLoc: 0.317

8.1.2 Diagnostics

# Normality (QQ plot) and variance homogeneity (Levene)
sm.qqplot(m1.resid, line="45", fit=True)
plt.title("QQ Plot — ANOVA Residuals (Sales ~ ShelveLoc)")
plt.show()

groups = [g["Sales"].values for _, g in carseats.groupby("ShelveLoc")]
W, p = stats.levene(*groups, center="median")
print(f"Levene's test for equal variances: W={W:.3f}, p={p:.3f}")

Levene's test for equal variances: W=0.809, p=0.446

8.1.3 Post-hoc comparisons (Tukey HSD)

from statsmodels.stats.multicomp import pairwise_tukeyhsd
tukey = pairwise_tukeyhsd(endog=carseats["Sales"], groups=carseats["ShelveLoc"], alpha=0.05)
print(tukey.summary())

Multiple Comparison of Means - Tukey HSD, FWER=0.05
===================================================
group1 group2 meandiff p-adj  lower   upper  reject
---------------------------------------------------
   Bad   Good   4.6911   0.0  3.8714  5.5108   True
   Bad Medium   1.7837   0.0    1.11  2.4573   True
  Good Medium  -2.9074   0.0 -3.6107 -2.2041   True
---------------------------------------------------

8.2 Two-Way ANOVA: Sales ~ ShelveLoc * Urban

Two categorical variables + their interaction.

Main effect: Impact of one factor ignoring the other.
Interaction: When the effect of one factor changes depending on the level of the other.

Interaction plot:

Parallel lines → no interaction.
Crossing lines → strong interaction.

Business application: If shelf location matters more in urban stores than rural, marketing strategy should be location-specific.

m2 = ols("Sales ~ C(ShelveLoc) * C(Urban)", data=carseats).fit()
anova_2 = anova_lm(m2, typ=2)
anova_2

	sum_sq	df	F	PR(>F)
C(ShelveLoc)	1010.472	2.0	91.778	1.900e-33
C(Urban)	1.698	1.0	0.308	5.790e-01
C(ShelveLoc):C(Urban)	2.079	2.0	0.189	8.280e-01
Residual	2168.967	394.0	NaN	NaN

8.2.1 Interaction plot

means = carseats.groupby(["ShelveLoc","Urban"])["Sales"].mean().reset_index()
plt.figure(figsize=(7,4))
for u in means["Urban"].unique():
    s = means[means["Urban"]==u]
    plt.plot(s["ShelveLoc"], s["Sales"], marker="o", label=f"Urban={u}")
plt.title("Interaction: Sales by ShelveLoc × Urban")
plt.xlabel("ShelveLoc")
plt.ylabel("Mean Sales")
plt.legend()
plt.show()

Interpretation prompt. If the interaction is significant, how would you tailor merchandising or store layout by ShelveLoc × Urban segment?

9 From ANOVA to Regression (Preview)

ANOVA with categorical factors is a special case of linear regression with dummy variables. We now include continuous controls used by retail/marketing teams.

Why regression?
ANOVA is a special case of regression where predictors are only categorical.

Adding continuous predictors lets us:

Control for other factors.
Estimate marginal effects.

Interpreting coefficients:

For categorical variables, coefficients are relative to a reference category.
For continuous variables, coefficients are the change in outcome per unit change.

AIC:

Lower AIC = better balance of fit and simplicity.

Be cautious: a very low AIC in a complex model may overfit.

Robust SEs:

Protect against heteroskedasticity.
Can change which variables are significant

9.1 Model set

# Univariate (ShelveLoc only)
lm1 = ols("Sales ~ C(ShelveLoc)", data=carseats).fit()

# Add main demographics
lm2 = ols("Sales ~ C(ShelveLoc) + C(Urban) + C(US)", data=carseats).fit()

# Interaction between ShelveLoc and Urban
lm3 = ols("Sales ~ C(ShelveLoc)*C(Urban)", data=carseats).fit()

# Add continuous marketing/econ controls
lm4 = ols("Sales ~ C(ShelveLoc)*C(Urban) + Price + CompPrice + Income + Advertising + Age + Education + margin_proxy", data=carseats).fit()

summary1, summary2, summary3, summary4 = lm1.summary(), lm2.summary(), lm3.summary(), lm4.summary()
print(summary4)

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Sales   R-squared:                       0.873
Model:                            OLS   Adj. R-squared:                  0.869
Method:                 Least Squares   F-statistic:                     242.1
Date:                Tue, 12 Aug 2025   Prob (F-statistic):          3.75e-166
Time:                        13:04:59   Log-Likelihood:                -569.87
No. Observations:                 400   AIC:                             1164.
Df Residuals:                     388   BIC:                             1212.
Df Model:                          11                                         
Covariance Type:            nonrobust                                         
==========================================================================================================
                                             coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------------------
Intercept                                  5.8024      0.591      9.812      0.000       4.640       6.965
C(ShelveLoc)[T.Good]                       4.6908      0.292     16.083      0.000       4.117       5.264
C(ShelveLoc)[T.Medium]                     1.8360      0.252      7.299      0.000       1.341       2.331
C(Urban)[T.Yes]                           -0.0239      0.249     -0.096      0.924      -0.514       0.466
C(ShelveLoc)[T.Good]:C(Urban)[T.Yes]       0.2067      0.343      0.603      0.547      -0.467       0.881
C(ShelveLoc)[T.Medium]:C(Urban)[T.Yes]     0.1643      0.290      0.566      0.572      -0.406       0.735
Price                                     -0.0327      0.001    -21.859      0.000      -0.036      -0.030
CompPrice                                  0.0299      0.002     12.647      0.000       0.025       0.035
Income                                     0.0157      0.002      8.482      0.000       0.012       0.019
Advertising                                0.1154      0.008     14.880      0.000       0.100       0.131
Age                                       -0.0462      0.003    -14.487      0.000      -0.052      -0.040
Education                                 -0.0200      0.020     -1.021      0.308      -0.059       0.019
margin_proxy                               0.0626      0.002     30.822      0.000       0.059       0.067
==============================================================================
Omnibus:                        1.001   Durbin-Watson:                   2.006
Prob(Omnibus):                  0.606   Jarque-Bera (JB):                0.955
Skew:                           0.120   Prob(JB):                        0.620
Kurtosis:                       2.992   Cond. No.                     4.04e+16
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 9.17e-27. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

9.2 Model comparison (AIC)

aic_table = pd.DataFrame({
    "model": ["lm1: ShelveLoc", "lm2: +Urban+US", "lm3: +Interaction", "lm4: +Controls"],
    "AIC":   [lm1.aic, lm2.aic, lm3.aic, lm4.aic]
}).sort_values("AIC")
aic_table

	model	AIC
3	lm4: +Controls	1163.736
1	lm2: +Urban+US	1809.466
0	lm1: ShelveLoc	1818.063
2	lm3: +Interaction	1823.368

9.3 Robust standard errors (HC1)

lm4_robust = lm4.get_robustcov_results(cov_type="HC1")
print(lm4_robust.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  Sales   R-squared:                       0.873
Model:                            OLS   Adj. R-squared:                  0.869
Method:                 Least Squares   F-statistic:                     269.1
Date:                Tue, 12 Aug 2025   Prob (F-statistic):          6.15e-174
Time:                        13:04:59   Log-Likelihood:                -569.87
No. Observations:                 400   AIC:                             1164.
Df Residuals:                     388   BIC:                             1212.
Df Model:                          11                                         
Covariance Type:                  HC1                                         
==========================================================================================================
                                             coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------------------------
Intercept                                  5.8024      0.629      9.223      0.000       4.565       7.039
C(ShelveLoc)[T.Good]                       4.6908      0.302     15.558      0.000       4.098       5.284
C(ShelveLoc)[T.Medium]                     1.8360      0.266      6.893      0.000       1.312       2.360
C(Urban)[T.Yes]                           -0.0239      0.268     -0.089      0.929      -0.551       0.503
C(ShelveLoc)[T.Good]:C(Urban)[T.Yes]       0.2067      0.355      0.583      0.560      -0.490       0.904
C(ShelveLoc)[T.Medium]:C(Urban)[T.Yes]     0.1643      0.303      0.543      0.588      -0.431       0.760
Price                                     -0.0327      0.002    -21.041      0.000      -0.036      -0.030
CompPrice                                  0.0299      0.002     13.144      0.000       0.025       0.034
Income                                     0.0157      0.002      8.321      0.000       0.012       0.019
Advertising                                0.1154      0.007     15.551      0.000       0.101       0.130
Age                                       -0.0462      0.003    -13.811      0.000      -0.053      -0.040
Education                                 -0.0200      0.020     -0.997      0.319      -0.059       0.019
margin_proxy                               0.0626      0.002     32.984      0.000       0.059       0.066
==============================================================================
Omnibus:                        1.001   Durbin-Watson:                   2.006
Prob(Omnibus):                  0.606   Jarque-Bera (JB):                0.955
Skew:                           0.120   Prob(JB):                        0.620
Kurtosis:                       2.992   Cond. No.                     4.04e+16
==============================================================================

Notes:
[1] Standard Errors are heteroscedasticity robust (HC1)
[2] The smallest eigenvalue is 9.17e-27. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

C:\Users\L03544739\AppData\Local\Programs\Python\Python313\Lib\site-packages\statsmodels\base\model.py:1894: ValueWarning: covariance of constraints does not have full rank. The number of constraints is 12, but rank is 11
  warnings.warn('covariance of constraints does not have full '

Reporting tip. Present the preferred model’s coefficients with robust SEs and interpret the magnitude and sign of key effects (e.g., the expected change in Sales for Good vs Bad shelf location, holding other factors constant).