Workshop 3, Stats for AI

Author

Alberto Dorantes

Published

August 19, 2025

Abstract
This is an INDIVIDUAL workshop. In this workshop we practice with excercises related to the Introduction to Linear Regression Models.

Create a new Google Colab Notebook for this Workshop and share (with Edit priviledges) it with me (cdorante@tec.mx).

For this Workshop, you have to submit the Jupyter file (.ipynb) instead of the Colab Link. You can download directly it from Colab.

The material related to this workshop is covered in Chapters 6 and 7 of my ebook. You can find my ebook at:

https://www.apradie.com/StatsBAbook/

1 CHALLENGE 1

You have to download monthly stock prices (from Yahoo Finance) from Jan 2021 to July 2023 for:

  1. The Mexican IPC market index (ticker = “^MXX”)

  2. The Walmart company (ticker = “WMT.MX”)

  3. The Alfa company (ticker = “ALFAA.MX”)

You have to:

  1. Calculate the descriptive statistics for the 3 variables (find the right data transformation) to show how a good overview of these variables

  2. Calculate the covariance and correlation between: a) The IPC return vs Walmart return, and b) the Walmart return vs the Alfa company. Explain how you calculated your results, and interpret the correlations.

  3. What you can say about the statistical significance of the previous 2 correlations? Calculate the corresponding p-values and Explain with YOUR OWN WORDS.

2 CHALLENGE 2

Run a regression model to examine whether the Alfa return is related to the IPC return. You want to see how sensible is Alfa return with respect to the IPC return. Decide which can be the dependent variable and the independent variable, run the model and interpret the regression coefficients with your OWN WORDS.

3 CHALLENGE 3

Download the following dataset:

https://www.apradie.com/datos/SALES1.xlsx

It is an Excel file with monthly historical sales of 2 chocolate products in a Wholesaler with more than 150 stores in all Mexican states. You have to do the following analysis:

  1. Show important descriptive statistics about these 2 products that give you a good overview of sales and price performance over time. You have to decide what type of data wrangling and/or transformations you need to do, and what type of descriptive statistics can be relevant in this context. Think that you are responsible for the sales of these products at a national level, and you have to present your analysis to your boss.

  2. For each product, design a regression model to estimate their direct price elasticity. Price elasticity refers to how much sensible is a consumer product to price changes. In other words, on average what is the percentage change of volume sold with respect to percentage change in price. You have to:

  1. Decide the data wrangling/transformations and decide the dependent and the independent(s) variable(s) for each model,

  2. For each product run a regression model to estimate the price elasticity, and interpret the model WITH YOUR OWN WORDS.

  3. Besides changes in price, what other variables can you consider that might influence changes in sales volume? Just explain your ideas about this with your OWN WORDS.