Project 1, Stats for AI
1 Objective
You have historical monthly data of all types of candies sold in all modern retail stores around the country.
The objectives of your analysis are the following:
- Provide understanding of the candy category by manufacturer (fab) in terms of a) market share composition by year, and b) sales and price performance over time. Identify the top fabs that make 80% of the market share for the last 12 months. Show relevant descriptive statistics according to your analysis and assumptions.
Present your analysis in tables and selected graphs to provide a good understanding of the data.
- Design and run regression model(s) to estimate price sensitivity (direct price elasticity) by manufacturer (by fabs, not by product).
You have to present the regression output for the top fab (the fab with the highest market share), and provide a clear interpretation for the regression output: parameter estimates (beta coefficients), t-Statistics, pvalues, 95% confidence intervals, and R-square.
For the rest of the fabs, present a summary of your results in terms of price sensitivity and their corresponding statistical significance. You can use tables and relevant graphs to present your results. You have to explain your summary.
- Considering ONLY the top 2 fabs, propose how you could improve your model(s) design to incorporate the following factors: cross-price elasticity, seasonality and growth trend.
For each fab, consider the cross-elasticity with the other fab price.
Run both models, show the regression output and interpret the outputs
In addition, respond to the following questions:
Did the estimates of direct price elasticities change compared with the previous models? If so, how much the magnitude change and statistical significance change? Why do you think the estimates change, and which estimate is the best? Explain clearly with your own words.
What can you say about the level of competition between the top 2 fabs?
- Propose how you could forecast price and sales by product and by fab for the next 24 months. Which types of models you can use to do this. You have to do your own research. Consider not only regression models, but any type of machine learning/ AI models. (Extra points if you implement one model to do this forecast by fab).
2 PART 1 - Data wrangling and data understanding
You have to download (from Canvas) the following files:
DATOS_VENTA_2024.xlsx: historical monthly data for all products sold for the last 4 years
DATA_CATALOG_2024.xlsx: Product catalog (includes product description, fab, subcategory, etc)
You have to:
Propose and implement efficient data management process(es) to prepare the data for descriptive statistics and modelling. You have to propose not only data cleanning and variable transformations, but also data struecture/shaping. Remember dataset structure types (long-format, wide-format).
Do the corresponding data analysis to respond to the objective 1)
Run and interpret the regression model for the top fab
3 PART 2 - Data modelling
You have to design and run models to respond to the objective 2) and 3)
In addition, you have to respond to objective 4).
Go to Canvas and check DEADLINES for each part