Model Critique
For this lab, you’ll be working with a group of other classmates, and
each group will be assigned to critique a lab from a previous week.
Your group will have three goals:
- Create an explicit business scenario which might leverage the data
(and methods) used in the lab.
- Critique the models (or analyses) present in the lab based on this
scenario.
- Devise a list of ethical and epistemological concerns that might
pertain to this lab in the context of your business scenario.
Goal 1: Business Scenario
First, create your own context for the lab. This should be a business
use-case such as “a real estate firm aims to present housing trends
(and recommendations) for their clients in Ames, Iowa”.
You do not need to solve the problem, you only need to define
it.
Your scenario should include the following:
- Customer or Audience: who exactly will use your
results?
- Problem Statement: identify a business need or a
possible customer request. This should be actionable, in that it should
call for an action taken.
- E.g., the statement “we need to analyze sales data” is not a good
problem statement, but “the company needs to know if they should stop
selling product A” is better.
- Scope: What variables from the data (in the lab)
can address the issue presented in your problem statement? What analyses
would you use? You’ll need to define any assumptions you feel need to be
made before you move forward.
- If you feel the content in the lab cannot sufficiently address the
issue, try to devise a more applicable problem statement.
- Objective: Define your success criteria. In other
words, suppose you started working on this problem in earnest; how will
you know when you are done? For example, you might want to “identify the
factors that most influence
<some variable>.”
- Note: words like “identify”, “maximize”, “determine”, etc.
could be useful here. Feel free to find the right action verbs that work
for you!
Goal 2: Model Critique
Since this is a class, and not a workplace, we need to be careful not
to present too much information to you all at once. For this reason, our
labs are often not as analytically rigorous or thorough as they might be
in practice … So here, your goal is to:
Present a list of at least 3 (improved) analyses you would
recommend for your business scenario. Each proposed analysis
should be accompanied by a “proof of concept” R implementation. (As
usual, execute R code blocks here in the RMarkdown
file.)
In the lab your group has been assigned, consider issues with models,
statistical improvements, interpretations, analyses, visualizations,
etc. Use this notebook as a sandbox for trying out different
code, and investigating the data from a different perspective. Take
notes on all the issues you see, and propose your solutions (even if you
might need to request more data or resources to accomplish those
solutions).
You’ll want to consider the following:
- Analytical issues, such as the current model assumptions.
- Issues with the data itself.
- Statistical improvements; what do we know now that we didn’t know
(or at least didn’t use) then? Are there other methods that would be
appropriate?
- Are there better visualizations which could have been used?
Feel free to use the reading for the week associated with your
assigned lab to help refresh your memory on the concepts presented.
Goal 3: Ethical and Epistemological Concerns
Review the materials from the Week 5 lesson on Ethics and
Epistemology. This includes lecture slides, the lecture video, or the
reading. You should also consider doing supplementary research on the
topic at hand (e.g., news outlets, historical articles, etc.). Some
issues you might want to consider include:
- Overcoming biases (existing or potential).
- Possible risks or societal implications.
- Crucial issues which might not be measurable.
- Who would be affected by this project, and how does that affect your
critique?
Example
For example, in Week 10-11, we used the year built, square footage,
elevation, and the number of bedrooms to determine the price of an
apartment. A few questions you might ask are:
- Is this a “good” selection of variables? What could we be missing,
or are there potential biases inherent in the groups of apartments
here?
- Nowhere in the lab do we investigate the assumptions of a linear
model. Is the relationship between the response (i.e., \(\log(\text{price})\)) and each of these
variables linear? Are the error terms evenly distributed?
- Is it possible that our conclusions are more appropriate for some
group(s) of the data and not others?
- What if assumptions are not met? What could happen to this model if
it were deployed on a platform like Zillow?
- Consider different evaluation metrics between models. What is a
practical use for these values?
Share your model critique in this notebook as your data dive
submission for the week. Make sure to include your own R code
which executes suggested routines.