For this lab, you’ll be working with a group of other classmates, and each group will be assigned a lab from a previous week. Your goal is to critique the models (or analyses) present in the lab.
First, review the materials from the Lesson on Ethics and Epistemology (week 5?). This includes lecture slides, the lecture video, or the reading. You can use these as reference materials for this lab. You may even consider the reading for the week associated with the lab, or even supplementary research on the topic at hand (e.g., news outlets, historical articles, etc.).
For the lab your group has been assigned, consider issues with models, interpretations, analyses, visualizations, etc. Use this notebook as a sandbox for trying out different code, and investigating the data from a different perspective. Take notes on all the issues you see, and possible solutions (even if you would need to request more data or resources to accomplish those solutions).
Share your model critique in this notebook as your data dive submission for the week.
As a start, think about the context of the lab and consider the following:
Analytical issues, such as model assumptions
Overcoming biases (existing or potential)
Possible risks or societal implications
Crucial issues which might not be measurable
Treat this exercise as if the analyses in your assigned lab
(i.e., the one you are critiquing) were to be published, made available
to the public in a press release, or used at some large company (e.g.,
for mpg data, imagine if Toyota used the conclusions to
drive strategic decisions).
# your code here
If you were unable to attend class, select a
notes_*.Rmdfile from a previous week (not including weeks 1 or 3), and complete the analysis above. Share your critique below.
As I was unable to attend this lecture, I picked Week 7: Hypothesis Testing.
Null hypothesis could be better explained using a complete explanation of Fisher’s significance testing first and then followed by a complete explanation of the Neyman-Pearson testing. Clearer instructions can be provided to carry out both tests and what cases are needed to carry out the tests. The only instructions mentioned are:
In this subsection, we expound on the Neyman-Pearson testing paradigm. Again, this method is notpreferred, but it is still sometimes used in industry, especially when sample sizes are low.
This could be detailed further by adding hard limits on data size and expanding further on what industry standards/requirements allow for this test
The testing process could use more information on selecting p-value threshold and alpha value thresholds. It mentions to not choose an arbitrary number such as 0.05 but fails to expand what is needed for a better analysis. The Beta value or False Positive Rate does not have content further explaining the true importance of why this number should be low or high. The section mentions:
Devise a null hypothesis.
Infer the distribution of the random variable represented in the null hypothesis.
Choose a practical and informed False Negative Rate, called the "α�-level". Let this inform a "critical value" based on whether you're using a two-tailed or one-tailed test.
If we end up rejecting the null hypothesis, this is the theoretical probability we would have sampled our data given that it was actually "true".
Your null hypothesis should be associated with a difference in parameter values, (e.g., p1−p2�1−�2 or μ−0�−0). Decide on the least extreme but most practical difference worth measuring, and call this ΔΔ.
This value can be informed by an effect size calculation or conversations with co-workers.
E.g., "we don't need to change our pricing plan until the CTR increases by 0.01, so we'll choose Δ=0.01Δ=0.01".
Choose a practical and informed False Positive Rate (β�).
- If the difference ΔΔ is true, what is the likelihood of detecting it with this test?
Calculate an appropriate sample size using the above measures, and ensure you have enough data before moving forward.
Run the hypothesis test, and draw conclusions.
The best definition of p-value is explained as:
Assuming the null hypothesis is true, the p-value is the theoretical probability of sampling data with a statistic more extreme than the one calculated.
While this makes sense, the data visualizations before this could have further explanation on what the structure further means. This follows the definition:
So, we say "assuming the display ad has no effect on revenue, then 17 of 100 samples this large would yield a difference in revenue of 93.6 or more."
In other words (assuming there is no effect), would you be surprised if 17% of all (theoretical) samples reflect a difference of 93.6 or more? This is where you use your sample size, judgement, and discussions to decide if this is sufficient evidence to reject the null hypothesis, and claim that there is a difference.
This does not explain well what we accomplished in this Lab by calculating the p-value. If an example that followed this section existed as opposed to moving on to AB Testing, we could gain further examples and learn a better definition on best practices on carrying the same out.