This is Quiz 3 from the Data Scientist’s toolbox course within the Data Science Specialization. This publication is intended as a learning resource, all answers are documented and explained.
1. We take a random sample of individuals in a population and identify whether they smoke and if they have cancer. We observe that there is a strong relationship between whether a person in the sample smoked or not and whether they have lung cancer. We claim that the smoking is related to lung cancer in the larger population. We explain we think that the reason for this relationship is because cigarette smoke contains known carcinogens such as arsenic and benzene, which make cells in the lungs become cancerous. directory?
Descriptive analysis is for identifying characteristics and describing datasets. Inferential analysis aims to test hypothesis about general populations based on sample sets. Predictive analytics tries to guess the outcome of an event. Causal analysis generally requires careful study design and randomized studies although some techniques exist to infer causation.
2. What is the most important thing in Data Science?
Garbage in, garbage out. If you don’t start in the right place you’ll end up way off course.
3. Suppose you have forked a repository called datascientist on Github but it isn’t on your local computer yet. Which of the following is the command to bring the directory to your local computer?
This is basically just a heatmap of population density.
4. What is an experimental design tool that can be used to address variables that may be confounders at the design phase of an experiment?
Self-explanatory
5. What is the reason behind the explosion of interest in big data?