State your research question, a description of the variables you’ll use, and your data sources.
Do chocolate bars with certain cacao percentages and bean types have higher ratings than other chocolate bars?
One numerical variable, outcome variable: ratings
One categorical explanatory/predictor variable: bean type
One numerical explanatory/predictor variable: cacao percentage
ID variable: #1-1795
clean_names() function from the janitor package then select() only the variables you are going to use.| cocoa_percent | rating | bean_type_new | ID |
|---|---|---|---|
| 63 | 3.75 | other | 1 |
| 70 | 2.75 | other | 2 |
| 70 | 3.00 | other | 3 |
| 70 | 3.50 | other | 4 |
| 70 | 3.50 | other | 5 |
| 70 | 2.75 | Criollo | 6 |
Create “exploratory data analysis” visualizations of your data. At this point these are preliminary and can change for the submission, but the only requirement is that your visualizations use each of the measurement variables included in your dataset to test out if they work.