In your final paper, you will conduct an extensive set of analyses in R on a dataset of your choice. You can use any dataset you’d like, however, it would be best if your dataset had at least 100 rows and 5 columns. If you have data for a Bachelor’s or Master’s thesis, you are very welcome to use that. If you do not have a dataset, look at the next section “How do I get a dataset?”.
You are welcome to use the same dataset as another student in the class, and you are welcome to work together on your analyses. However, you must each write your own code and text and turn in your own work!
If you do not have a dataset, here are a few places to get datasets
R Datasets: run “library(help = datasets”)" to see a list of the datasets preloaded in R. Make sure to use one with several columns and (hopefully) at least 100 rows.
Datsets from the UCI machine learning database (http://archive.ics.uci.edu/ml/datasets.html). This database has 307 datasets from many fields. If you want to use one of these datasets, I recommend you use those with a Multivariate data type and a Regression default task (http://goo.gl/hm2v4B).
Flights dataset: Use this dataset on flights leaving the Houston airport (http://nathanieldphillips.com/wp-content/uploads/2015/04/Flights.txt)
Create your own new dataset!: You are welcome to collect your own data by, for example, conducting a survey of students.
Your paper must be written entirely in an RMarkdown document and knitted to either an HTML or PDF document. Make sure to print all of your R code in the document by including “echo = T” in the chunk options (or leave it blank and it will print your code by default). You must include your name, date, and the title of the course on the first page of your paper. There is no minimum or maximum page length.
If for some reason you absolutely cannot get R to knit your document to an HTML or PDF file, you can turn in a print-out of your R code - however you may receive a 15% penalty to your grade.
There should be four sections in your paper: Dataset description, Questions, Analyses, and Conclusion:
Your paper should start with a description of the dataset. Make sure you answer these four questions (in paragraph form).
Next you should list 5 questions that you would like to answer. For example, if I was analyzing the ChickWeight dataset, I could ask the following:
In this section, you should conduct the relevant analyses to answer each of your five research questions. I expect to see all of the relevant R code in a chunk, and I expect you to include the main result in your written text using a mini-chunk.
You do not need to restrict yourself to one analysis for each question. For example, to answer the question “How did the chicken weights change over time?”, I could create a plot, calculate a regression and/or calculate the mean (or median) weight at each time point.
At some point in your analyses, you need to use each of the following 8 commands at some point. You do not need to do all of these for each of your analyses questions! You just need to do each one once across all of your analyses.
Write a brief summary of your main conclusions in a few paragraphs.
I will grade your paper based on how well you followed the instructions above, how well formatted and clean your code is. If you follow the instructions and have well formatted code, you’ll get a good grade.