1 . Prepare Assignment 2 report using this R Markdown template. Feel free to DELETE the instructional text provided in the template.
Once you finalise your report, run all R chunks and Preview your notebook in HTML (by clicking Preview). Make sure your code and outputs are visible.
2 . Upload the report as a PDF file via the File Upload tab under the Assignment 2 page in CANVAS (see instructions file for details) after you attach the file click Submit assignment.
The easiest way to produce a PDF file from the RMarkdown is to Run all R chunks, then Preview your notebook in HTML (by clicking Preview) → Open in Browser (Chrome) → Right click on the report in Chrome → Click Print and Select the Destination Option to Save as PDF.
3 . Publish the report to RPubs (see here) and enter your report’s RPubs URL into the Website URL tab under Assignment 2 RPubs Link Submission page in Canvas (see instructions file for details) and submit this too. This online version of the report will be used for marking. Failure to submit your link will delay your feedback and risk late penalties.
If you have any questions regarding the assignment instructions and the R Markdown template, please post it on discussion board.
Provide the packages required to reproduce the report. Make sure you fulfilled the minimum requirement #10.
# This is the R chunk for the required packages
In your own words, provide a brief summary of the preprocessing. Explain the steps that you have taken to preprocess your data. Write this section last after you have performed all data preprocessing. (Word count Max: 300 words)
A clear description of data sets, their sources, and variable descriptions should be provided. In this section, you must also provide the R codes with outputs (head of data sets) that you used to import/read/scrape the data set. You need to fulfil the minimum requirement #1 and merge at least two data sets to create the one you are going to work on. In addition to the R codes and outputs, you need to explain the steps that you have taken.
# This is the R chunk for the Data Section
Summarise the types of variables and data structures, check the attributes in the data and apply proper data type conversions. In addition to the R codes and outputs, explain briefly the steps that you have taken. In this section, show that you have fulfilled minimum requirements 2-4.
# This is the R chunk for the Understand Section
Explain why your data (or one of the data sets) doesn’t conform the tidy data principles (minimum requirement #5). Apply the required steps to reshape the data into a tidy format. In addition to the R codes and outputs, explain everything that you do in this step.
# This is the R chunk for the Tidy & Manipulate Data I
Create/mutate at least one variable from the existing variables (minimum requirement #6). In addition to the R codes and outputs, explain everything that you do in this step.
# This is the R chunk for the Tidy & Manipulate Data II
Scan the data for missing values, special values and obvious errors (i.e. inconsistencies). In this step, you should fulfil the minimum requirement #7. In addition to the R codes and outputs, explain your methodology (i.e. explain why you have chosen that methodology and the actions that you have taken to handle these values) and communicate your results clearly.
# This is the R chunk for the Scan I
Scan the numeric data for outliers. In this step, you should fulfil the minimum requirement #8. In addition to the R codes and outputs, explain your methodology (i.e. explain why you have chosen that methodology and the actions that you have taken to handle these values) and communicate your results clearly.
# This is the R chunk for the Scan II
Apply an appropriate transformation for at least one of the variables. In addition to the R codes and outputs, explain everything that you do in this step. In this step, you should fulfil the minimum requirement #9.
# This is the R chunk for the Transform Section
NOTE: Note that sometimes the order of the tasks may be different than the order given here. For example, you may need to tidy the data sets first to be able to create the common key to merge. Therefore, for such cases you may have a different ordering of the sections.
Any further or optional pre-processing tasks can be added to the template using an additional section in the R Markdown file. Make sure your code is visible (within the margin of the page). Do not use View() to show your data, instead give headers (using head() )