Requirements (Please Read Carefully):

  1. Submit your report in htmlknitted from R markdown along with the .rmd file.

  2. Organize your report using different level of headers.

  3. Include the question, code, result/graph, and explanation for each problem in your report.

  4. Polish graphs for visual comfort.

  5. AI is NOT allowed for this assignment.


Rubrics:


1. Data visualization and exploration tasks with gpa data set


The gpa data set is available through openintro package in R. Answer the following questions with an appropriate graph. Summarize your finding in plain text for each graph to answer the question.

Task list:
  1. By doing your own research, give the precise meaning of each variable.

  2. Visualize the relationship between studyweek and gpa. What does your graph indicate?

  3. Visualize the relationship between out and gpa. What does your graph indicate?

  4. Visualize the relationship between out and sleepnight. What does your graph indicate?

  5. Visualize the relationship between gender and studyweek. What does your graph indicate?

  6. Visualize the relationship between gender and out. What does your graph indicate?

  7. Present a question of your own interest related to this data set. Answer your question with analysis or visualization.


2. Data visualization tasks with loans_full_schema data set


Finish the following data visualization tasks using the full loans_full_schema data set (55 columns) in openintro library. For each task, you need to summarize what you learn from the graph accurately and concisely.

  1. Create a histogram of a numeric variable that you select and plot a density curve on top of the histogram. Carefully select bin numbers/sizes/boundaries to make the plot informative. What does this graph indicate?

  2. Create a graph to study the effect of a categorical/discrete variable on the distributions of a numeric variable. What does this graph indicate?

  3. Create a bin heatmap (2d density plot) to study the relationship between two numeric variables that you select. Summarize the findings from the graph.

  4. Use facet_wrap to create an informative plot. Summarize the findings from the graph

  5. Use facet_grid to create an informative plot. Summarize the findings from the graph.

  6. Present a question of your own interest related to this data set. Answer your question with analysis or visualization.


3. Data visualization and exploration tasks with ames data set


The ames data set is available through openintro package in R.

  1. Write an introductory paragraph to the data set which provides the basic information - what the data set is about; the number of samples and features; the scope that the features cover.

  2. Use a plot to analyze how area correlates with price. Summarize your finding from the graph.

  3. Use a plot to analyze how Bldg.Type correlates with price. Explain the meaning of each label for Bldg.Type and summarize your finding from the graph.

  4. Use a plot to analyze how Bldg.Type and area altogether correlates with price. Summarize your finding from the graph.

  5. (Bonus - 5 Points) You may need to self-study to fulfill this task: use a plot to study how area and Year.Built together correlates with price. Summarize your finding from the graph.

  6. (Bonus - 5 Points) Present a question of your own interest related to this data set. Answer your question with analysis or visualization.