📊 Individual Project: Data Analysis Report and Presentation

This project contributes 20% to your final course grade. You will work individually on this task.

🎯 Objective

Choose a publicly available dataset and apply the statistical and data analysis techniques learned in this course. The project is designed to assess your ability to handle real-world data, conduct appropriate statistical analysis, and communicate your findings effectively.

đź§­ Project Tasks

You are required to:

  1. Import and prepare the data in R (e.g., using read_csv(), read_excel(), etc).
    This includes:

    • Classifying variables into categorical, ordinal, numeric interval, or numeric ratio scales.
    • Cleaning the data, such as:
      • Handling or removing missing values when appropriate.
      • Identifying and removing duplicates.
      • Merging redundant or inconsistent factor levels where necessary for analysis.
  2. Generate appropriate summary statistics (e.g., using summary(), skimr::skim(), or dplyr summaries).
    This step should also involve identifying potential outliers and choosing an appropriate strategy for handling them.

  3. Create a meaningful visualization, using ggplot2 or other visualization libraries.
    Make sure to:

    • Clearly communicate the insights from the plot.
    • Address any outliers appropriately in the visual display.
  4. Perform a goodness-of-fit test to evaluate whether a theoretical distribution fits the data (e.g., QQ plot, Kolmogorov–Smirnov test, chi-squared test).

  5. Conduct an appropriate statistical test (e.g., chi-squared test, t-test, Mann–Whitney U test, ANOVA, regression, correlation, etc.) and interpret the result in a real-world context.

  6. Write a concise report (maximum 4 pages, not including code) using R Markdown, clearly explaining your methodology, analysis, and conclusions. Your report should be written in clear English with full sentences and logical structure.

  7. Prepare a short presentation (maximum 4 slides) summarizing your findings, created in R Markdown (e.g., xaringan, ioslides, or beamer).
    You will present this to the class, and it should be engaging, well-paced (no more than 5 minutes), and demonstrate your understanding. You may be asked questions by peers and the instructor.

🗓️ Timeline

You will work on your project throughout the course. In each class, you will have the opportunity to apply the skills learned that week to your project.

Individual Project Rubric (20% of Final Grade)

Criteria Excellent (100%) Good (75%) Satisfactory (50%) Poor (25%) Unsatisfactory (0%) Weight
1. Data Import Data is correctly imported from a public source using R; code is clean and reproducible. Data is imported with minor issues or excessive code complexity. Data is imported but with help or manual intervention. Attempt made but not working correctly. No attempt or completely incorrect. 10%
2. Summary Statistics Includes appropriate numeric summaries; well-formatted and meaningful. Basic summaries are included but may lack completeness or formatting. Some summary stats present but minimal or poorly presented. Attempted but mostly incorrect. Not attempted. 10%
3. Visualization Relevant, correctly implemented ggplot2 visualizations; clearly interpreted in the report. Visualization is relevant but lacks explanation or polish. Basic plot is shown without interpretation. Poor or incorrect use of visualizations. Not attempted. 10%
4. Goodness of Fit Test Appropriate theoretical distribution tested; test implemented correctly; result clearly interpreted. Test implemented with minor issues or weak interpretation. Correct test used but poorly explained. Attempted but wrong test or flawed implementation. Not attempted. 15%
5. Statistical Test Suitable test chosen and explained (e.g. t-test, ANOVA, etc.); real-world conclusion is correct. Test mostly correct but interpretation is unclear or incomplete. Basic test done with weak link to context. Attempted but test choice or conclusion is wrong. Not attempted. 15%
6. Report Clear, concise, structured, in full sentences; fits 4-page limit and includes all sections. Mostly clear but has structural or language issues. Understandable but poorly structured or verbose. Hard to follow or poorly written. Missing or completely incoherent. 10%
7. Presentation Within 5 minutes; engaging; clear slides; speaker answers questions well. Mostly clear; minor issues with timing or Q&A. Presentation is basic or lacks clarity. Poorly presented or hard to understand. Not presented. 10%
8. Extras Project includes additional material outside the scope (e.g., networks, power law, own idea). Attempt at additional material is relevant but shallow. Some creativity or extras, not well executed. Extras attempted but irrelevant or poorly done. No extras. 20%

Additional Notes

Declaration of AI usage

Fedor used ChatGPT 4o to generate this document using his prompts. Specifically, Fedor outlined the main structure — the scope of the project, the list of tasks, the list of grading rubrics and their weights, and the list of additional notes. ChatGPT was then used to improve grammar and clarity, to fill the rubric matrix with specific descriptions, and to format everything in markdown.