Enhancing Statistical Testing and Interactive Visualization

Objective: To equip students with the skills to perform more sophisticated statistical tests within their Exploratory Data Analysis (EDA) workflow using the ggstatsplot package, create interactive visualizations, and publish RPubs using the Breast Cancer Wisconsin dataset.

Prerequisites: Students have a basic understanding of:

Task Description:

Students will perform an EDA. For statistical tests using ggstatsplot and create interactive visualizations with plotly, they will use the Breast Cancer Wisconsin dataset. The entire analysis will be documented in an R Markdown document and published on RPubs.

Steps:

Part 1: EDA with ggstatsplot for Statistical Tests (Breast Cancer Wisconsin Dataset)

  1. Load Necessary Libraries and Data: Load the tidyverse and ggstatsplot libraries. Load the Breast Cancer Wisconsin dataset (students may need to load it from a file or use a readily available version if provided).
  2. Explore the Breast Cancer Wisconsin Dataset: Briefly explore the structure and variables of the dataset.
  3. Formulate Statistical Questions (Breast Cancer Wisconsin): Based on the Breast Cancer Wisconsin dataset, formulate statistical questions that can be addressed using different types of statistical tests (e.g., comparing features between benign and malignant tumors, exploring relationships between cellular characteristics). This is part of Bivariate Analysis.
  4. Visualize and Test with ggstatsplot (Breast Cancer Wisconsin): For each formulated question, use appropriate functions from the ggstatsplot package with the Breast Cancer Wisconsin data to create informative visualizations and perform the corresponding statistical test, displaying the results on the plot.
  5. Interpret the Results (Breast Cancer Wisconsin): Provide a brief interpretation of the statistical results for each test.

Part 2: Creating Interactive Plots with plotly (Iris Dataset Example)

library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
# Load the iris dataset
data(iris)

# Create a ggplot2 scatter plot with tooltip information
p_iris <- ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species,
                           text = paste("Sepal Length: ", Sepal.Length, "<br>",
                                        "Sepal Width: ", Sepal.Width, "<br>",
                                        "Petal Length: ", Petal.Length, "<br>",
                                        "Petal Width: ", Petal.Width))) +
  geom_point() +
  labs(title = "Iris Flower Sepal Dimensions",
       x = "Sepal Length (cm)",
       y = "Sepal Width (cm)",
       color = "Species") +
  theme_minimal()

# Convert the ggplot2 object to an interactive plotly object, specifying the tooltip
fig_iris <- ggplotly(p_iris, tooltip = "text") %>%
  layout(modebar = list(visible = FALSE))

# Display the interactive plot
fig_iris

Part 3: Reproducible Reporting and Publication

  1. Document in R Markdown: Write the entire EDA process, including code, visualizations with ggplot2, ggstatsplot, and ggplotly using Breast Cancer Wisconsin data, and appropriate interpretations, in a well-structured R Markdown document.
  2. Create an RPubs Account: Create an account on RPubs: https://rpubs.com/
  3. Publish to RPubs: In RStudio, knit the R Markdown document to HTML and publish it to RPubs.
  4. Submit the RPubs Link: Submit the link to the published report.

Assessment Criteria:

  • Use of ggstatsplot: Correct application of ggstatsplot for statistical tests on the Breast Cancer Wisconsin dataset.

  • Interpretation of Results: Accurate interpretation of statistical test results.

  • Creation of Interactive Plot: Successful creation of an interactive plotly plot from a ggplot2 objects as part of your EDA, with tooltips and no modebar.

  • Reproducibility: Well-structured and reproducible R Markdown document.

  • Clarity and Communication: Clear explanations, appropriate visualizations, and logical flow.

  • RPubs Publication: Successful publication of the report on RPubs.