Assignment 2

There are so many packages and functions to make visualizations, that its really important to be able to read documentation and learn new functions. Fortunately, the design of ggplot means that many functions work very similarly and so once you have learned the basics, it’s quite easy to learn more on your own.

The purpose of this assignment is for you to practice this sort of learning. I’ve picked out a few functions that work much like the examples we’ve looked at already. Your assignment is to pick out one from each question below and show how it works in this R markdown document. This is the sort of practice I do all the time when I learn a new R skill.

For each question below, pick one of the functions and demonstrate how the function works by creating an example. I describe briefly what the function or package does and provide a link to documentation or a tutorial which I think will help you learn to use the functions. Do not simply copy examples from these (or any other) web pages or documentation. Instead, adapt the examples to create your own example.

You will need to install packages to access some of these files. Include the appropriate library function in this .Rmd document. Do not include the code to install the package. (I sometimes write the code to install the package, but put a # mark at the start of the line of code to mark it as a comment so it is not executed when I knit the document.) This is because you need to use the library function each time you use a package, but you only need to install it once.

Question 1 (3 marks)

Create a visualization that uses one of the three following functions:

The ggrepel package provides the geoms geom_text_repel and geom_label_repel which add labels that do not overlap and are placed near data points or are connected to points with lines. See the vignette.
The ggtext package allows you to add colour, italics, bold, and other formatting using markdown and other formatting markup. Documentation.
The function expression makes it possible to add mathematical expressions to text elements in a plot. This function is in the package base which is always installed and always loaded, so you don’t need a library function call. You can find examples in this tutorial and the help page for plotmath.

library(ggplot2)
library(ggrepel)

ggplot(mtcars, aes(x = mpg, y = wt, label = rownames(mtcars))) +
  geom_point() +
  geom_text_repel(max.overlaps = 20) +  # Adjust this number if needed
  theme_minimal() +
  ggtitle("Car Mileage vs Weight with Repelled Labels")

Explain in a sentence or two what your example does and how to use the function.


# The example creates a scatter plot using the `mtcars` dataset, with points representing car mileage vs weight.
# It uses the `geom_text_repel()` function from the `ggrepel` package to add text labels to each point, 
# adjusting the labels to avoid overlapping, making the plot more readable.

Question 2 (3 marks)

Create a visualization that uses one of the three following functions. These are alternatives to boxplots, histograms, or points and error bars (from stat_summary)

Use geom_violin to show the distribution of data. (This is part of the tidyverse; there is nothing to install.) Documentation
Show uncertainty in a set of data using stat_eye, stat_histinterval, stat_gradientinterval or similar functions from package ggdist. Documentation
Show the distribution of data using either geom_density or stat_ecdf as an alternative to geom_histogram. (These are both part of the tidyverse and documented here and here)

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +
  geom_violin() +
  ggtitle("Violin Plot of MPG by Number of Cylinders") +
  xlab("Number of Cylinders") +
  ylab("Miles Per Gallon (MPG)") +
  theme_minimal()

Write a couple of sentences that describes the meaning of your visualization and how you interpret the graphical display.


# The violin plot shows the distribution of miles per gallon (MPG) for cars with different numbers of cylinders (cyl).
# The wider areas of the violin plot indicate higher density, showing where most of the data points are concentrated.
# From the plot, we can observe that cars with 4 cylinders have a wider distribution of MPG, while cars with 8 cylinders have a narrower range of MPG, suggesting a more consistent fuel efficiency.

Question 3 (3 marks)

These are more complex functions, so your goal should be simply to replicate an example from the documentation. Pick one of the two functions below, find an example, write code to repeat the example below and describe what visualization your code creates.

Make an animated plot using package gganimate and its documentation. This builds directly on ggplot, creating new asthestics to describe animated changes over time.

Describe what your example does here. Make specific links between your code and the visualization.

Make a 3d scatterplot using package plotly (Documentation). This is a very different plotting package and style compared to ggplot.

library(plotly)

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(webshot)

data <- iris

fig <- plot_ly(data = data, x = ~Sepal.Length, y = ~Sepal.Width, z = ~Petal.Length, 
               type = "scatter3d", mode = "markers", color = ~Species, 
               marker = list(size = 5))

fig

Describe what your example does here. Make specific links between your code and the visualization.

I have attached both the PDF as well as the HTML as, being an interactive plot, I reckon it would be better to have both. 

This R code creates a 3D scatter plot using the plotly package and the built-in iris dataset. The plot represents the relationship between three key flower attributes: sepal length, sepal width, and petal length.

x = ~Sepal.Length, y = ~Sepal.Width, z = ~Petal.Length: These arguments define the three dimensions in the plot. The sepal length is plotted on the x-axis, the sepal width on the y-axis, and the petal length on the z-axis. This 3D structure allows us to see how the different flower attributes relate to one another.

color = ~Species: This argument assigns different colors to the data points based on the species of the flower (Setosa, Versicolor, and Virginica). The plot visually distinguishes the species, allowing you to see how they spread across the different attributes.

mode = "markers": This specifies that we are plotting individual data points (markers) rather than lines or other geometries. Each point represents one flower.

marker = list(size = 5): This controls the size of the data points on the plot. Here, a fixed size of 5 is applied to all points to ensure they are clearly visible.

The resulting plot is interactive, meaning you can rotate and zoom in on the 3D plot to explore the data. When hovering over each point, the species of the flower is displayed, allowing you to understand the relationship between the different attributes across species.

If you use gganimate, not that for some systems the plot may not work in your knitted document, but it should work in the Rstudio environment. That is OK.

Knit and submit (1 mark)

When you are done knit your document to pdf to be sure that all the code works without errors.

Then upload your Rmd file and knitted pdf file to BS.

Assignment 2

Utkarsh,Pandey, and B00925880

Feb 7th, 2025

Question 1 (3 marks)

Question 2 (3 marks)

Question 3 (3 marks)

Knit and submit (1 mark)