Welcome to your first graphics challenge! Follow these instructions, step by step, to produce the intended graphic. We will continue to build on this, getting more and more advanced as the semester goes on.
We are going to start by getting your system set up to create your code for the semester. You will be working from this single file, all semester long! This will allow you to keep it for use in the future.
If you have not previously done so, go to this webpage and download R: http://cran.r-project.org/
Then go to this webpage and download Rstudio: http://www.rstudio.com/products/RStuido
Once you have done this, open RStudio.
Here are some helpful resources in aiding you to understand R Markdown.
Video Tutorial: https://rmarkdown.rstudio.com/authoring_quick_tour.html
Help Page: https://rmarkdown.rstudio.com/lesson-15.HTML
Here is a PDF version that I find really helpful! https://posit.co/wp-content/uploads/2022/10/rmarkdown-1.pdf
#Run the following code in your console (directly in the code running area) to download the package that allows you to create R markdown files. This code will download the package if you do not have it, and skip the download if you have already done so previously.
if (!require("rmarkdown")) {
install.packages("rmarkdown", dependencies = TRUE)
library(rmarkdown)
}
if (!require("knitr")) {
install.packages("knitr")
library(knitr)
}
if (!require("tinytex")) {
install.packages("tinytex")
library(tinytex)
tinytex::install_tinytex() # Install TinyTeX distribution
}
Replace the “Untitled1” title with “YOURLASTNAME Data Visualization Code”.
Click the little disk to save the file, and name it the same thing. Be very selective of where you save the file, as all of your data will need to be stored in the same location. I recommend creating a folder on your desktop for this course, saving this file there, and exclusively using that folder moving forward. And remember to save your work often.
Place the following code in the file right under the header, but remove the leading hash tags. I have to put those in for the code to display to you. Any code that you write has to go between these two lines, or it will not run or display to me. Anything you write outside of this is interpreted as plain text instead of code. This means that outside of this region, you can write notes, or write me messages just like you would in a word document. You do not need hashtags. It can allow you to keep really neat files (see the help PDF for organization tricks).
#```{r,message=F, warning=FALSE}
#```
# List any packages you need to use here
packages <- c("ggplot2", "readr", "tidyverse", "dplyr", "ggpubr")
#Check to see if any of your listed packages need installed
check_install_packages <- function(pkg){
if (!require(pkg, character.only = TRUE)) {
install.packages(pkg, dependencies = TRUE)
library(pkg, character.only = TRUE)
}
}
# Download the packages and read in the libraries if necessary
sapply(packages, check_install_packages)
Import your data for today’s activity by using a demo data set
called “USArrests”. Use your text to figure this out. Remember, you can
start with data("")
.
Look at the data, and in the “open text” space of this R Markdown document, write a description of the variables you see. You can type this out just as you would in a word document. Hint: Use the head function to see the data, and the ? function to learn more about it. Be sure to answer:
#General format is going to be calling a ggplot, followed by the dataframe name (mtcars), followed by defining the X and Y variables of the graphic.
ggplot(mtcars, aes(x = mpg, y=hp)) +
#You then indicate the type of graph to make (in this case, a dotplot using points).
geom_point()
- These are the absolute basics of a ggplot. You have to tell
it what data to use, what variables to choose, and what type of graph to
make.
scale_color_manual
code.If you have done it correctly, you should get this graphic!
Use any other demo data set to create another graph. You have full freedom to create anything you’d like! Explore other graph types and aesthetic options!
You are now ready to save the assignment to your own webpage! Let me walk you through that. This is how you will be turning in your projects this semester.
Teghipco, A., Newman-Norlund, R., Fridriksson, J. et al. Distinct brain morphometry patterns revealed by deep learning improve prediction of post-stroke aphasia severity. Commun Med 4, 115 (2024). https://doi.org/10.1038/s43856-024-00541-8
We will be recreating Figure 7 from this publication. They did not use GGplot to make their graphics, but they do employ a number of skills that will be useful for you to learn in GGPlot! Once you read the paper, you can download the data that was used to make the graph. This came from Supplementary Data File 9. You can download this excel file directly by going to the webpage shown in the citation, and going to the supporting information.
Download Supp. Data File 9
Save it as a CSV file as “Violin_Plot_Data.csv”. Make sure it is save in the same folder on your desktop as your code file.
Open up the R Markdown file you used to complete Assignment 1,
and start a new coding block using the {r}
code that we
discussed in the last module.
Make sure you give the assignment a name by inserting a header above the new code chunk.
Read in your new data using the read.csv("")
function and name the new dataframe “CAM”
In order to complete todays activity, you will also need to add
two packages to your “list” of packages you created last week. You need
to add “see” to your list that looked like this last module:
packages <- c("ggplot2", "readr", "tidyverse", "dplyr", "ggpubr")
.
In order to use them, don’t forget to run the whole series of code that checks to see if its installed and then runs the libraries foe each package.
head(CAM)
to look at your data, you are
going to notice that it is in wide format. This means that you have each
measurement in a new column instead of a new row.## F1Performance Repeat1 Repeat2 Repeat3 Repeat4 Repeat5 Repeat6
## 1 SVMWithGradCAMMaps 0.670051 0.701571 0.680628 0.710660 0.648649 0.715686
## 2 SVMWithDeepShapMaps 0.673913 0.610390 0.630872 0.618357 0.662577 0.608696
## Repeat7 Repeat8 Repeat9 Repeat10 Repeat11 Repeat12 Repeat13 Repeat14
## 1 0.713568 0.684932 0.699029 0.687500 0.720812 0.716418 0.666667 0.683417
## 2 0.623529 0.642857 0.607477 0.645833 0.631579 0.660099 0.662420 0.610778
## Repeat15 Repeat16 Repeat17 Repeat18 Repeat19 Repeat20
## 1 0.666667 0.663317 0.691943 0.680412 0.686869 0.686551
## 2 0.701754 0.659091 0.577540 0.666667 0.678571 0.596685
GGplot can’t use data in this format (see your assigned reading). So the first thing we will need to do is reformat the data to “long” format.
I am going to provide you with the code, but make sure you understand how it works.
#give your newly formatted data a name you will recognize, in this case "data_long"
data_long <- CAM %>%
#Pivot the data from having many columns to many rows
pivot_longer(
cols = starts_with("Repeat"), # Select columns to pivot
names_to = "Repeat",
values_to = "values") #give the newly created column a name
Now you have variables you can use in your graphs (F1Performance, Repeat, and Values). Your newly formatted data should now look like this:
## # A tibble: 6 × 3
## F1Performance Repeat values
## <chr> <chr> <dbl>
## 1 SVMWithGradCAMMaps Repeat1 0.670
## 2 SVMWithGradCAMMaps Repeat2 0.702
## 3 SVMWithGradCAMMaps Repeat3 0.681
## 4 SVMWithGradCAMMaps Repeat4 0.711
## 5 SVMWithGradCAMMaps Repeat5 0.649
## 6 SVMWithGradCAMMaps Repeat6 0.716
Let’s keep in mind, this is the graphic you are trying to replicate.
At this point, you should see something like this, which is the absolute minimum you must have to create the plot:
This, is very obviously not what you want, so we have lots of work to do!
alpha=
command).draw_quantiles
command to add 25%, 50%, and 75%
quantile lines.quantile.size
command to make the quantile
lines thicker.Now, you graph should look a bit like this:
Go back to your text, or the internet and find the code for flipping the axes in ggplot.
You should now get this:
geom_violin
line. Within your new geometry
command:Now your graphic should look a bit like this!
Your graph should now look like this:
stat_summary(fun = median, geom = "point", shape = 21, size = 3, fill = "white", color = "black", stroke = 1.5)
This tells it to calculated the medium, add it as a point, with a circular shape, at a size of 3, with white filling and a black border, with the border at a size of 1.5. Feel free to adapt this numbers to see what changes! Make sure whatever you end with mirrors the figure as closely as possible.
Your graph should now look like this:
theme()
command. Keeping all of the theme elents together
can really help keep your code organized.plot.title = element_text(hjust = 0.5, face="bold")
panel.grid.major.y = element_blank()
and
panel.grid.minor.x = element_blank()
.Now, you should have something like this!
You can use the geom_text
command to do this. I am going
to give you the outline for the code, cause this can be tricky, but you
need to fill in all the indicators after the “=” signs for it to
work.
You can use:
geom_text(aes(x = "", label = "", y =), vjust =, color = "", size = )
.
You will need to do this twice! Once for each x axis category (Keep in
mind you flipped your axes, so your x axis looks like your y).
If you are having trouble, I will show you how I got the top one to work:
geom_text(aes(x = "SVMWithGradCAMMaps", label = "SVM + GRAD-CAM++", y = 0.64), vjust = -4.5, color = "darkorange2", size = 4.5)
Geom_text tells it to add text writing. “x=” tells it where to put it on the x-axis. In this case, I want the GRAD-CAM label to line up with the GradCAMMaps violin. “label=” tells it what to write. “y=” tells it where to put it on the y-axis. “v-just” allows it to move up and down so that it doesn’t overlap your data. “color” and “size” tells you what it will look like. When you are looking at your graph, be sure to “knit” the file so that you know what it looks like at that size. Otherwise, you may have to come back and change all of these numbers so that it looks correct on your webpage.
Your graph should now look like this:
Now you should have this beauty!
geom_violin
to
geom_violinhalf
in your code.I bet you still notice slight differences, like the fact that your data isnt lined up well anymore. That is because there is a better way to do this in GGplot! And that is your challenge. Make sure you include the code you built to create this graph, and the graph in your Knitted document to your webpage.
Create a new code chunk!
You are now tasked with creating a violin plot with a boxplot overlay. This is the basic requirement. Do anything else you’d like that you think enhances the plot to make it “publication ready”. It should be visually appealing, and sophisticated enough to show me you understand how these two graph types work! Feel free to experiment.
Here are a couple webpages that can help you find more code that can be helpful to you as you build this graphic.
https://r-graph-gallery.com/violin_and_boxplot_ggplot2.html
https://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html
In order to get full credit, you need to complete this activity and push it to your RPubs webpage. The new webpage should include:
Today we are going to be learning how to show data fequency in a number of ways, including histograms, density plots, and bubble plots.
Start by downloading the “log_population_data.csv” data off of blackboard.
Read in the data and name it “population_data”.
Look at the data using either ?
or
head()
so that you know what data is available.
Create a stat_density_2d
plot where the x axis shows
log currently population and the y axis shows log past population. Set
the geometry as a polygon and the colour as white.
scale_fill_distiller
.
You can choose any color palette you’d like. I chose palette number 4 in
the reverse direction.Now we are going to work on adding more graphical elements to plots you already know how to make! We have learned previously how to do dotplots, bubble plots, and adding regression lines. Today we are going to explore using bubble plots to show sample size, and how adding density plots to the graphic can help you interprete biases in data or the data themselves.
"ggExtra"
to your list of required
packages.Now that you have your data, we need to do some calculations with it. For example, the best way to look at this data happens to be in log format. In the previous dataset, we gave you log transformed data. This time, we are going to show you how to calculate that yourself if needed.
?
or head()
to see
the variables.long <- longevity_data %>% #create a new dataframe called "long" that contains all your newly calculated variables
mutate( #mutate tells the program to perform new calculations
log_mass = log10(mass_g), # create a new column called "log_mass" which Log-transforms mass values
log_lifespan = log10(maximum_lifespan_yr)) %>% # create a new colummn called "log_lifespan" that Log-transforms lifespan value
group_by(order) %>% # this tells it that after "mutate", you are going to start a new function. for each "order" or group of animals
mutate(order_size = n()) #calculate the sample size of each order and put it in a column called "order_size".
#Now you have a sample size for each order, and you have transformed each mass and lifespan value to log form.
size=
command to the aes of the ggplot to
size the dots by the variable “order_size”.annotate
function:Now we are going to add density plots to show how the data is distributed.
p = ggplot(long, aes(......
+
. This is not an
additional line of the ggplot code. Rather, it is a new command we are
telling it to do after create of plot “p”.ggExtra::ggMarginal(p, type = "density", groupFill = TRUE, alpha = 0.4)
This will use the package ggExtra
to add a denisty plot
in the margins of the “p” plot, at 40% transparency, and grouped by the
same colors used in plot “p”.
Use your newly formed skills to create any kind of frequency plot you’d like. I encourage you to include more than one plotting element (a mixture or histogram, density, regression, dotplot, etc.) to practice integrating multiple types of data visualization. You can use any of the publicly available data that you’d like in order to complete it.
In order to get full credit, you need to complete this activity and push it to your RPubs webpage. The new webpage should include:
Now that you have learned how to make lots of different graphics and have learned the basics of altering ggplot aesthetics, we are ready to put multiple graphs together into publication quality images!
First, lets just briefly discuss how to make multipanel plots by faceting based on a variable within the dataset.
ChickWeight
data set that is
publicly available in R.Now, lets talk about how to combine multiple plots that were created independently into a single figure! Sometimes, we dont want to facet_wrap by a specific variable. Instead, we want to take two or more independent plots that help answer a single question, and show them in the same location.
There are lots of ways to do that. I am going to share my favorite approach with you here. I find it to be the easiest to manipulate and give you the most freedom.
You are going to create 3 separate plots. I want you to come up with these on your own! You can decide on the colors and other aesthetics of the plots, but you do need to make three general plot types. The plots you need to make usin the CO2 dataset are:
Once you have created your three plots, you are going to use this help page to combine all three plots into a single attractive figure. This should be something you think would be ready for submission to a journal for publication. This graph should:
You can use the ggarrange
function, which will work in
most cases.
In order to get full credit, you need to complete this activity and push it to your RPubs webpage. The new webpage should include:
This final assessment is designed to test your skills in data visualization using ggplot2 in R. You will demonstrate your ability to prepare a well-organized R markdown file and to create insightful visual representations from a real-world dataset.
This assignment is an opportunity to showcase your growth in data visualization and your ability to convey complex information effectively through graphical representations. Good luck, and enjoy the process of discovering and sharing data-driven stories!