Welcome to Data Visualization in R!

Graphics Assignment 1: GGplot Basics

Welcome to your first graphics challenge! Follow these instructions, step by step, to produce the intended graphic. We will continue to build on this, getting more and more advanced as the semester goes on.

We are going to start by getting your system set up to create your code for the semester. You will be working from this single file, all semester long! This will allow you to keep it for use in the future.

Setting up R and RStudio
  1. Let’s make sure you have R set up and ready to go! Use the directions in Chapter 1 of the text to do so. Here is some code to help you through it. You must have R and Rstudio set up before you can move on.

If you have not previously done so, go to this webpage and download R: http://cran.r-project.org/

Then go to this webpage and download Rstudio: http://www.rstudio.com/products/RStuido

Once you have done this, open RStudio.

Setting up R Markdown
  1. Now we are going to start by getting R markdown all set up. R markdown is a cool and useful tool that allows you to write code, submit your code, and also upload your results all at one time, as a web browser document! It makes it easier to grade, spot mistakes, and allows you to save your submitted assignments. It can take a bit to learn, but I promise it is worth it!

Here are some helpful resources in aiding you to understand R Markdown.

Video Tutorial: https://rmarkdown.rstudio.com/authoring_quick_tour.html

Help Page: https://rmarkdown.rstudio.com/lesson-15.HTML

Here is a PDF version that I find really helpful! https://posit.co/wp-content/uploads/2022/10/rmarkdown-1.pdf

#Run the following code in your console (directly in the code running area) to download the package that allows you to create R markdown files. This code will download the package if you do not have it, and skip the download if you have already done so previously.

if (!require("rmarkdown")) {
  install.packages("rmarkdown", dependencies = TRUE)
  library(rmarkdown)
}

if (!require("knitr")) {
  install.packages("knitr")
  library(knitr)
}

if (!require("tinytex")) {
  install.packages("tinytex")
  library(tinytex)
  tinytex::install_tinytex()  # Install TinyTeX distribution
}
  1. Click on the icon that looks like a white page with a little “+” on it on the top left hand corner of the program, and select “R Markdown”. This choice of file type is very important. From this point on we will be working in the “Code Editor”. This is the only way to save your work! Do not write code directly in the console.
  • You can erase everything after line 6. But make sure you leave the header on there!
  1. Replace the “Untitled1” title with “YOURLASTNAME Data Visualization Code”.

  2. Click the little disk to save the file, and name it the same thing. Be very selective of where you save the file, as all of your data will need to be stored in the same location. I recommend creating a folder on your desktop for this course, saving this file there, and exclusively using that folder moving forward. And remember to save your work often.

  3. Place the following code in the file right under the header, but remove the leading hash tags. I have to put those in for the code to display to you. Any code that you write has to go between these two lines, or it will not run or display to me. Anything you write outside of this is interpreted as plain text instead of code. This means that outside of this region, you can write notes, or write me messages just like you would in a word document. You do not need hashtags. It can allow you to keep really neat files (see the help PDF for organization tricks).

  • This is what we do to create a new “chunk” of code.
#```{r,message=F, warning=FALSE}


#```
  1. In your code editing space, which should appear as a grey bar between this new code, read in your necessary packages. If you use this formatting, you can use it each week as we move forward without altering it, although you may need to add new packages to it as we progress. After you paste in the code, press the little green right pointing arrow in the top right hand corner of the green box. This will run all code within the grey chunk.
# List any packages you need to use here
packages <- c("ggplot2", "readr", "tidyverse", "dplyr", "ggpubr")

#Check to see if any of your listed packages need installed
check_install_packages <- function(pkg){
  if (!require(pkg, character.only = TRUE)) {
    install.packages(pkg, dependencies = TRUE)
    library(pkg, character.only = TRUE)
  }
}

# Download the packages and read in the libraries if necessary
sapply(packages, check_install_packages)
  1. Import your data for today’s activity by using a demo data set called “USArrests”. Use your text to figure this out. Remember, you can start with data("").

  2. Look at the data, and in the “open text” space of this R Markdown document, write a description of the variables you see. You can type this out just as you would in a word document. Hint: Use the head function to see the data, and the ? function to learn more about it. Be sure to answer:

  • What are the variables available
  • How is each variable defined or calculated
  • Is each one numerical or categorical
  1. Now we are going to use this data to make a graph! I am going to give you most of the code to do so, as we learn the basics. Copy this code into a new code chunk. Label that chunk by creating a header in your document. You can do so by putting a “##” followed by one space, and the title of the section. You can make this chunk “GGplot Graphic Code”.
#General format is going to be calling a ggplot, followed by the dataframe name (mtcars), followed by defining the X and Y variables of the graphic.
ggplot(mtcars, aes(x = mpg, y=hp)) +
    #You then indicate the type of graph to make (in this case, a dotplot using points).
    geom_point()

- These are the absolute basics of a ggplot. You have to tell it what data to use, what variables to choose, and what type of graph to make.

  1. Using your text and any other resources you need, I want you to take this graph that I have given you and do all of the following:
  • Change the dots to a size of 2.4, and star shaped
  • Use the minimal theme to display the graphic
  • Color, or group, the dots by the “cyl” variable. When you do this, keep in mind that you use “color” for continuous data, and “fill” for categorical in the scale_color_manual code.
  • Move the legend to the bottom of the graph.
  • Title your graphic “Effect of Horsepower on Fuel Efficiency”
  • Give a subtitle of “Categorized by Number of Cylinders”
  • Name your X and Y axes “Horsepower” and “Fuel Efficiency (MPG)”

If you have done it correctly, you should get this graphic!

  1. Use any other demo data set to create another graph. You have full freedom to create anything you’d like! Explore other graph types and aesthetic options!

  2. You are now ready to save the assignment to your own webpage! Let me walk you through that. This is how you will be turning in your projects this semester.

  • Go to the top of the window and hit the “Knit” button, making sure it is knitting to HTML.
  • It will show you the product on the right hand side. Make any changes you’d like so that it is neatly formatted and “pretty”. After you make changes, just hit “knit” again to see the adaptations.
  • Hit the “Publish” button above the preview it gives you.
  • Select “Rpubs” and follow the directions for creating your own page. You will have to create an account. Be sure to save your login info.
  • Save the page to your Rpubs by clicking publish. Name it the same thing as this document. Remember, if you make any changes, you will need to republish each time!
  1. Copy and paste the URL of your newly created webpage into the submission box on BB for me to grade it and leave comments.

Graphics Assignment 2: Violin and Boxplots

Start by reading this paper, or at a minimum, looking over the graphics.

Teghipco, A., Newman-Norlund, R., Fridriksson, J. et al. Distinct brain morphometry patterns revealed by deep learning improve prediction of post-stroke aphasia severity. Commun Med 4, 115 (2024). https://doi.org/10.1038/s43856-024-00541-8

Today’s Assignment:

We will be recreating Figure 7 from this publication. They did not use GGplot to make their graphics, but they do employ a number of skills that will be useful for you to learn in GGPlot! Once you read the paper, you can download the data that was used to make the graph. This came from Supplementary Data File 9. You can download this excel file directly by going to the webpage shown in the citation, and going to the supporting information.

Downloading the Data
  1. Download Supp. Data File 9

  2. Save it as a CSV file as “Violin_Plot_Data.csv”. Make sure it is save in the same folder on your desktop as your code file.

  3. Open up the R Markdown file you used to complete Assignment 1, and start a new coding block using the {r} code that we discussed in the last module.

  4. Make sure you give the assignment a name by inserting a header above the new code chunk.

  5. Read in your new data using the read.csv("") function and name the new dataframe “CAM”

Installing New Packages
  1. In order to complete todays activity, you will also need to add two packages to your “list” of packages you created last week. You need to add “see” to your list that looked like this last module: packages <- c("ggplot2", "readr", "tidyverse", "dplyr", "ggpubr").

  2. In order to use them, don’t forget to run the whole series of code that checks to see if its installed and then runs the libraries foe each package.

Data Formatting
  1. If you use head(CAM) to look at your data, you are going to notice that it is in wide format. This means that you have each measurement in a new column instead of a new row.
##         F1Performance  Repeat1  Repeat2  Repeat3  Repeat4  Repeat5  Repeat6
## 1  SVMWithGradCAMMaps 0.670051 0.701571 0.680628 0.710660 0.648649 0.715686
## 2 SVMWithDeepShapMaps 0.673913 0.610390 0.630872 0.618357 0.662577 0.608696
##    Repeat7  Repeat8  Repeat9 Repeat10 Repeat11 Repeat12 Repeat13 Repeat14
## 1 0.713568 0.684932 0.699029 0.687500 0.720812 0.716418 0.666667 0.683417
## 2 0.623529 0.642857 0.607477 0.645833 0.631579 0.660099 0.662420 0.610778
##   Repeat15 Repeat16 Repeat17 Repeat18 Repeat19 Repeat20
## 1 0.666667 0.663317 0.691943 0.680412 0.686869 0.686551
## 2 0.701754 0.659091 0.577540 0.666667 0.678571 0.596685

GGplot can’t use data in this format (see your assigned reading). So the first thing we will need to do is reformat the data to “long” format.

I am going to provide you with the code, but make sure you understand how it works.

  1. Paste this code into your R code chunk. If you did it properly, you should see the code is colored. If not, you are not properly in a coding chunk and it will only be recognized as text.
#give your newly formatted data a name you will recognize, in this case "data_long"
data_long <- CAM %>%
  #Pivot the data from having many columns to many rows
  pivot_longer(
    cols = starts_with("Repeat"),  # Select columns to pivot
    names_to = "Repeat", 
    values_to = "values") #give the newly created column a name

Now you have variables you can use in your graphs (F1Performance, Repeat, and Values). Your newly formatted data should now look like this:

## # A tibble: 6 × 3
##   F1Performance      Repeat  values
##   <chr>              <chr>    <dbl>
## 1 SVMWithGradCAMMaps Repeat1  0.670
## 2 SVMWithGradCAMMaps Repeat2  0.702
## 3 SVMWithGradCAMMaps Repeat3  0.681
## 4 SVMWithGradCAMMaps Repeat4  0.711
## 5 SVMWithGradCAMMaps Repeat5  0.649
## 6 SVMWithGradCAMMaps Repeat6  0.716
Figure Recreation

Let’s keep in mind, this is the graphic you are trying to replicate.

  1. You are going to start by writing the base formula for your ggplot. Remember, this needs to include your Data + Aesthetics + Geometry. So you need to tell it that you are using your data_long dataframe with “values” on the y-axis and “F1Performance” on the x-axis. Then that you will be using the “violin plot” as the geometry. Now you will notice that the final graph actually has values on the x-axis, and you are telling it to do it on the y-axis. This is required for a violin plot as you start building. We will tell it to flip them later.

At this point, you should see something like this, which is the absolute minimum you must have to create the plot:

This, is very obviously not what you want, so we have lots of work to do!

  1. Lets adapt the violin plot geometry to do a few things:
  • Make the violin plots transparent at a level that seems comparable to the graphic you are trying to make. Look up the formatting on how to do this. (Hint: it uses the alpha= command).
  • Make the size equal to 2. Pay special attention to what this does to the graph.
  • use the draw_quantiles command to add 25%, 50%, and 75% quantile lines.
  • use the quantile.size command to make the quantile lines thicker.

Now, you graph should look a bit like this:

  1. Let’s address that small issue about the axes needing flipped!

Go back to your text, or the internet and find the code for flipping the axes in ggplot.

You should now get this:

  1. We are getting closer! How we are going to add the points to the graphic. However, if you look at the graph you are trying to replicate, the points are under the violin plot. GGplot does things in the order that you type them. So to get the points to plot under the violin plot, you need to add them first! So you are going to add the new geometry before the geom_violin line. Within your new geometry command:
  • make sure you use the appropriate geometry to add ‘dots’ to your graphic.
  • Use your text to remind yourself how to add them as “jitter” points rather than traditional points. Group the coloring by your categorical variable.
  • Set the transparency to be 20% transparent (or 80% visible).

Now your graphic should look a bit like this!

  1. We are now going to do a color match! You can go to this webpage to see colors that are available within ggplot: https://sape.inf.usi.ch/quick-reference/ggplot2/colour.
  • Choose an orange and purple that you feel match the figure as closely as possible.
  • You will need to additional lines of code. One to color the data points, and one to fill the violin plots. Make sure you get both as close as possible!

Your graph should now look like this:

  1. Now you will also notice that the graph you are trying to make shows one more summary statistic. They appear to highlight the median for each group with a single white filled dot. We can very easily add summary statistics to ggplots because it pulls from the raw data directly! In order to do this, we can use this line of code:

stat_summary(fun = median, geom = "point", shape = 21, size = 3, fill = "white", color = "black", stroke = 1.5)

This tells it to calculated the medium, add it as a point, with a circular shape, at a size of 3, with white filling and a black border, with the border at a size of 1.5. Feel free to adapt this numbers to see what changes! Make sure whatever you end with mirrors the figure as closely as possible.

Your graph should now look like this:

  1. Now let’s address this pesky theme. Yours has a grey background, and lines that look dissimilar, among many other things that need done. Let’s get those all out of the way!
  • Start by making the theme “minimal”, and you should get this:

  1. We are getting much closer! There are still many “aesthetic” changes to be made though. All of these changes can be made within one single theme() command. Keeping all of the theme elents together can really help keep your code organized.
  • remove the y axis title, text, and ticks.
  • make the x axis line thicker and black, like the image, by increasing the size, and coloring it black.
  • while you don’t have a title on there yet, set the aesthetics now. Add this in too: plot.title = element_text(hjust = 0.5, face="bold")
  • remove the legend -remove the horizontal “major” grid lines in the y direction and the minor gridlines in the x directions. The code to do this is: panel.grid.major.y = element_blank() and panel.grid.minor.x = element_blank().
  • make the major grid lines in the x direction grey, dashed, and set it to a linewidth of 1.5.

Now, you should have something like this!

  1. Awesome! You might also notice that the graph you are trying to make has some text labels. Let’s go ahead and get those on there!

You can use the geom_text command to do this. I am going to give you the outline for the code, cause this can be tricky, but you need to fill in all the indicators after the “=” signs for it to work.

You can use: geom_text(aes(x = "", label = "", y =), vjust =, color = "", size = ). You will need to do this twice! Once for each x axis category (Keep in mind you flipped your axes, so your x axis looks like your y).

If you are having trouble, I will show you how I got the top one to work:

geom_text(aes(x = "SVMWithGradCAMMaps", label = "SVM + GRAD-CAM++", y = 0.64), vjust = -4.5, color = "darkorange2", size = 4.5)

Geom_text tells it to add text writing. “x=” tells it where to put it on the x-axis. In this case, I want the GRAD-CAM label to line up with the GradCAMMaps violin. “label=” tells it what to write. “y=” tells it where to put it on the y-axis. “v-just” allows it to move up and down so that it doesn’t overlap your data. “color” and “size” tells you what it will look like. When you are looking at your graph, be sure to “knit” the file so that you know what it looks like at that size. Otherwise, you may have to come back and change all of these numbers so that it looks correct on your webpage.

Your graph should now look like this:

  1. You are very close now! A couple small changes still need to be made. First, their values go up in intervals of 0.02, and ours do not. So…
  • adjust the y-axis to be continuous, with limits that match theirs (min and max values), have the “breaks” and the “labels” start and end at your set values, and go up “by” 0.02. You may have to use your book to figure this bit out!

  1. Lastly, you need to add a title! Add the title to match the figure you are replicating, and rename the axis title to match as well.

Now you should have this beauty!

Now make it a half graph, you notice you lose the quantiles.
  1. So you probably notice one more big difference. The figure from the paper has the violin plots cut in “half”. GGplot can actaully do that as well! But it uses a slightly different command and package. It uses that “see” package you installed earlier.
  • to do this, you simply change your geom_violin to geom_violinhalf in your code.

I bet you still notice slight differences, like the fact that your data isnt lined up well anymore. That is because there is a better way to do this in GGplot! And that is your challenge. Make sure you include the code you built to create this graph, and the graph in your Knitted document to your webpage.

Instead, create a violin plot with a boxplot overlay.

Create a new code chunk!

You are now tasked with creating a violin plot with a boxplot overlay. This is the basic requirement. Do anything else you’d like that you think enhances the plot to make it “publication ready”. It should be visually appealing, and sophisticated enough to show me you understand how these two graph types work! Feel free to experiment.

Here are a couple webpages that can help you find more code that can be helpful to you as you build this graphic.

https://r-graph-gallery.com/violin_and_boxplot_ggplot2.html

https://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html

What do I need to turn in for a grade?

In order to get full credit, you need to complete this activity and push it to your RPubs webpage. The new webpage should include:

  1. The code chunk used to produce the replica graph
  2. The replica graph with full violins
  3. The replica graph with half violins
  4. Your new boxplot + violin plot graphic

Graphics Assignment 3: Depictions of Frequency

2D Density Graphics:

Today we are going to be learning how to show data fequency in a number of ways, including histograms, density plots, and bubble plots.

  1. Start by downloading the “log_population_data.csv” data off of blackboard.

  2. Read in the data and name it “population_data”.

  3. Look at the data using either ? or head() so that you know what data is available.

  4. Create a stat_density_2d plot where the x axis shows log currently population and the y axis shows log past population. Set the geometry as a polygon and the colour as white.

  1. Change the color palette by using scale_fill_distiller. You can choose any color palette you’d like. I chose palette number 4 in the reverse direction.

  1. Change the theme to minimal.
  2. Change the labels to match the ones shown here.

Adding Density to Margins:

Now we are going to work on adding more graphical elements to plots you already know how to make! We have learned previously how to do dotplots, bubble plots, and adding regression lines. Today we are going to explore using bubble plots to show sample size, and how adding density plots to the graphic can help you interprete biases in data or the data themselves.

  1. Start by adding "ggExtra" to your list of required packages.
  2. Download the “longevity_data.csv” data file and add it to the same file location where you saved this code file.
  3. Read in the CSV data file and name it “longevity_data” in R.

Now that you have your data, we need to do some calculations with it. For example, the best way to look at this data happens to be in log format. In the previous dataset, we gave you log transformed data. This time, we are going to show you how to calculate that yourself if needed.

  1. Open your data using ? or head() to see the variables.
  2. You will see that you have mass data and longevity data. We want the log values of each of these rather than raw values. Copy and paste this code in order to perform those calculations.
long <- longevity_data %>% #create a new dataframe called "long" that contains all your newly calculated variables
  mutate( #mutate tells the program to perform new calculations
    log_mass = log10(mass_g),                          # create a new column called "log_mass" which Log-transforms mass values
    log_lifespan = log10(maximum_lifespan_yr))  %>%          # create a new colummn called "log_lifespan" that Log-transforms lifespan value
   group_by(order) %>%        # this tells it that after "mutate", you are going to start a new function. for each "order" or group of animals    
  mutate(order_size = n())      #calculate the sample size of each order and put it in a column called "order_size". 

#Now you have a sample size for each order, and you have transformed each mass and lifespan value to log form. 
  1. Make sure you read the notes on each line to understand what it did.
  2. Now look at the data again, and you will see three new variables calculated in newly created columns. Those are the ones we will use to create our plot.
Let’s Make our Plot!
  1. Begin by creating a dotplot with your new dataframe called “long” in which log mass is on the x axis, log lifespan is on the y axis, and you color by class.
  2. Make the points transpartent at 30%.
  3. Now add the size= command to the aes of the ggplot to size the dots by the variable “order_size”.

  1. Add a regression line based on the “linear model” option (lm) with no standard error. Color the linesby class, and make them solid lines.
  2. Scale the color theme to be “lightgreen” and “darkslategray”.

  1. Change the title to: “Bubble Chart of Longevity and Body Mass”
  2. Change the x label to: “Log (Body Mass [g])”
  3. Change the y label to: “Log (Maximum Lifespan [yr])”
  4. Change the theme to “minimal”

  1. Remove all legends from the graph
  2. Change the plot title to be size 14 and bold font.
  3. Change the axis titles to be size 12 and bold font.

  1. Add two text annotations using the annotate function:
  • The first should be “Aves”, in the corresponding color, size 5 and bold. Place it in the correct spot so that you can read it easily.
  • Repeat this for the “Mammals” label.

Now we are going to add density plots to show how the data is distributed.

  1. First, we need to assign your whole plot code to a specific variable name so that we can “call” it back. To do this, simply put “p=” before the first line of your ggplot code. It will look like this: p = ggplot(long, aes(......
  2. The last thing you are going to do is paste this code underneath your ggplot code. Do not include the +. This is not an additional line of the ggplot code. Rather, it is a new command we are telling it to do after create of plot “p”.

ggExtra::ggMarginal(p, type = "density", groupFill = TRUE, alpha = 0.4)

This will use the package ggExtra to add a denisty plot in the margins of the “p” plot, at 40% transparency, and grouped by the same colors used in plot “p”.

Interpretation Questions:
  1. What is the benefit to adding density plots in the margin of your graphics?
  2. Explain how you were able to depict 6 different measures in a single graphic. Be sure to clearly list the element and how it was depicted.
  3. What is the relationship between longevity and body mass? Is it more extreme in mammals or aves?
  4. Is the data more biased toward smaller/larger or long/short lived animals? How do you know and why do you think that is?
  5. Is there an element missing from this graphic that you feel should be there? Hint: There is one that could be helpful if added that is not depicted currently in any other way on the graphic.

Create your own!

Use your newly formed skills to create any kind of frequency plot you’d like. I encourage you to include more than one plotting element (a mixture or histogram, density, regression, dotplot, etc.) to practice integrating multiple types of data visualization. You can use any of the publicly available data that you’d like in order to complete it.

What do I need to turn in for a grade?

In order to get full credit, you need to complete this activity and push it to your RPubs webpage. The new webpage should include:

  1. The code chunk used to produce the 2d Density plot
  2. The bubble chart with regression and density depictions
  3. Answers to your intrepretation questions
  4. Your newly designed graphic

Graphics Assignment 4: Multipanel Graphics

Now that you have learned how to make lots of different graphics and have learned the basics of altering ggplot aesthetics, we are ready to put multiple graphs together into publication quality images!

Creating Panels Based on a Variable

First, lets just briefly discuss how to make multipanel plots by faceting based on a variable within the dataset.

  1. Practice using the ChickWeight data set that is publicly available in R.
  2. Create a ggplot where you show time on the x axis, and weight on the y axis.
  3. Add individual lines, colored by “Chick” id, and set to an alpha value of 0.1.
  4. Add a smoothed regression line, which is black in color and set to 1.2 in size. Include SE coloring in the line, and leave it set to the default “loess” for statistics.
  5. Facet_wrap by Diet type, creating four side by side panels.
  6. Format the aesthetics to be similar to this plot.

Creating Panels From Independent Graphics

Now, lets talk about how to combine multiple plots that were created independently into a single figure! Sometimes, we dont want to facet_wrap by a specific variable. Instead, we want to take two or more independent plots that help answer a single question, and show them in the same location.

There are lots of ways to do that. I am going to share my favorite approach with you here. I find it to be the easiest to manipulate and give you the most freedom.

You are going to create 3 separate plots. I want you to come up with these on your own! You can decide on the colors and other aesthetics of the plots, but you do need to make three general plot types. The plots you need to make usin the CO2 dataset are:

  1. A violin plot with at least one additional graphing element that shows uptake by treatment, grouped by Type
  2. A line plot that shows the relationship between ambient CO2 and uptake, colored and grouped by Treatment, faceted by Type. Make sure each plot also has individual data points.
  3. Any additional plot of your choice that shows a new element.

Once you have created your three plots, you are going to use this help page to combine all three plots into a single attractive figure. This should be something you think would be ready for submission to a journal for publication. This graph should:

  1. Have all three graphics laid out in a way that is easy and fun to look it.
  2. Include panel labels (ex. A, B, C)
  3. Include an overall title for the figure

You can use the ggarrange function, which will work in most cases.

[ggarrange]https://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/81-ggplot2-easy-way-to-mix-multiple-graphs-on-the-same-page/

What do I need to turn in for a grade?

In order to get full credit, you need to complete this activity and push it to your RPubs webpage. The new webpage should include:

  1. Your faceted plot using the chick growth data.
  2. Your final multipanel figure plot.

Final Assignment: Finalizing your R Markdown and Analysing a Dataset

Overview:

This final assessment is designed to test your skills in data visualization using ggplot2 in R. You will demonstrate your ability to prepare a well-organized R markdown file and to create insightful visual representations from a real-world dataset.

Objectives:

  1. Refine and organize an R markdown file into a structured, navigable webpage.
  2. Utilize a publicly available dataset to generate multiple graphics, showcasing diverse analytical conclusions.
  3. Experiment with various ggplot2 visualization techniques, including at least one plot type not covered during the course.

Instructions:

Part 1: R Markdown File Enhancement
Format Your Document:
  1. Transform your R markdown file into a cleanly formatted webpage. You may have to do a bit of research on resources to help you learn to do this! You can use any resources you find to help you. RPubs and RMarkdowns are incredibly versitile and customizable if you put in the time. They can be a great resource for your “future” self and for others who use your code.
  • Add a table of contents or interactive tabs to enhance navigation.
  • Restructure your headers and subheaders to guide the reader through the content logically.
  1. Annotate Your Code:
  • Provide detailed comments on your code snippets to explain their functionality.
  • This will not only help you remember the purpose of each code block but also assist anyone else who might use your script in the future.
  1. Organizational Structure:
  • Arrange your content and code in a tidy and logical order to facilitate ease of understanding and navigation.
Part 2: Data Visualization
Dataset Selection:
  1. Find a publicly available dataset from a reputable source (e.g., government databases, research institutions, academic datasets repositories).
  2. Ensure the dataset is relevant to a real-world application and has sufficient complexity to support varied analyses.
Graphic Creation:
  1. Develop a minimum of four distinct graphics using ggplot2, each illustrating a different conclusion or insight from the dataset.
  2. Include a variety of ggplot2 functions and aesthetics to demonstrate your mastery of the tool.
Innovative Plot Type:
  1. Create at least one graphic using a plot type that was not explicitly covered in the course materials. This will challenge your ability to research and implement new visualization techniques.
Multipanel Figure:
  1. One of your submissions must be a multipanel figure, incorporating multiple plots in a single, cohesive graphic.
  • This could be structured as a grid of related visuals that together tell a comprehensive story about the dataset.
  • Not all of your figures have to go into the multipanel figure. But you must have at least one multipanel.
Submission Requirements:
  1. Final R Markdown File: Submit your completed R markdown file, which should be executable without errors and include all your annotations and structured sections. This submission should be via your Rpubs page link.
  2. Graphics: Include all required graphics both within the R markdown file and as separate image files for easy review.
  3. Dataset Link and Description: Provide a link to your dataset and a brief description of its origin, structure, and relevance to your chosen visualizations.
Evaluation Criteria:
  1. Clarity and Organization of Markdown File: The document should be well-structured, easy to navigate, and aesthetically pleasing.
  2. Code Quality and Annotation: Code should be efficiently written and clearly annotated.
  3. Creativity and Analytical Depth in Visualizations: Graphics should not only be technically well-constructed but also insightful, displaying a deep understanding of the data and its implications.
  4. Adherence to Assignment Specifications: All elements of the assignment prompt must be addressed thoroughly.

This assignment is an opportunity to showcase your growth in data visualization and your ability to convey complex information effectively through graphical representations. Good luck, and enjoy the process of discovering and sharing data-driven stories!