Introduction

R is a useful tool for data analysis and visualization, but it’s not perfect. It’s great at doing certain things and less good at others. It can be a great tool in a research project. When using R, keep in mind that this is an iterative process. You start with a question or research goal which leads you to find or make a data set. Often this raises more questions then it answers. Other times you find enough to start thinking about how to show your findings, so you experiment and make visuals until you find something that feels impactful.

Getting Started with R

Make sure you install R and then Rstudio

You can install R from here: https://cran.r-project.org/

You can install RStudio from here: https://posit.co/download/rstudio-desktop/

Open RStudio, then make a new script (file > new file > R script) and save it somewhere you’ll remember.

Now you should see four different windows:

it looks like this Console: you can write code here that R won’t save.

Script: the script you’re writing. R will save this.

Environment: all the dataframes and variables you create will be saved here but not the code itself.

Viewer: where you can look at plots, file paths, and get help.

Keyboard Shortcuts

A few things to make using R a bit easier: There are a few tools that you’ll use often.

<- is called the assignment operator. alt + - on windows or option + - on Mac.

%/% is called a pipe operator. ctrl + shift + m on windows or command + shift + m on Mac.

a # means the rest of the line after the symbol will be a comment, not code. R skips over it.

crtl + enter or command + enter will run the code chunk that your cursor is in.

If you ever need help type “?” in the console window. If you want to be more specific you can type “?” followed by the name of the function you are having trouble with.

Also it’s worth noting that R does not like spaces. I suggest using an underscore instead. If you have to use a space or if a file name has a space in it and you can’t change it, put it in quotes when you reference it.

Libraries and Packages

Libraries are pre-written packages of functions. Using them saves a lot of time. These are the libraries I used:

library(tidyverse)
library(scales)
library(sf)
library(ggplot2)
library(svglite)
## Warning: package 'svglite' was built under R version 4.2.2

Always put them at the top of your script! You’ll need to run this each time you start up RStudio.

You might note you get a warning. That’s okay! A warning is different from an error.

Each library has to be installed. Type the following in your console and substitute “library_name” for the section in parenthesis of each of the libraries above. You only need to install a library once.

# install.packages(library_name)

Import Data

Download the dataset you’re going to use. In this case I downloaded the Block Party data that Sofia shared with me. Here is the format for that:

# data_set_name <- read.csv("url/file_name")

And here is what it looked like for me:

raw_van_alen_block_party_1 <- read_csv("221108_Van Alen Block Party (1).csv")

A few things to note Your file name may be different from mine.

Your URL may be different as well.

Pay attention to those details. They can cause a lot of frustration if you’re not careful.

Also it’s up to you what you substitute for “data_set_name”. Just make sure it’s clear!

Picking Data and Visualizing

I looked through the dataset Sofia sent me and I thought about what I could visualize and how. The first thing I thought of was to make a map by plotting latitude and longitude. This is really a scatter plot where the x axis is longitude and the y axis is latitude. “column_name_1” and “column_name_2” can be the names of any column in the data set your using. Use whatever variable you want to look at.

Here’s the formula to make a scatter plot:

# ggplot(data_set_name, aes(x = column_name_1, y = column_name_2)) + 
#   geom_point()

Here’s my version:

ggplot(raw_van_alen_block_party_1, aes(x = longitude, y = latitude)) + 
  geom_point(size = 0.7, alpha = 0.25) # I added size and alpha (transparency) to make it a little easier to read

Already we can see how all of these points are arrayed along a line (a street). Lets make it a little more interesting. R can also change the size and color of these points according to a variable, like this:

# ggplot(data_set_name, aes(x = column_name_1, y = column_name_2, color = column_name_3, size = column_name_4)) + 
#   geom_point()
ggplot(raw_van_alen_block_party_1, 
       aes(x = longitude, y = latitude, 
           color = Posture, # making the color of each dot correspond to the "Posture" column in raw_van_alen_block_party_1
           alpha = 0.25)) + 
  geom_point()

And we can add a title and make the background look a little nicer:

ggplot(raw_van_alen_block_party_1, 
       aes(x = longitude, y = latitude, 
           color = Posture, # making the color of each dot correspond to the "Posture" column in raw_van_alen_block_party_1
           alpha = 0.25)) + 
  geom_point() +
  labs(title = "Locations of Stationary People Observed") +
  theme_bw() # this changes the whole visual to a nicer color scheme

Take a careful look at what you’ve created? Do you notice anything wrong with it? In this case I notice that the scale of the x and y axis are quite different. That means this map isn’t quite accurate. Lets fix that:

ggplot(raw_van_alen_block_party_1, 
       aes(x = longitude, y = latitude, 
           color = Posture,
           alpha = 0.25)) + 
  geom_point() +
  labs(title = "Locations of People Observed") +
  coord_fixed() +
  theme_bw() 

Assign it a name. Do this by typing whatever you want to name this plot then an assignment operator (alt + - or option + -) before the code for the plot

van_alen_plot_1 <- ggplot(raw_van_alen_block_party_1, 
       aes(x = longitude, y = latitude, 
           color = Posture, 
           alpha = 0.25)) + 
  geom_point() +
  labs(title = "Locations of Stationary People Observed") +
  theme_bw()

Save it:

ggsave("C:/Users/likms/Desktop/DUE/Other/van_alen_plot_1.png", 
       plot = van_alen_plot_1, 
       units = "in", 
       height = 5, width = 7)


ggsave("C:/Users/likms/Desktop/DUE/Other/van_alen_plot_1.svg", 
       plot = van_alen_plot_1, 
       units = "in", 
       height = 5, width = 7)

I am saving it two ways; as a .png and as a .svg (scalable vector graphic). A png is a pretty common image format that has transparency. An svg is useful if you might modify this in indesign.

I took an image - just a screenshot from Google Maps - and overlayed our scatter plot on top in indesign. It looks like this:

Made with R and Indesign

Note

One of the problems with this process is that you never know how good a visualization will be until you’ve made it. It’s a process of iterating and experimenting until you get something you’re happy with. For example I also tried to make a timeline, but it looked like this:

# first I created a new dataframe to work from in which I made date and time into two separate columns
modified_van_alen_block_party_1 <- raw_van_alen_block_party_1 %>% 
  separate(collected, into = c("date_collected", "time_collected"), sep = " ")

# then I plotted it the same way as above
ggplot(modified_van_alen_block_party_1, 
       aes(x = time_collected, y = Posture)) + 
  geom_point() +
  labs(title = "Locations of People Observed") +
  theme_bw() +
  theme(axis.text.x=element_text(size = 5, angle=90, hjust=1, vjust=0.5))

Making other types of visuals

In the examples above, the type of plot is defined by “geom_point”. If you want to make a different type of plot you can use “geom_bar” or “geom_col” for a bar/column chart, “geom_histogram” for a histogram, “geom_sf” for a map with a shapefile, and a lot more.

ggplot(modified_van_alen_block_party_1, 
       aes(x = Posture)) + 
  geom_bar() +
  labs(title = "Posture of People Observed") +
  theme_bw()