SO859 Week 1: Getting Started With R

The markdown file for this class contains all the code you need to load the libraries we will use throughout the course. We will call on the same libraries for most of our sessions. Click the ‘show’ buttons next to each section to reveal the code (if viewing the page on Moodle). When you are working with this code in class, you can simply run it directly from the markdown sheet, or copy it to the console or script file.

# Set working directory
setwd("C:/Users/eflaherty/iCloudDrive/Teaching/so_859_2025")
dir("C:/Users/eflaherty/iCloudDrive/Teaching/so_859_2025")

# Load required libraries
library(haven)
library(tmaptools)
library(GISTools)
library(tidyverse)
library(sp)
library(spdep)
library(tmap)
library(grid)
library(readxl)
library(bslib)

# Load data (Stata dataset)
ess10_ie <- read_stata("C:/Users/eflaherty/iCloudDrive/Teaching/so_859_2025/ess10_ie.dta")

# Examine the object's class, and view the data  grid
class(ess10_ie)
View(ess10_ie)

# Remember, if you ever want to remove and object from the environment pane use rm()

The code below demonstrates the basic elements of a ggplot command. We will create a simple scatterplot using the variables ‘nwspol’, and ‘agea’.

# Creating a scatterplot
plot1.1 <- ggplot(data = ess10_ie,
            mapping = aes(x = agea,
                          y = nwspol))
plot1.1 + geom_point()

The graph above plots news consumption in minutes per week, against age. What do you notice? Is there anything else we could do to make this more legible, or to explore the relationship further?

plot1.2 <- ggplot(data = ess10_ie,
            mapping = aes(x = agea,
                          y = nwspol))
plot1.2 + geom_smooth()

# Then add in the data points
plot1.2 + geom_point() + geom_smooth ()

# Try it with a linear model also
plot1.2 + geom_point() + geom_smooth(method = lm)

Notice the effect of adding in the smoothing layer? Let’s try some more examples with a different dataset - use the code below (click Show to reveal) to load some country-level data.

country_data <- read_stata("C:/Users/eflaherty/iCloudDrive/Teaching/so_859_2025/country_data_so875.dta")

Now let’ try some graph adjustments to better display our relationships. Start with a linear fit, then let’s rescale the axes to make the plot more legible. There are both aesthetic and staatistical considerations here - keep both in mind.

plot1.3 <- ggplot(data = country_data,
            mapping = aes(x = gni_pc,
                          y = life))
plot1.3 + geom_point() + geom_smooth(method=lm)

# What do you notice about the relationship - is it straightforwardly linear? Try a smoother instead and see what the result is. 
plot1.3 + geom_point() + geom_smooth()

# Let's also try rescaling the gni_pc axis
plot1.4 <- ggplot(data = country_data,
            mapping = aes(x = gni_pc,
                          y = life))
plot1.4 + geom_point() +
    geom_smooth(method = "gam") +
    scale_x_log10()

# Finally, let's  clean our graph up a little. We will grab some pre-made labelling schemes from the scales library (the Dollar function). You can also add your own if you want. We will also add labels and titles.
plot1.5 <- ggplot(data = country_data, mapping = aes(x = gni_pc, y=life))
plot1.5 + geom_point(color = "black", alpha = 0.2, size = 2) +
    geom_smooth(color = "blue", method = "gam") +
    scale_x_log10(labels = scales::dollar) +
    labs(x = "GNI Per Capita", y = "Life Expectancy in Years",
         title = "Per Capita National Income and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: World Bank Databank.")

We can also add a third variable by mapping development classification to color. This classification uses a U.N. binary classification of ‘0=developing/transition, 1=developed’.

# First we load a copy of the data with development status and Eu membership encoded as string.
library(readxl)
country_data2 <- read_excel("C:/Users/eflaherty/iCloudDrive/Teaching/so_859_2025/country_data.xlsx")

Then we map the ‘un_developed’ variable to colour. We will also request that the standard error bands are coloured (filled) to match their color category. Watch what happens in the output. What can we tell about the difference between developed and developing countries by comparing smoothers?

plot1.6 <- ggplot(data = country_data2,
            mapping = aes(x = gni_pc,
                          y = life,
                          color = un_developed,
                          fill = un_developed))
plot1.6 + geom_point() +
    geom_smooth(method = "gam") +
    scale_x_log10()

What if we want to keep just the one plot line, but instead colour the points by group? We can do this by writing the colour mapping instructions into the geom function.

plot1.7 <- ggplot(data = country_data2, mapping = aes(x = gni_pc, y = life))
plot1.7 + geom_point(mapping = aes(color = un_developed)) +
    geom_smooth(method = "gam") +
    scale_x_log10()

We can also map a continuous variable to colour, and ask R to display a colour ramp legend. Like with maps, we can aso specify what breaks we wish to use (equal interval, quantiles, etc).

plot1.8 <- ggplot(data = country_data2,
            mapping = aes(x = gni_pc,
                          y = life))
plot1.8 + geom_point(mapping = aes(color = (pop_grow))) +
    scale_x_log10() +
    geom_smooth(method = "gam")

ggsave(filename = "plot1.8.png")
ggsave(filename = "plot1.8.jpeg")

Let’s end with a fully annotated graph, and save the output as both JPEG and PNG.

plot1.9 <- ggplot(data = country_data2,
            mapping = aes(x = gni_pc,
                          y = life))
plot1.9 + geom_point(mapping = aes(color = (pop_grow))) +
    scale_x_log10(labels = scales::comma) +
    geom_smooth(method = "gam") +
    labs(x = "GNI Per Capita", y = "Life Expectancy in Years",
         title = "Per Capita National Income and Life Expectancy",
         subtitle = "Data points are country-years",
         caption = "Source: World Bank Databank.")

ggsave(filename = "plot1.9.jpeg")

SO859 Week 1: Getting Started With R

Eoin Flaherty

v1: 2024-09-28