Goals

  1. Give you a sense for what is possible with R Markdown
  2. Show you the basic components so you can publish your work online
  3. Build something, make mistakes, and iterate

In the longer term: Use this file and tutorial as a starting point, then go to YouTube, Google or your preferred AI to keep going. Ideally, you’ll be able to articulate what you want a little better after this tutorial.

Remember: Almost nothing works well the first time. It just needs to work well enough for you to learn something and prove a concept. This tutorial is meant to be rough and ready ™. It’s okay if you’re unsure about some part of the process. Getting comfortable will take time.

Components of an R based report

R Markdown file (.Rmd)

  • Contains all the pieces we need and will be render into our outputs
  • make a new .RmD file in R studio -> ‘click’ file -> new file -> R Markdown

Outputs (what you publish/share)

  • PDF
  • Word
  • HTML (website)
  • Graphs (PNG/JPG)

Publishing (if we have time)

The Anatomy of an R Markdown File

  1. YAML
    • lives at the top of the Rmd file
    • sets the outputs for making PDFs, HTML, Word docs, etc
    • Ideally, set up a template or example that has all the options you like and just copy/paste when you want to use it
  2. Markdown code
    • All of your text is written in markdown
      • Make headers by using a hashtag symbol in front of the text
      • More hashtags translates into a smaller header size (# H1, ## H2, ### H3)
      • Using headers correctly will make it easy to create tables of contents (see Table of contents for example)
    • Download the Cheatsheet
    • See RMarkdown documentation for help in the future: https://rmarkdown.rstudio.com/lesson-1.html
  3. Code Chunks
    • Where all the code lives!
    • Used to work with
      • data
      • calculations
      • graphs and data viz

Coding in R

Setup your workspace

Although R has a lot of built-in functionality, we almost always import a separate library to add certain features. Which library you use will depend on what you’re trying to do. Ask Google or an LLM for help figuring out what you need.

#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("tidyverse")
#install.packages("readr")
#install.packages("knitr")
# Install packages to work with, they help you do things that R doesn't have built-in by default
# install.packages("dplyr") <- this is a code comment, it doesn't run, it's for your reference

# helps with data manipulation
library(dplyr)

#helps with plotting
library(ggplot2)

# also helps with data manipulation
library(tidyverse)

# work with CSV data
library(readr)

Loading data

food_security_df <- read.csv('data/raw_data/FAOSTAT_data_en_1-19-2026.csv')

Data cleanup/setup

# Remove the columns we don't want
food_security_df_modified <- food_security_df %>% 
  select(-Domain) %>% 
  select(-Element)

# Removing rows with NA values
food_security_df_modified <- food_security_df_modified[!is.na(food_security_df_modified$Value) & food_security_df_modified$Value != "", ]

Summary stats

# Let's grab only one indicator for now
df_filtered_stability <- food_security_df_modified %>%
  filter(Item == "Political stability and absence of violence/terrorism (index)")


summary(df_filtered_stability$Value)
##    Length     Class      Mode 
##       390 character character

It looks like our values are being treated like text (strings). We need to make them numeric. R needs to be told what kind of values it’s working with so we can get summary statistics.

# right now, our values are being treated like text (strings), we need to make them numeric
# the R needs to be told what kind of values it's working with
df_filtered_stability$value_num <- as.numeric(df_filtered_stability$Value)
df_filtered_stability$Year <- as.numeric(df_filtered_stability$Year)


summary(df_filtered_stability$value_num)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -2.78000 -0.66000  0.01500 -0.07741  0.76000  1.88000

Basic Graphs

ggplot(df_filtered_stability, aes(x = Year, y = value_num)) +
  geom_line() +
  geom_point() +
  scale_x_continuous(breaks = c(2021, 2022)) +
  labs(
    x = "Year",
    y = "Political stability index",
    title = "Political Stability, 2021–2022"
  ) +
  theme_minimal()

#Simple box plot
ggplot(df_filtered_stability, aes(x = factor(Year), y = value_num)) +
  geom_boxplot() +
  labs(
    x = "Year",
    y = "Political stability index",
    title = "Political Stability by Year"
  ) +
  theme_minimal()

#Pretty box plot
ggplot(df_filtered_stability, aes(x = factor(Year), y = value_num, fill = factor(Year))) +
  geom_boxplot(
    width = 0.6,
    alpha = 0.8,
    outlier.shape = NA
  ) +
  geom_jitter(
    width = 0.15,
    alpha = 0.35,
    size = 1.5
  ) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    x = "Year",
    y = "Political stability index",
    title = "Political Stability and Absence of Violence/Terrorism",
    subtitle = "Distribution by year"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "none",
    plot.title = element_text(face = "bold"),
    axis.title = element_text(face = "bold")
  )

# Violing plot

temp_plot <- ggplot(df_filtered_stability, aes(x = factor(Year), y = value_num, fill = factor(Year))) +
  geom_violin(
    trim = FALSE,
    alpha = 0.8
  ) +
  geom_boxplot(
    width = 0.12,
    fill = "white",
    outlier.shape = NA
  ) +
  geom_jitter(
    width = 0.1,
    alpha = 0.3,
    size = 1.3
  ) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    x = "Year",
    y = "Political stability index",
    title = "Political Stability and Absence of Violence/Terrorism",
    subtitle = "Distribution by year"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    legend.position = "none",
    plot.title = element_text(face = "bold"),
    axis.title = element_text(face = "bold")
  )

Saving plots as images

#Saving the plots as images
ggsave(
  filename = "political_stability_violin_2021_2022_2.png",
  plot = temp_plot,
  width = 8,
  height = 6,
  dpi = 300
)

Regressions

Let’s investigate the relationship between political stability and GDP

# lets see how stability and GDP compare
reg_df <- food_security_df_modified %>% 
  filter(
    Item == "Political stability and absence of violence/terrorism (index)" 
    |Item == "Gross domestic product per capita, PPP, (constant 2021 international $)"
    ) 


reg_df_years <- reg_df %>% 
  filter(
    Year == 2021 | Year == 2022
  )

# Reshaping the data so we can run the regressions. Right now it's long, we need it wide (i.e. we're going to pivot the table)
df_wide <- reg_df_years %>%
  filter(Item %in% c(
    "Gross domestic product per capita, PPP, (constant 2021 international $)",
    "Political stability and absence of violence/terrorism (index)"
  )) %>%
  #making sure our values are treated as numbers
  mutate(
    Value = as.numeric(Value),     # if Value is character
    Year  = as.integer(Year)
  ) %>%
  #pivoting the table
  select(Area, Year, Item, Value) %>%
  pivot_wider(
    names_from = Item,
    values_from = Value
  ) %>%
  # renaming columns for ease of use
  rename(
    gdp = `Gross domestic product per capita, PPP, (constant 2021 international $)`,
    pol_stab   = `Political stability and absence of violence/terrorism (index)`
  ) %>%
  # getting rid of any blank values is good practice
  drop_na(gdp, pol_stab)

Running the regression

Raw GPD
#running the regression

# Raw GDP
m1 <- lm(gdp ~ pol_stab, data = df_wide)
summary(m1)
## 
## Call:
## lm(formula = gdp ~ pol_stab, data = df_wide)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -39475 -13757  -2334  10534  93768 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    26106       1158   22.54   <2e-16 ***
## pol_stab       14200       1196   11.87   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22430 on 374 degrees of freedom
## Multiple R-squared:  0.2737, Adjusted R-squared:  0.2718 
## F-statistic:   141 on 1 and 374 DF,  p-value: < 2.2e-16
Log GDP, since GDP tends to skew right
# Log GDP, since it tends to be sweked right 
m2 <- lm(log(gdp) ~pol_stab, data= df_wide)
summary(m2)
## 
## Call:
## lm(formula = log(gdp) ~ pol_stab, data = df_wide)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3856 -0.6761  0.2010  0.6945  2.0400 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.60146    0.04899  195.99   <2e-16 ***
## pol_stab     0.69955    0.05058   13.83   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9488 on 374 degrees of freedom
## Multiple R-squared:  0.3384, Adjusted R-squared:  0.3366 
## F-statistic: 191.3 on 1 and 374 DF,  p-value: < 2.2e-16
Time Fixed Effects
# With time fixed effects
m3 <- lm(log(gdp) ~ pol_stab + factor(Year), data = df_wide)
summary(m3)
## 
## Call:
## lm(formula = log(gdp) ~ pol_stab + factor(Year), data = df_wide)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.3680 -0.6812  0.2064  0.6954  2.0225 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       9.58378    0.06950 137.894   <2e-16 ***
## pol_stab          0.69959    0.05064  13.815   <2e-16 ***
## factor(Year)2022  0.03519    0.09797   0.359     0.72    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9499 on 373 degrees of freedom
## Multiple R-squared:  0.3386, Adjusted R-squared:  0.3351 
## F-statistic: 95.48 on 2 and 373 DF,  p-value: < 2.2e-16
Controlling for Area
# Controlling for area
m4 <- lm(log(gdp) ~ pol_stab + factor(Area) + factor(Year), data = df_wide)
summary(m4)

Graphing Regression results

ggplot(df_wide, aes(x = pol_stab, y = log(gdp))) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", se = TRUE) +
  theme_minimal() +
  labs(
    x = "Political stability and absence of violence/terrorism (index)",
    y = "log(GDP per capita, PPP)",
    title = "GDP and Political Stability"
  )
## `geom_smooth()` using formula = 'y ~ x'

Nice regression tables

library(broom)
library(gt)

tidy(m1) %>%
  gt() %>%
  fmt_number(columns = c(estimate, std.error), decimals = 3) %>%
  tab_header(
    title = "Regression results",
    subtitle = "Dependent variable: GDP per capita"
  )
Regression results
Dependent variable: GDP per capita
term estimate std.error statistic p.value
(Intercept) 26,105.629 1,158.398 22.53597 1.176300e-71
pol_stab 14,200.471 1,196.006 11.87324 8.254138e-28
# Making things nicer, rounding vals
tidy(m2) %>%
  mutate(
    estimate  = round(estimate, 3),
    std.error = round(std.error, 3),
    statistic = round(statistic, 2),
    p.value   = round(p.value, 3)
  ) %>% 
  gt() %>%
  fmt_number(columns = c(estimate, std.error), decimals = 2) %>%
  tab_header(
    title = "Regression results",
    subtitle = "Dependent variable: log(GDP per capita)"
  )
Regression results
Dependent variable: log(GDP per capita)
term estimate std.error statistic p.value
(Intercept) 9.60 0.05 195.99 0
pol_stab 0.70 0.05 13.83 0

For Future Reference

Keyboard Shortcuts

  • Make a new code chunk (Command + Option + I)
  • Assign a new variable (Option + Dash)

Code Chunk Options

Chunk option What it controls Why it matters in reports Plain-language explanation
echo Whether the R code is shown Separates analysis from presentation Show or hide the code
eval Whether the code is executed Lets you display example code without running it Run this code or not
message Package and function messages Keeps output clean and professional Hide messages like “Attaching package…”
warning Warning messages Prevents confusing output for readers Show or hide warnings
include Whether code and results appear Allows silent background computation Run this but don’t show anything
results How printed output is treated Required for tables and formatted text Control how output is displayed
fig.width / fig.height Figure size (in inches) Ensures readable, publication-ready figures Control plot size
fig.cap Figure caption text Enables figure numbering and captions Text shown under the figure
cache Whether results are saved Speeds up slow reports Don’t re-run unless code changes
error Behavior when errors occur Useful for teaching and debugging Keep knitting even if this fails
lm(y ~ x, data = df)

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00