R MarkdownIn the longer term: Use this file and tutorial as a starting point, then go to YouTube, Google or your preferred AI to keep going. Ideally, you’ll be able to articulate what you want a little better after this tutorial.
Remember: Almost nothing works well the first time. It just needs to work well enough for you to learn something and prove a concept. This tutorial is meant to be rough and ready ™. It’s okay if you’re unsure about some part of the process. Getting comfortable will take time.
file -> new file ->
R MarkdownAlthough R has a lot of built-in functionality, we almost always
import a separate library to add certain features. Which
library you use will depend on what you’re trying to do. Ask Google or
an LLM for help figuring out what you need.
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("tidyverse")
#install.packages("readr")
#install.packages("knitr")
# Install packages to work with, they help you do things that R doesn't have built-in by default
# install.packages("dplyr") <- this is a code comment, it doesn't run, it's for your reference
# helps with data manipulation
library(dplyr)
#helps with plotting
library(ggplot2)
# also helps with data manipulation
library(tidyverse)
# work with CSV data
library(readr)
food_security_df <- read.csv('data/raw_data/FAOSTAT_data_en_1-19-2026.csv')
# Remove the columns we don't want
food_security_df_modified <- food_security_df %>%
select(-Domain) %>%
select(-Element)
# Removing rows with NA values
food_security_df_modified <- food_security_df_modified[!is.na(food_security_df_modified$Value) & food_security_df_modified$Value != "", ]
# Let's grab only one indicator for now
df_filtered_stability <- food_security_df_modified %>%
filter(Item == "Political stability and absence of violence/terrorism (index)")
summary(df_filtered_stability$Value)
## Length Class Mode
## 390 character character
It looks like our values are being treated like text (strings). We need to make them numeric. R needs to be told what kind of values it’s working with so we can get summary statistics.
# right now, our values are being treated like text (strings), we need to make them numeric
# the R needs to be told what kind of values it's working with
df_filtered_stability$value_num <- as.numeric(df_filtered_stability$Value)
df_filtered_stability$Year <- as.numeric(df_filtered_stability$Year)
summary(df_filtered_stability$value_num)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.78000 -0.66000 0.01500 -0.07741 0.76000 1.88000
ggplot(df_filtered_stability, aes(x = Year, y = value_num)) +
geom_line() +
geom_point() +
scale_x_continuous(breaks = c(2021, 2022)) +
labs(
x = "Year",
y = "Political stability index",
title = "Political Stability, 2021–2022"
) +
theme_minimal()
#Simple box plot
ggplot(df_filtered_stability, aes(x = factor(Year), y = value_num)) +
geom_boxplot() +
labs(
x = "Year",
y = "Political stability index",
title = "Political Stability by Year"
) +
theme_minimal()
#Pretty box plot
ggplot(df_filtered_stability, aes(x = factor(Year), y = value_num, fill = factor(Year))) +
geom_boxplot(
width = 0.6,
alpha = 0.8,
outlier.shape = NA
) +
geom_jitter(
width = 0.15,
alpha = 0.35,
size = 1.5
) +
scale_fill_brewer(palette = "Set2") +
labs(
x = "Year",
y = "Political stability index",
title = "Political Stability and Absence of Violence/Terrorism",
subtitle = "Distribution by year"
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "none",
plot.title = element_text(face = "bold"),
axis.title = element_text(face = "bold")
)
# Violing plot
temp_plot <- ggplot(df_filtered_stability, aes(x = factor(Year), y = value_num, fill = factor(Year))) +
geom_violin(
trim = FALSE,
alpha = 0.8
) +
geom_boxplot(
width = 0.12,
fill = "white",
outlier.shape = NA
) +
geom_jitter(
width = 0.1,
alpha = 0.3,
size = 1.3
) +
scale_fill_brewer(palette = "Set2") +
labs(
x = "Year",
y = "Political stability index",
title = "Political Stability and Absence of Violence/Terrorism",
subtitle = "Distribution by year"
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "none",
plot.title = element_text(face = "bold"),
axis.title = element_text(face = "bold")
)
#Saving the plots as images
ggsave(
filename = "political_stability_violin_2021_2022_2.png",
plot = temp_plot,
width = 8,
height = 6,
dpi = 300
)
Let’s investigate the relationship between political stability and GDP
# lets see how stability and GDP compare
reg_df <- food_security_df_modified %>%
filter(
Item == "Political stability and absence of violence/terrorism (index)"
|Item == "Gross domestic product per capita, PPP, (constant 2021 international $)"
)
reg_df_years <- reg_df %>%
filter(
Year == 2021 | Year == 2022
)
# Reshaping the data so we can run the regressions. Right now it's long, we need it wide (i.e. we're going to pivot the table)
df_wide <- reg_df_years %>%
filter(Item %in% c(
"Gross domestic product per capita, PPP, (constant 2021 international $)",
"Political stability and absence of violence/terrorism (index)"
)) %>%
#making sure our values are treated as numbers
mutate(
Value = as.numeric(Value), # if Value is character
Year = as.integer(Year)
) %>%
#pivoting the table
select(Area, Year, Item, Value) %>%
pivot_wider(
names_from = Item,
values_from = Value
) %>%
# renaming columns for ease of use
rename(
gdp = `Gross domestic product per capita, PPP, (constant 2021 international $)`,
pol_stab = `Political stability and absence of violence/terrorism (index)`
) %>%
# getting rid of any blank values is good practice
drop_na(gdp, pol_stab)
#running the regression
# Raw GDP
m1 <- lm(gdp ~ pol_stab, data = df_wide)
summary(m1)
##
## Call:
## lm(formula = gdp ~ pol_stab, data = df_wide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -39475 -13757 -2334 10534 93768
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26106 1158 22.54 <2e-16 ***
## pol_stab 14200 1196 11.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22430 on 374 degrees of freedom
## Multiple R-squared: 0.2737, Adjusted R-squared: 0.2718
## F-statistic: 141 on 1 and 374 DF, p-value: < 2.2e-16
# Log GDP, since it tends to be sweked right
m2 <- lm(log(gdp) ~pol_stab, data= df_wide)
summary(m2)
##
## Call:
## lm(formula = log(gdp) ~ pol_stab, data = df_wide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3856 -0.6761 0.2010 0.6945 2.0400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.60146 0.04899 195.99 <2e-16 ***
## pol_stab 0.69955 0.05058 13.83 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9488 on 374 degrees of freedom
## Multiple R-squared: 0.3384, Adjusted R-squared: 0.3366
## F-statistic: 191.3 on 1 and 374 DF, p-value: < 2.2e-16
# With time fixed effects
m3 <- lm(log(gdp) ~ pol_stab + factor(Year), data = df_wide)
summary(m3)
##
## Call:
## lm(formula = log(gdp) ~ pol_stab + factor(Year), data = df_wide)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.3680 -0.6812 0.2064 0.6954 2.0225
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.58378 0.06950 137.894 <2e-16 ***
## pol_stab 0.69959 0.05064 13.815 <2e-16 ***
## factor(Year)2022 0.03519 0.09797 0.359 0.72
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9499 on 373 degrees of freedom
## Multiple R-squared: 0.3386, Adjusted R-squared: 0.3351
## F-statistic: 95.48 on 2 and 373 DF, p-value: < 2.2e-16
# Controlling for area
m4 <- lm(log(gdp) ~ pol_stab + factor(Area) + factor(Year), data = df_wide)
summary(m4)
ggplot(df_wide, aes(x = pol_stab, y = log(gdp))) +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", se = TRUE) +
theme_minimal() +
labs(
x = "Political stability and absence of violence/terrorism (index)",
y = "log(GDP per capita, PPP)",
title = "GDP and Political Stability"
)
## `geom_smooth()` using formula = 'y ~ x'
library(broom)
library(gt)
tidy(m1) %>%
gt() %>%
fmt_number(columns = c(estimate, std.error), decimals = 3) %>%
tab_header(
title = "Regression results",
subtitle = "Dependent variable: GDP per capita"
)
| Regression results | ||||
| Dependent variable: GDP per capita | ||||
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 26,105.629 | 1,158.398 | 22.53597 | 1.176300e-71 |
| pol_stab | 14,200.471 | 1,196.006 | 11.87324 | 8.254138e-28 |
# Making things nicer, rounding vals
tidy(m2) %>%
mutate(
estimate = round(estimate, 3),
std.error = round(std.error, 3),
statistic = round(statistic, 2),
p.value = round(p.value, 3)
) %>%
gt() %>%
fmt_number(columns = c(estimate, std.error), decimals = 2) %>%
tab_header(
title = "Regression results",
subtitle = "Dependent variable: log(GDP per capita)"
)
| Regression results | ||||
| Dependent variable: log(GDP per capita) | ||||
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 9.60 | 0.05 | 195.99 | 0 |
| pol_stab | 0.70 | 0.05 | 13.83 | 0 |
| Chunk option | What it controls | Why it matters in reports | Plain-language explanation |
|---|---|---|---|
echo |
Whether the R code is shown | Separates analysis from presentation | Show or hide the code |
eval |
Whether the code is executed | Lets you display example code without running it | Run this code or not |
message |
Package and function messages | Keeps output clean and professional | Hide messages like “Attaching package…” |
warning |
Warning messages | Prevents confusing output for readers | Show or hide warnings |
include |
Whether code and results appear | Allows silent background computation | Run this but don’t show anything |
results |
How printed output is treated | Required for tables and formatted text | Control how output is displayed |
fig.width / fig.height |
Figure size (in inches) | Ensures readable, publication-ready figures | Control plot size |
fig.cap |
Figure caption text | Enables figure numbering and captions | Text shown under the figure |
cache |
Whether results are saved | Speeds up slow reports | Don’t re-run unless code changes |
error |
Behavior when errors occur | Useful for teaching and debugging | Keep knitting even if this fails |
lm(y ~ x, data = df)
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00