Data Visualization using R

Assignment — LA1

Anusha Y K & Asha N Bhat

Overview

What this assignment explores:

    1. Extract live data from Our World in Data via a direct CSV URL.
    1. Clean and filter the dataset. That is remove regional aggregates, handle missing values.
    1. Map countries to continents using the countrycode package.
    1. Generate a categorical bubble chart with ggplot2 and interpret the health-wealth relationship.

Step 1

Load required libraries

The three-package toolkit for this analysis is:

  • ggplot2 - The standard grammar-of-graphics engine for data visualization in R. Powers the bubble chart.

  • dplyr - Provides pipe-friendly verbs — filter(), mutate() — for cleaning and reshaping the dataset.

  • countrycode - Converts country names into standardized continent labels — the categorical grouping for our chart.

Step 2

Data extraction

Pulling live data from Our World in Data

Source - Our World in Data — Life Expectancy vs GDP per Capita dataset, Loaded directly via url() with no manual download.

Column standardization All column names lowercased and special characters replaced with underscores — prevents “object not found” errors during plotting.

Code
url_path <- "https://ourworldindata.org/grapher/life-expectancy-vs-gdp-per-capita.csv"

data_raw <- read.csv(url(url_path))

# Standardize column names
names(data_raw) <- tolower(names(data_raw))
names(data_raw) <- gsub("[^a-z0-9]", "_", names(data_raw))

Step 3

Data cleaning & categorical grouping

  • Filtering, renaming, and mapping countries to continents

  • Why filter? Raw data includes rows like “World” and “High Income” — regional aggregates, not countries. These are removed by checking that the code column is non-empty.

  • Continent mapping countrycode() translates country names into Africa, America, Asia, Europe, Oceania — the color grouping in the chart.

Step 4

The bubble chart - Visualizing three dimensions simultaneously

Aesthetic mappings

  • x axis — GDP per capita (log scale)

  • y axis — Life expectancy in years

  • size — Population of the country

  • color — Continent category

Why log scale on x? - GDP ranges from ~$800 to ~$100,000. Without scale_x_log10(), all points would cluster against the left axis and the chart would be unreadable.

BUBBLE CHART

ggplot2 code - Full plotting block with chunk options

Key insights

  • What the chart reveals about global health and wealth Regional breakdown?

  • AF - Africa — predominantly lower-left quadrant; significant room for growth in both wealth and life expectancy

  • AS - Asia — largest bubbles (India, China); a middle-ground transition with high population and rising longevity despite varied GDP

  • AM - Americas — wide spread; North America leads in GDP while Latin America shows strong health outcomes relative to income

  • EU - Europe — clustered in the top-right corner, representing high longevity and economic stability

Conclusion -Summary and takeaways

What we showed?

  • A raw CSV of thousands of rows was transformed into a single intuitive visual narrative — highlighting global inequality, population scale, and the diminishing returns of wealth on health.

  • Techniques used

  • live data extraction

  • dplyr pipes

  • countrycode mapping

  • log scale

  • alpha transparency

  • size encoding