sushmitha -1NT23IS227 sirisha B.A - 1NT23IS211

Plot a bubble chart showing life expectancy vs GDP per capita, sized by population and colored by continent.

Using Dataset: https://ourworldindata.org/grapher/life-expectancy-vs-gdp-per-capita these datasets.

# Load necessary libraries
library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(readr)
library(scales)

Attaching package: 'scales'
The following object is masked from 'package:readr':

    col_factor
library(countrycode)

# Set working directory (update this path if needed)
setwd("C:/Users/sushm/OneDrive/Documents")

# Load dataset
df <- read_csv("life-expectancy-vs-gdp-per-capita.csv")
Rows: 64877 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Entity, Code, 900793-annotations, World regions according to OWID
dbl (4): Year, Period life expectancy at birth - Sex: total - Age: 0, GDP pe...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Rename columns for clarity
colnames(df) <- c("Country", "Code", "Year", "Life_Expectancy", "GDP_per_Capita", 
                   "Annotations", "Population", "Continent")

# Assign continent names using countrycode package
df$Continent <- countrycode(df$Country, "country.name", "continent")
Warning: Some values were not matched unambiguously: Africa, Africa (UN), Akrotiri and Dhekelia, Americas, Americas (UN), Asia, Asia (excl. China and India), Asia (UN), Austria-Hungary, British Indian Ocean Territory, Cocos Islands, Czechoslovakia, Duchy of Modena and Reggio, Duchy of Parma and Piacenza, East Asia (MPD), East Germany, Eastern Europe (MPD), England and Wales, Europe, Europe (UN), European Union (27), Federal Republic of Central America, Grand Duchy of Baden, Grand Duchy of Tuscany, High-and-upper-middle-income countries, High-income countries, Kingdom of Bavaria, Kingdom of Sardinia, Kingdom of Saxony, Kingdom of the Two Sicilies, Kingdom of Wurttemberg, Kosovo, Land-locked Developing Countries (LLDC), Latin America (MPD), Latin America and the Caribbean, Latin America and the Caribbean (UN), Least developed countries, Less developed regions, Less developed regions, excluding least developed countries, Low-and-Lower-middle-income countries, Low-and-middle-income countries, Low-income countries, Lower-middle-income countries, Micronesia (country), Middle-income countries, Middle East and North Africa (MPD), More developed regions, No income group available, North America, Northern America, Northern America (UN), Northern Ireland, Oceania, Oceania (UN), Orange Free State, Scotland, Serbia and Montenegro, Small Island Developing States (SIDS), South America, South and South East Asia (MPD), South Georgia and the South Sandwich Islands, Sub Saharan Africa (MPD), Upper-middle-income countries, Western Europe (MPD), Western offshoots (MPD), World, Yemen Arab Republic, Yemen People's Republic, Yugoslavia
Warning: Some strings were matched more than once, and therefore set to <NA> in the result: Asia (excl. China and India),Asia,Asia
# Remove rows with missing continent values
df_clean <- df[!is.na(df$Continent), ]

# Get unique continents
continents <- unique(df_clean$Continent)

# Loop through each continent and generate a separate graph
for (continent in continents) {
  continent_data <- df_clean %>% filter(Continent == continent)
  
  # Plot bubble chart for each continent
  p <- ggplot(continent_data, aes(x = GDP_per_Capita, y = Life_Expectancy, 
                                  size = Population, color = Continent)) +
    geom_point(alpha = 0.5) +  # Reduce opacity to avoid clutter
    scale_x_log10(labels = scales::dollar_format()) +  # Log scale for GDP per capita
    scale_size(range = c(0.1,2), name = "Population") +  # Adjust bubble sizes
    labs(
      title = paste("Life Expectancy vs GDP per Capita -", continent),
      x = "GDP per Capita (log scale)",
      y = "Life Expectancy",
      size = "Population",
      color = "Continent"
    ) +
    theme_minimal() +  # Clean theme
    theme(legend.position = "right")  # Move legend to the right for better visibility
  
  # Print the plot
  print(p)
}
Warning: Removed 10032 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 11784 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 10906 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 3766 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 9400 rows containing missing values or values outside the scale range
(`geom_point()`).