In this analysis, we aim to study global trends using the Gapminder dataset. Specifically, we will: 1. Explore life expectancy trends across continents. 2. Investigate the relationship between GDP per capita and life expectancy. 3. Visualize population distribution geographically.
We will use the Gapminder dataset available in R. Using
dplyr, we will filter, select, and modify the dataset to
focus on relevant data for our analysis.
# Load Gapminder dataset
library(gapminder)
## Warning: package 'gapminder' was built under R version 4.4.2
data <- gapminder
# Filter data for the year 2007
filtered_data <- data %>%
filter(year == 2007) %>%
select(country, continent, year, lifeExp, gdpPercap, pop)
# Create a new column for GDP in billions
filtered_data <- filtered_data %>% mutate(gdp_billion = (gdpPercap * pop) / 1e9)
# Display the first few rows of the wrangled data
head(filtered_data)
## # A tibble: 6 × 7
## country continent year lifeExp gdpPercap pop gdp_billion
## <fct> <fct> <int> <dbl> <dbl> <int> <dbl>
## 1 Afghanistan Asia 2007 43.8 975. 31889923 31.1
## 2 Albania Europe 2007 76.4 5937. 3600523 21.4
## 3 Algeria Africa 2007 72.3 6223. 33333216 207.
## 4 Angola Africa 2007 42.7 4797. 12420476 59.6
## 5 Argentina Americas 2007 75.3 12779. 40301927 515.
## 6 Australia Oceania 2007 81.2 34435. 20434176 704.
We will generate a summary table showing the top 5 countries by life expectancy in 2007 for each continent.
# Group by continent and arrange by life expectancy
summary_table <- filtered_data %>%
group_by(continent) %>%
arrange(desc(lifeExp)) %>%
slice_head(n = 5) %>%
ungroup()
# Display the table
knitr::kable(summary_table, caption = "Top 5 Countries by Life Expectancy in 2007 (Grouped by Continent)")
| country | continent | year | lifeExp | gdpPercap | pop | gdp_billion |
|---|---|---|---|---|---|---|
| Reunion | Africa | 2007 | 76.442 | 7670.123 | 798094 | 6.121479 |
| Libya | Africa | 2007 | 73.952 | 12057.499 | 6036914 | 72.790086 |
| Tunisia | Africa | 2007 | 73.923 | 7092.923 | 10276158 | 72.887998 |
| Mauritius | Africa | 2007 | 72.801 | 10956.991 | 1250882 | 13.705903 |
| Algeria | Africa | 2007 | 72.301 | 6223.367 | 33333216 | 207.444852 |
| Canada | Americas | 2007 | 80.653 | 36319.235 | 33390141 | 1212.704378 |
| Costa Rica | Americas | 2007 | 78.782 | 9645.061 | 4133884 | 39.871565 |
| Puerto Rico | Americas | 2007 | 78.746 | 19328.709 | 3942491 | 76.203261 |
| Chile | Americas | 2007 | 78.553 | 13171.639 | 16284741 | 214.496727 |
| Cuba | Americas | 2007 | 78.273 | 8948.103 | 11416987 | 102.160375 |
| Japan | Asia | 2007 | 82.603 | 31656.068 | 127467972 | 4035.134797 |
| Hong Kong, China | Asia | 2007 | 82.208 | 39724.979 | 6980412 | 277.296718 |
| Israel | Asia | 2007 | 80.745 | 25523.277 | 6426679 | 164.029909 |
| Singapore | Asia | 2007 | 79.972 | 47143.180 | 4553009 | 214.643321 |
| Korea, Rep. | Asia | 2007 | 78.623 | 23348.140 | 49044790 | 1145.104610 |
| Iceland | Europe | 2007 | 81.757 | 36180.789 | 301931 | 10.924102 |
| Switzerland | Europe | 2007 | 81.701 | 37506.419 | 7554661 | 283.348281 |
| Spain | Europe | 2007 | 80.941 | 28821.064 | 40448191 | 1165.759889 |
| Sweden | Europe | 2007 | 80.884 | 33859.748 | 9031088 | 305.790367 |
| France | Europe | 2007 | 80.657 | 30470.017 | 61083916 | 1861.227941 |
| Australia | Oceania | 2007 | 81.235 | 34435.367 | 20434176 | 703.658359 |
| New Zealand | Oceania | 2007 | 80.204 | 25185.009 | 4115771 | 103.655730 |
We will use ggplot2 to visualize the relationship
between GDP per capita and life expectancy.
# Scatterplot of GDP per capita vs Life Expectancy
plot1 <- ggplot(filtered_data, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point(alpha = 0.7) +
scale_x_log10() +
labs(
title = "Life Expectancy vs GDP Per Capita (2007)",
x = "GDP Per Capita (Log Scale)",
y = "Life Expectancy",
color = "Continent",
size = "Population"
) +
theme_minimal()
plot1
Using leaflet, we will visualize the population
distribution for the year 2007.
# Load required libraries
library(dplyr)
library(leaflet)
# Load the gapminder dataset or your equivalent data
library(gapminder)
data("gapminder")
# Filter data for the year 2007
filtered_data <- gapminder %>%
filter(year == 2007) %>%
select(country = country, pop, continent)
# Load the coordinates data
coordinates_data <- read.csv("C:\\Users\\vaibhav gupta\\Downloads\\longitude.csv", stringsAsFactors = FALSE)
# Join the filtered data with coordinates data
map_data <- filtered_data %>%
inner_join(coordinates_data, by = c("country" = "name"))
# Normalize the population values for visualization
map_data <- map_data %>%
mutate(radius = sqrt(pop) / 1000) # Adjust divisor for better scaling
# Create a Leaflet map to visualize population distribution
leaflet(data = map_data) %>%
addTiles() %>%
addCircleMarkers(
lng = ~longitude, lat = ~latitude,
radius = ~radius, # Use the scaled radius
popup = ~paste0("<b>", country, "</b><br>",
"Population: ", formatC(pop, format = "d", big.mark = ","),
"<br>Continent: ", continent),
color = "blue", # Set circle color (can modify for continents)
stroke = FALSE, fillOpacity = 0.7
)
This analysis provides insights into global socio-economic patterns, highlighting disparities and opportunities for targeted interventions.