Country Level Data Analysis using Parallel Coordinate Plot

Author

Sanjana and Suchitra

Introduction

This report analyzes country-level indicators using visualization techniques in R.

Objective

To compare total quantity, average unit price, and total sales across countries.

Step 1:Load Libraries

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.3
library(dplyr)
Warning: package 'dplyr' was built under R version 4.5.3

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(GGally)
Warning: package 'GGally' was built under R version 4.5.3
library(gapminder)
Warning: package 'gapminder' was built under R version 4.5.3

Step 2:Load Dataset

data <- gapminder
head(data)
# A tibble: 6 × 6
  country     continent  year lifeExp      pop gdpPercap
  <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
1 Afghanistan Asia       1952    28.8  8425333      779.
2 Afghanistan Asia       1957    30.3  9240934      821.
3 Afghanistan Asia       1962    32.0 10267083      853.
4 Afghanistan Asia       1967    34.0 11537966      836.
5 Afghanistan Asia       1972    36.1 13079460      740.
6 Afghanistan Asia       1977    38.4 14880372      786.

Step 3:Data Preprocessing

data_latest <- data %>%
  filter(year == max(year))

health_data <- data_latest %>%
  select(country, lifeExp, gdpPercap, pop)

head(health_data)
# A tibble: 6 × 4
  country     lifeExp gdpPercap      pop
  <fct>         <dbl>     <dbl>    <int>
1 Afghanistan    43.8      975. 31889923
2 Albania        76.4     5937.  3600523
3 Algeria        72.3     6223. 33333216
4 Angola         42.7     4797. 12420476
5 Argentina      75.3    12779. 40301927
6 Australia      81.2    34435. 20434176

Step 4: Bar Chart

top_life <- health_data %>%
  arrange(desc(lifeExp)) %>%
  head(10)

ggplot(top_life, aes(x = reorder(country, lifeExp), y = lifeExp)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(title = "Top 10 Countries by Life Expectancy",
       x = "Country",
       y = "Life Expectancy") +
  theme_minimal()

Step 5: Scatter Plot

ggplot(health_data, aes(x = gdpPercap, y = lifeExp)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "GDP vs Life Expectancy",
       x = "GDP per Capita",
       y = "Life Expectancy") +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Step 6: Box Plot

ggplot(data_latest, aes(x = continent, y = lifeExp)) +
  geom_boxplot() +
  labs(title = "Life Expectancy by Continent",
       x = "Continent",
       y = "Life Expectancy") +
  theme_minimal()

Step 7: Parallel Coordinates Plot

parallel_data <- health_data %>%
  select(country, lifeExp, gdpPercap, pop)

ggparcoord(
  data = parallel_data,
  columns = 2:4,
  groupColumn = 1,
  scale = "uniminmax",
  alphaLines = 0.5
) +
  labs(title = "Parallel Coordinates Plot",
       x = "Indicators",
       y = "Scaled Values") +
  theme_minimal()

Step 8: Interpretation

-Countries with higher GDP generally have higher life expectancy -Europe shows consistently high health outcomes -Africa has lower life expectancy and wider variation -Parallel plot reveals clusters of similar-performing countries

Visualization

The visualizations provide a clear comparison of health indicators across different countries. The bar chart shows variations in life expectancy, with developed countries generally performing better. The scatter plot indicates a positive relationship between GDP per capita and life expectancy, suggesting that wealthier nations tend to have better health outcomes. The box plot highlights differences in health expenditure, with some countries spending significantly more than others. The parallel coordinates plot further reveals patterns, clusters, and disparities among countries across multiple indicators.