Chapter1 Introduction

1.1 Project introduction

The World Economic Data Visualization project involves using graphical and visual representations to illustrate and communicate various aspects of economic data on a global scale. The objective is to make complex economic information more accessible, understandable, and insightful for audience. Examples of visualizations might include maps showing GDP barplot by regions, line graphs illustrating economic growth over time, or box violin comparing various economic indicators for continents and so on.

1.2 Uniform plot environment: fonts, colors, spacing and layout

To create a cohesive and unified style across all visualizations. The theme is based on the “Lato” font, giving it a modern and stylish aesthetic. Axis titles have been removed for clarity and labels are rendered in subtle gray tones. Text size and margins have been carefully adjusted to ensure a balanced layout. Axis scales, grid lines and panel backgrounds are in a harmonious light gray color. The background of the plot is set in calm gray tones. Titles, subtitles, and captions display a consistent professional style but vary in size and margins. This meticulous thematic approach ensures that all plots share a unified visual identity, promoting a refined and cohesive presentation style throughout the data visualization. This standardized design language enhances the readability, coherence, and overall visual appeal of different plots.

theme_set(theme_minimal(base_family = "Lato"))
# This theme extends the 'theme_minimal' that comes with ggplot2.
# The "Lato" font is used as the base font. This is similar
# to the original font in Cedric's work, Avenir Next Condensed.

theme_update(
  # Remove title for both x and y axes
  axis.title = element_blank(),
  # Axes labels are grey
  axis.text = element_text(color = "grey40"),
  # The size of the axes labels are different for x and y.
  axis.text.x = element_text(size = 24, margin = margin(t = 2)),
  axis.text.y = element_text(size = 24, margin = margin(r = 2)),
  # Also, the ticks have a very light grey color
  axis.ticks = element_line(color = "grey91", linewidth = .5),
  # The length of the axis ticks is increased.
  axis.ticks.length.x = unit(0.1, "lines"),
  axis.ticks.length.y = unit(0.1, "lines"),
  # Remove the grid lines that come with ggplot2 plots by default
  # panel.grid = element_blank(),
  # Customize margin values (top, right, bottom, left)
  plot.margin = ggplot2::margin(8, 20, 8, 20),
  # Use a light grey color for the background of both the plot and the panel
  plot.background = element_rect(fill = "grey98", color = "grey98"),
  panel.background = element_rect(fill = "grey98", color = "grey98"),
  # Customize title appearence
  plot.title = element_text(
    color = "grey5", 
    size =34, 
    face = "bold",
    margin = margin(t = 15)
  ),
  # Customize subtitle appearence
  plot.subtitle = element_markdown(
    color = "grey30", 
    size = 22,
    lineheight = 0.4,
    margin = margin(t = 5, b = 12)
  ),
  # Title and caption are going to be aligned
  plot.title.position = "plot",  #whole plot top
  plot.caption.position = "plot",
  plot.caption = element_text(
    color = "grey30", 
    size = 18,
    lineheight = 1.2, 
    hjust = 0,
    margin = margin(t = 10) # Large margin on the top of the caption.
  )
)

Uniform use of colors

Implementing a consistent color scheme in R visualizations enhances the overall professional aesthetic. Employing two uniform colors, one light and the other deeper, adds visual depth and contrast.

pal=c("#003f5c",
      "#2f4b7c",
      "#665191",
      "#a05195",
      "#d45087",
      "#f95d6a",
      "#ff7c43",
      "#ffa600"
      )
light.pal=c("#E69F00", 
            "#56B4E9",
            "#009E73",
            "#F0E442",
            "#0072B2",
            "#999999")

Chapter2 Dataset Preprocessing

2.1 Dataset description

For this project, I will use two datasets.

2.1.1 Frist dataset: World economic indicates in 2006

This dataset presents a compilation of averaged world development indicator data. for a specific year 2016. The United Nations Statistics Division (UNSD) of the Department of Economic and Social Affairs (DESA) launched a new internet based data service for the global user community. It brings UN statistical databases within easy reach of users through a single entry point (http://data.un.org/). Users can search and download a variety of statistical resources of the UN system. This dataset contains key statistical indicators of the countries. It covers 35 indicates:

  • Country: The name of the country.

  • Region: The geographical region to which the country belongs.

  • Surface Area (km2): The total land area of the country in square kilometers.

  • Population in thousands (2016): The total population of the country in thousands for the year 2016.

  • Population Density (per km2, 2016): The number of people per square kilometer of land in 2016.

  • Sex Ratio (m per 100 f, 2016): The ratio of males to females in the population in 2016.

  • GDP (Gross Domestic Product): The total economic output of the country in current US dollars.

  • GDP Growth Rate: The annual percentage growth rate of the GDP, adjusted for inflation (constant 2005 prices).

  • GDP per Capita (current US$): The GDP divided by the population, giving the economic output per person.

  • Economy: Agriculture, Industry, Services (% of GVA): The percentage contribution of each sector to Gross Value Added (GVA) in the economy. Employment: Agriculture, Industry,

  • Services (% of employed): The percentage of the workforce employed in each sector.

  • Unemployment (% of labor force): The percentage of the labor force that is unemployed.

  • Labor Force Participation (female/male pop. %): The percentage of the female and male population that is part of the labor force.

  • Agricultural Production Index, Food Production Index: Indices reflecting the agricultural and food production levels. International Trade: Exports, Imports, Balance (million US$): Information on a country’s international trade activities and balance.

  • Balance of Payments, Current Account (million US$): The balance of payments, accounting for all economic transactions between a country and the rest of the world.

  • Population Growth Rate (average annual %): The average annual percentage change in the population.

  • Urban Population (% of total population): The percentage of the population living in urban areas.

  • Urban Population Growth Rate (average annual %): The average annual percentage change in the urban population.

  • Fertility Rate, Total (live births per woman): The average number of live births per woman. Life Expectancy at Birth (females/males, years): The average number of years a newborn is expected to live, separated by gender.

  • Population Age Distribution (0-14 / 60+ years, %): The percentage of the population in age brackets 0-14 and 60+ years.

  • International Migrant Stock, Refugees and others of concern to UNHCR: Information on international migration and refugee populations.

  • Infant Mortality Rate (per 1000 live births): The number of deaths of infants under one year old per 1000 live births.

  • Health: Total Expenditure (% of GDP), Physicians (per 1000 pop.): Health-related indicators, including total expenditure as a percentage of GDP and the number of physicians per 1000 population.

  • Education: Government Expenditure (% of GDP), Primary/Secondary/Tertiary Gross Enrolment Ratios: Education-related indicators, including government expenditure as a percentage of GDP and gross enrolment ratios in primary, secondary, and tertiary education.

  • Seats Held by Women in National Parliaments (%): The percentage of seats in the national parliament held by women.

  • Mobile-Cellular Subscriptions (per 100 inhabitants): The number of mobile-cellular subscriptions per 100 inhabitants.

  • Individuals Using the Internet (per 100 inhabitants): The percentage of the population using the internet.

  • Threatened Species (number): The number of species classified as threatened.

  • Forested Area (% of land area): The percentage of the land area covered by forests.

  • CO2 Emission Estimates (million tons/tons per capita): Information on carbon dioxide emissions, both total and per capita.

  • Energy Production, Primary (Petajoules), Energy Supply per Capita (Gigajoules): Indicators related to energy production and supply per capita.

  • Population Using Improved Drinking Water/Sanitation Facilities (urban/rural, %): The percentage of the population with access to improved drinking water and sanitation facilities, categorized by urban and rural areas.

  • Net Official Development Assistance Received (% of GNI): The percentage of Gross National Income received as official development assistance.

2.1.2 Second dataset: World economic time series from 2000-2020

Second dataset presents GDP data from 1960-2020 with more than 200 country. Including GDP, GDP Growth(Annually), GDP per capita, GDP per capita Growth(Annually).

2.2 Data cleaning and preprocessing

2.2.1 Data preprocessing for first dataset in specific year 2016

The economic data for the year 2016 serves as a pivotal point for a comprehensive analysis of the world’s top economies. Exploring dimensions beyond economic output, I scrutinize correlations with the unemployment rate, education levels, modernization ratio, and health expenditures. Each of these facets represents a critical dimension influencing and reflecting the economic health of nations. By scrutinizing these variables in tandem, a nuanced understanding emerges, providing insights into the multifaceted interplay of economic, social, and health dynamics.

# economicj data in specific year 2016
economic=read.csv("Data/country_profile_variables.csv")
dim(economic)
## [1] 229  50
economic=economic[,-50]
colnames(economic)= c("country","region","area","pop","pop.density","sex.ratio",
                     "GDP","growth.rate","GDP.capita","economy.agri","economy.indu",
                     "economy.serv","employ.agri","employ.indu","employ.serv",
                     "unemploy","labor.female.male","agri.index","food.index",
                     "export.million","import.million","balance.million","balance.payment",
                     "pop.growth","urban.ratio","urban.growth","fertility.ratio",
                     "life.female.male","pop.junior.senior","migrant.stock","refugee",
                     "infant.mortality","health.expend","health.physician","edu.expend",
                     "edu.primary","edu.secondary","edu.tertiary","seat.female","mobile",
                     "cellular","internet","threat.species","forest","CO2",
                     "energy.primary","energy.capita","pop.water","pop.sanitation")

#countries belongs to continents
continents=c("Asia","Europe","Africa","Oceania","North America","South America")
economic %>%
  mutate(continents= case_when(
    region %in% c("SouthernAsia", "WesternAsia", "South-easternAsia", "EasternAsia", "CentralAsia") ~ "Asia",
    region %in% c("SouthernEurope", "WesternEurope", "EasternEurope","NorthernEurope") ~ "Europe",
    region %in% c("NorthernAfrica", "MiddleAfrica", "WesternAfrica","SouthernAfrica", "EasternAfrica") ~ "Africa",
    region %in% c("Polynesia","Oceania", "Melanesia", "Micronesia") ~ "Oceania",
    region %in% c("Caribbean","NorthernAmerica", "CentralAmerica") ~ "NorthAmerica",
    region %in% c("SouthAmerica") ~ "South America"
  )) ->economic
head(economic,2)
##       country         region   area   pop pop.density sex.ratio   GDP
## 1 Afghanistan   SouthernAsia 652864 35530        54.4     106.3 20270
## 2     Albania SouthernEurope  28748  2930       106.9     101.9 11541
##   growth.rate GDP.capita economy.agri economy.indu economy.serv employ.agri
## 1        -2.4      623.2         23.3         23.3         53.3        61.6
## 2         2.6     3984.2         22.4         26.0         51.7        41.4
##   employ.indu employ.serv unemploy labor.female.male agri.index food.index
## 1        10.0        28.5      8.6         19.3/83.6        125        125
## 2        18.3        40.3     15.8         40.2/61.0        134        134
##   export.million import.million balance.million balance.payment pop.growth
## 1           1458           3568           -2110           -5121        3.2
## 2           1962           4669           -2707           -1222       -0.1
##   urban.ratio urban.growth fertility.ratio life.female.male pop.junior.senior
## 1        26.7          4.0             5.3        63.5/61.0          43.2/4.1
## 2        57.4          2.2             1.7        79.9/75.6         17.4/19.0
##   migrant.stock refugee infant.mortality health.expend health.physician
## 1     382.4/1.2  1513.1             68.6           8.2              0.3
## 2      57.6/2.0     8.8             14.6           5.9              1.3
##   edu.expend edu.primary edu.secondary edu.tertiary seat.female mobile cellular
## 1        3.3  91.1/131.6     39.7/70.7     3.7/13.3        27.7   61.6      8.3
## 2        3.5 111.7/115.5     92.5/98.8    68.1/48.7        22.9  106.4     63.3
##   internet threat.species  forest CO2 energy.primary energy.capita pop.water
## 1       42            2.1 9.8/0.3  63              5     78.2/47.0 45.1/27.0
## 2      130           28.2 5.7/2.0  84             36     94.9/95.2 95.5/90.2
##   pop.sanitation continents
## 1          21.43       Asia
## 2           2.96     Europe
caption="Visualization by Ting Wei  •  Data by The UNData  •  World Economic Visualization project: Employing visuals to communicate diverse global economic data effectively through graphs and charts. "

2.2.2 Date preprocessing for second dataset from 2000-2020

My analysis focuses on the period 2000-2020 and delves into subtle aspects of economic dynamics. The focus is on looking at the trajectory of GDP, growth rates and GDP per capita. This selective approach enhances the relevance and specificity of examining national economic developments, allowing for a deeper exploration of the complex patterns and fluctuations in the global economic landscape over these two decades. ​

9 highlighted countries:

Examined economic development (2000-2020) in the top 8 global GDP countries, augmented by Poland. Delving into the trajectories of the United States, China, Japan, Germany, the United Kingdom, India, France, Italy, and Poland, the analysis encompasses diverse economic landscapes, revealing trends, challenges, and opportunities that shaped the collective growth of these influential nations over the two decades.

# GDP from 2000-2020
df <- read.csv("data/gdp.csv")
df %>% select(Country.Name,X2020) %>% filter(!is.na(X2020)) %>% arrange(desc(X2020))->country.gdp2020
country.gdp2020[[1]][c(1,14,18,22,27,32,33,34)]->top.countries

#top 8 highest GDP economic nations, + add Poland 
country.gdp2020[[1]][c(14,18,22,27,32,33,34,37,64)]->highlighted.countries
highlighted.countries
## [1] "United States"  "China"          "Japan"          "Germany"       
## [5] "United Kingdom" "India"          "France"         "Italy"         
## [9] "Poland"
df %>% filter(Country.Name %in% top.countries) %>% 
  select(c(Country.Name,paste0("X",seq(2000,2020,1)))) -> df.top.countries
others<- df.top.countries[8,-1] - colSums(df.top.countries[-8,-1])
others$Country.Name="others"
df.top.countries <- rbind(df.top.countries, others)

# convert to long format
df.long <- pivot_longer(df.top.countries, 
                        cols = starts_with("X"), 
                        names_to = "Year", 
                        values_to = "GDP")

df.long$Year= rep(seq(2000,2020,1),9)
df.long$GDP.billion= round(df.long$GDP/1e+9)
head(df.long, 30)
## # A tibble: 30 × 4
##    Country.Name  Year     GDP GDP.billion
##    <chr>        <dbl>   <dbl>       <dbl>
##  1 China         2000 1.21e12        1211
##  2 China         2001 1.34e12        1339
##  3 China         2002 1.47e12        1471
##  4 China         2003 1.66e12        1660
##  5 China         2004 1.96e12        1955
##  6 China         2005 2.29e12        2286
##  7 China         2006 2.75e12        2752
##  8 China         2007 3.55e12        3550
##  9 China         2008 4.59e12        4594
## 10 China         2009 5.10e12        5102
## # ℹ 20 more rows
# GDP per capita from 2000-2020
df02 <- read.csv("data/gdp_per_capita.csv")

# convert to long format
df02 %>%  
  select(c(Country.Name,paste0("X",seq(2000,2020,1)))) %>%
  drop_na() %>% 
  filter(Country.Name %in% highlighted.countries) %>%
  pivot_longer(cols = starts_with("X"), 
               names_to = "Year",
               values_to = "GDP.capita") -> df02.long

df02.long$Year= rep(seq(2000,2020,1),9)
df02.long$Year <- make_date(year = df02.long$Year)
df02.long$Country.Name =factor(df02.long$Country.Name,
                  levels = c(highlighted.countries),
                  labels =  c(highlighted.countries))
head(df02.long,30)
## # A tibble: 30 × 3
##    Country.Name Year       GDP.capita
##    <fct>        <date>          <dbl>
##  1 China        2000-01-01       959.
##  2 China        2001-01-01      1053.
##  3 China        2002-01-01      1149.
##  4 China        2003-01-01      1289.
##  5 China        2004-01-01      1509.
##  6 China        2005-01-01      1753.
##  7 China        2006-01-01      2099.
##  8 China        2007-01-01      2694.
##  9 China        2008-01-01      3468.
## 10 China        2009-01-01      3832.
## # ℹ 20 more rows
# GDP growth rate from 2000-2020
df03 <- read.csv("data/gdp_growth.csv")

# convert to long format
df03 %>%  
  select(c(Country.Name,paste0("X",seq(2000,2020,1)))) %>%
  drop_na() %>% 
  filter(Country.Name %in% highlighted.countries) %>%
  pivot_longer(cols = starts_with("X"), 
               names_to = "Year", 
               values_to = "growth.rate") -> df03.long

df03.long$Year= rep(seq(2000,2020,1),9)
df03.long$Year <- make_date(year = df03.long$Year)
df03.long$Country.Name =factor(df03.long$Country.Name,
                  levels = c(highlighted.countries),
                  labels =  c(highlighted.countries))
head(df03.long,30)
## # A tibble: 30 × 3
##    Country.Name Year       growth.rate
##    <fct>        <date>           <dbl>
##  1 China        2000-01-01        8.49
##  2 China        2001-01-01        8.34
##  3 China        2002-01-01        9.13
##  4 China        2003-01-01       10.0 
##  5 China        2004-01-01       10.1 
##  6 China        2005-01-01       11.4 
##  7 China        2006-01-01       12.7 
##  8 China        2007-01-01       14.2 
##  9 China        2008-01-01        9.65
## 10 China        2009-01-01        9.40
## # ℹ 20 more rows

Chapter3 Visualization

3.1 Barplot: Major global economic indicators

Using barplots are effective visual approachs to compare the Gross Domestic Product (GDP) and GDP per capita, and growth rate of different countries. Each country can be represented by a distinct bar, allowing for a clear visual comparison of economic indicators. The length of each bar reflects the relative magnitude of the GDP, while colors or grouped bars can distinguish between nations.

3.1.1 TOP 15 GDP Countries in 2016

GDP.top15.country=economic %>%
  dplyr::filter(GDP!=-99) %>%
  group_by(country) %>%
  summarise(GDP.trillion=sum(GDP)/1000000) %>%
  slice_max(GDP.trillion,n=15) %>%
  mutate(country.order=fct_reorder(country,GDP.trillion))

ggplot(GDP.top15.country, aes(y=country.order,x=GDP.trillion,fill=country.order))+
  geom_col()+
  theme(legend.position = "none")+
  scale_x_continuous(breaks = seq(0,17.5,by=2.5),
                     labels = paste0("$ ",seq(0,17.5,by=2.5),"T")) +
  geom_text(aes(label=paste0("$",round(GDP.trillion,2),"T")),
            hjust=1.1,
            size=8,
            color="white",
            fontface="bold"
            ) +
  scale_fill_manual(values=c(carto_pal(n=12,"Antique"),pal[1:3]))+
  labs(title = "Barplot comparing 2016 GDP (Trillion USD) across countries. Top: USA, China.",
       subtitle="
This <i>barplot</i> illustrates the <b>top 15</b> countries with the highest GDP (Trillion USD) in 2016, with the United States leading the list, followed by China. The graph vividly depicts the<br> economic disparities among these nations in 2016, emphasizing their significant roles in the global economy.",
       x="GDP ( Trillion USD )",
       caption = caption) +
   theme(axis.title.x = element_text(size=24,margin = margin(t=10),
                                    color="grey5")) -> GDP.top15.pic
GDP.top15.pic
#ggsave(here("output/pic1_GDP.jpg"), plot = GDP.top15.pic,width = 8, height = 5, units="in",device ="jpg")

3.1.2 TOP 15 GDP per Capita Countries in 2016

economic %>%
  filter(GDP.capita!=-99) %>%
  select(country,GDP.capita) %>%
  slice_max(GDP.capita,n=15) %>%
  mutate(country.order=fct_reorder(country,GDP.capita))  ->GDP.capita.top15

economic %>%
  filter(GDP.capita!=-99) %>%
  summarise(mean=round(mean(GDP.capita))) %>% as.numeric() -> GDP.capita.mean  

ggplot(GDP.capita.top15, aes(y=country.order,x=GDP.capita,fill=country.order))+
  geom_col()+
  theme(legend.position = "none")+
  scale_x_continuous(breaks = seq(0,200000,by=25000),
                     labels = c("0",paste0("$ ",format(seq(25000,200000,by=25000),big.mark = ",")))) +
  geom_text(aes(label=paste0("$ ",format(round(GDP.capita,0),big.mark=))),
            hjust=1.1,
            size=8,
            color="white",
            fontface="bold"
  )  +
  geom_vline(xintercept = GDP.capita.mean  ,
             color="white",
             linewidth=.8,
             linetype="dotted") +
  scale_fill_manual(values=c(carto_pal(n=12,"Antique"),pal[1:3]))+
  geom_vline(xintercept = GDP.capita.mean  ,
             color="white",
             linewidth=.8,
             linetype="dotted") +
  annotate("text",
           label=paste0("$ ",format(GDP.capita.mean,big.mark = ",")),
           x=GDP.capita.mean,
           y=16,
           size=8,
           color="grey20",
           hjust=0.5
           ) +
  coord_cartesian(clip="off") +
  labs(title = "Compare 2016 GDP per capita (USD) rankings: Liechtenstein, Monaco lead among various countries",
       subtitle="This <i>barplot</i> illustrates the <b>top 15</b> countries with the highest GDP per capita (USD) in 2016. Liechtenstein takes the lead, followed by Monaco and Luxembourg. It highlights<br> variations in economic performance compared to the overall GDP rankings.",
       x="GDP per Capita ( USD )",
       caption = caption) +
   theme(axis.title.x = element_text(size=24,margin = margin(t=10),
                                    color="grey5")) -> pic2.GDP.capita

ggsave(here("output/pic2_GDP.jpg"), plot = pic2.GDP.capita,width = 8, height = 5, units="in",device ="jpg")

3.1.3 TOP 15 GDP Growth Rate Countries in 2016

economic$growth.rate=as.numeric(economic$growth.rate)
economic %>%
  filter(growth.rate!=-99) %>%
  select(country,growth.rate) %>%
  slice_max(growth.rate,n=15) %>%
  mutate(country.order=fct_reorder(country,growth.rate))  ->growth.rate.top15

economic %>%
  filter(GDP.capita!=-99) %>%
  summarise(mean=round(mean(GDP.capita))) %>% as.numeric() -> GDP.capita.mean 

economic %>% 
  filter(growth.rate!=-99) %>%
  summarise(mean=round(mean(growth.rate),1)) %>%
  as.numeric() -> growth.rate.mean  

ggplot(growth.rate.top15, 
       aes(y=country.order,x=growth.rate,fill=country.order))+
  geom_col()+
  theme(legend.position = "none")+
  scale_x_continuous(breaks = seq(0,27,by=3),
                     labels = paste0(seq(0,27,by=3),"%")) +
  geom_text(aes(label=paste0(round(growth.rate,1),"% ")),
            hjust=1.1,
            size=8,
            color="white",
            fontface="bold"
  )  +
  scale_fill_manual(values=c(carto_pal(n=12,"Antique"),pal[1:3]))+
  geom_vline(xintercept = growth.rate.mean  ,
             color="white",
             linewidth=.8,
             linetype="dotted") +
  annotate("text",
           label=paste0(round(growth.rate.mean,2),"%"),
           x=growth.rate.mean,
           y=16,
           size=8,
           color="grey40",
           hjust=0.5
  ) +
  coord_cartesian(clip="off") +
  labs(title = "Compared GDP Growth Rate by countries in 2016",
       subtitle="<b>which countries of the world have the highest growth rate?</b> This <i>barplot</i> highlights the <b>top 15</b> countries with the highest growth rates in 2016. Topping the list are  Ireland, <br>Nauru, and Ethiopia, showcasing their rapid development.",
       x="GDP growth rate",
       caption = caption)  +
   theme(axis.title.x = element_text(size=24,margin = margin(t=10),
                                    color="grey5"))->pic3.GDP.growth

ggsave(here("output/pic3_GDPGrowthRate.jpg"), plot =pic3.GDP.growth,width = 8, height = 5, units="in",device ="jpg")

3.2 Stacked Area chart and Lineplot: Economic dynamic

3.2.1 Stacked area chart: Global GDP (2000-2020)

df.long %>% filter(Country.Name=="World") %>% 
  filter(Year %in% c(2000,2005,2010,2015,2020)) ->whole.gdp

df.long %>% filter(!Country.Name=="World")->df.long

df.long$Country.Name =factor(df.long$Country.Name,
                  levels = c(top.countries[2:8],"others"),
                  labels =  c(top.countries[2:8],"others"))

df.long %>% filter(Year==2020) %>% arrange(desc(GDP)) %>% 
  select(Country.Name,GDP.billion)->max.df
max.df[c(2:8,1),]->max.df 

ggplot(data=df.long,
       aes(x=Year,
           y=GDP.billion,
           fill=Country.Name 
           ))  +
  geom_stream(type="ridge",bw=1) +
  scale_fill_manual(values = pal) +
  theme(legend.position = "none",
        panel.grid = element_blank(),
        axis.text.y = element_blank()
        ) +
  scale_x_continuous(breaks = c(2000,2005,2010,2015,2020),
                     labels =c(2000,2005,2010,2015,2020),
                     expand = c(0,0),
                     limits = c(1998,2028))+
  scale_y_continuous(expand = c(0,0),
                     limits = c(0,110000)) +
#each country's wealth in 2020
  annotate("text",
           x=2020.5,
           y=87000,
           label=paste0(max.df$Country.Name[1]," $",
                        format(max.df$ GDP.billion[1],big.mark=",")," Billion"),
           hjust=0,
           color=pal[1],
           size=8,
           fontface='bold'
           ) +
  annotate("text",
           x=2020.5,
           y=68750,
           label=paste0(max.df$Country.Name [2]," $", 
                        format(max.df$GDP.billion[2],big.mark=",")," Billion"),
           hjust=0,
           color=pal[2],
           size=8,
           fontface='bold'
  ) +
  annotate("text",
           x=2020.5,
           y=59550,
           label=paste0(max.df$Country.Name [3]," $", 
                        format(max.df$GDP.billion[3],big.mark=",")," Billion"),
           hjust=0,
           color=pal[3],
           size=8,
           fontface='bold'
  ) +
  annotate("text",
           x=2020.5,
           y=54000,
           label=paste0(max.df$Country.Name [4]," $",
                        format(max.df$GDP.billion[4],big.mark=",")," Billion"),
           hjust=0,
           color=pal[4],
           size=8,
           fontface='bold'
  )+
  annotate("text",
           x=2020.5,
           y=49000,
           label=paste0(max.df$Country.Name [5]," $", 
                        format(max.df$GDP.billion[5],big.mark=",")," Billion"),
           hjust=0,
           color=pal[5],
           size=8,
           fontface='bold'
  )+
  annotate("text",
           x=2020.5,
           y=45000,
           label=paste0(max.df$Country.Name [6]," $", 
                        format(max.df$GDP.billion[6],big.mark=",")," Billion"),
           hjust=0,
           color=pal[6],
           size=8,
           fontface='bold'
  )+
  annotate("text",
           x=2020.5,
           y=41000,
           label=paste0(max.df$Country.Name [7]," $", 
                        format(max.df$GDP.billion[7],big.mark=",")," Billion"),
           hjust=0,
           color=pal[7],
           size=8,
           fontface='bold'
  )+
  annotate("text",
           x=2020.5,
           y=20000,
           label=paste0(max.df$Country.Name [8]," $", 
                        format(max.df$GDP.billion[8],big.mark=",")," Billion"),
           hjust=0,
           color=pal[8],
           size=8,
           fontface='bold'
  ) +
# vertical segments 
  geom_segment(data =whole.gdp ,
               aes(x=Year,xend=Year,y=0,yend=GDP.billion+17000),
               inherit.aes = F) +
  geom_point(data=whole.gdp,
             aes(x=Year,y=GDP.billion+17000),
             inherit.aes = F) +
  geom_hline(yintercept = 0) +
  annotate("text",
           x=2000.5,
           y=whole.gdp$GDP.billion[1] +22000,
           label=paste0("$",format( whole.gdp$GDP.billion[1],big.mark = ",")," Billion"),
           size=10,
           fontface="bold") +
  annotate("text",
           x=2005.5,
           y=whole.gdp$GDP.billion[2] +22000,
           label=paste0("$",format( whole.gdp$GDP.billion[2],big.mark = ",")," Billion"),
           size=10,
           fontface="bold") +
  annotate("text",
           x=2010.5,
           y=whole.gdp$GDP.billion[3] +22000,
           label=paste0("$",format( whole.gdp$GDP.billion[3],big.mark = ",")," Billion"),
           size=10,
           fontface="bold") +
  annotate("text",
           x=2015.5,
           y=whole.gdp$GDP.billion[4] +22000,
           label=paste0("$",format( whole.gdp$GDP.billion[4],big.mark = ",")," Billion"),
           size=10,
           fontface="bold")  +
  annotate("text",
           x=2020.5,
           y=whole.gdp$GDP.billion[5] +22000,
           label=paste0("$",format( whole.gdp$GDP.billion[5],big.mark = ",")," Billion"),
           size=10,
           fontface="bold")+
  labs(title = "The aggregated global GDP development from 2000 to 2020",
       subtitle = "
",
       caption =caption) -> pic10.stackedArea

ggsave(here("output/pic10.jpg"), plot =pic10.stackedArea ,width = 8, height = 5, units="in",device ="jpg") 

From stacked area chart, we can see that the United States has become the leading force and has shown unwavering development momentum over the past 20 years. The chart not only captures China’s rapid and significant rise, but also depicts the gradual narrowing of the gap between the United States and China. Japan, Germany and the United Kingdom rank 3, 4, and 5 respectively.

3.2.1 Line Plot: Global GDP per Capita (2000-2020)

df02.long %>% slice_max(Year)->max.df

ggplot(data=df02.long,
       aes(x=Year,
           y=GDP.capita,
           color=factor(Country.Name)))+
  theme(legend.position = "none") +
  geom_line(size=.9)+
  geom_point(data=max.df,
             aes(x=Year,y=GDP.capita,color=Country.Name)) +
  geom_text(data = max.df,
            aes(x=Year,y=GDP.capita,color=Country.Name,
                label=paste0("$",format(round(GDP.capita),big.mark=","))),
            size=8,
            hjust=1,
            vjust=-.5,
            fontface="bold") +
  gghighlight(use_direct_label = F,
              unhighlighted_params = list(color=alpha("grey80",.4),
                                          size=.4)
              ) +
  geom_hline(yintercept = 30000,
             linewidth=.6,
             color="grey40") +
  facet_wrap(~factor(Country.Name)) +
  coord_cartesian(clip = "off") +
  theme(strip.text = element_text(face="bold",
                                  size=22,
                                  color = "grey40"),
        axis.text  = element_text(size=4)
        ) +
  scale_x_date(date_labels = "%y") +
  scale_y_continuous(breaks = seq(0,70000,by=30000),
                     labels = paste0("$",format(seq(0,70000,by=30000),big.mark = ","))) +
  scale_color_manual(values=c(pal[c(1,3,4,5,6,2,7,8)],"#1B9E77")) +
  labs(title = "The global GDP per capita development from 2000 to 2020",
       subtitle = "These <i>line graphs</i> show global <b>GDP per capita (USD)</b> from 2000 to 2020, focusing on the 9 highlighted countries. The United States has higher GDP per capita and is keeping <br>growing up, while China and India have higher GDP but relatively lower GDP per capita. The per capita GDP of other countries has entered a relatively fluctuating stable state. ",
       caption =caption) -> pic11.gdp.capita

ggsave(here("output/pic11.jpg"), plot = pic11.gdp.capita ,width = 8, height = 5, units="in",device ="jpg") 

This plot offers a comprehensive portrayal of global GDP per capita dynamics spanning from 2000 to 2020, with a specific focus on nine prominent nations. The United States, characterized by consistently higher GDP per capita, exhibits sustained growth. In contrast, China and India, while boasting higher overall GDP, grapple with relatively lower GDP per capita. The per capita GDP trajectories of other nations manifest a state of relative stability marked by periodic fluctuations. Notably, Poland, positioned among European counterparts, showcases a comparatively lower GDP per capita.

3.2.1 Line Plot: Global GDP Growth Rate (2000-2020)

df03.long %>% slice_max(Year)->max.df
df03.long %>% filter(Year== make_date(2009)) -> finance.crisis

ggplot(data=df03.long,
       aes(x=Year,
           y=growth.rate,
           color=factor(Country.Name)))+
  theme(legend.position = "none") +
  geom_line(size=.9)+
  geom_point(data=max.df,
             aes(x=Year,y=growth.rate,color=Country.Name)) +
  geom_text(data = max.df,
            aes(x=Year,y=growth.rate,color=Country.Name,
                label=paste0(round(growth.rate,2),"%")),
            size=8,
            hjust=1.2,
            vjust=.2,
            fontface="bold") +
    geom_point(data=finance.crisis,
             aes(x=Year,y=growth.rate,color=Country.Name)) +
  geom_text(data = finance.crisis,
            aes(x=Year,y=growth.rate,color=Country.Name,
                label=paste0(round(growth.rate,2),"%")),
            size=8,
            hjust=-.1,
            vjust=2.5,
            fontface="bold")+
  gghighlight(use_direct_label = F,
              unhighlighted_params = list(color=alpha("grey80",.4),
                                          size=.4)
              ) +
  geom_hline(yintercept = 0,
             size=.4,
             color="grey40") +
  geom_vline(xintercept =make_date(2009) ,
             size=.4,
             color="red",
             linetype="dashed") +
  facet_wrap(~factor(Country.Name)) +
  coord_cartesian(clip = "off") +
  theme(strip.text = element_text(face="bold",
                                  size=22,
                                  color = "grey40"),
        axis.text  = element_text(size=4)
        ) +
  scale_x_date(date_labels = "%y") +
  scale_y_continuous(breaks = seq(-10,15,by=5),
                     labels = paste0(seq(-10,15,by=5),"%")) +
  scale_color_manual(values=c(pal[c(1,3,4,5,6,2,7,8)],"#1B9E77")) +
  labs(title = "The global GDP growth rate from 2000 to 2020",
       subtitle = "These <i>line graphs</i> depict the global GDP growth rate from 2000 to 2020, highlighting 9 specific countries. China, India, and Poland exhibit higher growth rates, with China<br> leading significantly. On the other hand, other nations show comparatively lower growth rates.<br>The impact of the 2008 financial crisis is evident, causing a pronounced decline in the growth rate in 2009, particularly affecting developed countries. <br>Additionally, the global COVID-19 pandemic in 2020 triggered another sharp decline in the growth rates of all countries.",
       caption =caption) -> pic12.gdp.growth

ggsave(here("output/pic12.jpg"), pic12.gdp.growth ,width = 8, height = 5, units="in",device ="jpg") 

These line charts provide a detailed look at global GDP growth from 2000 to 2020, focusing on specific countries. China, India, and Poland became economic powerhouses with significantly higher growth rates, especially China, which expanded at an alarming rate. In contrast, other countries showed relatively modest growth rates during this period. Notably, the economic landscape changed dramatically during the 2008 financial crisis. The global economic downturn has had a greater impact on developed countries. In 2009, the economic growth of developed countries shrank significantly, and some even fell into negative growth. Additionally, the global COVID-19 pandemic in 2020 triggered another sharp decline in the growth rates of all countries. These events underscore the vulnerability of economies to external shocks, whether financial crises or pandemics, emphasizing the need for resilient economic policies and international cooperation to navigate such challenges and promote sustained global economic stability.

3.3 Box Violin: Continents comparison

economic %>%
  filter(GDP.capita!=-99) %>%
  select(country,continents,GDP.capita) %>% tibble() -> GDP.capita.continent

set.seed(123)
ggstatsplot::ggbetweenstats(
  data = GDP.capita.continent,
  x = continents, # grouping/independent variable
  y =  GDP.capita, # dependent variables
  xlab = "Continents", # label for the x-axis
  ylab = "GDP.capita (USD)", # label for the y-axis
  plot.type = "boxviolin", # type of plot
  type = "parametric", # type of statistical test
  effsize.type = "biased", # type of effect size
  nboot = 10, # number of bootstrap samples used
  bf.message = TRUE, # display bayes factor in favor of null hypothesis
  outlier.tagging = TRUE, # whether outliers should be flagged
  outlier.coef = 1.5, # coefficient for Tukey's rule
  outlier.label = id, # label to attach to outlier values
  outlier.label.color = "red", # outlier point label color
  mean.plotting = TRUE, # whether the mean is to be displayed
  mean.color = "darkblue", # color for mean
  messages = FALSE, # turn off messages
  package = "yarrr", # package from which color palette is to be taken
  palette = "info2",
  title = "Comparing GDP per Capita( USD ) among continents in 2016",
)  +
  scale_y_continuous(
    limits = c(0, 101000),
    breaks = seq(from = 0, to =100000, by = 20000),
    labels = paste0("$",format(seq(from = 0, to =100000, by = 20000),
                    big.mark=",",
                    scientific=F))) +
  theme(legend.position = "none",
        plot.title = element_text(size = 34),     
        plot.subtitle = element_text(size = 22),    
        axis.title = element_text(size=24),
        axis.text = element_text(size=22)) -> pic4.boxviolin

ggsave(here("output/pic4.jpg"), plot =pic4.boxviolin,width = 8, height = 5, units="in",device ="jpg")

Observing the boxplots comparing the per capita GDP of continents in 2016, we can draw the following conclusions:

European countries exhibit relatively high per capita GDP, reflected in higher median, mean, and interquartile range, indicating significant economic strength. North America follows closely, while Africa lags behind. Asia, Oceania, and South America fall in intermediate positions. Additionally, Africa and Asia have more countries, but their boxplots are shorter, indicating a more concentrated economic level, whereas Europe has a taller boxplot, suggesting a more dispersed distribution.

A larger F-value (17.76) indicates significant differences among continents in per capita GDP. The p-value is less than 0.01, signifying that we reject the null hypothesis of equal means for all groups. In summary, there is a significant difference in GDP per capita among continents.

3.4 Scatter Plot: Correlations analysis

3.4.1 Unemployment Rate vs GDP per Capita

economic$unemploy=as.numeric(economic$unemploy)
economic %>%
  select(country,continents,unemploy,GDP.capita) %>%
  filter(unemploy!=-99 & GDP.capita!=-99) -> unemploy.GDP.capita

unemploy.GDP.capita %>% 
  mutate(highligted=ifelse(
    country %in% highlighted.countries, country,""
  )) ->unemploy.GDP.capita

ggplot(unemploy.GDP.capita,
       aes(x=GDP.capita,
           y=unemploy)) +
  geom_point(aes(color=continents),
             size=4,
             alpha=.6) +
  scale_x_continuous(breaks = seq(0,100000,by=20000),
                     labels =paste0("$",format(seq(0,100000,by=20000),big.mark = ",",scientific = F)),
                     limits = c(0,100000)) +
  scale_y_continuous(breaks = seq(0,30,by=5),
                     labels = paste0(seq(0,30,by=5),"%"))+
  geom_text_repel(aes(label=highligted),
                  color="grey40",
                  max.overlaps = 300,
                  size=8,
                  box.padding = .8,
                  fontface="bold") +
  scale_color_manual(name=NULL, 
                     values =pal[c(1, 2, 4, 5, 6, 8)])+
  theme(legend.position = "top",
        legend.justification = "right",
        legend.text = element_text(size=20),
        axis.title.x = element_text(size=24,margin = margin(t=10),
                                    color="grey5"),
        axis.title.y = element_text(size = 24,margin = margin(r=10),
                                    color = "grey5")) +
  guides(color=guide_legend(nrow = 1,
                            override.aes = list(size=4)))+
  labs(x="GDP per Capita (USD) ,2016",
       y="Unemployment Rate",
       title = "Unemplotyment rate VS GDP per capita",
       subtitle = "This <i>scatterplot</i> visualizes <b>how does GDP per capita affect the Unemployment rate</b>. When GDP per capita is more than $40,000, Unemployment rate is less than 10%, .<br> However regions with lower GDP per capita, may still also have a low unemployment rate.",
       caption=caption )+
   geom_segment(aes(x = 10000, y = 30, xend = 48000, yend = 10),
                linetype = "dashed", color ="grey60") +
  geom_segment(aes(x = 50000, y = 10, xend = 100000, yend = 10), 
               linetype = "dashed", color ="grey60")-> pic5.scatterplot 

ggsave(here("output/pic5.jpg"), plot =pic5.scatterplot ,width = 8, height = 5, units="in",device ="jpg")

This insightful scatterplot provides a compelling visual representation of the intricate relationship between GDP per capita and the Unemployment rate. Notably, when GDP per capita surpasses $40,000, the Unemployment rate consistently remains below 10%, indicating a positive correlation between economic prosperity and employment stability. It emphasizes that higher GDP per capita signifies advanced economic development, fostering increased economic activities, business growth, and job opportunities. As economies expand, there is a corresponding surge in demand for labor, ultimately contributing to lower unemployment rates. However, the scatterplot also reveals nuances, suggesting that certain regions with lower GDP per capita may still exhibit a low unemployment rate, underscoring the multifaceted nature of factors influencing employment dynamics. This visualization offers valuable insights into the intricate interplay between economic affluence and labor market dynamics, enabling a nuanced understanding of these complex relationships.

3.4.2 Education Expenditure vs GDP per Capita

economic$edu.expend=as.numeric(economic$edu.expend)

economic %>%
  select(country,continents,edu.expend,edu.primary,edu.secondary,edu.tertiary,GDP.capita) %>%
  filter(edu.expend!=-99 & edu.primary!=-99 
         & edu.secondary!=-99& edu.tertiary!=-99) ->edu.GDP.capita #135 country left

edu.GDP.capita %>% 
  mutate(highligted=ifelse(
    country %in% highlighted.countries, country,""
  )) ->edu.GDP.capita


ggplot(edu.GDP.capita,aes(x=GDP.capita,
                          y=edu.expend,
                          color=continents)) +
  geom_smooth(method = "loess",
              formula = y~x,
              span=1,
              se=F,
              color="grey40",
              linewidth=.8)+
  geom_point(alpha=.6,
             size=4) +
  scale_x_continuous(breaks = seq(0,1000000,by=20000),
                     labels =paste0("$",format(seq(0,1000000,by=20000),big.mark = ",",scientific = F)),
                     limits = c(0,100000)) +
  scale_y_continuous(breaks = seq(1,9,by=2),
                     labels = paste0(seq(1,9,by=2),"%"),
                     limits =c(1,9))+
  scale_color_manual(name=NULL, 
                     values =light.pal)+
  geom_text_repel(aes(label=highligted),
                  color="grey40",
                  max.overlaps = 300,
                  size=8,
                  box.padding = .8,
                  fontface="bold") +
  theme(legend.position = "top",
        legend.justification = "right",
        legend.text = element_text(size=20),
        legend.title = element_text(size=20),
        axis.title.x = element_text(size=24,margin = margin(t=10),
                                    color="grey5"),
        axis.title.y = element_text(size = 24,margin = margin(r=10),
                                    color = "grey5")) +
  guides(color=guide_legend(nrow = 1,
                            override.aes = list(size=5)))+
  labs(x="GDP per Capita (USD) ,2016",
       y="Education  Expenditure  Ratio",
       title = "Enducation Expenditure Ratio VS GDP per capita",
       subtitle = "This <i>scatterplot</i> explores the relationship between<b> education expenditure as a percentage of GDP and GDP per capita</b> across different countries. Firstly presenting <br> positive correlation then negative. I suppose that in countries with lower GDP per capita, there might be an initial push to invest in education as part of the development<br> strategy. As GDP per capita increases, the marginal returns on education expenditure may diminish, leading to a weaker or negative correlation.",
       caption=caption ) ->pic6.scatterplot 

ggsave(here("output/pic6.jpg"), plot =pic6.scatterplot ,width = 8, height = 5, units="in",device ="jpg")

The observed change in the relationship between education expenditure and GDP per capita as GDP per capita increases could be attributed to various factors and economic dynamics. Here are a few possible explanations:

Initial Investment in Education: In countries with lower GDP per capita, there might be an initial push to invest in education as part of the development strategy. This could lead to a positive relationship between education expenditure and GDP per capita in the early stages of economic development.

Saturation Effect: As GDP per capita increases, there may be a point where the education system becomes more saturated or efficient, meaning additional investments may not result in proportional increases in educational outcomes. In such cases, the marginal returns on education expenditure may diminish, leading to a weaker or negative correlation.

3.4.3 Urbanization vs GDP.capita

economic$urban.growth=as.numeric(economic$urban.growth)
economic %>%
  select(country,continents,urban.ratio,urban.growth,GDP.capita) %>%
  filter(urban.ratio!=-99 & urban.growth!=-99 & GDP.capita !=-99) ->urban.GDP.capita #206 country left

urban.GDP.capita %>% 
  mutate(highligted=ifelse(
    country %in% highlighted.countries, country,""
  )) ->urban.GDP.capita

ggplot(urban.GDP.capita,
       aes(x=GDP.capita,
           y=urban.ratio,
           color=continents)) +
  geom_smooth(method = "loess",
              formula = y~x,
              span=1,
              se=F,
              color="grey40",
              size=.8)+
  geom_point(size=4,
             alpha=.6) +
  scale_x_continuous(breaks = seq(0,8000000,by=20000),
                     labels =paste0("$",format(seq(0,8000000,by=20000),big.mark = ",",scientific = F)),
                     limits = c(0,80000)) +
  scale_y_continuous(breaks = seq(0,100,by=25),
                     labels = paste0(seq(0,100,by=25),"%"),
                     limits =c(10,100))+
  scale_color_manual(name=NULL, 
                     values =light.pal)+
  geom_text_repel(aes(label=highligted),
                  color="grey40",
                  max.overlaps = 300,
                  size=8,
                  box.padding = .8,
                  fontface="bold") +
  theme(legend.position = "top",
        legend.justification = "right",
        legend.text = element_text(size=20),
        legend.title = element_text(size=20),
        axis.title.x = element_text(size=24,margin = margin(t=10),
                                    color="grey5"),
        axis.title.y = element_text(size = 24,margin = margin(r=10),
                                    color = "grey5")) +
  guides(color=guide_legend(nrow = 1,
                            override.aes = list(size=5)))+
  labs(x="GDP per Capita (USD) ,2016",
       y="Urbanlization  Ratio",
       title = "Urbanlization Ratio VS GDP per capita",
       subtitle = "This <i>scatterplot</i> illustrates the correlation between <b>GDP per capita and the level of urbanization</b>. It appears that as a country's economy develops, its level of urbanization tends<br> to increase. However, notable exceptions exist, as certain countries exhibit substantial urban populations despite having relatively low GDP per capita.",
       caption=caption ) -> pic7.scatterplot

ggsave(here("output/pic7.jpg"), plot =pic7.scatterplot ,width = 8, height = 5, units="in",device ="jpg")

This scatterplot visually articulates the correlation between GDP per capita and the degree of urbanization. A discernible trend emerges, indicating that countries experiencing economic development tend to witness a concurrent rise in urbanization levels. The positive correlation suggests that as a nation’s economy flourishes, urban areas expand, possibly due to increased job opportunities, infrastructure development, and improved living standards drawing people to cities. Despite this overarching pattern, intriguing exceptions exist. Some countries demonstrate significant urbanization even with relatively low GDP per capita, indicating unique socioeconomic dynamics. Factors such as cultural preferences, historical urbanization patterns, or strategic urban planning may contribute to these exceptions, underscoring the nuanced interplay between economic development and urbanization.

3.4.4 Health Expenditure Ratio vs GDP per Capita

economic$health.physician = as.numeric(economic$health.physician)

life_expect_string <- economic$life.female.male
life_expect_female<- lapply(strsplit(life_expect_string, "/"), function(x) {
  numeric_values <- as.numeric(x)
  female <- numeric_values[1] 
  return(female)
})
life_expect_male<- lapply(strsplit(life_expect_string, "/"), function(x) {
  numeric_values <- as.numeric(x)
  male <- numeric_values[2] 
  return(male)
})

senir_junior_string <- economic$pop.junior.senior
junior.ratio<- lapply(strsplit(senir_junior_string, "/"), function(x) {
  numeric_values <- as.numeric(x)
  junior <- numeric_values[1] 
  return(junior)
})
senior.ratio<- lapply(strsplit(senir_junior_string, "/"), function(x) {
  numeric_values <- as.numeric(x)
  senior <- numeric_values[2] 
  return(senior)
})
total.ratio=as.numeric(junior.ratio)+as.numeric(senior.ratio)

health.GDP.capita= economic %>%
  select(country,continents,health.expend,GDP.capita)

health.GDP.capita %>% 
  mutate(life.female=as.numeric(life_expect_female),
         life.male=as.numeric(life_expect_male),
         junior.ratio=as.numeric(junior.ratio),
         senior.ratio=as.numeric(senior.ratio),
         total.ratio=as.numeric(total.ratio)) %>%
  filter(health.expend !=-99 & GDP.capita !=-99 & life.female !=-99
         & life.male!=-99 & junior.ratio !=-99 & senior.ratio !=-99) -> health.GDP.capita

health.GDP.capita %>% 
  mutate(highligted=ifelse(
    country %in% highlighted.countries, country,""
  )) -> health.GDP.capita
head(health.GDP.capita,2)

ggplot(health.GDP.capita,
       aes(x=GDP.capita,
           y=health.expend,
           color=continents)) +
  geom_smooth(method = "loess",
              formula = y~x,
              span=1,
              se=F,
              color="grey40",
              size=.8)+
  geom_point(alpha=.6,
             size=4) +
  scale_x_continuous(breaks = seq(0,600000,by=20000),
                     labels =paste0("$",format(seq(0,600000,by=20000),
                                               big.mark = ",",
                                               scientific = F)),
                     limits = c(0,60000)) +
  scale_y_continuous(breaks = seq(1,14,by=2),
                     labels = paste0(seq(1,14,by=2),"%"),
                     limits =c(1,14))+
  scale_color_manual(name=NULL, 
                     values =light.pal)+
  geom_text_repel(aes(label=highligted),
                  color="grey40",
                  max.overlaps = 300,
                  size=8,
                  box.padding = .8,
                  fontface="bold") +
  theme(legend.position = "top",
        legend.justification = "right",
        legend.text = element_text(size=20),
        legend.title = element_text(size=20),
        axis.title.x = element_text(size=24,margin = margin(t=10),
                                    color="grey5"),
        axis.title.y = element_text(size = 24,margin = margin(r=10),
                                    color = "grey5")) +
  guides(color=guide_legend(nrow = 1,
                            override.aes = list(size=5)))+
  labs(x="GDP per Capita (USD) ,2016",
       y="Health Expenditure Ratio",
       title = "Health Expenditure Ratio VS GDP per capita",
       subtitle = "This <i>scatterplot</i> illustrates the correlation between <b>GDP per capita and Health expenditure as a percentage of GDP.</b> High GDP per capita countries tend to spend more on <br>health expenditure relatively due to greater financial resources and the ability to allocate a larger share of their income to healthcare. In contrast, low GDP per capita countries may<br> face limitations in available resources, making it challenging to significantly increase health expenditure.",
       caption=caption ) ->pic8.scatterplot

ggsave(here("output/pic8.jpg"), plot =pic8.scatterplot ,width = 8, height = 5, units="in",device ="jpg")

This insightful scatterplot showcases the correlation between GDP per capita and Health Expenditure as a percentage of GDP. Notably, countries with high GDP per capita exhibit a tendency to allocate a larger share of their income to healthcare, reflected in higher health expenditure percentages. This trend is attributed to greater financial resources in economically developed nations. In contrast, low GDP per capita countries encounter limitations in available resources, constraining their capacity to substantially increase health expenditure.

3.4.5 Health Ratio and Life Expectacy Correlation

ggplot(health.GDP.capita,
       aes(x=health.expend,
           y=life.female,
           color=life.female)) +
  geom_smooth(method = "loess",
              formula = y~x,
              span=1,
              se=F,
              color="grey40",
              size=.8)+
  geom_point(aes(size = life.male),
             alpha=.6) +
  scale_y_continuous(breaks = seq(50,90,by=5),
                     labels =seq(50,90,by=5),
                     limits = c(50,90)) +
  scale_x_continuous(breaks = seq(1,15,by=2.5),
                     labels = paste0(seq(1,15,by=2.5),"%"),
                     limits =c(1,15))+
  scale_color_gradientn(colours =pal)+
  geom_text_repel(aes(label=highligted),
                  color="grey40",
                  max.overlaps = 300,
                  size=8,
                  box.padding = .8,
                  fontface="bold") +
  theme(legend.position = "top",
        legend.justification = "right",
        legend.text = element_text(size=20),
        legend.title = element_text(size=20),
        axis.title.x = element_text(size=24,margin = margin(t=10),
                                    color="grey5"),
        axis.title.y = element_text(size = 24,margin = margin(r=10),
                                    color = "grey5")) +
  guides(color=guide_legend(nrow = 1,
                            override.aes = list(size=5)))+
  labs(x="Health Expenditure Ratio",
       y="Life expectancy",
       title = "Life expectancy VS Health Expenditure Ratio",
       subtitle = "This <i>scatterplot</i> illustrates a relatively positive correlation between <b>Life expectancy and Health expenditure Ratio</b>. And The size of points represents male life expectancy, <br>suggesting that in countries where females have higher life expectancy, males also tend to have higher life expectancy. Can we make a conclusion, higher level GDP can lead to long life expectancy? ",
       caption=caption ) ->pic9.scatterplot

ggsave(here("output/pic9.jpg"), plot =pic9.scatterplot ,width = 8, height = 5, units="in",device ="jpg")

Combining above 2 plots, Can we make a conclusion, higher level GDP can lead to long life expectancy?

The information provided suggests several interesting correlations, but it’s important to note that correlation does not imply causation. Let’s break down the information and the potential conclusions: The question raises the possibility that a higher level of GDP (Gross Domestic Product) could lead to long life expectancy because GDP per capita has a positive relation with health expenditure ratio. Now, it’s important to approach the conclusion cautiously: Correlation vs. Causation: While there is a positive correlation between life expectancy and health expenditure ratio, it doesn’t necessarily mean that higher GDP directly causes longer life expectancy. The relationship between GDP, health expenditure, and life expectancy is complex. Higher GDP may contribute to better healthcare infrastructure and resources, but the efficiency of healthcare systems, lifestyle factors, social policies, and other variables also play crucial roles.