Tuberculosis (TB) is one of the oldest infectious diseases known to humanity, yet it remains a major global health problem today. Despite medical advancements, millions of people still fall ill with TB each year, and it continues to be one of the deadliest infectious diseases worldwide.
According to the World Health Organization (WHO), approximately 10.8 million people developed TB in 2023, following increases during the COVID-19 pandemic. In 2020, there were 9.9 million cases, and in 2021, around 9.4 million new cases were reported.
The disease spreads through the air when an infected person coughs or sneezes, making it highly contagious, especially in crowded or poorly ventilated places. While TB is curable with proper treatment, many people struggle to access the care they need, and the rise of drug-resistant TB makes treatment more complicated.
Efforts to eliminate TB have made progress, but challenges such as poverty, weak healthcare systems, and limited awareness keep the disease alive.TB disproportionately affects countries in South Asia, Africa, and the Western Pacific regions. To defeat TB, we need global commitment, better access to healthcare, and greater awareness about prevention.
(Sources: WHO Global Tuberculosis Report, The Lancet, Le Monde, Our World in Data)
Development indicators are measurable statistics that provide insights into a country’s overall progress in economic, social, and environmental dimensions. These indicators help policymakers, researchers, and international organizations assess a nation’s well-being, identify challenges, and track improvements over time.
Countries vary significantly in terms of economic output, health standards, education levels, and living conditions. Development indicators help compare these differences and evaluate how policies impact national growth. Common indicators include Gross Domestic Product (GDP), Human Development Index (HDI), life expectancy, literacy rates, and access to basic services like healthcare and clean water.
By analyzing development indicators, governments can make informed decisions to enhance quality of life, reduce inequalities, and promote sustainable growth. These indicators also play a crucial role in shaping international aid distribution, investment strategies, and progress toward global goals such as the United Nations Sustainable Development Goals (SDGs).
Understanding these indicators provides a comprehensive picture of a country’s strengths and areas that need improvement, making them essential tools for development planning and evaluation.
The most widely used development indicators include the HDI, GDP, and Gross National Income (GNI) per capita.
In this case study we are going to compare over time the TB incidence and some of these development indicators to see the level of correlation with the TB and also what patters probably exists. The goal of this case study is to provide awareness to the general population also potential insights for desicion-making in the health sector for public health personnel about the status of the TB in their local settings.
For this case study we will use several data sources that are available for the public consumption from the WHO (the TB data), the World Bank and the United Nations Development Programme (UNDP) (for the development indicators data).
All the variables used for the analysis, beside each country or world region are numeric (including the year variable)
The data come from each institution’s data repository and the method of each indicator is well describe as the data sources and collection method of the indicators. For some indicators and years, the data is missing and to control this I will use complete year series that are available in each database.
Here is a table with the databases catalog for the analysis:
| Database | Institution | Location |
|---|---|---|
| TB burden estimates | WHO | link |
| Budgets for tuberculosis since fiscal year 2018 | WHO | link |
| TB cases notifications | WHO | link |
| Gross Domestic Product (in US$) per year | World Bank | link |
| Current health expenditure (% of GDP) | World Bank | link |
| Domestic general government health expenditure per capita (current US$) | World Bank | link |
| GNI per capita, Atlas method (current US$) | World Bank | link |
| Human Development index | UNDP | link |
For perform this analysis, I used R and MS Excel to process all the data, as they allow us to review, clean, adapt, and merge various data sources. First, I combined all the TB datasets into a single dataset with a total of 22 variables. The key variables include the estimated TB cases, reported cases, population (to calculate the TB rate per 100,000 inhabitants), and budget and expenditure in US dollars (available since 2018).
For the GDP, HDI, and other databases, I first cleaned the data in a spreadsheet by removing unnecessary elements such as non-functional titles and empty rows. Then, I continued the cleaning process in R, filtering and selecting the necessary columns and rows to create a dataset that end up with a total of 8 columns containing the countries and years that are aligned with the cleaned TB data set. Finally, I merged both datasets to ensure consistency and completeness. The final database contains 32 . The TB data is just related to drug-sensible TB (excluding the drug-resistant TB data).
Here is the list of indicators and the description to do the analysis:
| Indicator | Description |
|---|---|
| Estimated TB incidence (all forms) per 100 000 population | Is the expected number of new and relapse TB cases in within 100,000 pop. in a year (a rate). |
| Estimated number of incident cases (all forms) | is the total expected number of new and relapse TB cases in a year. |
| Population | Is the estimated total population in a year. |
| New TB cases reported | Is the total of new TB cases (all forms) reported in a year. |
| Average cost of drugs budgeted per TB diangosed patient | Is a metric in US Dollars based on the expected number of TB cases and the cost of the drugs in each country. |
| Total budget required for TB | Is the overall budget for each country per year in US Dollars dedicated for TB. |
| Total expected funding from all sources for TB | Is the identified total of money in US Dollars to cover the budget. |
| Total actual expenditure (US Dollars) in TB | Is the total in US Dollars expended for TB in a year in each country. |
| Gross domestic product per capita | Is a development indicator in US Dollars, is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products |
| Current health expenditure (% of GDP) | Level of current health expenditure expressed as a percentage of GDP. Estimates of current health expenditures include healthcare goods and services consumed during each year. |
| GNI per capita, Atlas method (current US$) | Is the gross national income, converted to U.S. dollars using the World Bank Atlas method, divided by the midyear population. GNI is the sum of value added by all resident producers plus any product taxes (less subsidies) not included in the valuation of output plus net receipts of primary income (compensation of employees and property income) from abroad. |
| GNI, Atlas method (current US$) | Is the sum of value added by all resident producers plus any product taxes (less subsidies) not included in the valuation of output plus net receipts of primary income (compensation of employees and property income) from abroad. |
| Domestic general government health expenditure per capita (current US$) | Public expenditure on health from domestic sources per capita expressed in current US Dollars. |
| Human development index | Is a summary measure of average achievement in key dimensions of human development: a long and healthy life, being knowledgeable and having a decent standard of living. |
| Life Expectancy in years | Is a summary measure of average achievement in key dimensions of human development: a long and healthy life, being knowledgeable and having a decent standard of living. |
Now lets see how our picked variables are related with the TB incidence.
Between 2003 and 2023, global tuberculosis (TB) trends have exhibited both progress and setbacks, influenced by various socioeconomic factors, including Gross Domestic Product (GDP) and the Human Development Index (HDI).
In the early 2000s, concerted global health initiatives led to a gradual decline in TB incidence. However, recent data indicates a concerning resurgence, especially after the COVID-19 pandemic. In 2023, approximately 10.8 million individuals developed TB, with 1.25 million fatalities, reaffirming its position as the leading infectious disease killer worldwide. This resurgence is attributed to factors such as healthcare disruptions during the COVID-19 pandemic and the constant funding gap; In 2023, global funding for TB prevention and treatment was $5.7 billion, markedly below the $22 billion estimated as necessary by the WHO.
TB disproportionately affects low- and middle-income countries, underscoring the link between socioeconomic status and disease burden. Nations with lower GDP and HDI often face challenges like inadequate healthcare infrastructure, limited access to medical services, and higher prevalence of risk factors such as malnutrition and overcrowded living conditions. For instance, countries like India, Indonesia, and the Philippines, which have significant TB burdens, also contend with developmental challenges that exacerbate disease transmission and hinder effective intervention.
Despite challenges, there have been notable advancements:
Improved Diagnostics and Treatments: The development and implementation of more effective diagnostic tools and treatment regimens have enhanced TB detection and management, contributing to better patient outcomes.
Global Health Initiatives: International strategies, such as the World Health Organization’s “End TB” initiative launched in 2014, aim to significantly reduce TB incidence and mortality by 2030. While interim targets have been missed, these initiatives have galvanized global efforts and resources towards combating TB.
Conclusion:
The global fight against TB from 2003 to 2023 has seen both achievements and obstacles. While advancements in medical technology and international collaboration offer hope, challenges like insufficient funding, and socioeconomic disparities underscore the need for sustained, multifaceted approaches to effectively reduce the global TB burden.
Guiding questions:
Deliverable: A clear statement of the business task you have selected to investigate
Guiding questions:
Where is your data located?
How is the data organized?
Are there issues with bias or credibility in this data? Does your data ROCCC (Reliable, original, comprehensive, current, cited)?
How are you addressing licensing, privacy, security, and accessibility?
How did you verify the data’s integrity?
How does it help you answer your question?
Are there any problems with the data?
Deliverable: A description of all data sources used
Guiding questions:
What tools are you choosing and why?
Have you ensured your data’s integrity?
What steps have you taken to ensure that your data is clean?
How can you verify that your data is clean and ready to analyze?
Have you documented your cleaning process so you can review and share those results?
Deliverable: Documentation of any cleaning or manipulation of data
Guiding questions:
How should you organize your data to perform analysis on it?
Has your data been properly formatted?
What surprises did you discover in the data?
What trends or relationships did you find in the data?
How will these insights help answer your business questions?
Deliverable: A summary of your analysis
Here is my code for the graphs:
knitr::opts_chunk$set(echo = FALSE,
warning = FALSE,
message = FALSE,
fig.width=7.5,
fig.height=6)
# load the necessary packages
pacman::p_load(icons,
tidyverse,
here,
rio,
janitor,
fontawesome,
scales,
patchwork)
# Get the path of each dataset
files_db <- dir(here("data"), full.names = T)
# Load TB datasets
files_db
tb_dictionary <- import(files_db[19]) %>%
select(columns_tb=1, definition)
#Get the definitions for the dictionary
tb_est_variables <- import(files_db[18]) %>%
colnames() %>%
as_tibble() %>%
rename(columns_tb=1)%>%
left_join(tb_dictionary)
tb_estimates <- import(files_db[18]) %>%
pivot_longer(7:ncol(.), names_to = "columns_tb", values_to = "value") %>%
left_join(tb_est_variables) %>%
select(1:6,definition, variable=columns_tb, value) %>%
filter(variable %in% c("e_inc_num", "e_pop_num", "e_inc_100k")) %>%
select(-definition) %>%
pivot_wider(names_from = variable,
values_from = value,
values_fill = 0) %>%
rename(population=e_pop_num,
estimated_cases=e_inc_num,
estimated_rate_100k=e_inc_100k)
tb_cases_variables <- import(files_db[21]) %>%
colnames() %>%
as_tibble() %>%
rename(columns_tb=1)%>%
left_join(tb_dictionary)
tb_cases <- import(files_db[21]) %>%
pivot_longer(7:ncol(.), names_to = "columns_tb", values_to = "value") %>%
left_join(tb_cases_variables) %>%
select(1:6,definition, variable=columns_tb, value) %>%
filter(variable %in% c("c_newinc")) %>%
select(1:6,new_cases_reported=value)
tb_budget_variables <- import(files_db[16]) %>%
colnames() %>%
as_tibble() %>%
rename(columns_tb=1)%>%
left_join(tb_dictionary)
tb_budget <- import(files_db[16]) %>%
pivot_longer(7:ncol(.), names_to = "columns_tb", values_to = "value") %>%
left_join(tb_budget_variables) %>%
select(1:6,definition, variable=columns_tb, value) %>%
filter(variable %in% c("budget_cpp_dstb", "budget_staff",
"budget_fld", "budget_prog", "budget_tot",
"cf_tot_sources")) %>%
select(-definition) %>%
pivot_wider(names_from = variable,
values_from = value,
values_fill = 0) %>%
mutate(gap=budget_tot-cf_tot_sources)
tb_expen_variables <- import(files_db[20]) %>%
colnames() %>%
as_tibble() %>%
rename(columns_tb=1)%>%
left_join(tb_dictionary)
tb_expenditure <- import(files_db[20]) %>%
pivot_longer(7:ncol(.), names_to = "columns_tb", values_to = "value") %>%
left_join(tb_budget_variables) %>%
select(1:6,definition, variable=columns_tb, value) %>%
filter(variable %in% c("exp_lab","exp_staff","exp_fld", "exp_prog","exp_tot")) %>%
select(-definition) %>%
pivot_wider(names_from = variable,
values_from = value,
values_fill = 0)
#Final TB dataset
tb_dataset <- tb_estimates %>%
left_join(tb_cases) %>%
left_join(tb_budget) %>%
left_join(tb_expenditure) %>%
mutate(gap=1-new_cases_reported/estimated_cases,
tb_rate=new_cases_reported/population*100000,
budget_vs_expe=exp_tot/budget_tot) %>%
rename(country_code=iso3) %>%
select(-c(iso2, iso_numeric))
#GDP
gdp_pc <- import(files_db[1], skip=3) %>%
clean_names() %>%
pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>%
mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric()) %>%
filter(year>=2000) %>%
select(country=1,country_code, year, gdp_usd_pc=value)
gdp_health <- import(files_db[3], skip=3) %>%
clean_names() %>%
pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>%
mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric(),
value=value/100) %>%
filter(year>=2000) %>%
select(country=1,country_code, year, gdp_health_ptc=value)
gni_pc <- import(files_db[2], skip=3) %>%
clean_names() %>%
pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>%
mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric()) %>%
filter(year>=2000) %>%
select(country=1,country_code, year, gni_usd_pc=value)
gni_usd <- import(files_db[23], skip=3) %>%
clean_names() %>%
pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>%
mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric()) %>%
filter(year>=2000) %>%
select(country=1,country_code, year, gni_usd=value)
dgghe_usd <- import(files_db[4]) %>%
clean_names() %>%
pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>%
mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric()) %>%
filter(year>=2000) %>%
select(country=1,country_code, year, dgghe_pc=value)
gdp_dataset <- gdp_pc%>%
left_join(gdp_health) %>%
left_join(gni_pc) %>%
left_join(gni_usd) %>%
left_join(dgghe_usd)
hdi_dataset <- import(files_db[9]) %>%
pivot_longer(6:ncol(.), names_to = "indicator", values_to = "value") %>%
select(-hdi_rank_2022) %>%
separate(indicator, sep="_", into=c("indicator", "year")) %>%
mutate(year=str_extract(year, "[0-9]+") %>% as.numeric()) %>%
filter(!is.na(year),
indicator %in% c("hdi", "le", "gnipc")) %>%
pivot_wider(names_from = indicator,
values_from = value,
values_fill = 0) %>%
rename(country_code=iso3,
life_expectancy=le,
gni_pc_un=gnipc)
#Final database
tb_with_gdp <- tb_dataset %>%
left_join(gdp_dataset, by=c("country_code", "year"), relationship = "one-to-one")%>%
left_join(hdi_dataset) %>%
mutate(country=country.x) %>%
select(country,country_code,region, everything(), -country.x, -country.y)
#Visualizations
g0 <- tb_with_gdp %>%
select(year, budget_tot, cf_tot_sources, exp_tot) %>%
group_by(year) %>%
reframe(across(is.numeric, \(x) sum(x, na.rm = T))) %>%
filter(year>=2018) %>%
pivot_longer(2:ncol(.), names_to = "ind", values_to = "n") %>%
mutate(ind=case_when(ind=="budget_tot"~"A. Total Budget",
ind=="cf_tot_sources"~"B. Expected funding",
ind=="exp_tot"~"Total actual expenditure")) %>%
group_by(year) %>%
mutate(gap=1-(n/lag(n,1)),
gap=if_else(str_detect(ind, "B. "),gap,NA)) %>%
ggplot(aes(x=year, y=n, fill=ind))+
geom_col(position="dodge", color="black")+
ggrepel::geom_text_repel(aes(x=year, y=n, label=percent(gap,0.1)), vjust=-0.9, hjust=0)+
scale_y_continuous(labels = unit_format(unit="B$", scale=1e-9, sep = ""))+
scale_fill_brewer(palette = "BuPu")+
tint::theme_tint(base_size = 12)+
geom_hline(yintercept = 0)+
labs(fill="In Billion USD$:",
title="Funding of Tuberculsis response world-wide since 2018",
caption = "Source: WHO.; Percent on top of the the Expected funding bar is the funding gap of that year")+
theme(panel.grid.major.y = element_line(color="snow2", linewidth = 0.1),
axis.text = element_text(face="bold"),
legend.position = "top",
axis.title.y = element_blank())
g1 <- g0+annotate("text",
x=min(g0[["data"]]$year),
y=max(g0[["data"]]$n)-max(g0[["data"]]$n)*0.001,
label=paste0("The funding gap in the period ranged from"," ",percent(min(g0[["data"]]$gap, na.rm = T),0.1)," to ",percent(max(g0[["data"]]$gap, na.rm = T),0.1), " of the budget."), hjust=0)
world <- tb_with_gdp %>%
select(country, region,population, year, new_cases_reported, cf_tot_sources, budget_tot) %>%
group_by(year) %>%
reframe(across(is.numeric, \(x) sum(x,na.rm=T))) %>%
filter(year>2018) %>%
mutate(gap_fun=1-(cf_tot_sources/budget_tot),
tb_rate=new_cases_reported/population*100000,
country="World",
region="World") %>%
filter(year==max(year)) %>%
select(-new_cases_reported, -population)
g2_data <- tb_with_gdp %>%
group_by(country) %>%
fill(region) %>%
select(country, region,population, year, gap_cases=gap, budget_tot, cf_tot_sources, tb_rate) %>%
filter(year>2018) %>%
mutate(gap_fun=1-(cf_tot_sources/budget_tot)) %>%
group_by(country) %>%
filter(year==max(year),
!is.na(budget_tot),
!is.na(cf_tot_sources),
gap_fun>=0) %>%
bind_rows(world)
g2 <- g2_data %>%
ggplot(aes(x=gap_fun, y=tb_rate,label=country, size=population))+
geom_point(alpha=0.4)+
scale_x_continuous(label=percent, "Funding gap (budgeted vs expected funding)")+
scale_y_continuous(label=comma, "TB cases per 100,000 pop.")+
scale_size_continuous(labels = unit_format(unit="M.", scale=1e-6, sep = ""))+
geom_smooth(method = "lm", se=F, color="darkgrey", show.legend = F)+
ggrepel::geom_text_repel(show.legend = F)+
geom_point(data=world %>% filter(year==max(year)), size=4, color="red",
aes(x=gap_fun, y=tb_rate),
inherit.aes = F)+
labs(title="Tubercolosis per population vs. funding gap in 2023",
subtitle = "(Just the countries with funding gap from 0% and above)",
size="Size Population")+
tint::theme_tint(base_size = 12)+
theme(panel.grid.major = element_line(color="snow3", linewidth = 0.1),
axis.text = element_text(face="bold"),
axis.title = element_text(face="bold", size=14),
legend.position = "top")
g3 <- tb_with_gdp %>%
group_by(year) %>%
reframe(across(is.numeric, \(x) sum(x, na.rm = T))) %>%
mutate(tb_rate=new_cases_reported/population*100000,
estimated_rate_100k=estimated_cases/population*100000) %>%
select(year, tb_rate, estimated_rate_100k) %>%
pivot_longer(tb_rate:ncol(.), names_to = "ind", values_to = "n") %>%
mutate(ind=recode(ind, "estimated_rate_100k"="Estimated TB rate",
"tb_rate"="Reported TB rate")) %>%
ggplot(aes(x=year, y=n, color=ind, linetype = ind))+
geom_line()+
geom_point(size=4, alpha=0.8)+
labs(x="Years", y="TB rate per 100k hab.",
caption="Source: WHO",
color="Indicator",
linetype="Indicator")+
scale_color_brewer(palette = "BuPu")+
tint::theme_tint(base_size = 14)+
theme(panel.grid.major.y = element_line(color="snow3", linewidth = 0.1),
axis.text = element_text(face="bold"),
legend.position = "top",
axis.line.x = element_line(linewidth =0.4))
g4 <- tb_with_gdp %>%
select(new_cases_reported, population, year) %>%
group_by(year) %>%
reframe(across(is.numeric, \(x) sum(x, na.rm = T))) %>%
mutate(tb_rate=new_cases_reported/population*100000) %>%
select(year, tb_rate) %>%
left_join(hdi_dataset %>% filter(country=="World"), by="year") %>%
select(year, tb_rate, life_expectancy, gni_pc_un, hdi) %>%
filter(!is.na(life_expectancy)) %>%
pivot_longer(2:ncol(.), names_to = "ind", values_to = "n") %>%
mutate(ind=recode(ind,
"gni_pc_un"="Gross National Income Per Capita (US Dollars)",
"life_expectancy"="Life Expectancy in years",
"tb_rate"="TB rate per 100k pop.",
"hdi"="Human Development index"))%>%
ggplot(aes(x=year, y=n, color=ind))+
geom_line(show.legend = F)+
labs(caption = "source:WHO TB data and UN data")+
geom_point(show.legend = F, size=4)+
facet_wrap(~ind, scales="free", ncol=1)+
scale_y_continuous( label=comma)+
scale_color_brewer(palette = "BuPu")+
labs(x="Years")+
tint::theme_tint(base_size = 14)+
theme(panel.grid.major.y = element_line(color="snow3", linewidth = 0.1),
axis.text = element_text(face="bold"),
axis.title = element_text(face="bold", size=14),
legend.position = "top",
strip.text = element_text(hjust = 0, face="bold"),
axis.line.x = element_line(linewidth = 0.3),
axis.title.y=element_blank())
g5 <- tb_with_gdp %>%
filter(year>=2016 & year<=2021) %>%
select(year, gdp_health_ptc, tb_rate, population, country, dgghe_pc) %>%
filter(!is.na(dgghe_pc)) %>%
group_by(year) %>%
mutate(avg_dgghe=mean(dgghe_pc, na.rm=T),
avg_tbr=mean(tb_rate, na.rm=T)) %>%
ggplot(aes(x=dgghe_pc, y=tb_rate, size=population,
label=country,
group = year))+
geom_point(alpha=0.4, color="darkblue")+
scale_x_continuous(label=dollar, "Domestic general government health expenditure per capita (current US$)")+
scale_y_continuous(label=comma, "TB rate per 100k pop.")+
scale_size_continuous(labels = unit_format(unit="M.", scale=1e-6, sep = ""))+
geom_vline(aes(xintercept = mean(dgghe_pc, na.rm=T),
group = year), linetype=2)+
geom_hline(aes(yintercept = mean(tb_rate, na.rm=T),
group = year), linetype=2)+
ggrepel::geom_text_repel(show.legend = F, color="black", segment.curvature = -0.1,
max.overlaps = 20)+
facet_wrap(~year, scale="free_x")+
geom_text(aes(x=5000, y=max(tb_rate, na.rm = T)-max(tb_rate, na.rm = T)*0.7, label=paste0(dollar(avg_dgghe),"\n TB rate: ",comma(avg_tbr)), group=year), size=5)+
tint::theme_tint(base_size = 12)+
theme(
axis.text = element_text(face="bold"),
axis.title = element_text(face="bold", size=14),
axis.line=element_line(linewidth = 0.3),
legend.position = "top",
strip.text = element_text(hjust = 0.5, face="bold", size=16),)+
labs(size="Country's size Population",
caption = "Source: World bank and WHO; text in $ is the average of the DGGHE per capita and average of TB rate")
g7 <- tb_with_gdp %>%
select(country,year,g_whoregion, tb_rate, new_cases_reported, population, gdp_usd_pc) %>%
filter(year>2016) %>%
ggplot(aes(x=gdp_usd_pc, fill=g_whoregion))+
geom_histogram(show.legend = F, color="white")+
scale_x_continuous(labels = dollar)+
scale_fill_brewer(palette = "BuPu")+
facet_wrap(~year, scale="free_x")+
tint::theme_tint()
g7 <- tb_with_gdp %>%
select(country,year,g_whoregion, tb_rate,
new_cases_reported,
population, gdp_usd_pc) %>%
group_by(year) %>%
mutate(avg_gdp=mean(gdp_usd_pc, na.rm=T),
avg_tbr=mean(tb_rate, na.rm=T)) %>%
filter(year==2023 | year==2003) %>%
ggplot(aes(x=gdp_usd_pc, y=tb_rate, size=population))+
geom_point(aes(color=g_whoregion), alpha=0.4)+
geom_smooth(se=F, method = "lm", show.legend = F)+
scale_x_continuous(labels = dollar, "Gross domestic product per capita (US dollars)")+
scale_y_continuous(labels = comma, "TB rate per 100,000 pop.")+
scale_fill_brewer(palette = "BuPu")+
geom_text(aes(x=100000, y=200, label=paste0("GDP: ", dollar(avg_gdp),"\n TB rate: ",comma(avg_tbr))), size=7, hjust=1)+
scale_size_continuous(labels = unit_format(unit="M.", scale=1e-6, sep = ""))+
facet_wrap(~year)+
tint::theme_tint()+
theme(axis.line = element_line(linewidth = 0.1))+
theme(
axis.text = element_text(face="bold"),
axis.title = element_text(face="bold", size=14),
axis.line=element_line(linewidth = 0.3),
legend.position = "top",
strip.text = element_text(hjust = 0.5, face="bold", size=16),)+
labs(size="Country's size Population",
color="Country's WHO region",
caption = "Source: World bank and WHO")
g8_data <- tb_with_gdp %>%
mutate(hdicode=case_when(hdicode %in% c("")~NA,
hdicode == "Low"~"a. Low",
hdicode == "Medium"~"b. Medium",
hdicode == "High"~"c. High",
hdicode == "Very High"~"d. Very High",
.default = hdicode)) %>%
filter(!is.na(hdicode)) %>%
reframe(n_countries=n(),
tb_rate=sum(new_cases_reported, na.rm = T)/sum(population, na.rm = T)*100000,
avg_gdp_pc=mean(gdp_usd_pc, na.rm=T),
ov_gdp=sum(gdp_usd_pc, na.rm = T), .by = c(hdicode, year),
population=sum(population, na.rm = T)) %>%
filter(year %in% c(2000, 2022))
g8 <- g8_data %>%
ggplot(aes(x=factor(year),
y=ov_gdp,
fill=hdicode,
label=paste0("Avg. GDP: ",dollar(round(avg_gdp_pc,1)),
"\n TB rate: ",round(tb_rate, 1), " per 100k pop.")))+
geom_col(alpha=0.5, color="black")+
facet_wrap(~hdicode, scale="free")+
scale_fill_brewer(palette = "BuPu")+
geom_hline(yintercept = 0)+
geom_text(position = position_stack(vjust = 0.5))+
geom_text(aes(x=factor(year), y=ov_gdp, label=paste0(dollar(ov_gdp), " B.")),
vjust=-0.2,
fontface="bold")+
tint::theme_tint()+
theme(
axis.text.x = element_text(face="bold", size=16),
legend.position = "none",
strip.text = element_text(hjust = 0.5, face="bold", size=16),
axis.title=element_blank(),
axis.text.y=element_blank(),
axis.line = element_blank(),
axis.ticks = element_blank())
Guiding questions:
What is your final conclusion based on your analysis?
How could your team and business apply your insights?
What next steps would you or your stakeholders take based on your findings?
Is there additional data you could use to expand on your findings?
Deliverable: Your top high-level insights based on your analysis, a list of additional deliverables you think would be helpful to include for further exploration
Statement of the business
Data sources descriptions
Data processing procedures
Data analysis
Data visualizations and key findings
Conclusions
Recomendations