Introduction

Tuberculosis: An Old Disease That Still Threatens the

Tuberculosis (TB) is one of the oldest infectious diseases known to humanity, yet it remains a major global health problem today. Despite medical advancements, millions of people still fall ill with TB each year, and it continues to be one of the deadliest infectious diseases worldwide.

According to the World Health Organization (WHO), approximately 10.8 million people developed TB in 2023, following increases during the COVID-19 pandemic. In 2020, there were 9.9 million cases, and in 2021, around 9.4 million new cases were reported.

The disease spreads through the air when an infected person coughs or sneezes, making it highly contagious, especially in crowded or poorly ventilated places. While TB is curable with proper treatment, many people struggle to access the care they need, and the rise of drug-resistant TB makes treatment more complicated.

Efforts to eliminate TB have made progress, but challenges such as poverty, weak healthcare systems, and limited awareness keep the disease alive.TB disproportionately affects countries in South Asia, Africa, and the Western Pacific regions. To defeat TB, we need global commitment, better access to healthcare, and greater awareness about prevention.

(Sources: WHO Global Tuberculosis Report, The Lancet, Le Monde, Our World in Data)

How the Tuberculosis burden relates with other development indicators

Development indicators are measurable statistics that provide insights into a country’s overall progress in economic, social, and environmental dimensions. These indicators help policymakers, researchers, and international organizations assess a nation’s well-being, identify challenges, and track improvements over time.

Countries vary significantly in terms of economic output, health standards, education levels, and living conditions. Development indicators help compare these differences and evaluate how policies impact national growth. Common indicators include Gross Domestic Product (GDP), Human Development Index (HDI), life expectancy, literacy rates, and access to basic services like healthcare and clean water.

By analyzing development indicators, governments can make informed decisions to enhance quality of life, reduce inequalities, and promote sustainable growth. These indicators also play a crucial role in shaping international aid distribution, investment strategies, and progress toward global goals such as the United Nations Sustainable Development Goals (SDGs).

Understanding these indicators provides a comprehensive picture of a country’s strengths and areas that need improvement, making them essential tools for development planning and evaluation.

The most widely used development indicators include the HDI, GDP, and Gross National Income (GNI) per capita.

In this case study we are going to compare over time the TB incidence and some of these development indicators to see the level of correlation with the TB and also what patters probably exists. The goal of this case study is to provide awareness to the general population also potential insights for desicion-making in the health sector for public health personnel about the status of the TB in their local settings.

Data sources

For this case study we will use several data sources that are available for the public consumption from the WHO (the TB data), the World Bank and the United Nations Development Programme (UNDP) (for the development indicators data).

All the variables used for the analysis, beside each country or world region are numeric (including the year variable)

The data come from each institution’s data repository and the method of each indicator is well describe as the data sources and collection method of the indicators. For some indicators and years, the data is missing and to control this I will use complete year series that are available in each database.

Here is a table with the databases catalog for the analysis:

Database	Institution	Location
TB burden estimates	WHO	link
Budgets for tuberculosis since fiscal year 2018	WHO	link
TB cases notifications	WHO	link
Gross Domestic Product (in US$) per year	World Bank	link
Current health expenditure (% of GDP)	World Bank	link
Domestic general government health expenditure per capita (current US$)	World Bank	link
GNI per capita, Atlas method (current US$)	World Bank	link
Human Development index	UNDP	link

Data processing

For perform this analysis, I used R and MS Excel to process all the data, as they allow us to review, clean, adapt, and merge various data sources. First, I combined all the TB datasets into a single dataset with a total of 22 variables. The key variables include the estimated TB cases, reported cases, population (to calculate the TB rate per 100,000 inhabitants), and budget and expenditure in US dollars (available since 2018).

For the GDP, HDI, and other databases, I first cleaned the data in a spreadsheet by removing unnecessary elements such as non-functional titles and empty rows. Then, I continued the cleaning process in R, filtering and selecting the necessary columns and rows to create a dataset that end up with a total of 8 columns containing the countries and years that are aligned with the cleaned TB data set. Finally, I merged both datasets to ensure consistency and completeness. The final database contains 32 . The TB data is just related to drug-sensible TB (excluding the drug-resistant TB data).

Here is the list of indicators and the description to do the analysis:

Indicator	Description
Estimated TB incidence (all forms) per 100 000 population	Is the expected number of new and relapse TB cases in within 100,000 pop. in a year (a rate).
Estimated number of incident cases (all forms)	is the total expected number of new and relapse TB cases in a year.
Population	Is the estimated total population in a year.
New TB cases reported	Is the total of new TB cases (all forms) reported in a year.
Average cost of drugs budgeted per TB diangosed patient	Is a metric in US Dollars based on the expected number of TB cases and the cost of the drugs in each country.
Total budget required for TB	Is the overall budget for each country per year in US Dollars dedicated for TB.
Total expected funding from all sources for TB	Is the identified total of money in US Dollars to cover the budget.
Total actual expenditure (US Dollars) in TB	Is the total in US Dollars expended for TB in a year in each country.
Gross domestic product per capita	Is a development indicator in US Dollars, is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products
Current health expenditure (% of GDP)	Level of current health expenditure expressed as a percentage of GDP. Estimates of current health expenditures include healthcare goods and services consumed during each year.
GNI per capita, Atlas method (current US$)	Is the gross national income, converted to U.S. dollars using the World Bank Atlas method, divided by the midyear population. GNI is the sum of value added by all resident producers plus any product taxes (less subsidies) not included in the valuation of output plus net receipts of primary income (compensation of employees and property income) from abroad.
GNI, Atlas method (current US$)	Is the sum of value added by all resident producers plus any product taxes (less subsidies) not included in the valuation of output plus net receipts of primary income (compensation of employees and property income) from abroad.
Domestic general government health expenditure per capita (current US$)	Public expenditure on health from domestic sources per capita expressed in current US Dollars.
Human development index	Is a summary measure of average achievement in key dimensions of human development: a long and healthy life, being knowledgeable and having a decent standard of living.
Life Expectancy in years	Is a summary measure of average achievement in key dimensions of human development: a long and healthy life, being knowledgeable and having a decent standard of living.

Analysis

Now lets see how our picked variables are related with the TB incidence.

The Tuberculosis rate trend in 20 years seems to increase, probably better diagnostic technologies. The world population is improving.

The Tuberculosis budget vs. expenditure has been constant over the years; also the funding gap is about the same every year in the last 6 years.

In 2023, Is the funding correlates with the Tuberculosis rate in each country? Seems likely, but more data is needed.

The life expectancy, the gross national income, the Human development index and the Tuberculosis rate show a similar trend.

The investemnt in health expenditure seems to be associated with the Tuberculosis rate, the more investment lower the rate (data from 2016 to 2021); more data and deeper analysis is needed to confirm this claim.

The Gross domestric product per capita in 20 years has increase on average a 36% and the TB rate has decreased in 14%; although this data is shows some progress, a forecast analysis and the inclusion of other factors are needed to predict the End of TB in the world.

Insights

Between 2003 and 2023, global tuberculosis (TB) trends have exhibited both progress and setbacks, influenced by various socioeconomic factors, including Gross Domestic Product (GDP) and the Human Development Index (HDI).

In the early 2000s, concerted global health initiatives led to a gradual decline in TB incidence. However, recent data indicates a concerning resurgence, especially after the COVID-19 pandemic. In 2023, approximately 10.8 million individuals developed TB, with 1.25 million fatalities, reaffirming its position as the leading infectious disease killer worldwide. This resurgence is attributed to factors such as healthcare disruptions during the COVID-19 pandemic and the constant funding gap; In 2023, global funding for TB prevention and treatment was $5.7 billion, markedly below the $22 billion estimated as necessary by the WHO.

TB disproportionately affects low- and middle-income countries, underscoring the link between socioeconomic status and disease burden. Nations with lower GDP and HDI often face challenges like inadequate healthcare infrastructure, limited access to medical services, and higher prevalence of risk factors such as malnutrition and overcrowded living conditions. For instance, countries like India, Indonesia, and the Philippines, which have significant TB burdens, also contend with developmental challenges that exacerbate disease transmission and hinder effective intervention.

Despite challenges, there have been notable advancements:

Improved Diagnostics and Treatments: The development and implementation of more effective diagnostic tools and treatment regimens have enhanced TB detection and management, contributing to better patient outcomes.
Global Health Initiatives: International strategies, such as the World Health Organization’s “End TB” initiative launched in 2014, aim to significantly reduce TB incidence and mortality by 2030. While interim targets have been missed, these initiatives have galvanized global efforts and resources towards combating TB.

Conclusion:

The global fight against TB from 2003 to 2023 has seen both achievements and obstacles. While advancements in medical technology and international collaboration offer hope, challenges like insufficient funding, and socioeconomic disparities underscore the need for sustained, multifaceted approaches to effectively reduce the global TB burden.

Annex

Ask phase

Guiding questions:

What topic are you exploring?
What is the problem you are trying to solve?
What metrics will you use to measure your data to achieve your objective?
Who are the stakeholders?
Who is your audience?
How can your insights help your client make decisions?

Deliverable: A clear statement of the business task you have selected to investigate

Prepare phase

Guiding questions:

Where is your data located?
How is the data organized?
Are there issues with bias or credibility in this data? Does your data ROCCC (Reliable, original, comprehensive, current, cited)?
How are you addressing licensing, privacy, security, and accessibility?
How did you verify the data’s integrity?
How does it help you answer your question?
Are there any problems with the data?

Deliverable: A description of all data sources used

Process Phase

Guiding questions:

What tools are you choosing and why?
Have you ensured your data’s integrity?
What steps have you taken to ensure that your data is clean?
How can you verify that your data is clean and ready to analyze?
Have you documented your cleaning process so you can review and share those results?

Deliverable: Documentation of any cleaning or manipulation of data

Analyze phase

Guiding questions:

How should you organize your data to perform analysis on it?
Has your data been properly formatted?
What surprises did you discover in the data?
What trends or relationships did you find in the data?
How will these insights help answer your business questions?

Deliverable: A summary of your analysis

Here is my code for the graphs:

knitr::opts_chunk$set(echo = FALSE,
                      warning = FALSE,
                      message = FALSE,
                      fig.width=7.5, 
                      fig.height=6)

# load the necessary packages

pacman::p_load(icons,
               tidyverse,
               here,
               rio,
               janitor,
               fontawesome,
               scales,
               patchwork)

# Get the path of each dataset

files_db <- dir(here("data"), full.names = T)

# Load TB datasets

files_db

tb_dictionary <- import(files_db[19]) %>% 
  select(columns_tb=1, definition)

#Get the definitions for the dictionary

tb_est_variables <- import(files_db[18]) %>% 
  colnames() %>% 
  as_tibble() %>%
  rename(columns_tb=1)%>% 
  left_join(tb_dictionary)

tb_estimates <- import(files_db[18]) %>% 
  pivot_longer(7:ncol(.), names_to = "columns_tb", values_to = "value") %>% 
  left_join(tb_est_variables) %>% 
  select(1:6,definition, variable=columns_tb, value) %>% 
  filter(variable %in% c("e_inc_num", "e_pop_num", "e_inc_100k")) %>% 
  select(-definition) %>% 
  pivot_wider(names_from = variable,
              values_from = value,
              values_fill = 0) %>% 
  rename(population=e_pop_num,
         estimated_cases=e_inc_num,
         estimated_rate_100k=e_inc_100k)

tb_cases_variables <- import(files_db[21]) %>%  
  colnames() %>% 
  as_tibble() %>%
  rename(columns_tb=1)%>% 
  left_join(tb_dictionary)

tb_cases <- import(files_db[21]) %>% 
  pivot_longer(7:ncol(.), names_to = "columns_tb", values_to = "value") %>% 
  left_join(tb_cases_variables) %>% 
  select(1:6,definition, variable=columns_tb, value) %>% 
  filter(variable %in% c("c_newinc")) %>% 
  select(1:6,new_cases_reported=value)



tb_budget_variables <- import(files_db[16]) %>%  
  colnames() %>% 
  as_tibble() %>%
  rename(columns_tb=1)%>% 
  left_join(tb_dictionary)


tb_budget <- import(files_db[16]) %>% 
  pivot_longer(7:ncol(.), names_to = "columns_tb", values_to = "value") %>% 
  left_join(tb_budget_variables) %>% 
  select(1:6,definition, variable=columns_tb, value) %>% 
  filter(variable %in% c("budget_cpp_dstb", "budget_staff", 
                         "budget_fld", "budget_prog", "budget_tot",
                         "cf_tot_sources")) %>% 
  select(-definition) %>% 
  pivot_wider(names_from = variable, 
              values_from = value,
              values_fill = 0) %>% 
  mutate(gap=budget_tot-cf_tot_sources)


tb_expen_variables <- import(files_db[20]) %>%  
  colnames() %>% 
  as_tibble() %>%
  rename(columns_tb=1)%>% 
  left_join(tb_dictionary)


tb_expenditure <- import(files_db[20]) %>% 
  pivot_longer(7:ncol(.), names_to = "columns_tb", values_to = "value") %>% 
  left_join(tb_budget_variables) %>% 
  select(1:6,definition, variable=columns_tb, value) %>% 
  filter(variable %in% c("exp_lab","exp_staff","exp_fld", "exp_prog","exp_tot")) %>% 
  select(-definition) %>% 
  pivot_wider(names_from = variable, 
              values_from = value,
              values_fill = 0) 
  

#Final TB dataset

tb_dataset <- tb_estimates %>% 
  left_join(tb_cases) %>% 
  left_join(tb_budget) %>% 
  left_join(tb_expenditure) %>% 
  mutate(gap=1-new_cases_reported/estimated_cases,
         tb_rate=new_cases_reported/population*100000,
         budget_vs_expe=exp_tot/budget_tot) %>% 
  rename(country_code=iso3) %>% 
  select(-c(iso2, iso_numeric)) 


#GDP 

gdp_pc <- import(files_db[1], skip=3) %>% 
  clean_names() %>% 
  pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>% 
  mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric()) %>% 
  filter(year>=2000) %>% 
  select(country=1,country_code, year, gdp_usd_pc=value)

gdp_health <- import(files_db[3], skip=3) %>% 
  clean_names() %>% 
  pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>% 
  mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric(),
         value=value/100) %>% 
  filter(year>=2000) %>% 
  select(country=1,country_code, year, gdp_health_ptc=value)

gni_pc <- import(files_db[2], skip=3) %>% 
  clean_names() %>% 
  pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>% 
  mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric()) %>% 
  filter(year>=2000) %>% 
  select(country=1,country_code, year, gni_usd_pc=value)

gni_usd <- import(files_db[23], skip=3) %>% 
  clean_names() %>% 
  pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>% 
  mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric()) %>% 
  filter(year>=2000) %>% 
  select(country=1,country_code, year, gni_usd=value)

dgghe_usd <- import(files_db[4]) %>% 
  clean_names() %>% 
  pivot_longer(5:ncol(.), names_to = "year", values_to = "value") %>% 
  mutate(year=str_sub(year, 2,nchar(year)) %>% as.numeric()) %>% 
  filter(year>=2000) %>% 
  select(country=1,country_code, year, dgghe_pc=value) 


gdp_dataset <- gdp_pc%>% 
  left_join(gdp_health) %>% 
  left_join(gni_pc) %>% 
  left_join(gni_usd) %>% 
  left_join(dgghe_usd)

hdi_dataset <- import(files_db[9]) %>% 
  pivot_longer(6:ncol(.), names_to = "indicator", values_to = "value") %>% 
  select(-hdi_rank_2022) %>% 
  separate(indicator, sep="_", into=c("indicator", "year")) %>% 
  mutate(year=str_extract(year, "[0-9]+") %>% as.numeric()) %>% 
  filter(!is.na(year),
         indicator %in% c("hdi", "le", "gnipc")) %>% 
  pivot_wider(names_from = indicator,
              values_from = value,
              values_fill = 0) %>% 
  rename(country_code=iso3,
         life_expectancy=le,
         gni_pc_un=gnipc)

#Final database

tb_with_gdp <- tb_dataset %>% 
  left_join(gdp_dataset, by=c("country_code", "year"), relationship = "one-to-one")%>%
  left_join(hdi_dataset) %>% 
  mutate(country=country.x) %>% 
  select(country,country_code,region, everything(), -country.x, -country.y)


#Visualizations

g0 <- tb_with_gdp %>% 
  select(year, budget_tot, cf_tot_sources, exp_tot) %>% 
  group_by(year) %>% 
  reframe(across(is.numeric, \(x) sum(x, na.rm = T))) %>% 
  filter(year>=2018) %>% 
  pivot_longer(2:ncol(.), names_to = "ind", values_to = "n") %>% 
  mutate(ind=case_when(ind=="budget_tot"~"A. Total Budget",
                       ind=="cf_tot_sources"~"B. Expected funding",
                       ind=="exp_tot"~"Total actual expenditure")) %>% 
  group_by(year) %>% 
  mutate(gap=1-(n/lag(n,1)),
         gap=if_else(str_detect(ind, "B. "),gap,NA)) %>% 
  ggplot(aes(x=year, y=n, fill=ind))+
  geom_col(position="dodge", color="black")+
  ggrepel::geom_text_repel(aes(x=year, y=n, label=percent(gap,0.1)), vjust=-0.9, hjust=0)+
  scale_y_continuous(labels = unit_format(unit="B$", scale=1e-9, sep = ""))+
  scale_fill_brewer(palette = "BuPu")+
  tint::theme_tint(base_size = 12)+
  geom_hline(yintercept = 0)+
  labs(fill="In Billion USD$:",
       title="Funding of Tuberculsis response world-wide since 2018",
       caption = "Source: WHO.; Percent on top of the the Expected funding bar is the funding gap of that year")+
  theme(panel.grid.major.y = element_line(color="snow2", linewidth = 0.1),
        axis.text = element_text(face="bold"),
        legend.position = "top",
        axis.title.y = element_blank())
  
  
  g1 <- g0+annotate("text", 
            x=min(g0[["data"]]$year), 
            y=max(g0[["data"]]$n)-max(g0[["data"]]$n)*0.001, 
            label=paste0("The funding gap in the period ranged from"," ",percent(min(g0[["data"]]$gap, na.rm = T),0.1)," to ",percent(max(g0[["data"]]$gap, na.rm = T),0.1), " of the budget."), hjust=0)



world <- tb_with_gdp %>% 
  select(country, region,population, year, new_cases_reported,  cf_tot_sources, budget_tot) %>% 
  group_by(year) %>% 
  reframe(across(is.numeric, \(x) sum(x,na.rm=T))) %>% 
  filter(year>2018) %>% 
  mutate(gap_fun=1-(cf_tot_sources/budget_tot),
         tb_rate=new_cases_reported/population*100000,
         country="World",
         region="World") %>% 
  filter(year==max(year)) %>% 
  select(-new_cases_reported, -population)


g2_data <- tb_with_gdp %>% 
  group_by(country) %>% 
  fill(region) %>% 
  select(country, region,population, year, gap_cases=gap, budget_tot, cf_tot_sources, tb_rate) %>% 
  filter(year>2018) %>% 
  mutate(gap_fun=1-(cf_tot_sources/budget_tot)) %>% 
  group_by(country) %>% 
  filter(year==max(year),
         !is.na(budget_tot),
         !is.na(cf_tot_sources),
         gap_fun>=0) %>% 
  bind_rows(world)

g2 <- g2_data %>% 
  ggplot(aes(x=gap_fun, y=tb_rate,label=country, size=population))+
  geom_point(alpha=0.4)+
  scale_x_continuous(label=percent, "Funding gap (budgeted vs expected funding)")+
  scale_y_continuous(label=comma, "TB cases per 100,000 pop.")+
  scale_size_continuous(labels = unit_format(unit="M.", scale=1e-6, sep = ""))+
  geom_smooth(method = "lm", se=F, color="darkgrey", show.legend = F)+
  ggrepel::geom_text_repel(show.legend = F)+
  geom_point(data=world %>% filter(year==max(year)), size=4, color="red", 
             aes(x=gap_fun, y=tb_rate),
              inherit.aes = F)+
  labs(title="Tubercolosis per population vs. funding gap in 2023",
       subtitle = "(Just the countries with funding gap from 0% and above)",
       size="Size Population")+
    tint::theme_tint(base_size = 12)+
   theme(panel.grid.major = element_line(color="snow3", linewidth = 0.1),
        axis.text = element_text(face="bold"),
        axis.title = element_text(face="bold", size=14),
        legend.position = "top")

g3 <- tb_with_gdp %>% 
  group_by(year) %>% 
  reframe(across(is.numeric, \(x) sum(x, na.rm = T))) %>% 
  mutate(tb_rate=new_cases_reported/population*100000,
         estimated_rate_100k=estimated_cases/population*100000) %>% 
  select(year, tb_rate, estimated_rate_100k) %>% 
  pivot_longer(tb_rate:ncol(.), names_to = "ind", values_to = "n") %>% 
  mutate(ind=recode(ind, "estimated_rate_100k"="Estimated TB rate",
                    "tb_rate"="Reported TB rate")) %>% 
  ggplot(aes(x=year, y=n, color=ind, linetype = ind))+
  geom_line()+
  geom_point(size=4, alpha=0.8)+
  labs(x="Years", y="TB rate per 100k hab.",
       caption="Source: WHO",
       color="Indicator",
       linetype="Indicator")+
   scale_color_brewer(palette = "BuPu")+
  tint::theme_tint(base_size = 14)+
    theme(panel.grid.major.y = element_line(color="snow3", linewidth = 0.1),
        axis.text = element_text(face="bold"),
        legend.position = "top",
        axis.line.x = element_line(linewidth =0.4))

g4 <- tb_with_gdp %>% 
  select(new_cases_reported, population, year) %>% 
  group_by(year) %>% 
  reframe(across(is.numeric, \(x) sum(x, na.rm = T))) %>% 
  mutate(tb_rate=new_cases_reported/population*100000) %>% 
  select(year, tb_rate) %>% 
  left_join(hdi_dataset %>% filter(country=="World"), by="year") %>% 
  select(year, tb_rate, life_expectancy, gni_pc_un, hdi) %>% 
  filter(!is.na(life_expectancy)) %>% 
  pivot_longer(2:ncol(.), names_to = "ind", values_to = "n") %>% 
  mutate(ind=recode(ind,
                    "gni_pc_un"="Gross National Income Per Capita (US Dollars)",
                    "life_expectancy"="Life Expectancy in years",
                    "tb_rate"="TB rate per 100k pop.",
         "hdi"="Human Development index"))%>% 
  ggplot(aes(x=year, y=n, color=ind))+
  geom_line(show.legend = F)+
  labs(caption = "source:WHO TB data and UN data")+
  geom_point(show.legend = F, size=4)+
  facet_wrap(~ind, scales="free", ncol=1)+
  scale_y_continuous( label=comma)+
   scale_color_brewer(palette = "BuPu")+
  labs(x="Years")+
  tint::theme_tint(base_size = 14)+
   theme(panel.grid.major.y = element_line(color="snow3", linewidth = 0.1),
        axis.text = element_text(face="bold"),
        axis.title = element_text(face="bold", size=14),
        legend.position = "top",
        strip.text = element_text(hjust = 0, face="bold"),
        axis.line.x = element_line(linewidth = 0.3),
        axis.title.y=element_blank())


g5 <- tb_with_gdp %>% 
  filter(year>=2016 & year<=2021) %>% 
  select(year, gdp_health_ptc, tb_rate, population, country, dgghe_pc) %>% 
  filter(!is.na(dgghe_pc)) %>% 
  group_by(year) %>% 
  mutate(avg_dgghe=mean(dgghe_pc, na.rm=T),
         avg_tbr=mean(tb_rate, na.rm=T)) %>% 
  ggplot(aes(x=dgghe_pc, y=tb_rate, size=population,
             label=country, 
             group = year))+
  geom_point(alpha=0.4, color="darkblue")+
   scale_x_continuous(label=dollar, "Domestic general government health expenditure per capita (current US$)")+
  scale_y_continuous(label=comma, "TB rate per 100k pop.")+
  scale_size_continuous(labels = unit_format(unit="M.", scale=1e-6, sep = ""))+
  geom_vline(aes(xintercept = mean(dgghe_pc, na.rm=T), 
                 group = year), linetype=2)+
  geom_hline(aes(yintercept = mean(tb_rate, na.rm=T), 
                 group = year), linetype=2)+
  ggrepel::geom_text_repel(show.legend = F, color="black", segment.curvature = -0.1,
                           max.overlaps  = 20)+
   facet_wrap(~year, scale="free_x")+
  geom_text(aes(x=5000, y=max(tb_rate, na.rm = T)-max(tb_rate, na.rm = T)*0.7, label=paste0(dollar(avg_dgghe),"\n TB rate: ",comma(avg_tbr)), group=year), size=5)+
     tint::theme_tint(base_size = 12)+
   theme(
        axis.text = element_text(face="bold"),
        axis.title = element_text(face="bold", size=14),
        axis.line=element_line(linewidth = 0.3),
        legend.position = "top",
         strip.text = element_text(hjust = 0.5, face="bold", size=16),)+
  labs(size="Country's size Population",
       caption = "Source: World bank and WHO; text in $ is the average of the DGGHE per capita and average of TB rate")

 g7 <- tb_with_gdp %>% 
   select(country,year,g_whoregion, tb_rate, new_cases_reported, population, gdp_usd_pc) %>%
   filter(year>2016) %>% 
   ggplot(aes(x=gdp_usd_pc, fill=g_whoregion))+
   geom_histogram(show.legend = F, color="white")+
   scale_x_continuous(labels = dollar)+
   scale_fill_brewer(palette = "BuPu")+
   facet_wrap(~year, scale="free_x")+
   tint::theme_tint()
 
 g7 <- tb_with_gdp %>% 
   select(country,year,g_whoregion, tb_rate, 
          new_cases_reported,
          population, gdp_usd_pc) %>%
   group_by(year) %>% 
   mutate(avg_gdp=mean(gdp_usd_pc, na.rm=T),
          avg_tbr=mean(tb_rate, na.rm=T)) %>% 
   filter(year==2023 | year==2003) %>% 
   ggplot(aes(x=gdp_usd_pc, y=tb_rate, size=population))+
   geom_point(aes(color=g_whoregion), alpha=0.4)+
   geom_smooth(se=F, method = "lm", show.legend = F)+
   scale_x_continuous(labels = dollar, "Gross domestic product per capita (US dollars)")+
    scale_y_continuous(labels = comma, "TB rate per 100,000 pop.")+
   scale_fill_brewer(palette = "BuPu")+
   geom_text(aes(x=100000, y=200, label=paste0("GDP: ", dollar(avg_gdp),"\n TB rate: ",comma(avg_tbr))), size=7, hjust=1)+
    scale_size_continuous(labels = unit_format(unit="M.", scale=1e-6, sep = ""))+
   facet_wrap(~year)+
   tint::theme_tint()+
   theme(axis.line = element_line(linewidth = 0.1))+
   theme(
        axis.text = element_text(face="bold"),
        axis.title = element_text(face="bold", size=14),
        axis.line=element_line(linewidth = 0.3),
        legend.position = "top",
         strip.text = element_text(hjust = 0.5, face="bold", size=16),)+
  labs(size="Country's size Population",
       color="Country's WHO region",
       caption = "Source: World bank and WHO")
 
 g8_data <- tb_with_gdp %>% 
   mutate(hdicode=case_when(hdicode %in% c("")~NA,
                            hdicode == "Low"~"a. Low",
                            hdicode == "Medium"~"b. Medium",
                            hdicode == "High"~"c. High",
                            hdicode == "Very High"~"d. Very High",
                            .default = hdicode)) %>% 
   filter(!is.na(hdicode)) %>% 
   reframe(n_countries=n(),
           tb_rate=sum(new_cases_reported, na.rm = T)/sum(population, na.rm = T)*100000,
           avg_gdp_pc=mean(gdp_usd_pc, na.rm=T),
           ov_gdp=sum(gdp_usd_pc, na.rm = T), .by = c(hdicode, year),
           population=sum(population, na.rm = T)) %>% 
   filter(year %in% c(2000, 2022))
 
 
 g8 <- g8_data %>% 
   ggplot(aes(x=factor(year), 
              y=ov_gdp, 
              fill=hdicode,
              label=paste0("Avg. GDP: ",dollar(round(avg_gdp_pc,1)),
                           "\n TB rate: ",round(tb_rate, 1), " per 100k pop.")))+
   geom_col(alpha=0.5, color="black")+
   facet_wrap(~hdicode, scale="free")+
   scale_fill_brewer(palette = "BuPu")+
   geom_hline(yintercept = 0)+
   geom_text(position = position_stack(vjust = 0.5))+
  geom_text(aes(x=factor(year), y=ov_gdp, label=paste0(dollar(ov_gdp), " B.")),
            vjust=-0.2,
            fontface="bold")+
   tint::theme_tint()+
   theme(
        axis.text.x = element_text(face="bold", size=16),
        legend.position = "none",
        strip.text = element_text(hjust = 0.5, face="bold", size=16),
        axis.title=element_blank(),
        axis.text.y=element_blank(),
        axis.line = element_blank(),
        axis.ticks = element_blank())

Act Phase

Guiding questions:

What is your final conclusion based on your analysis?
How could your team and business apply your insights?
What next steps would you or your stakeholders take based on your findings?
Is there additional data you could use to expand on your findings?

Deliverable: Your top high-level insights based on your analysis, a list of additional deliverables you think would be helpful to include for further exploration

What type of company does your client represent, and what are they asking you to accomplish?

What are the key factors involved in the business task you are investigating?

What type of data will be appropriate for your analysis?

Where will you obtain that data?

Who is your audience, and what materials will help you present to them effectively?

Deliverables

Statement of the business
Data sources descriptions
Data processing procedures
Data analysis
Data visualizations and key findings
Conclusions
Recomendations

World Tuberculosis Day

Leonel Lerebours

2025-03-24

Introduction

Tuberculosis: An Old Disease That Still Threatens the

How the Tuberculosis burden relates with other development indicators

Data sources

Data processing

Analysis

The Tuberculosis rate trend in 20 years seems to increase, probably better diagnostic technologies. The world population is improving.

The Tuberculosis budget vs. expenditure has been constant over the years; also the funding gap is about the same every year in the last 6 years.

In 2023, Is the funding correlates with the Tuberculosis rate in each country? Seems likely, but more data is needed.

The life expectancy, the gross national income, the Human development index and the Tuberculosis rate show a similar trend.

The investemnt in health expenditure seems to be associated with the Tuberculosis rate, the more investment lower the rate (data from 2016 to 2021); more data and deeper analysis is needed to confirm this claim.

The Gross domestric product per capita in 20 years has increase on average a 36% and the TB rate has decreased in 14%; although this data is shows some progress, a forecast analysis and the inclusion of other factors are needed to predict the End of TB in the world.

Insights

Annex

Ask phase

Prepare phase

Process Phase

Analyze phase

Act Phase

What type of company does your client represent, and what are they asking you to accomplish?

What are the key factors involved in the business task you are investigating?

What type of data will be appropriate for your analysis?

Where will you obtain that data?

Who is your audience, and what materials will help you present to them effectively?

Deliverables

World Tuberculosis Day

Leonel Lerebours

2025-03-24

Introduction

Tuberculosis: An Old Disease That Still Threatens the

How the Tuberculosis burden relates with other development indicators

Data sources

Data processing

Analysis

The Tuberculosis rate trend in 20 years seems to increase, probably better diagnostic technologies. The world population is improving.

The Tuberculosis budget vs. expenditure has been constant over the years; also the funding gap is about the same every year in the last 6 years.

In 2023, Is the funding correlates with the Tuberculosis rate in each country? Seems likely, but more data is needed.

The life expectancy, the gross national income, the Human development index and the Tuberculosis rate show a similar trend.

The investemnt in health expenditure seems to be associated with the Tuberculosis rate, the more investment lower the rate (data from 2016 to 2021); more data and deeper analysis is needed to confirm this claim.

The Gross domestric product per capita in 20 years has increase on average a 36% and the TB rate has decreased in 14%; although this data is shows some progress, a forecast analysis and the inclusion of other factors are needed to predict the End of TB in the world.

Insights

Annex

Ask phase

Prepare phase

Process Phase

Analyze phase

Share phase

Act Phase

What type of company does your client represent, and what are they asking you to accomplish?

What are the key factors involved in the business task you are investigating?

What type of data will be appropriate for your analysis?

Where will you obtain that data?

Who is your audience, and what materials will help you present to them effectively?

Deliverables