Final Project Outline Component

Allison Porambo

Part 1 - The Introduction

  1. Background on Studies of Pharmaceutical Spending Per Capita and Its Relationship to GDP.

    1. History of scholarship on pharmaceutical spending per capita by country.
      1. Origins.
      2. Current state of scholarship/research.
    2. Useful terminology:
      1. Gross Domestic Product : The monetary cost of all goods and services produced in a geographical area in a given unit of time.
      2. Per Capita: A Latin phrase meaning “per head”; often used similarly to “per person.”
  2. Datasets

    1. Sources:
      1. Pharmaceutical Drug Spending by Countries - Organization for Economical Co-operation and Development.
      2. Country, Regional, and World GDP (Gross Domestic Product) - World Bank and Organization for Economic Co-operation and Development.
    2. The data in both datasets were the results of observational studies. No treatment was applied to the population; only variables were recorded.
    3. Potential Bias - The pharmaceutical spending dataset only includes countries that were members of the OECD at the time the data was pulled. Although this technically may not be “bias” since the source of the data doesn’t claim to track variables in non-member countries, it will limit the analysis derived from it to OECD members only. The relationships described in the analysis may not be helpful when looking at countries outside of the organization.
  3. (Potential) Statistics to be Used:

    1. GDP-C - Sample mean of annual GDP by country
    2. GDP-R - Sample mean of annual GDP by region
    3. PER-CAP-PS-C - Sample mean of annual per-capita pharmaceutical spending by country
    4. PER-CAP-PS-R - Sample mean of annual per-capita pharmaceutical spending by region
    5. PC_HXP-C - Sample mean of annual percentage of total healthcare spending that goes towards pharmaceuticals by country
    6. PC_HXP-R - Sample mean of annual percentage of total healthcare spending that goes towards pharmaceuticals by region
    7. PC_GDP-C - Sample mean of annual percentage of GDP that goes towards pharmaceutical spending by country
    8. PC_GDP-R - Sample mean of annual percentage of GDP that goes towards pharmaceutical spending by region
  4. Variables Included in Dataset:

    1. country_code - A shorter means of indicating the country in question without writing out the country’s entire name.
    2. year - Year for which the values were recorded.
    3. pc_healthxp - Pharmaceutical spending as percentage of healthcare spending overall.
    4. pc_gdp - Pharmaceutical spending as percent of GDP.
    5. usd_cap - Annual per-capita pharmaceutical spending in US dollars.
    6. flag_codes - A Boolean label indicating whether or not a certain condition is met.
      1. “D” - (possibly) related to an OECD disclaimer about how the Israeli population was enumerated.
      2. “B” - (unknown at this time)
      3. “P” - (unknown at this time)
    7. total_spend - Annual per-capita healthcare spending in US dollars.
    8. country_name - Name of the country in question.
    9. value - Annual GDP in US dollars.
  5. Background Research

    1. Previous research on the topic.
    2. 5 facts
      1. Uniquely for most developed countries, the United States of America sets drug prices in a largely decentralized manner. Pharmaceutical pricing is set by the actions and negotiations between the federal government, private insurance companies, pharmaceutical manufacturers, drug wholesalers, pharmacies, and pharmacy benefit managers (PBMS). (Lakdawalla 2018, 427)
      2. Outside of the United States, pharmaceutical price-setting systems usually fall into one of two categories: reference pricing and price controls. Reference pricing refers to systems in which drugs are reimbursed at a level equal to the price of a comparable reference product. In a price control system, the prices that manufacturers can charge is explicitly limited. (Lakdawalla 2018, 432-433)
      3. Countries considered emerging markets often lack public or private insurance systems. Pharmaceutical manufacturers tend to sell directly to consumers. Differences between the purity and efficacy of doses of the same drug aren’t uncommon. (Lakdawalla 2018, 435)
      4. Inefficient medication and medical waste can increase pharmaceutical spending over time. In order to avoid such waste, governments can, among other actions, regulate the purity of medicines and create incentives for healthcare providers and patients to use generics. (Garcia-Goni 2022, 18, 32)
      5. The United States remains the only industrialized country in the world that doesn’t have drug price restraints set in place by the government and, save for New Zealand, is the only industrialized country to permit drug manufacturers to advertise directly to consumers. (Spitz and Wickham 2023, 9-10)
  6. Overarching Questions abou the Dataset

    1. Is there a relationship between national GDP and per-capita pharmaceutical spending? If so, what form does it take, and how strong is it?
    2. Is there a difference between the mean GDP of regions (Western Europe, Eastern Europe, North America, East Asia, Middle East)? If so, what form does it take and how strong is it?
    3. How do any outliers bear on the general population? How can they be interpreted in relation to the other observations?

Part 2 - Your Work with the Data

  1. Initial Summary Graphs:

Loading Library and Importing Data

library(tidyverse) # Loads library.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("/Users/Owner/Desktop/Data Science Certificate/Statistics for Scientists/Final Project")
# Sets work directory.
ui <- read_csv("flat-ui_data.csv") # Imports and renames dataset.
## Rows: 1036 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): LOCATION, FLAG_CODES
## dbl (5): TIME, PC_HEALTHXP, PC_GDP, USD_CAP, TOTAL_SPEND
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
gdp <- read_csv("gdp.csv")
## Rows: 11507 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Country Name, Country Code
## dbl (2): Year, Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(ui) # Returns a few rows at the head of the dataset.
head(gdp)

Matching the Names of Shared Variables

names(gdp) <- tolower(names(gdp)) # makes all headers lower-case
names(ui) <- tolower(names(ui))
names(gdp) <- gsub(" ", "_", names(gdp))  # removes spaces between header words
names(ui) <- gsub(" ", "_", names(ui))
ui1 <- ui |>          # match variable names: time from ui to year from gdp datasets
  rename("year" = time,"country_code" = location)

Merging the Datasets

joined <- left_join(ui1, gdp, by = c("country_code", "year")) #joins the datasets "gdp" and "ui1" via the matching variables "country code" and "year".
head(joined)

Histogram of Annual GDP Values in Dataset

All of the recorded annual GDP values in the “joined” dataset are visually summarized using the histogram below. The values show an extreme rightward skew. That is to say that the majority of recorded values are concentrated at the lower end of the range of the gdp values recorded.

hist(joined$value)

Histogram of Annual Per-Capita Pharmaceutical Spending Values in Dataset

The recorded annual per-capita pharmaceutical spending values, like those for annual gdp, exhibit a strong rightward skew. The wide difference in scales between the x-axes of each graph, however, makes it appear as if the data in the second histogram exhibits a wider spread of values than those in the first. While the x-axis of the per-capita pharmaceutical spending plot ranges from $0.00 to $1,200.00, the values of the x-axis of the gdp box plot ranges from $0.00 to ~$2.5 trillion. If the pharmaceutical spending data was graphed on the same x-axis as the gdp data box plot, it would appear much more compressed than it does in it’s own histogram.

hist(joined$usd_cap)

Box and Whisker Plot of Annual GDP Values in Dataset

All of the recorded annual gdp values in the “joined” dataset are visually summarized in the box plot below. The box was colored in light pink for contrast. The values are almost completely concentrated at the bottom edge of the y-axis- in this case, mostly under a 1/2-billion dollars in gdp. The outliers, however, range as high as ~1.5 trillion dollars.

Although the data is plotted in a different format, it appears to repeat and confirm the distribution shown in the histogram of the gdp values.

boxplot(joined$value, col = "lightpink")

Box and Whisker Plot of Annual Per-Capita Pharmaceutical Spending Values in Dataset

Like the histogram of the values of per-capita pharmaceutical spending, the box plot of those same values appear to have a wider spread than the data in the annual gdp box plot. The scales on the y-axes of the two box plots, however, are significantly different: while the y-axis of the per-capita pharmaceutical spending plot ranges from $0.00 to $1,200.00, the values of the y-axis of the gdp box plot ranges from $0.00 to over $1.5 trillion. If the pharmaceutical spending data was graphed on the same y-axis as the gdp data box plot, it would appear much more compressed than it does in it’s own box plot.

Although the data is plotted in a different format, it appears to repeat and confirm the distribution shown in the histogram on the per capita pharmaceutical spending values.

boxplot(joined$usd_cap, col = "lightpink")

Combined Line Graph for the Change Over Time in Per-Capita Pharmaceutical Spending from Each Country

The changes in per capita pharmaceutical spending over the years for each country in the “joined” dataset are visualized in a combined line graph below. Though most of the lines curve up slowly and gradually from the x-axis, an extreme outlier is visible in the upper right-hand corner of the graph. Looking at the dataset alongside this graph, it appears that this outlier may represent the changes in per-capita pharmaceutical spending from the United States.

ggplot(data=joined, aes(x=joined$year,y=joined$usd_cap, group=joined$country_code)) +
  geom_line() +
  labs(x= "Year", y = "Pharmaceutical Spending per Capita, USD", title = "Change in Pharmaceutical Spending Per Capita Over Time")

  1. Summary Statistics

Linear model of relationship between GDP and Per-Capita Pharmaceutical Spending

linear_reg <- lm(data = joined, usd_cap ~ value)
summary(linear_reg)
## 
## Call:
## lm(formula = usd_cap ~ value, data = joined)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -286.2 -165.3  -30.6  142.5  765.9 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.538e+02  6.089e+00   41.67   <2e-16 ***
## value       5.353e-11  2.933e-12   18.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 182 on 1034 degrees of freedom
## Multiple R-squared:  0.2437, Adjusted R-squared:  0.243 
## F-statistic: 333.2 on 1 and 1034 DF,  p-value: < 2.2e-16

The p-value of the variable at nearly zero means the probability that the recorded outcomes from this linear model are due to random chance is close to none. Conversely, the adjusted R2 value for the linear model, 0.243, means that only 24.3% of variations in the observations can be explained by it. These seemingly- contradictory conditions indicate that a more-effective linear model of the relationship may exist after the addition of more variables.

  1. Parameters to Estimate:

    1. Mean GDP by country in 2015
    2. Mean GDP by region in 2015
    3. Mean per-capita pharmaceutical spending by country in 2015.
    4. Mean per-capita pharmaceutical spending by region in 2015.
    5. Mean annual percentage of total healthcare spending that goes towards pharmaceuticals by country in 2015.
    6. Mean annual percentage of total healthcare spending that goes towards pharmaceuticals by region in 2015.
    7. Mean annual percentage of GDP that goes towards pharmaceutical spending by country in 2015.
    8. Mean annual percentage of GDP that goes towards pharmaceutical spending by region in 2015.
    9. Chang in mean GDP by country from 1970 to 2015.
    10. Change in mean GDP by region from 1970 to 2015.
    11. Change in mean per-capita pharmaceutical spending by country from 1970-2015.
    12. Change in mean per-capita pharmaceutical spending by region from 1970-2015.
    13. Change in mean annual percentage of total healtcare spending that goes towards pharmaceuticals by country from 1970-2015.
    14. Change in mean annual percentage of total healthcare spending that goes towards pharmaceuticals by region from 1970-2015.
    15. Change in mean annual percentage of GDP that goes towards pharmaceuticals by country from 1970-2015.
    16. Change in mean annual percentage of GDP that goes towards pharmaceuticals by region from 1970-2015.
  2. Statistical Analysis

    1. Linear Regression Modeling
ggplot(joined, aes(x=value, y=usd_cap))+ 
  geom_point(aes(color = year, alpha = 0.5))+
  scale_color_gradient(low = "blue",high = "orange")+
  geom_smooth(method = "lm")+
  theme_bw()+
  labs(x="GDP, USD",
       y="Pharmaceutical Spending Per Capita, USD",
       title = "Scatterplot of Gross Domestic Product to Average Annual Pharmaceutical Spending Per Capita (USD)")
## `geom_smooth()` using formula = 'y ~ x'

As predicted with the initial summary statistics, the linear model for the relationship with only one explanatory variable is not strong. The color gradients added to the data points to indicate the passage of the years, however, seem to suggest that in general per-capita pharmaceutical spending increased over time.

  1. (Multiple Linear Regression Modeling)
  2. (Bootstrap Confidence Interval for Mean - IF conditions are met)
  3. (ANOVA - IF conditions are met)

Part 3

  1. General Conclusions from Analysis
    1. (Answer to #6A)
    2. (Answer to #6B)
    3. (Answer to #6C)
  2. Important Statistical Summary Results
    1. (P-value[s])
    2. (Confidence interval[s])
    3. (Regression model equation[s])
    4. (Other important findings)
  3. Specific Conclusions Regarding Implications of Results
    1. (Impliations of results to healthcare spending by region/country)
    2. (Any personal opinions)
  4. Opinion of the Overall Analysis
    1. (Overall opinion)
    2. If I had the data, I would include data from every country on Earth for which it is available, not just those in the OECD. By limiting the countries to only those in the OECD, the relationship between per-capita pharmaceutical spending and the gdp for countries in the rest of world remains elusive.
    3. (What questions I was unable to answer through the analysis, and what deficiencies in the original data prevented me from reaching the answer)
  5. Bibliography
    1. Lakdawalla, Darius N. “Economics of the Pharmaceutical Industry.” Journal of Economic Literature 56, no. 2 (June 2018): 397-449.
    2. Bauer, Matthias. “The Compounding Effects of Tariffs on Medicines: Estimating the Real Cost of Emerging Markets’ Protectionism.” European Center for International Political Economy, no. 1 (2017): 1-45.
    3. Garcia-Goni, Manuel. 2022. “Rationalizing Pharmaceutical Spending.” Working Paper, International Monetary Fund.
    4. Spita, Janet, and Mark Wickham. “Pharmaceutical High Profits: The Value of R&D, or Oligopolistic Rents?” The American Journal of Economics and Sociology 71, no. 1 (January 2012): 1-36.