Repository of Data Visualization Work

Author

Emily Leibfritz

Abstract

This document displays a summary of data visualizations I have created from three different data sets. The data sets are related to the Centers for Disease Control, median income across the United States, and stock prices from technology companies. Some of the R packages used for this work were: Tidyverse, Readr, Viridis, etc.

The plot descriptions and takeaways are superficial since the focuse of this repository is to show my data visualization work.

Disclaimer: The limitations of the demographic options presented in the plots are not a reflection of the author’s views but a reflection of the limitations of the data provided in the datasets.

Loading data

#loading libraries
library(tidyverse)
library(skimr)
library(devtools)
library(readr)
library(viridis)
library(lubridate)

# loading data
load("data/tech_stocks.rda")
load("data/US_income.rda")
CDC <- read_delim("data/cdc.txt", delim = "|")

Plots

Center for Disease Control

ggplot(data = CDC, aes(x = genhlth, y = weight, 
                       colour = gender, fill = gender))+
  geom_violin()+
  labs(
    title = "Distribution of Weight over Different Gender and General Health Conditions.",
    x = "General Health",
    y = "Weight (in lbs)"
  )+
  theme_minimal()

Plot summary: The violin plot above shows the distribution of weight for different general health conditions and it highlights the differences according to sex.
Takeaways:
- Across all health conditions, male’s median weight is higher.
- The distribution of weight is more centered for the “excellent” and “very good” health groups wherease it is more spead out for “poor” and “fair”.

ggplot(data = CDC, aes(x = height, y = weight))+
  stat_density_2d(geom = "polygon", aes(fill=..level..))+
  facet_wrap(~ gender)+
  theme_minimal()+
  labs(
    x = "Height (in)",
    y = "Weight (lbs)", 
    title = "Distribution of Height over Weight by Sex"
  )

Plot summary: The Density of a person’s weight distributed over their height is visualized in two graphs, separating the participants by sex.
Takeaways:
- Among the men, the density is higher at both a larger weight and height.
- For men, there is a higher spread of distribution of weight, whereas women’s weight has a smaller range.

# Data Wrangeling
# calculate mean, se, and margin of error for CI formula
cdc_weight_95ci <- CDC %>%
  group_by(genhlth, gender) %>%
  summarise(
    mean_wt = mean(weight),
    se = sd(weight) / sqrt(n()),
    moe = qt(0.975, n() - 1) * se
  )%>%
  mutate(genhlth = factor(genhlth,
                          levels = c("excellent", "very good", "good",
                                     "fair", "poor")))

# Create Plot
ggplot(data = cdc_weight_95ci, aes(x = mean_wt, y = gender, 
                                   colour = genhlth
                                   ))+
  geom_point(position = position_dodge(width=0.5))+
  
  geom_errorbarh(aes(xmin = mean_wt-moe,
                     xmax = mean_wt+moe, 
                     height = 0.1),
                 position = position_dodge(width=0.5))+
  labs(
    x = "Weight (lbs)",
    y = "Gender",
    color = "General Health\n(self reported)", 
    title = "Distribution of Weight by General Health"
  )+
  theme_minimal()

Plot summary: The graph shows the weight ranges according to health status and sex.
Takeaways:
- People with a poor health status have a higher median and spread of weight which can be linked to obesity related health issues.
- Men with excellent health have the lowest median weight among their sex and yet that weight is higher than women with poor health status’ weight, which is the highest among their sex.

US income

# Data Wrangeling
US_income <- mutate(
  US_income,
  income_bins = cut(
    ifelse(is.na(median_income), 25000, median_income),
    breaks = c(0, 40000, 50000, 60000, 70000, 80000),
    labels = c("< $40k", "$40k to $50k", 
               "$50k to $60k", "$60k to $70k", "> $70k"),
    right = FALSE
  )
)
# Create Plot
ggplot(data = US_income, 
       mapping = aes(fill = income_bins, geometry = geometry))+
  geom_sf(colour = "grey80", size = 0.2)+
  theme_void()+
  scale_fill_viridis(discrete = TRUE)+
  labs(fill = "Median\nIncome",
       title = "Median Income of the US States")

Plot summary: The map shows the median income of a person living in each state of the US
Takeaways:
- Geographically there seems to be a lower median income in the south east of the country. For further analysis, it would be important to see how this is connected to intersectional problems like access to education, racial inequality, etc.
- One of the states with the highest median income is Alaska, which might be connected to the natural resources available such as oil.

Tech stocks

# Create Caption
caption <- paste(strwrap("Stock price over time for four major tech companies"))

#Finding the range for the caption
yrng <- range(tech_stocks$price_indexed)
xrng <- range(tech_stocks$date)

# create subdataset for the last day of each stock
label_info_2 <- tech_stocks%>%
  group_by(company)%>%
  filter(date == max((ymd(date))))

# make the plot
ggplot(data = tech_stocks, 
       mapping = aes(y = price_indexed, x = date))+
  geom_line(mapping = aes(colour = company),
            show.legend = FALSE)+
  labs(
    x = NULL,
    y = "Stock price, indexed"
  )+
  annotate(geom = "text",
           x = xrng[1],
           y = yrng[2],
           size = 4,
           label = caption,
           hjust = 0,
           vjust = 1,
           family = "serif")+
  theme_minimal()+
  geom_text(data = label_info_2,
            mapping = aes(label = company))

Plot summary: The graph shows the indexed stock price of four major technology companies (Alphabet, Apple, Facebook and Microsoft) from 2006 until 2018.
Takeaways:
- The stocks of all the technology companies have grown over time showing their resilience to market swings.
- In 2008, the stocks of Apple, Alphabet, and Microsoft had a short-term decrease which was likely due to the 2008 financial crisis. For a fair assessment it would be interesting to compare their rates to the average decrease in the stock market.