setwd("~/Documents/0 - Montgomery College/0 - DATA 110/3 - Projects/Project 1")

library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ─────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
all_recall <- read_csv("FSIS-Recall-Summary-2014.csv")
## Parsed with column specification:
## cols(
##   `Recall Date` = col_double(),
##   `Recall Number` = col_character(),
##   `Recall Class` = col_character(),
##   Product = col_character(),
##   `Reason for Recall` = col_character(),
##   `Pounds Recalled` = col_double()
## )
str(all_recall)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 94 obs. of  6 variables:
##  $ Recall Date      : num  41649 41652 41654 41656 41656 ...
##  $ Recall Number    : chr  "001-2014" "002-2014" "003-2014" "004-2014" ...
##  $ Recall Class     : chr  "I" "I" "I" "II" ...
##  $ Product          : chr  "Mechanically Separated Chicken Products" "Various Beef Products" "Beef Franks" "Beef and Pork Products" ...
##  $ Reason for Recall: chr  "Salmonella" "Other" "Undeclared Allergen" "Undeclared Allergen" ...
##  $ Pounds Recalled  : num  33840 42103 2664 130000 67113 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Recall Date` = col_double(),
##   ..   `Recall Number` = col_character(),
##   ..   `Recall Class` = col_character(),
##   ..   Product = col_character(),
##   ..   `Reason for Recall` = col_character(),
##   ..   `Pounds Recalled` = col_double()
##   .. )
# Clean up the variable names - make lowercase, remove spaces, add underscore
names(all_recall) <- tolower(names(all_recall))
names(all_recall) <- gsub(" ","_", names(all_recall))

Explore the dataset for any potentially interesting insights

For the sake of simplicity in writing code and commentary, I will consider “meat” to include beef, pork, and chicken (even though chicken is technically considered poultry). It is far easier to call them all “meat.”

SECTION 1 - What’s in the data?

str(all_recall)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 94 obs. of  6 variables:
##  $ recall_date      : num  41649 41652 41654 41656 41656 ...
##  $ recall_number    : chr  "001-2014" "002-2014" "003-2014" "004-2014" ...
##  $ recall_class     : chr  "I" "I" "I" "II" ...
##  $ product          : chr  "Mechanically Separated Chicken Products" "Various Beef Products" "Beef Franks" "Beef and Pork Products" ...
##  $ reason_for_recall: chr  "Salmonella" "Other" "Undeclared Allergen" "Undeclared Allergen" ...
##  $ pounds_recalled  : num  33840 42103 2664 130000 67113 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Recall Date` = col_double(),
##   ..   `Recall Number` = col_character(),
##   ..   `Recall Class` = col_character(),
##   ..   Product = col_character(),
##   ..   `Reason for Recall` = col_character(),
##   ..   `Pounds Recalled` = col_double()
##   .. )

Although there are only 94 observations in this dataset, which is relatively small, it is still hard to get an idea of what food products are included from a cursory glance.

But this quick glance does reveal there are duplicate product names. The duplicates are not due to an input error. They are due to different reasons for recall. For instance, there is “Beef Products” recalls due to “Extraneous Materials” and “Beef Products” recalls due to “Processing Defects.”

Nevertheless, to make this initial exploration simpler, I will aggregate the dupliates to create a smaller dataset that can be aggregated later.

# Show total pounds recalled for each unique product name in the dataset.
# Aggregate using the group_by and summarize functions together. 

unique_prod_total <- all_recall %>%
  group_by(product) %>%
  summarize(total = sum(pounds_recalled)) %>%
  arrange(desc(total))

# NOTE TO SELF - Try doing this with the aggregate function later

The data frame has been reduced to 69 observations. So, 30% of this dataset was duplicates. Worth cleaning. Easier to get a sense of the varieties of foods now.

It is a broad mix of packaged, prepared, and processed meat products. I’m not sure if fresh meat is included. The label “Various Beef Products” is vague. Unfortuantely, the USDA FSIS does not provide any information on the product name descriptions.

SECTION 2 - RECALL CLASS

First look at the data by Recall Class. (Because it’s the easiest method.) There are only 3 classes (I, II, III). So filtering will be quick.

# Show the top 30 Class I recalls with more than 10K pounds recalled

class_I <- all_recall %>%
  filter(recall_class =="I" & pounds_recalled >= 10000) %>%
  arrange(desc(pounds_recalled)) %>%
  head(30)

# This tibble has 30 rows x 6 columns
# Show the top 30 Class II recalls with more than 10K pounds recalled

class_II <- all_recall %>%
  filter(recall_class =="II" & pounds_recalled >= 10000) %>%
  arrange(desc(pounds_recalled)) %>%
  head(30)

# This tibble has 11 rows x 6 columns, which means there are fewer Class II's.
# Show the top 30 Class III recalls with more than 10K pounds recalled

class_III <- all_recall %>%
  filter(recall_class =="III" & pounds_recalled >= 10000) %>%
  arrange(desc(pounds_recalled)) %>%
  head(30)

# This tibble has 3 rows x 6 columns, which means there are very few Class III's.
# Calculate total # of pounds recalled for each recall class (I, II, III)

class_total <- all_recall %>%
  group_by(recall_class) %>%
  summarize(total = sum(pounds_recalled)) %>%
  arrange(desc(total))
 str(class_total)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3 obs. of  2 variables:
##  $ recall_class: chr  "I" "II" "III"
##  $ total       : num  14261888 3817387 595827
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Recall Date` = col_double(),
##   ..   `Recall Number` = col_character(),
##   ..   `Recall Class` = col_character(),
##   ..   Product = col_character(),
##   ..   `Reason for Recall` = col_character(),
##   ..   `Pounds Recalled` = col_double()
##   .. )
 # This tibble has 3 rows x 2 columns
# Create a barplot of Total Pounds Recalled for each recall class

class_total_plot <- class_total %>%
  ggplot(aes(x = recall_class, y = total/1000)) +
  geom_col(fill="brown") +
  ylab("Total Pounds Recalled, in 000's") +
  xlab("FDA Recall Class")
  ggtitle("Total Pounds of Meat-Based Foods Recalled for Each Recall Class, 2014")
## $title
## [1] "Total Pounds of Meat-Based Foods Recalled for Each Recall Class, 2014"
## 
## attr(,"class")
## [1] "labels"
class_total_plot

# FINALLY figured out how to specify the color of bars. 
# NOTE TO SELF (so you don't forget): Use fill= inside geom_col() to change the bar fill color. Using color= will NOT work; instead, it will specify a border color around the bars.

We can see most of the recalls are Class I type, and there are very few Class III. This is not very interesting since I do not have descriptions of the 3 class types. (No luck finding information on it on the USDA FSIS site.)

# Create a new subset containing Class I Top 30 recalls, Class II Top 30 recalls, and Class III Top 30 recalls. 
# This is just practice using the bind_rows function, which will be used later.

class_top30_merged <- bind_rows(class_I, class_II, class_III) %>%
  arrange(recall_class,desc(pounds_recalled))

# This tibble has 44 rows x 6 columns

Summary of Recall Class Exploration

Each recall class has a mixture of meat types (beef, chicken, pork) and reasons for their recall. It’s hard to see any trends this way.

Change the approach. Look at the dataset differently. Zoom in on each meat type instead. The three major meat types are: Beef, Chicken, and Pork. There is one entry for lamb.

Some of these products are a mixture of several meats. There are a smattering of products whose names do not contain the words beef, chicken or pork. Examples: Sausage Products, Salami, Spiral Ham.

These meat by-products will be ignored to make this exercise simpler. So from now on, when I create a meat dataset, such as “pork” dataset, I will be ignoring foods that don’t have pork in its name but we know has pork in it, e.g. sausage, ham.

First, let’s look at Beef.

SECTION 3 - BEEF

# List all the recalls containing "Beef" and arrange in descending order of pounds recalled
# Remove beef products containing chicken or pork. Resulting list should be 'just beef' products.

just_beef <- all_recall %>%
  filter(grepl("Beef", product) & !grepl("Chicken",product) & !grepl("Pork",product)) %>%
  arrange(desc(pounds_recalled))

# beef tibble is now 19 rows x 6 columns. Note there are duplicate product names due to different recall reasons.
# Refine 'just beef' dataset even more - roll up duplicate product names so that the resulting tibble contains only unique product names, and the duplicates have been summed up.
# Note: Duplicate product names are due to different reasons for recall
# Only display the product name and total pounds recalled
# Arrange the list by product name in alphabetical order 

beef_cleaned <- just_beef %>%
  group_by(product,reason_for_recall) %>%
  summarize(pounds_recalled = sum(pounds_recalled))

# beef tibble is now 8 rows x 3 columns
# Make barplot of 'just beef' dataset

beef_plot <- beef_cleaned %>%
  ggplot() +
  geom_bar(aes(x=product,y=pounds_recalled), stat="identity", fill="red") +
  theme(axis.text.x= element_text(angle=90))  # Change the angle of x-axis tick label
beef_plot

# The x-axis tick labels were originally horizontal. Changed to vertical because horizontal was messy and hard to read. 

As expected, the largest beef category is the catch-all bucket called “Various Beef Products.” This probably captures all the products that did not neatly fit into the other beef categories.

A question to ask the USDA – What is the difference between “Various Beef Products” and “Beef Products”?

SECTION 4 - CHICKEN

Do the same analysis for chicken.

# List all the recalls containing "Chicken" and arrange in descending order of pounds recalled
# Remove chicken products containing beef or pork. Resulting list should be 'just chicken' products.

just_chicken <- all_recall %>%
  filter(grepl("Chicken", product) & !grepl("Beef",product) & !grepl("Pork",product)) %>%
  arrange(desc(pounds_recalled))

# Chicken tibble is now 27 rows x 6 columns. There are duplicate product names.
# Refine 'just chicken' dataset even more - roll up duplicate product names so that the resulting tibble contains only unique product names, and the duplicates have been summed up.
# Only display the product name and total pounds recalled
# Arrange the list by product name alphabetical order 

chicken_cleaned <- just_chicken %>%
  group_by(product) %>%
  summarize(pounds_recalled = sum(pounds_recalled))

# Chicken tibble is now 21 rows x 2 columns

Yikes - “mechanically separated chicken products” was recalled.

# Make barplot of 'just chicken' dataset

chicken_plot <- chicken_cleaned %>%
  ggplot() +
  geom_bar(aes(x=product,y=pounds_recalled), stat="identity", fill="orange") +
  theme(axis.text.x= element_text(angle=90))  # Change the angle of x-axis tick label
chicken_plot

The largest chicken category is Frozen Chicken Products, by a long margin. Interesting, I had no idea frozen chicken was even a big item in the freezer aisle. Could Frozen Chicken Products include chicken distributed to fast food chains? McDonald’s Chicken McNuggets?

SECTION 5 - PORK

Do the same analysis for pork.

# List all the recalls containing "Pork" and arrange in descending order of pounds recalled
# Remove pork products containing beef or chicken. Resulting list should be 'just pork' products.

just_pork <- all_recall %>%
  filter(grepl("Pork", product) & !grepl("Beef",product) & !grepl("Chicken",product)) %>%
  arrange(desc(pounds_recalled))

# Pork tibble is now 15 rows x 6 columns. There are duplicate product names.
# Refine 'just pork' dataset even more - roll up duplicate product names so that the resulting tibble contains only unique product names, and the duplicates have been summed up.
# Only display the product name and total pounds recalled
# Arrange the list by product name alphabetical order 

pork_cleaned <- just_pork %>%
  group_by(product) %>%
  summarize(pounds_recalled = sum(pounds_recalled))

# Pork tibble is now 10 rows x 2 columns

I have just caught a minor data entry error. There is a duplicate entry: “Pork products” was not merged into “Pork Products” because the second word was not capitalized. The quickest way to fix this is to edit the csv file.

But I am leaving it in as a reminder of the challenges of working with inconsistent data formatting.

FOLLOW-UP FOR DOWN THE ROAD - How to clean the data entries (not just the variables)?

# Make barplot of 'just pork' dataset

pork_plot <- pork_cleaned %>%
  ggplot() +
  geom_bar(aes(x=product,y=pounds_recalled), stat="identity", fill="purple") +
  theme(axis.text.x= element_text(angle=90))  # Change the angle of x-axis tick label
pork_plot

Well, I caught another error. This time it’s mine. I did not filter out the word “Poultry” when I filtered out “Chicken.” And the only reason I am noticing it is because the second largest category is “Pork and Poultry Products.”

This is also interesting - what kinds of foods contain both pork and chicken together?? I would love to see the raw data that went into each of these product categories.

Summary of Beef, Chicken, and Pork Explorations

In each meat group, the highest recalls are dominated by one category (product name): - Various Beef Products - Frozen Chicken Products - Pork Products

I do not find these categorizations particularly illuminating, nor are the various product names for each meat group interesting either. There isn’t enough information about the contents of the dominant categories. Also, most of the categories are more or less the same (some form of packaged, processed food).

For a food scientist, regulatory agency, or consumer packaged goods marketer, these charts would probably be meaningful.

I am more interested in how beef, chicken, and pork compare with each other. So that will be my next visualization.

SECTION 6 - Comparing the Meat Groups

# Add a new column to capture 'Main Meat" category' for Beef
# Use mutate()

new_beef <- beef_cleaned %>%
  mutate(main_meat = "Beef")
# Add a new column to capture 'Main Meat" category for Chicken
# Use mutate()

new_chicken <- chicken_cleaned %>%
  mutate(main_meat = "Chicken")
# Add a new column to capture 'Main Meat" category for Pork
# Use mutate()

new_pork <- pork_cleaned %>%
  mutate(main_meat = "Pork")

NOTE -Using the mutate function three times above seems to be an elegant way to go about adding this extra column. I’m assuming there is a more efficient way. But this gets the job done for now.

# Bind the 3 meat datasets together

merged_meats <- bind_rows(new_beef, new_chicken, new_pork) %>%
  arrange(main_meat,desc(pounds_recalled))
merged_meats
## # A tibble: 45 x 4
## # Groups:   product [39]
##    product          reason_for_recall             pounds_recalled main_meat
##    <chr>            <chr>                                   <dbl> <chr>    
##  1 Various Beef Pr… Other                                 8784803 Beef     
##  2 Ground Beef Pro… E. coli O157:H7                       1801568 Beef     
##  3 Beef Products    Undeclared Allergen                    568503 Beef     
##  4 Ground Beef Pro… Extraneous Material                     90987 Beef     
##  5 Beef Jerky Prod… Undeclared Allergen                     90000 Beef     
##  6 Beef Products    Extraneous Material                     75465 Beef     
##  7 Beef Products    E. coli O157:H7                         23100 Beef     
##  8 Beef Products    E. coli O103, O111, O121, O1…           15865 Beef     
##  9 Beef Franks      Undeclared Allergen                      2664 Beef     
## 10 Ground Beef      Extraneous Material                      2633 Beef     
## # … with 35 more rows
# This tibble has 39 rows x 3 columns
# This new dataset is ready for the next visualization
# Roll up all the beef products into one Beef line, same for Chicken and Pork.

meat_condensed <- merged_meats %>%
  group_by(main_meat) %>%
  summarize(pounds_recalled = sum(pounds_recalled))

# This tibble is now 3 rows x 2 variables. As small as it will get.
# Creat a barplot of the 3 meat categories next to each other.

top3_meats_plot <- meat_condensed %>%
  ggplot(aes(x=main_meat, y=pounds_recalled/1000)) +
  geom_col(aes(fill=main_meat)) +
  ylab("Total Pounds Recalled in 1000's") +
  xlab("Type of Meat") +
  ggtitle("Which type of meat had the highest FDA food recalls in 2014?") +
  scale_fill_hue(l=40)  # increase color saturation
  scale_fill_brewer(palette="Spectral")  # this isn't changing the color palette
## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: fill
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: NA
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: <ggproto object: Class RangeDiscrete, Range, gg>
##         range: NULL
##         reset: function
##         train: function
##         super:  <ggproto object: Class RangeDiscrete, Range, gg>
##     reset: function
##     scale_name: brewer
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>
top3_meats_plot

This is a surprise. I wasn’t expecting beef to eclipse chicken and pork so much.

The chart informs but is boring. I am going to re-visit the earlier beef chart. I think it would be useful to provide a detailed beef chart in addition to the above summary for someone looking at this visualization.

SECTION 7 - A DETAILED BEEF CHART

For the detailed beef chart, I would like to flip the axis to create a vertical bar chart. The category labels will be easier to read since they’re long. This site has some great examples of the ordinary bar chart brought to life: http://www.storytellingwithdata.com/blog/2018/3/9/bring-on-the-bar-charts

I particularly like the vertical charts that show a few bars in a different color. Those charts really pop and draw you in.

# Make VERTICAL barplot of 'just beef' dataset. Show the reason for recall next to each bar.

vert_beef_plot <- beef_cleaned %>%
  ggplot() +
  geom_bar(aes(x=product,y=pounds_recalled,fill= product), stat="identity") +
  coord_flip()+
  geom_text(aes(x=product,y=pounds_recalled,label= reason_for_recall)) 
vert_beef_plot

# Added label to each bar (the reason for recall).
# The labels overlap the bar fill area. Not readable.
# Tried many ways to move the text label outside the bar area. 
# Using geom-text() - none of attributes attempted are working, e.g. hjust, left, right, inward, outward. 
# Also tried many versions of code found on StackOverflow. None worked. Giving up. Surprised this is so complicated. 

Because the product “Various Beef Products” is so big relative to the other beef products, the latter do not show up on the chart. I can’t re-scale the plot, and even if I did, it might not fit on every screen. It would be better to create a separate chart for everything but “Various Beef Products.” But that chart isn’t calling out to me.

In fact, I find this visualization dull and am not satisfied with leaving it as my final. Up until this point, I was fixated on studying the individual meat categories. But without knowing more about what’s in these different products, the chart doesn’t tell much of a story.

Now I have a better idea for a visualization. Looking at the reasons for these recalls would be more interesting, since it would at least be somewhat educational. We’ve all gotten sick from food before. So it would be good to know what exactly made us sick.

SECTION 8 - FINAL VISUALIZATION

For my final visualization, I will plot the reasons for these recalls and see if there is an interesting finding.

# Create a list of all the reasons for recall, sorted by total number of pounds recalled for each reason

reasons <- all_recall %>%
  group_by(reason_for_recall) %>%
  summarize(total = sum(pounds_recalled)) %>%
  arrange(desc(total))
# Create a barplot showing all the reasons for recall

reasons_plot <- reasons %>%
  ggplot() +
  geom_bar(aes(x=reason_for_recall,y=total/1000,fill= reason_for_recall), stat="identity")+
  
  ylim(0,10500)+  # extend the y-axis scale, default too short and lops off value label for highest bar
  
  geom_text(aes(x=reason_for_recall,y=total/1000,label=total, vjust=-0.5))+  # label each bar with the value of total variable
  
  xlab("Reasons for Recall") +
  ylab("Total Pounds of Meat-Based Foods (in 000's)") +
  ggtitle("Reasons for the USDA's Recall of Meat-Based Products in 2014") +
  
  theme(axis.text.x= element_text(angle=90))  # change the angle of x-axis tick label, or else it won't fit horizontally   

reasons_plot

# Expand the plot to see it in full screen view. Otherwise, the labels will be crowded.

# This chart could be improved with commas in the data labels. My attempt did not work.

And here we are! An interesting finding. The chart makes it very obvious. The #1 reason for recall, by a large margin, is Listeria Monocytogenes, a Gram-positive bacterium. A quick search on Wikipedia helps explain why:

“Its ability to grow at temperatures as low as 0°C permits multiplication at typical refrigeration temperatures, greatly increasing its ability to evade control in human foodstuffs.” - Wikipedia

The symptoms of this bacterial infection are fever, muscle aches, nausea, and diarrhea. Sounds familiar to me.

The second biggest reason is “Undeclared Allergen.” This is rather vague. The third is a strain of E. coli. And I am relieved to see “Extraneous Material” and “Undeclared Substance” low on the list (yuck!), although seeing no bar would be better than a sliver of a bar.

SECTION 9 - PROJECT ESSAY

Originally, I set out to analyze a dataset of adverse food and cosmetic events from the FDA. I was curious to see which cosmetic products were reported and why. That dataset was very large (more than 90,000 events), and all of the data was categorical data that needed to be cleaned up with coding that we haven’t learned yet. Fortunately, the professor pointed out the challenges of working with the data and suggested a more manageable dataset for this project.

This dataset comes from the U.S. Department of Agriculture’s (USDA) Food Safety and Inspection Service (FSIS), which publishes a list of recalls annually. This particular dataset is from 2014 and covers meat-based products. This data is also interesting to me because as a consumer who has gotten food poisoning a number of times, I find it useful to know what foods I am eating could be problematic.

This dataset has 94 observations and 6 variables: product name, reason for recall, number of pounds recalled, recall class, date of recall, and recall number. The variables are all categorical except for number of pounds recalled, which is discrete numeric. The presentation of this data (in a CSV file) was already very clean. There were no data entry typos (except for 1 minor one), and every value was consistently formatted. The only thing that I had to do was clean the variable names.

Before deciding on a data visualization to show, I explored the data by first asking the question: what’s in the data? An initial glance showed me it was mix of beef, chicken, pork, and a lot of processed and packaged meats. Although the list is under 100 records, it is impossible to identify trends from just scrolling down the list. I decided to break this down into more manageable sections.

My first approach was to filter the dataset by Recall Class. There are 3 classes (I, II, III). For each class, I grouped the data by product, summed up the total pounds for each product, and arranged the products in descending order of pounds. A simple bar plot revealed Class I to have the most recalls, followed by Class II and Class III. Since the definitions for the class types are not given on the FSIS site, this isn’t a very insightful chart. (see Section 2)

My next step was examining the major meat categories one by one. I started with beef, since it is my favorite kind of meat and I am curious about how much of it is recalled. Using grepl pattern matching, I filtered the dataset for ‘just beef’ products, eliminating any beef products that had other meats in its name (chicken, pork). I also rolled up duplicate product names, e.g. “Beef Products” into one row. (The duplicates are due to different reasons for recall.) My goal was to see the distribution of pounds recalled for each unique beef product, so I felt comfortable rolling up the duplicates.

The Beef bar plot was not illuminating (see Section 3). The greatest contributor to recalls was “Various Beef Products,” which is as vague as you can get. The second contributor was “Ground Beef Products,” a bit more informative. I repeated this filtering and plotting exercise for Chicken and Pork and came up with similar results. Chicken’s top source of recalls was “Frozen Chicken Products” (see Section 4); Pork’s was “Pork Products.” (see Section 5)

This raised the question: who was the biggest source of recalls – beef, chicken, or pork? My next step was to compare these three groups of meat. First, by using mutate, I added a new column to each meat dataset (main_meat) in order to add my own classification category. I needed this classification to calculate a consolidated total pounds for each meat group. Next, these three meat tibbles (beef_cleaned, chicken_cleaned, pork_cleaned) were merged into one tibble using the bind_rows function. (see Section 6)

I created a bar plot of this merged meat tibble. This plot – consisting of three bars (one for each meat group) – was a bit more informative than the earlier plots. It revealed Beef having far and above the most recalls (in terms of pounds). I expected Chicken to be number one and was surprised to see it so much lower than beef. Since chicken consumption is higher than beef consumption in the U.S. and Salmonella food poisoning comes up in the news, I just assumed there would be more recalls for chicken.

I didn’t think this plot was interesting enough for a final visualization, so I created a slightly spiffier version of a bar plot, based on inspiration from the Storytelling With Data blog. The result was a vertical bar plot showing all the Beef recalls with values for a third variable (Reason for Recall) labeled on the bars. (see Section 7)

The attempt worked, but the result was ugly. Even after a lot of Google searching and trying other people’s code, I could not get the data labels to show up outside the bars. It was not a waste of time though. Along the way I learned a bit more about how R functions work.

Having come this far, it dawned on me that looking at the Reasons for Recall could yield more useful information. My final visualization is a bar plot that shows all the Reasons for Recall (9 total) and the number of pounds recalled for each reason. (see Section 8)

I am happy to report that this chart does show some interesting information. The number one reason for meat-based recalls is a bacterium called Listeria monocytogenes.

Bacterial contamination would probably come as no surprise to anyone. But I think my classmates and consumers in general would be intrigued to learn that the number one bug in meat is not E. coli or Salmonella, which we hear about much more often in the media. I myself had not heard of Listeria monocytogenes before. I am guessing they would also be appalled to see Extraneous Material, Undeclared Substance, Undeclared Allergen, and Processing Defect on this chart. This chart certainly provides the “Ew” factor!

Although a bar plot is a simple data tool, if it makes you stop and wonder, like this one has done for me, I think it has done its job telling a story.

.