Food Service Inspection Analysis

Montgomery County, Maryland

Author

Brian Crilly

Published

February 7, 2024

Code
/* Properly place and align figure captions */
div.leaflet-control {
  text-align: left;
}

Introduction

Code
# Configure working directory
setwd('~/Library/CloudStorage/OneDrive-Personal/Loyola/DS_736/GIT_Repository/DS736')

# Load libraries
library(plyr)
library(httr)
library(jsonlite)
library(lubridate)
library(tidyverse)
library(scales)
library(ggrepel)
library(leaflet)
library(sf)
library(plotly)
library(patchwork)
library(maps)

# Use parameter from header to determine data source
# If data refresh is indicated, pull data from dataMontgomery and store in data table
if (params$refresh_data) {
  data = GET('https://data.montgomerycountymd.gov/resource/46p5-g5na.json?$limit=15000')
  data = as_tibble(fromJSON(rawToChar(data$content)))
} else {
  # Otherwise use data already downloaded
  data <- as_tibble(read_csv('MC_FoodInspection.csv'))
}

# Clean up data
data <- data |> 
  # Use descriptive titles for violations
  rename(
    v_Approved_Source = violation1,
    v_Contamination_Protection = violation2,
    v_Workers_Restricted = violation3,
    v_Hand_Washing = violation4,
    v_Cooling_Time_Temp = violation5,
    v_Cold_Holding_Temp = violation6a,
    v_Hot_Holding_Temp = violation6b,
    v_Cooking_Time_Temp = violation7a,
    v_Reheat_Time_Temp = violation7b,
    v_Hot_Cold_Water = violation8,
    v_Sewage_Disposal = violation9,
    v_Toxic_Substances = violation20,
    v_Pest_Control = violation22,
    v_Nutrition_Labeling = nutritional_labeling,
    v_Trans_Fat = violationtransfat,
    v_No_Smoking = violationsmoking
  ) |> 
  # Remove unused columns
  select(-c(address2, type)) |> 
  # Remove observations with NA values where important
  filter(!is.na(inspectiontype)) |> 
  # Convert NA values in category to 'Other'
  mutate(category = replace_na(category, 'Other')) |> 
  # Convert inspection date to date format
  mutate(inspectiondate = as_date(inspectiondate)) |> 
  # Convert other columns to factors
  mutate_at(.vars = c(3:4, 6:23, 24:25), as_factor) |> 
  # Convert Lat / Long to numeric
  mutate(
    latitude = as.numeric(latitude),
    longitude = as.numeric(longitude)
  )

Montgomery County is located to the north and west of Washington, DC in the state of Maryland (see Figure 1). Montgomery County occupies roughly 493 square miles, and, as of the 2020 census, had a population 1,062,061 individuals (U.S. Census Bureau QuickFacts, n.d.). The Department of Health and Human Services, the largest department within the Montgomery County Government, is responsible for protecting community health and addressing basic human needs across the county (Montgomery County Department of Health and Human ServicesHome PageAbout Us, n.d.). Services provided by the department include the licensing and inspection of food service facilities such as grocery stores, restaurants, cafeterias, and food trucks (Licensure and Regulatory Services - Montgomery County, Maryland, n.d.).

This report presents the analysis of data provided by the Department of Health and Human Services regarding the routine inspection of food service facilities across Montgomery County for the most recent 24 months.

Code
# Create map to show where Montgomery County is located within the state of Maryland
maryland <- map_data('county') |> 
  filter(region == 'maryland') |> 
  mutate(highlight = (subregion == 'montgomery'))

# Find the center of Montgomery County to guide label positioning
mc_center <- maryland |> 
  filter(subregion == 'montgomery') |> 
  summarize(lat = mean(lat), long = mean(long))

# Create the map with a label for Montgomery County
maryland |> 
  ggplot(
    aes(
      x = long, 
      y = lat
    )
  ) +
  geom_polygon(
    aes(
      group = group, 
      fill = highlight
    ), 
    color = 'black'
  ) +
  theme_void() +
  theme(legend.position = 'none') +
  geom_label(
    aes(
      x = mc_center$long,
      y = mc_center$lat,
      label = 'Montgomery County'
    ),
    nudge_x = -0.5,
    nudge_y = -0.05
  )
Figure 1: Map of Maryland Highlighting Montgomery County

Data Set

The Food Inspection data set is provided by the Open Data Operations of the Montgomery County Government (“Montgomery County Data Open Data Portal,” n.d.). Contained within the data set are the results of 12,985 food service facility inspections conducted during the most recent 24 months. For each inspection, the following information is provided:

Variable Description
Name Facility name
Address Facility street address
City Facility city
Zip Facility zip code
Date Inspection date
Result Overall inspection result
Approved Source All food from a regulated source
Contamination Protection All food protected from contamination
Ill Worker Restriction Food not handled by worker with transmissible disease
Hand Washing Workers follow proper hand washing regulations
Hot Holding Temperature Hot foods held at or above proper temperature
Cooking Time and Temperature Foods cooked to an appropriate internal temperature
Reheat Time and Temperature Foods are reheated to an appropraite internal temperature
Hot and Cold Water Both hot and cold running water are available
Proper Sewage Disposal All sewage is disposed properly
Toxic Substances Toxic substances are properly marked and stored
Rodents and Insects Proper measures are taken to prevent and eliminate rodents and insects
Nutrition Labeling Facility follows nutritional labeling regulations
Trans Fat Facility adheres to restrictions regarding trans fat
No Smoking Sign No smoking signs posted per regulations
Inspection Type Comprehensive or monitoring inspection
Owner Facility owner
Category Facility cataegory
Type Facility type
Lat Latitude of facility
Long Longitude of facility

Findings

Code
# Create a list of likely chains in data set for further cleaning
chains <- c(
  'STARBUCKS',
  '7-ELEVEN',
  'ADVANCE AUTO PARTS',
  'ALDI',
  'AMAZON FRESH',
  'AMERICAN LEGION',
  'AUTUMN LAKE HEALTHCARE',
  'BURGER KING',
  'CALIFORNIA TORTILLA',
  'CAVA MEZZE GRILL',
  'CHICK-FIL-A',
  'CHIPOTLE',
  'CHOPT CREATIVE SALAD',
  'CVS',
  'DOLLAR TREE',
  'DOMINO\\\'S PIZZA',
  'DON POLLO',
  'DUNKIN DONUTS',
  'EINSTEIN BROS. BAGELS',
  'FIVE BELOW',
  'FIVE GUYS',
  'GENERAL NUTRITION CENTER',
  'GENJI SUSHI',
  'GIANT FOOD',
  'GOLDS GYM',
  'HARRIS TEETER',
  'HISSHO SUSHI',
  'HOME DEPOT',
  'HOME GOODS',
  'IHOP',
  'LEDO PIZZA',
  'LITTLE CAESARS',
  'MAMMA LUCIA',
  'MCDONALD',
  'MEGAMART',
  'MICHAEL\\\'S STORE',
  'MISSION BBQ',
  'OLD NAVY',
  'PANDA EXPRESS',
  'PANERA BREAD',
  'PAPA JOHN\\\'S PIZZA',
  'PARTY CITY',
  'PIZZA HUT',
  'POTBELLY',
  'PROMEDICA',
  'RED LOBSTER',
  'RITA\\\'S',
  'ROSS DRESS FOR LESS',
  'SAFEWAY',
  'STAPLES',
  'SUBWAY',
  'TARGET',
  'TGI FRIDAY\\\'S',
  'T.J. MAXX',
  'TACO BELL',
  'TRADER JOE\\\'S',
  'VITAMIN SHOPPE',
  'WALGREEN',
  'WENDY\\\'S',
  'WHOLE FOODS MARKET',
  'WINGSTOP'
)

# Get count of locations by name (i.e. if > 1, likely a chain) then select top 10
data <- data |> 
  # Clean up names of chain establishments based on list above
  # e.g. STARBUCKS 123 and STARBUCKS KIOSK both become STARBUCKS
  mutate(
    name = str_replace(
      name,
      paste(
        '[[:graph:]|[:space:]]*(',
        paste(chains, collapse = '|'),
        ')[[:graph:]|[:space:]]*',
        sep = ''
      ),
      '\\1'
    )
  ) |> 
  mutate(
    name = as_factor(
      ifelse(
        name =='CVS', 
        name, 
        ifelse(
          name == 'MCDONALD',
          'McDonald\'s',
          str_to_title(name)
        )
      )
    )
  )

# Determine how many unique food service facilities are in the database
distinct_locations <- data |>
  distinct(name, latitude, longitude, category, .keep_all = TRUE)

Across the 4,103 unique food service facilities identified in the data set, the Montgomery County Department of Health conducted conducted 12,985 inspections during the most recent 24 months. This analysis considers the types of facilities, the quantity of inspections conducted, types of violations recorded, and facility closures resulting from a failed inspection. Each of these considerations is presented in a separate tab below.

The 4,103 unique food service facilities in the data set are broken down by category. The top categories based on number of facilities are presented in Figure 2, with all remaining categories lumped into the designation Other.

Code
# Group public schools together
distinct_locations <- distinct_locations |> 
  dplyr::mutate(
    category = str_replace(
      category,
      '^(Public School).*',
      '\\1'
    )
  )

# Find the top categories by count, excluding 'Other'
top_categories <- distinct_locations |> 
  filter(category != 'Other') |> 
  group_by(category) |> 
  summarize(n = length(category)) |> 
  arrange(desc(n)) |> 
  slice(1:7)

# Group all remaining categories into 'Other'
topN_categories <- distinct_locations |> 
  mutate(
    category = ifelse(category %in% top_categories$category, category, 'Other')
  ) |> 
  group_by(category) |> 
  summarize(n = length(category)) |> 
  mutate(
    category = factor(
      category,
      levels = c(top_categories$category, 'Other')
    )
  )

# Plot the top categories of food service facilities
topN_categories |> 
  arrange(fct_relevel(category, 'Other', after = Inf)) |> 
  plot_ly(
    labels = ~category,
    values = ~n,
    textposition = 'inside'
  ) |> 
  add_pie(
    title = paste0('Total Facilities\n', comma(sum(topN_categories$n))),
    hole = 0.7,
    direction = 'clockwise',
    sort = FALSE,
    hovertemplate = '%{label}<br>Percent: %{percent}<br>Count: %{value:,}<extra></extra>',
  )
Figure 2: Distribution of Food Service Facilities by Category
Code
# Identify likely chains
locations_count <- distinct_locations |> 
  group_by(name) |> 
  # Count the number of unique locations for each establishment
  summarize(
    locations = length(name),
  ) |> 
  # Order by count from largest to smallest and select the top 10
  arrange(desc(locations))

# Count the number of locations likely part of a chain versus
# those not part of a chain (independents)
independents <- sum(locations_count$locations == 1)
chain_count = nrow(distinct_locations) - independents

As indicated in Figure 2, the majority of food service facilities in the data set, are categorized as Restaurants and Markets. Many of those restaurants and markets are part of a chain, with approximately 1,064 facilities in the data set belonging to a food service chain. In total, there appear to be approximately 211 different chains represented in the data set. The 10 largest chains, based on number of locations, are identified in Figure 3.

Code
# Plot the top 10 chains in Montgomery County
locations_count |> 
  slice(1:10) |> 
  ggplot(aes(x = locations, y = fct_reorder(name, locations))) +
    # geom_col(aes(fill = fct_reorder(name, locations))) +
    geom_col(aes(fill = locations)) +
    geom_text(aes(label = locations), hjust = -0.25) +
    labs(
      # title = 'Figure 1: 10 Largest Food Service Chains in Montgomery County',
      x = 'Locations',
      y = NULL,
      caption = 'Data from data.montgomerycountymd.gov'
    ) +
    theme_light() +
    theme(
      legend.position = 'none',
      plot.title = element_text(hjust = 0.5)
    ) +
    scale_fill_continuous(type = 'viridis', trans = 'reverse') +
    scale_x_continuous(
      labels = seq(0, round_any(max(locations_count$locations), 10, ceiling), 10),
      breaks = seq(0, round_any(max(locations_count$locations), 10, ceiling), 10)
    )
Figure 3: 10 Largest Food Service Chains in Montgomery County

Code
# Group public schools together into one category
data <- data |> 
  mutate(
    category = str_replace(
      category,
      '^(Public School).*',
      '\\1'
    )
  )

# Determine the number of inspections per week in the data set
inspections_per_week <- data |> 
  mutate(
    Year = as_factor(year(inspectiondate)),
    Week = strftime(inspectiondate, format = '%V')
  ) |> 
  group_by(Year, Week) |> 
  summarize(n = length(inspectiondate), .groups = 'drop')

The Montgomery County Department of Health recorded 12,985 inspections in the provided data set. A histogram of the number of inspections conducted per week is presented in Figure 4. During the past 24 months, the Department of Health conducted an average of 123.7 inspections per week, with a standard deviation of 44.5. The coefficient of variation of 36% indicates moderate variation in the number of inspections conducted per week.

Code
# Get mean and standard deviation to further augment the graph
stat_marker <- tribble(
  ~stat, ~value,
  'Mean', mean(inspections_per_week$n),
  '+1 SD', mean(inspections_per_week$n) + sd(inspections_per_week$n),
  '-1 SD', mean(inspections_per_week$n) - sd(inspections_per_week$n)
) |> 
  mutate(stat = factor(stat))

# Create a histogram of the number of inspections per week.
# Highlight mean and ±1 standard deviations
inspections_per_week |> 
  ggplot(aes(x = n)) +
  geom_histogram(
    # aes(y = after_stat(density)),
    binwidth = 10,
    fill = 'deepskyblue4'
  ) +
  labs(
    x = 'Inspections per Week',
    y = 'Count',
    caption = 'Data from data.montgomerycountymd.gov'
  ) +
  geom_vline(
    data = stat_marker,
    aes(xintercept = value, color = stat),
    linetype = 'dashed'
  ) +
  scale_color_manual(
    guide = 'none',
    # '',
    values = c('darkorange1', 'darkorange1', 'darkolivegreen3'),
    # labels = c('+1 SD', 'Mean', '-1 SD'),
    # breaks = c('+1 SD', 'Mean', '-1 SD')
  ) +
  theme_minimal() +
  geom_point(
    data = stat_marker,
    aes(x = value, y = 0, color = stat),
    size = 3
  ) +
  geom_label_repel(
    data = stat_marker,
    aes(
      x = value,
      y = 0,
      label = paste0(stat, ': ', round(value, 1))
    ),
    box.padding = 1,
    point.padding = 1,
    size = 4,
    color = 'Grey50',
    segment.color = 'white'
  )
Figure 4: Histogram of Inspections Conducted per Week

Figure 5 breaks down the number of inspections by month for the past 24 months. From the results, we note that the number of inspections seem to peak in March and tend to be lowest in the summer months. Note that partial months are not included in the graph.

Code
# Get number of inspections by month

# Ignore partial months
today <- Sys.Date()
end_date <- floor_date(today, 'month') - days(1)
start_date <- end_date - years(2) + days(1)
start_date <- start_date + months(1)

# Get count of inspections for past 24 months (ignoring partial months)
inspection_count <- data |> 
  select(inspectiondate) |> 
  filter(inspectiondate >= start_date & inspectiondate <= end_date) |> 
  mutate(
    Month = as_factor(month(inspectiondate, label = TRUE, abbr = TRUE)),
    Year = as_factor(year(inspectiondate))
  ) |> 
  group_by(Year, Month) |> 
  summarize(Inspections = length(Month), .groups = 'keep')

# Plot number of inspections by month
inspection_count |> 
  ggplot(aes(x = Month, y = Inspections)) +
  geom_line(
    aes(group = Year, color = Year),
    linewidth = 2
  ) +
  geom_point(
    aes(fill = Year),
    size = 5,
    shape = 21,
    color = 'white'
  ) +
  labs(
    x = 'Month',
    y = 'Inspections Conducted',
    caption = paste0('Data as of ', today ,'\nPartial months not displayed')
  ) +
  scale_color_brewer(palette  = 'Dark2') +
  scale_fill_brewer(palette = 'Dark2') +
  scale_y_continuous(
    expand = c(0, 0),
    limits = c(0, round_any(max(inspection_count$Inspections), 200, ceiling)),
    labels = seq(0, round_any(max(inspection_count$Inspections), 200, ceiling), 200),
    breaks = seq(0, round_any(max(inspection_count$Inspections), 200, ceiling), 200)
  ) +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_label_repel(
    aes(label = ifelse(
      Inspections == min(Inspections) | Inspections == max(Inspections), 
      comma(Inspections), '')),
    box.padding = 1,
    point.padding = 1,
    size = 4,
    color = 'Grey50',
    segment.color = 'darkblue'
  )
Figure 5: Food Safety Inspections by Month for the Past 24 Months

There are 16 catgories of violations that may be noted during a food service facility inspection. The four most common violations, as a percentage of inspections, are identifed in Figure 6.

Code
# Create descriptive labels for violations for graphing
violations = c(
    'v_Approved_Source' = 'Food From\nApproved Sources',
    'v_Contamination_Protection' = 'No Contaminated\nFoods',
    'v_Workers_Restricted' = 'No Workers with\nTransmissible Disease',
    'v_Hand_Washing' = 'Proper\nHand Washing',
    'v_Cooling_Time_Temp' = 'Cooling Time\nand Temperature',
    'v_Cold_Holding_Temp' = 'Cold Holding\nTemperature',
    'v_Hot_Holding_Temp' = 'Hot Holding\nTemperature',
    'v_Cooking_Time_Temp' = 'Cooking Time\nand Temperature',
    'v_Reheat_Time_Temp' = 'Reheat Time\nand Temperature',
    'v_Hot_Cold_Water' = 'Hot and Cold\nWater Available',
    'v_Sewage_Disposal' = 'Proper Sewage\nDisposal',
    'v_Toxic_Substances' = 'Toxic Substances\nControl',
    'v_Pest_Control' = 'Pest Control\nMeasures',
    'v_Nutrition_Labeling' = 'Nutritional\nLabeling',
    'v_Trans_Fat' = 'Trans Fats',
    'v_No_Smoking' = 'No-Smoking\nSigns'
)

# Calculate percent of violations noted based on number of inspections
violations_pct <- data |> 
  pivot_longer(cols = 7:22, names_to = 'Violation', values_to = 'Result') |> 
  count(Violation, Result) |> 
  group_by(Violation) |> 
  mutate(pct = n / sum(n) * 100)

# Record the four most common violations by percentage
top_violations <- violations_pct |> 
  arrange(desc(Result), desc(pct)) %>% 
  filter(Result == 'Out of Compliance') |> 
  ungroup() |> 
  slice(1:4)

# Plot donut charts of the four most common violations
plot_ly(
    labels = ~Result,
    values = ~n,
    textposition = 'inside'
  ) |> 
  add_pie(
    data = violations_pct[violations_pct$Violation == top_violations$Violation[[1]], ],
    title = violations[top_violations$Violation[[1]]],
    hole = 0.7,
    hovertemplate = '%{label}<br>Percent: %{percent}<br>Count: %{value:,}<extra></extra>',
    domain = list(
      row = 0,
      column = 0
    )
  ) |> 
  add_pie(
    data = violations_pct[violations_pct$Violation == top_violations$Violation[[2]], ],
    title = violations[top_violations$Violation[[2]]],
    hole = 0.7,
    hovertemplate = '%{label}<br>Percent: %{percent}<br>Count: %{value:,}<extra></extra>',
    domain = list(
      row = 0,
      column = 1
    )
  ) |> 
  add_pie(
    data = violations_pct[violations_pct$Violation == top_violations$Violation[[3]], ],
    title = violations[top_violations$Violation[[3]]],
    hole = 0.7,
    hovertemplate = '%{label}<br>Percent: %{percent}<br>Count: %{value:,}<extra></extra>',
    domain = list(
      row = 1,
      column = 0
    )
  ) |> 
  add_pie(
    data = violations_pct[violations_pct$Violation == top_violations$Violation[[4]], ],
    title = violations[top_violations$Violation[[4]]],
    hole = 0.7,
    hovertemplate = '%{label}<br>Percent: %{percent}<br>Count: %{value:,}<extra></extra>',
    domain = list(
      row = 1,
      column = 1
    )
  ) |> 
  layout(
    showlegend = TRUE,
    grid = list(rows = 2, columns = 2)
  )
Figure 6: Percent of Occurrence of Most Common Violations

The diagram in Figure 7 highlights the most common violations, as a percentage of inspections, by food service facility categories. From this diagram, it appears that the most common violations, especially those identified in Figure 6, tend to occur predominantly in hospitals, restaurants, and nursing homes. (Note: 0.0% indicates no occurrence of the specific violation was noted during any of the inspections for the given food service facility category.)

Code
# Find the top categories of food service providers based on percentage of
# violations across all violations recorded, along with the top violations recorded

# First find the percentage of violations by category
violations_by_category <- data |> 
  dplyr::mutate(
    category = str_replace(
      category,
      '^(Public School).*',
      '\\1'
    )
  ) |> 
  pivot_longer(cols = 7:22, names_to = 'violation', values_to = 'result') |> 
  select(category, violation, result) |> 
  count(category, violation, result) |> 
  group_by(category, violation) |> 
  mutate(pct = n / sum(n) * 100) |> 
  ungroup() |> 
  filter(result == 'Out of Compliance') 

# Rank categories based on percentages of noncompliances across violations
top_category <- violations_by_category |> 
  filter(!category == 'Other') |> 
  filter(!is.na(category)) |> 
  group_by(category) |>
  summarize(tot_pct = sum(pct)) |> 
  arrange(desc(tot_pct)) |> 
  select(category) |> 
  slice(1:9)

# Rank violations based on percentages of noncompliances across violations
top_violation <- violations_by_category |> 
  group_by(violation) |>
  summarize(tot_pct = sum(pct)) |> 
  arrange(desc(tot_pct)) |> 
  select(violation) |> 
  slice(1:9)

# Create a 10th category called 'Other' to include all other food service
# categories not in the top 9. Also create a 10th category called 'Other' 
# for violations not in the top 9.
pct_violation_by_category <- data |> 
  dplyr::mutate(
    category = str_replace(
      category,
      '^(Public School).*',
      '\\1'
    )
  ) |> 
  pivot_longer(cols = 7:22, names_to = 'violation', values_to = 'result') |> 
  select(category, violation, result) |> 
  mutate(
    category = ifelse(
      category %in% top_category$category,
      category,
      'Other'
    ),
    category = factor(
      category,
      levels = c(top_category$category, 'Other')
    ),
    violation = ifelse(
      violation %in% top_violation$violation,
      violation,
      'Other'
    ),
    violation = factor(
      violation,
      levels = c(top_violation$violation, 'Other')
    )
  ) |> 
  # Find percentage of violations by inspection for each combination of
  # category and violation
  count(category, violation, result) |> 
  group_by(category, violation) |> 
  mutate(pct = n / sum(n) * 100) |> 
  ungroup() |> 
  filter(result == 'Out of Compliance') |> 
  select(category, violation, pct) |> 
  pivot_wider(names_from = violation, values_from = pct) |> 
  mutate_at(.vars = 2:11, ~replace_na(., 0)) |> 
  mutate_at(.vars = 'category', ~replace_na(., 'Other')) |> 
  pivot_longer(cols = 2:11, names_to = 'violation', values_to = 'pct') |> 
  arrange(desc(pct))

# Find the max percentage to set up legend for graph
max_pct <- round_any(max(pct_violation_by_category$pct), 10, ceiling)

# Create a heat map to display percent occurrence of violations by
# category and violation
pct_violation_by_category |> 
  ggplot(aes(x = category, y = violation, fill = sqrt(pct))) +
    geom_tile(color = 'black') +
    geom_text(
      aes(label = sprintf('%.1f%%', pct)),
      # aes(label = paste0(round(pct, 1), '%')),
      size = 3,
    ) +
    scale_y_discrete(
      limits = rev(c(top_violation$violation, 'Other')),
      labels = c(violations[top_violation$violation], 'Other')
    ) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 50, vjust = 1, hjust = 1)) +
    scale_fill_continuous(
      low = 'white',
      high = 'red',
      labels = paste0(c(seq(0, max_pct, by = 10)), '%'),
      breaks = c(sqrt(seq(0, max_pct, by = 10))),
      name = '% Non-Compliant'
    ) +
    labs(
      x = NULL,
      y = NULL,
      caption = 'Data from data.montgomerycountymd.gov'
    ) +
    guides(
      fill = guide_legend(
        reverse = TRUE,
        override.aes = list(color = 'black')
      )
    )
Figure 7: Percent of Occurrence of Violations by Facility Category

Based on the data set, it does not appear that all noted violations during an inspection result in a closure of the facility. Figure 8 presents the percentage of inspections that resulted in a facility closure by category, along with the total number of closures by category. While restaurants had the highest number of closures, they also represent the largest category. Carry-out facilities represent a smaller portion of the food service facilities in the data set, but had the highest percentage of inspections resulting in a facility closure.

Code
# Find top categories for which an inspection resulted in a closure
failure_by_category <- data |> 
  group_by(category) |> 
  summarize(
    n = sum(inspectionresults == 'Facility Closed'),
    pct = round(n / length(category), 4)
  ) |> 
  arrange(desc(pct)) |> 
  filter(category != 'Other') |>
  mutate(category = as.character(category)) |> 
  filter(n > 5)

# For all other categories, group as 'Other'
top_failure_by_category <- data |> 
  mutate(category = as.character(category)) |> 
  mutate(
    category = ifelse(
      category %in% failure_by_category$category,
      category,
      'Other'
    )
  ) |> 
  mutate(
    category = factor(
      category,
      levels = c(failure_by_category$category, 'Other')
    )
  ) |> 
  group_by(category) |> 
  summarize(
    n = sum(inspectionresults == 'Facility Closed'),
    pct = round(n / length(category), 4),
    count = length(category)
  )

# Set scale for second axis
p_scale <- 1 / 4000
max_pct <- max(top_failure_by_category$pct)

# Plot total closures and percent closures
top_failure_by_category |> 
  ggplot(aes(x = category, y = pct)) +
    # Use bar chart for percent of closure by category
    geom_col(aes(fill = sqrt(pct))) +
    geom_text(
      aes(
        label = percent(pct)
      ), 
      hjust = -0.25
    ) +
    # Use line chart for total closures by category
    geom_line(
      aes(
        x = category,
        y = n * p_scale,
        color = 'Closure\nCount',
        group = 1
      ),
      linewidth = 1
    ) +
    geom_point(
      aes(
        x = category,
        y = n * p_scale,
        group = 1
      ),
      size = 3,
      shape = 21,
      fill = 'white',
      color = 'black'
    ) +
    coord_flip() +
    labs(
      x = NULL,
      y = 'Percent of Inspections Resulting in Closure',
      caption = 'Data from data.montgomerycountymd.gov'
    ) +
    theme_light() +
    scale_color_manual(
      NULL,
      values = 'black'
    ) +
    scale_fill_continuous(
      low = 'lightpink',
      high = 'red',
      labels = paste0(c(100 * seq(0, max_pct, by = 0.005)), '%'),
      breaks = c(sqrt(seq(0, max_pct, by = 0.005))),
      name = '% Closures',
      guide = 'none'
    ) +
    scale_y_continuous(
      limits = c(0, round_any(max(top_failure_by_category$pct), .01, ceiling)),
      sec.axis = sec_axis(
        ~. / p_scale,
        name = 'Count of Inspections Resulting in Closure'
      ),
      labels = percent
    ) +
    scale_x_discrete(
      limits = rev(c(failure_by_category$category, 'Other')),
      labels = rev(c(failure_by_category$category, 'Other'))
    ) +
    geom_label_repel(
      aes(
        x = category,
        y = n * p_scale,
        label = paste0(n, ' of ', comma(count)),
        group = 1
      ),
      nudge_x = 0.01,
      box.padding = 1,
      point.padding = 1,
      size = 4,
      color = 'Grey50',
      segment.color = 'darkblue'
    ) +
    guides(
      color = guide_legend(order = 1),
      fill = FALSE
    )
Figure 8: Facility Closures by Category

The map presented in Figure 9 identifies the location of the food service facilities that were closed in 2023 as a result of an inspection by the Department of Health. The color coding identifies if the facility remains closed at present or has since reopened following a favorable re-inspection. (Note: Map controls in the lower-right corner allow for filtering of facilities based on current status.)

Code
# Identify all inpsection failures for 2023
inspection_failure <- data |> 
  mutate(Year = as_factor(year(inspectiondate))) |> 
  filter(Year == 2023) |>
  filter(inspectionresults == 'Facility Closed') |> 
  select(name, latitude, longitude) |> 
  unique()

# Determine which facilities have since reopened
failure_status <- data |>
  filter(
    name %in% inspection_failure$name &
      latitude %in% inspection_failure$latitude & 
      longitude %in% inspection_failure$longitude
  ) |> 
  group_by(name, latitude, longitude) |> 
  filter(inspectiondate == max(inspectiondate)) |> 
  ungroup() |> 
  select(name, inspectiondate, inspectiontype, inspectionresults, latitude, longitude) |> 
  mutate(
    inspectionresults = as.factor(
      ifelse(
        inspectionresults == 'Facility Closed', 
        'Closed', 
        'Reopened'
      )
    )
  )

# Create color coding for facilities that remain closed versus those that have reopened
pal <- colorFactor(c('red', 'darkgreen'), domain = c('Closed', 'Reopened'))

# Add a shape file to highlight Montgomery County on the map
# This shapefile was downloaded from the US Census Bureau
load(file = 'MC_Shapefile.RData')

# Create a Leaflet map of Montgomery County
leaflet(data = failure_status) |>
  addProviderTiles(providers$Esri.WorldTopoMap) |>
  addPolygons(
    data = counties,
    weight = 2,
    fillColor = 'lightblue',
    fillOpacity = 0.3
  ) |> 
  # Add markers to highlight facilities that remain closed
  addCircleMarkers(
    data = failure_status[failure_status$inspectionresults == 'Closed', ],
    lng = ~longitude,
    lat = ~latitude,
    radius = 5,
    color = ~pal(inspectionresults),
    fillOpacity = 0.7,
    popup = ~paste(
      name, '<br/>',
      'Last Inspection:', inspectiondate, '<br/>',
      'Inspection Type:', inspectiontype, '<br/>',
      'Status:', inspectionresults
    ),
    label = ~name,
    group = 'Closed'
  ) |> 
  # Add markers to highlight facilities that have since reopened
  addCircleMarkers(
    data = failure_status[failure_status$inspectionresults == 'Reopened', ],
    lng = ~longitude,
    lat = ~latitude,
    radius = 5,
    color = ~pal(inspectionresults),
    fillOpacity = 0.7,
    popup = ~paste(
      name, '<br/>',
      'Last Inspection:', inspectiondate, '<br/>',
      'Inspection Type:', inspectiontype, '<br/>',
      'Current Status:', inspectionresults
    ),
    label = ~name,
    group = 'Reopened'
  ) |>
  addLegend(
    position = 'topright',
    pal = pal,
    values = ~inspectionresults,
    labels = ~c('Closed', 'Reopened'),
    title = 'Current Status'
  ) |>
  # Add a label to emphasize 'Montgomery County'
  addControl(
    'Montgomery County, MD',
    position = 'topleft'
  ) |>
  # Add layer control to allow user to declutter map by removing markers
  addLayersControl(
    overlayGroups = c('Closed', 'Reopened'),
    options = layersControlOptions(collapsed = FALSE),
    position = 'bottomright'
  ) |>
  addControl(
    'Show Status:',
    position = 'bottomright'
  )
Figure 9: Map of Facility Closures for 2023

Conclusions

The Montgomery County Department of Health conducted 12,985 inspections of food service facilities during the past 24 months. This report highlights several details regarding the inspections, including the types of facilities, inspections conducted, the most common violations, and the types and locations of facilities that were closed as a result of an inspection.

References

Licensure and Regulatory Services - Montgomery County, Maryland. (n.d.). Retrieved January 20, 2024, from https://www.montgomerycountymd.gov/HHS/LandR/Index.html
Montgomery County Data Open Data Portal. (n.d.). In Tyler Data & Insights. Retrieved January 20, 2024, from https://data.montgomerycountymd.gov/
Montgomery County Department of Health and Human ServicesHome PageAbout Us. (n.d.). Retrieved January 20, 2024, from https://www.montgomerycountymd.gov/hhs/aboutHHS/aboutHHSmain.html
U.S. Census Bureau QuickFacts: Montgomery County, Maryland. (n.d.). Retrieved January 20, 2024, from https://www.census.gov/quickfacts/fact/table/montgomerycountymaryland/PST045223