Dataset: Baltimore City Liquor Licenses
Observations: ~30,284 records
Variables: 21 columns
Each row represents a liquor license record, including:
- License class and subclass
- License year and fee
- License Status (Closed, Renewed, Transferred, Unknown)
- Establishment Type
- Business name and location (ZIP Code)
This dataset is well suited for:
- Categorical analysis (license types, statuses)
- Numerical analysis (fees, years)
- Time-based trends
- Business and Regulatory Practices
This report analyzes liquor license data rom Baltimore City. The purpose of this analysis is to explore license status, license classes, fees, and geographic patterns using descriptive statistics and data visualizations.
The data was imported from a local CSV file. Initial inspection revealed missing or empty values in the ‘License Status’ field. These values were recoded as “Unknown” to ensure accurate categorical analysis.
df <- read.csv("R_datafiles/Liquor_Licenses.csv")
df$LicenseStatus <- as.character(df$LicenseStatus)
df$LicenseStatus[is.na(df$LicenseStatus) |
(df$LicenseStatus) == ""] <- "Unknown"
df$LicenseStatus <- as.factor(df$LicenseStatus)
This visualization shows the frequency of liquor licenses by license class. The results indicate that certain license classes dominate the dataset, suggesting that specific types of alcohol sales are more common in Baltimore.
ggplot(df,aes(x=LicenseClass)) +
geom_bar() +
labs(
title= "Number of Liquor Licenses by License Class",
x = "License Class",
y = "Count"
) +
theme(axis.text.x = element_text(angle=45, hjust=1))
A box plot was used to compare license fees across different license statuses. Active licenses generally show higher and more consistent fees, while expired and unknown licenses display greater variability.
ggplot(df, aes(x = LicenseStatus, y = LicenseFee)) +
geom_boxplot()+
labs(
title = "License Fees by License Status",
x = "License Status",
y = "License Fee"
)
The scatter plot examines whether license fees have changed over time. While individual fees vary, no extreme upward or downward trend is observed, suggesting relative fee stability across years.
ggplot(df, aes(x = LicenseYear, y = LicenseFee)) +
geom_point(alpha = 0.6) +
labs(
title = "License Fee vs License Year",
x = "License Year",
y = "License Fee"
)
Aggregating license fees by year provides insight into trends in total revenue generated from liquor licenses. The line chart reveals fluctuations that reflect economic conditions and regulatory changes.
fees_by_year <- aggregate(LicenseFee ~ LicenseYear, data = df, sum)
ggplot(fees_by_year, aes(x = LicenseYear, y = LicenseFee)) +
geom_line() +
geom_point() +
labs(
title = "Total License Fees Collected by Year",
x = "License Year",
y = "Total License Fees"
)
A pie chart was used to visualize the proportions of license statuses. Including an “Unknown” category ensures that missing data is transparently represented rather than excluded.
status_counts <- as.data.frame(table(df$LicenseStatus))
colnames(status_counts) <- c("LicenseStatus","Count")
ggplot(status_counts, aes(x = "", y = Count, fill= LicenseStatus)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y") +
labs(
title = "Proportion of Licenses by Status",
fill = "License Status"
)
This analysis explored Baltimore City liquor licenses using multiple visualization techniques. The findings highlight dominant license classes, variability in license fees, and stable long-term trends in licensing revenue. Addressing missing data improved clarity and strengthened interpretability.