Hi! I’m Kenidy. I’m a Cincinnatian living in Columbus building the foundation of my career. I’m interested in analytics and how data can support better decisions within corporate supply chain.
Current program: MSBA / Business Analytics School: University of Cincinnati Areas of interest: analytics, data manipulation, visualization, operations management
I currently work in Consumer Packaged Goods, where I support supply chain and finance. I’m especially interested in roles that involve process improvement, analysis, and translating data into actionable insights.
I’m still building my foundation in R. So far, I’ve learned how to work in RStudio, use R Markdown, and practice basic data import.
I have basic to intermediate experience with: Excel (spreadsheets, formulas, basic analysis) Power BI / Tableau / SQL / Python
df <- readr::read_csv("blood_transfusion.csv")
## Rows: 748 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Class
## dbl (4): Recency, Frequency, Monetary, Time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# missing values
sum(is.na(df))
## [1] 0
# dimensions
dim(df)
## [1] 748 5
# first 10 rows
head(df, 10)
## # A tibble: 10 × 5
## Recency Frequency Monetary Time Class
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 2 50 12500 98 donated
## 2 0 13 3250 28 donated
## 3 1 16 4000 35 donated
## 4 2 20 5000 45 donated
## 5 1 24 6000 77 not donated
## 6 4 4 1000 4 not donated
## 7 2 7 1750 14 donated
## 8 1 12 3000 35 not donated
## 9 2 9 2250 22 donated
## 10 5 46 11500 98 donated
# last 10 rows
tail(df, 10)
## # A tibble: 10 × 5
## Recency Frequency Monetary Time Class
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 23 1 250 23 not donated
## 2 23 4 1000 52 not donated
## 3 23 1 250 23 not donated
## 4 23 7 1750 88 not donated
## 5 16 3 750 86 not donated
## 6 23 2 500 38 not donated
## 7 21 2 500 52 not donated
## 8 23 3 750 62 not donated
## 9 39 1 250 39 not donated
## 10 72 1 250 72 not donated
# 100th row, Monetary column
df[100, "Monetary"]
## # A tibble: 1 × 1
## Monetary
## <dbl>
## 1 1750
# mean of Monetary
mean(df[["Monetary"]])
## [1] 1378.676
# how many rows have Monetary > mean
above_avg <- df[["Monetary"]] > mean(df[["Monetary"]])
sum(above_avg)
## [1] 267
df[above_avg, "Monetary"]
## # A tibble: 267 × 1
## Monetary
## <dbl>
## 1 12500
## 2 3250
## 3 4000
## 4 5000
## 5 6000
## 6 1750
## 7 3000
## 8 2250
## 9 11500
## 10 5750
## # ℹ 257 more rows
There are 267 observations where the Monetary value is greater than the mean. ## Dataset 2: Cincinnati Police Data
df <- readr::read_csv("PDI__Police_Data_Initiative__Crime_Incidents.csv")
## Rows: 15155 Columns: 40
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (34): INSTANCEID, INCIDENT_NO, DATE_REPORTED, DATE_FROM, DATE_TO, CLSD, ...
## dbl (6): UCR, LONGITUDE_X, LATITUDE_X, TOTALNUMBERVICTIMS, TOTALSUSPECTS, ZIP
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dim(df)
## [1] 15155 40
sum(is.na(df))
## [1] 95592
sort(colSums(is.na(df)), decreasing = TRUE)[1:5]
## OPENING FLOOR SIDE THEFT_CODE SUSPECT_RACE
## 14508 14127 14120 10167 7082
range(df[["DATE_REPORTED"]])
## [1] "01/01/2022 01:08:00 AM" "06/26/2022 12:50:00 AM"
sort(table(df[["SUSPECT_AGE"]]), decreasing = TRUE)[1:5]
##
## UNKNOWN 18-25 31-40 26-30 41-50
## 9003 1778 1525 1126 659
sort(table(df[["ZIP"]]), decreasing = TRUE)[1:5]
##
## 45202 45205 45211 45238 45229
## 2049 1110 1094 956 913
sort(prop.table(table(df[["DAYOFWEEK"]])), decreasing = TRUE)
##
## SATURDAY SUNDAY MONDAY TUESDAY WEDNESDAY FRIDAY THURSDAY
## 0.1542221 0.1448547 0.1438365 0.1432935 0.1405105 0.1369807 0.1363019
This dataset contains 15,155 rows and 40 columns. There are 95,592 missing values. The OPENING column has the most missing values at 14,508. The data spans from 01/01/2022 01:08:00 AM to 06/26/2022 12:50:00 AM The most common suspect age range is 18–25. The ZIP code with the most incidents is 45202. No unusual values or zipcodes. Most incidents occur on Saturday at 0.15% Based on this dataset, I would be interested in analyzing trends related to time, location, and demographics. Columns such as DAYOFWEEK, ZIP, and SUSPECT_AGE could help identify patterns in incident occurrences. Some variables contain missing values, which may need to be addressed before further analysis. Summary statistics and frequency tables help highlight trends and potential data quality issues.