Part 1: HTML Bio

Introduction to Me

My name is Harshitha Baratam, and I am currently pursuing a Master’s degree in Information Systems at the University of Cincinnati.

Prior to graduate school, I worked at Tech Mahindra, where I contributed to an ETL migration project for Scotiabank, collaborating with cross-functional teams to deliver reliable and impactful data solutions.

Academic Background

Bachelor’s degree in Computer Science and Engineering
Currently pursuing a Master’s in Information Systems
Relevant coursework includes:
- Data Wrangling
- Statistics
- Analytics Programming

Professional Background

Software Engineer at Tech Mahindra
Experience in software development and data-related projects
Strong interest in analytics, data engineering, and data-driven decision making

Experience with R

I have beginner-to-intermediate experience with R, including:

Writing and executing R scripts
Creating reproducible reports using R Markdown
Importing, cleaning, and exploring datasets
Working with tidyverse packages

Experience with Other Analytic Software

Microsoft Excel (pivot tables, formulas)
SQL
Python
Tableau / Power BI

Fun Fact

In my free time, I enjoy painting and crafting, which helps me relax and express creativity outside of academics.

Part 2: Importing Data

df <- readr::read_csv("blood_transfusion.csv")

## Rows: 748 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Class
## dbl (4): Recency, Frequency, Monetary, Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

sum(is.na(df))

## [1] 0

dim(df)

## [1] 748   5

head(df, 10)

## # A tibble: 10 × 5
##    Recency Frequency Monetary  Time Class      
##      <dbl>     <dbl>    <dbl> <dbl> <chr>      
##  1       2        50    12500    98 donated    
##  2       0        13     3250    28 donated    
##  3       1        16     4000    35 donated    
##  4       2        20     5000    45 donated    
##  5       1        24     6000    77 not donated
##  6       4         4     1000     4 not donated
##  7       2         7     1750    14 donated    
##  8       1        12     3000    35 not donated
##  9       2         9     2250    22 donated    
## 10       5        46    11500    98 donated

tail(df, 10)

## # A tibble: 10 × 5
##    Recency Frequency Monetary  Time Class      
##      <dbl>     <dbl>    <dbl> <dbl> <chr>      
##  1      23         1      250    23 not donated
##  2      23         4     1000    52 not donated
##  3      23         1      250    23 not donated
##  4      23         7     1750    88 not donated
##  5      16         3      750    86 not donated
##  6      23         2      500    38 not donated
##  7      21         2      500    52 not donated
##  8      23         3      750    62 not donated
##  9      39         1      250    39 not donated
## 10      72         1      250    72 not donated

df[100, "Monetary"]

## # A tibble: 1 × 1
##   Monetary
##      <dbl>
## 1     1750

mean(df[["Monetary"]])

## [1] 1378.676

above_avg <- df[["Monetary"]] > mean(df[["Monetary"]])
nrow(df[above_avg, ])

## [1] 267

Blood Transfusion Data Summary

The dataset contains 748 rows and 5 columns.
There are no missing values in the dataset.
The average monetary value is 1378.676.
There are 267 observations with a monetary value greater than the average.

# Import Police Crime Data
df <- readr::read_csv("PDI__Police_Data_Initiative__Crime_Incidents.csv")

## Rows: 15155 Columns: 40
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (34): INSTANCEID, INCIDENT_NO, DATE_REPORTED, DATE_FROM, DATE_TO, CLSD, ...
## dbl  (6): UCR, LONGITUDE_X, LATITUDE_X, TOTALNUMBERVICTIMS, TOTALSUSPECTS, ZIP
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dim(df)

## [1] 15155    40

sum(is.na(df))

## [1] 95592

colSums(is.na(df))

##                     INSTANCEID                    INCIDENT_NO 
##                              0                              0 
##                  DATE_REPORTED                      DATE_FROM 
##                              0                              2 
##                        DATE_TO                           CLSD 
##                              9                            545 
##                            UCR                            DST 
##                             10                              0 
##                           BEAT                        OFFENSE 
##                             28                             10 
##                       LOCATION                     THEFT_CODE 
##                              2                          10167 
##                          FLOOR                           SIDE 
##                          14127                          14120 
##                        OPENING                      HATE_BIAS 
##                          14508                              0 
##                      DAYOFWEEK                       RPT_AREA 
##                            423                            239 
##               CPD_NEIGHBORHOOD                        WEAPONS 
##                            249                              5 
##              DATE_OF_CLEARANCE                      HOUR_FROM 
##                           2613                              2 
##                        HOUR_TO                      ADDRESS_X 
##                              9                            148 
##                    LONGITUDE_X                     LATITUDE_X 
##                           1714                           1714 
##                     VICTIM_AGE                    VICTIM_RACE 
##                              0                           2192 
##               VICTIM_ETHNICITY                  VICTIM_GENDER 
##                           2192                           2192 
##                    SUSPECT_AGE                   SUSPECT_RACE 
##                              0                           7082 
##              SUSPECT_ETHNICITY                 SUSPECT_GENDER 
##                           7082                           7082 
##             TOTALNUMBERVICTIMS                  TOTALSUSPECTS 
##                             33                           7082 
##                      UCR_GROUP                            ZIP 
##                             10                              1 
## COMMUNITY_COUNCIL_NEIGHBORHOOD               SNA_NEIGHBORHOOD 
##                              0                              0

range(df[["DATE_REPORTED"]])

## [1] "01/01/2022 01:08:00 AM" "06/26/2022 12:50:00 AM"

table(df[["SUSPECT_AGE"]])

## 
##    18-25    26-30    31-40    41-50    51-60    61-70  OVER 70 UNDER 18 
##     1778     1126     1525      659      298      121       16      629 
##  UNKNOWN 
##     9003

sort(table(df[["ZIP"]]), decreasing = TRUE)

## 
## 45202 45205 45211 45238 45229 45219 45225 45214 45237 45223 45206 45220 45232 
##  2049  1110  1094   956   913   863   811   774   699   653   616   477   477 
## 45224 45209 45208 45204 45216 45227 45207 45203 45230 45213 45239 45226 45217 
##   429   380   359   348   302   286   245   226   214   190   169   112   100 
## 45221 45233 45212 45215 45231 45228 42502 45236 45244 45248  4523  5239 
##    90    77    61    47     7     5     3     3     3     3     2     1

day_tbl <- table(df[["DAYOFWEEK"]])
day_tbl / sum(day_tbl)

## 
##    FRIDAY    MONDAY  SATURDAY    SUNDAY  THURSDAY   TUESDAY WEDNESDAY 
## 0.1369807 0.1438365 0.1542221 0.1448547 0.1363019 0.1432935 0.1405105

Police Crime Data Summary

The dataset contains 15,155 rows and 40 columns.
Several columns contain missing values, particularly demographic-related fields.
The data ranges from January 1, 2022 to June 26, 2022.
The most common suspect age category is UNKNOWN.
ZIP code 45202 has the highest number of reported incidents.
Crime incidents occur most frequently on Saturday.

Part 3: Computing Environment

library(sessioninfo)
session_info(pkgs = "attached")

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.5.2 (2025-10-31)
##  os       macOS Tahoe 26.2
##  system   aarch64, darwin20
##  ui       X11
##  language (EN)
##  collate  en_US.UTF-8
##  ctype    en_US.UTF-8
##  tz       America/New_York
##  date     2026-01-23
##  pandoc   3.6.3 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
##  quarto   1.8.25 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  sessioninfo * 1.2.3   2025-02-05 [1] CRAN (R 4.5.0)
## 
##  [1] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
##  * ── Packages attached to the search path.
## 
## ──────────────────────────────────────────────────────────────────────────────

About Me

Harshitha Baratam

2026-01-22