About Me

My name is Sarbjot Singh and I am a second year BANA major at the University of Cincinnati. I enjoy playing sports and spending time with my family and friends. I love learning new hobbies and have started to participate in the Bearcat Motorsports Club to learn more about cars and to have some fun watching a race car be built and raced by my fellow peers. One of my favorite things to do is eat different foods, but I mostly love eating my moms cooking.

About Me

Graduated from Lakota East Highschool: Class of 2022
Currently completing a BS in Business Analytics with a minor in Accounting
CCP classes at UC Blue Ash

Professional Background

Currently I am working for Red Bull as a student marketeer for the University of Cincinnati and I am also a full time student at UC. Some of the tasks I do at work inlcude: * Travel to different campuses and business’ and sample cans of Red Bull to students, workers, and really anybody else that we see who would like a can for themselves. * Network with students organizations and business’ to keep the product on top of mind. * Plan missions for where we should go and sample at * Promote any Red Bull events or opportunites for our consumers to be a part of.

Experience with R and other analytical software

I do not have much experience with R. I have heard about it and knew what it was but never actually used it until taking this course. I am very new to actualy programming with the language but I am very excited to learn. I do not have any experience with any other programming language but I have learned how to use Tablaeu which is very good to create interactive data visualizations.

library(readxl)
df = readr::read_csv("Data/blood_transfusion.csv")

## Rows: 748 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Class
## dbl (4): Recency, Frequency, Monetary, Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

df

## # A tibble: 748 × 5
##    Recency Frequency Monetary  Time Class      
##      <dbl>     <dbl>    <dbl> <dbl> <chr>      
##  1       2        50    12500    98 donated    
##  2       0        13     3250    28 donated    
##  3       1        16     4000    35 donated    
##  4       2        20     5000    45 donated    
##  5       1        24     6000    77 not donated
##  6       4         4     1000     4 not donated
##  7       2         7     1750    14 donated    
##  8       1        12     3000    35 not donated
##  9       2         9     2250    22 donated    
## 10       5        46    11500    98 donated    
## # ℹ 738 more rows

sum(is.na(df))

## [1] 0

dim(df)

## [1] 748   5

head(df, 10)

## # A tibble: 10 × 5
##    Recency Frequency Monetary  Time Class      
##      <dbl>     <dbl>    <dbl> <dbl> <chr>      
##  1       2        50    12500    98 donated    
##  2       0        13     3250    28 donated    
##  3       1        16     4000    35 donated    
##  4       2        20     5000    45 donated    
##  5       1        24     6000    77 not donated
##  6       4         4     1000     4 not donated
##  7       2         7     1750    14 donated    
##  8       1        12     3000    35 not donated
##  9       2         9     2250    22 donated    
## 10       5        46    11500    98 donated

tail(df,10)

## # A tibble: 10 × 5
##    Recency Frequency Monetary  Time Class      
##      <dbl>     <dbl>    <dbl> <dbl> <chr>      
##  1      23         1      250    23 not donated
##  2      23         4     1000    52 not donated
##  3      23         1      250    23 not donated
##  4      23         7     1750    88 not donated
##  5      16         3      750    86 not donated
##  6      23         2      500    38 not donated
##  7      21         2      500    52 not donated
##  8      23         3      750    62 not donated
##  9      39         1      250    39 not donated
## 10      72         1      250    72 not donated

df[100, 'Monetary']

## # A tibble: 1 × 1
##   Monetary
##      <dbl>
## 1     1750

mean(df[['Monetary']])

## [1] 1378.676

above_avg= df[['Monetary']]>mean(df[['Monetary']])
df[above_avg, 'Monetary']

## # A tibble: 267 × 1
##    Monetary
##       <dbl>
##  1    12500
##  2     3250
##  3     4000
##  4     5000
##  5     6000
##  6     1750
##  7     3000
##  8     2250
##  9    11500
## 10     5750
## # ℹ 257 more rows

df= readr::read_csv("Data/PDI__Police_Data_Initiative__Crime_Incidents.csv")

## Rows: 15155 Columns: 40
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (34): INSTANCEID, INCIDENT_NO, DATE_REPORTED, DATE_FROM, DATE_TO, CLSD, ...
## dbl  (6): UCR, LONGITUDE_X, LATITUDE_X, TOTALNUMBERVICTIMS, TOTALSUSPECTS, ZIP
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

df

## # A tibble: 15,155 × 40
##    INSTANCEID      INCIDENT_NO DATE_REPORTED DATE_FROM DATE_TO CLSD    UCR DST  
##    <chr>           <chr>       <chr>         <chr>     <chr>   <chr> <dbl> <chr>
##  1 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…   803 2    
##  2 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…   803 2    
##  3 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…   803 2    
##  4 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…  1493 2    
##  5 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…  1493 2    
##  6 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…  1493 2    
##  7 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…   810 2    
##  8 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…   810 2    
##  9 4B312B08-FE95-… 229000003   01/01/2022 1… 12/31/20… 01/01/… F--C…   810 2    
## 10 2565E4A0-1C0B-… 229000009   01/01/2022 1… 01/01/20… 01/01/… Z--E…  1521 3    
## # ℹ 15,145 more rows
## # ℹ 32 more variables: BEAT <chr>, OFFENSE <chr>, LOCATION <chr>,
## #   THEFT_CODE <chr>, FLOOR <chr>, SIDE <chr>, OPENING <chr>, HATE_BIAS <chr>,
## #   DAYOFWEEK <chr>, RPT_AREA <chr>, CPD_NEIGHBORHOOD <chr>, WEAPONS <chr>,
## #   DATE_OF_CLEARANCE <chr>, HOUR_FROM <chr>, HOUR_TO <chr>, ADDRESS_X <chr>,
## #   LONGITUDE_X <dbl>, LATITUDE_X <dbl>, VICTIM_AGE <chr>, VICTIM_RACE <chr>,
## #   VICTIM_ETHNICITY <chr>, VICTIM_GENDER <chr>, SUSPECT_AGE <chr>, …

dim(df)

## [1] 15155    40

sum(is.na(df))

## [1] 95592

colSums(is.na(df))

##                     INSTANCEID                    INCIDENT_NO 
##                              0                              0 
##                  DATE_REPORTED                      DATE_FROM 
##                              0                              2 
##                        DATE_TO                           CLSD 
##                              9                            545 
##                            UCR                            DST 
##                             10                              0 
##                           BEAT                        OFFENSE 
##                             28                             10 
##                       LOCATION                     THEFT_CODE 
##                              2                          10167 
##                          FLOOR                           SIDE 
##                          14127                          14120 
##                        OPENING                      HATE_BIAS 
##                          14508                              0 
##                      DAYOFWEEK                       RPT_AREA 
##                            423                            239 
##               CPD_NEIGHBORHOOD                        WEAPONS 
##                            249                              5 
##              DATE_OF_CLEARANCE                      HOUR_FROM 
##                           2613                              2 
##                        HOUR_TO                      ADDRESS_X 
##                              9                            148 
##                    LONGITUDE_X                     LATITUDE_X 
##                           1714                           1714 
##                     VICTIM_AGE                    VICTIM_RACE 
##                              0                           2192 
##               VICTIM_ETHNICITY                  VICTIM_GENDER 
##                           2192                           2192 
##                    SUSPECT_AGE                   SUSPECT_RACE 
##                              0                           7082 
##              SUSPECT_ETHNICITY                 SUSPECT_GENDER 
##                           7082                           7082 
##             TOTALNUMBERVICTIMS                  TOTALSUSPECTS 
##                             33                           7082 
##                      UCR_GROUP                            ZIP 
##                             10                              1 
## COMMUNITY_COUNCIL_NEIGHBORHOOD               SNA_NEIGHBORHOOD 
##                              0                              0

range(df$DATE_REPORTED)

## [1] "01/01/2022 01:08:00 AM" "06/26/2022 12:50:00 AM"

table(df$SUSPECT_AGE)

## 
##    18-25    26-30    31-40    41-50    51-60    61-70  OVER 70 UNDER 18 
##     1778     1126     1525      659      298      121       16      629 
##  UNKNOWN 
##     9003

sort(table(df$ZIP), decreasing = TRUE)

## 
## 45202 45205 45211 45238 45229 45219 45225 45214 45237 45223 45206 45220 45232 
##  2049  1110  1094   956   913   863   811   774   699   653   616   477   477 
## 45224 45209 45208 45204 45216 45227 45207 45203 45230 45213 45239 45226 45217 
##   429   380   359   348   302   286   245   226   214   190   169   112   100 
## 45221 45233 45212 45215 45231 45228 42502 45236 45244 45248  4523  5239 
##    90    77    61    47     7     5     3     3     3     3     2     1

table(df$DAYOFWEEK) / sum(table(df$INCIDENT_NO))

## 
##    FRIDAY    MONDAY  SATURDAY    SUNDAY  THURSDAY   TUESDAY WEDNESDAY 
## 0.1331574 0.1398218 0.1499175 0.1408116 0.1324975 0.1392940 0.1365886

Module 2 Lab

Sarbjot Singh

2024-01-21

About Me

About Me

Professional Background

Experience with R and other analytical software