My name is Sarbjot Singh and I am a second year BANA
major at the University of Cincinnati. I enjoy playing sports and
spending time with my family and friends. I love learning new hobbies
and have started to participate in the Bearcat Motorsports Club
to learn more about cars and to have some fun watching a race car be
built and raced by my fellow peers. One of my favorite things to do is
eat different foods, but I mostly love eating my moms cooking.
Currently I am working for Red Bull as a student marketeer for the University of Cincinnati and I am also a full time student at UC. Some of the tasks I do at work inlcude: * Travel to different campuses and business’ and sample cans of Red Bull to students, workers, and really anybody else that we see who would like a can for themselves. * Network with students organizations and business’ to keep the product on top of mind. * Plan missions for where we should go and sample at * Promote any Red Bull events or opportunites for our consumers to be a part of.
I do not have much experience with R. I have heard about it and knew what it was but never actually used it until taking this course. I am very new to actualy programming with the language but I am very excited to learn. I do not have any experience with any other programming language but I have learned how to use Tablaeu which is very good to create interactive data visualizations.
library(readxl)
df = readr::read_csv("Data/blood_transfusion.csv")
## Rows: 748 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Class
## dbl (4): Recency, Frequency, Monetary, Time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df
## # A tibble: 748 × 5
## Recency Frequency Monetary Time Class
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 2 50 12500 98 donated
## 2 0 13 3250 28 donated
## 3 1 16 4000 35 donated
## 4 2 20 5000 45 donated
## 5 1 24 6000 77 not donated
## 6 4 4 1000 4 not donated
## 7 2 7 1750 14 donated
## 8 1 12 3000 35 not donated
## 9 2 9 2250 22 donated
## 10 5 46 11500 98 donated
## # ℹ 738 more rows
sum(is.na(df))
## [1] 0
dim(df)
## [1] 748 5
head(df, 10)
## # A tibble: 10 × 5
## Recency Frequency Monetary Time Class
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 2 50 12500 98 donated
## 2 0 13 3250 28 donated
## 3 1 16 4000 35 donated
## 4 2 20 5000 45 donated
## 5 1 24 6000 77 not donated
## 6 4 4 1000 4 not donated
## 7 2 7 1750 14 donated
## 8 1 12 3000 35 not donated
## 9 2 9 2250 22 donated
## 10 5 46 11500 98 donated
tail(df,10)
## # A tibble: 10 × 5
## Recency Frequency Monetary Time Class
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 23 1 250 23 not donated
## 2 23 4 1000 52 not donated
## 3 23 1 250 23 not donated
## 4 23 7 1750 88 not donated
## 5 16 3 750 86 not donated
## 6 23 2 500 38 not donated
## 7 21 2 500 52 not donated
## 8 23 3 750 62 not donated
## 9 39 1 250 39 not donated
## 10 72 1 250 72 not donated
df[100, 'Monetary']
## # A tibble: 1 × 1
## Monetary
## <dbl>
## 1 1750
mean(df[['Monetary']])
## [1] 1378.676
above_avg= df[['Monetary']]>mean(df[['Monetary']])
df[above_avg, 'Monetary']
## # A tibble: 267 × 1
## Monetary
## <dbl>
## 1 12500
## 2 3250
## 3 4000
## 4 5000
## 5 6000
## 6 1750
## 7 3000
## 8 2250
## 9 11500
## 10 5750
## # ℹ 257 more rows
df= readr::read_csv("Data/PDI__Police_Data_Initiative__Crime_Incidents.csv")
## Rows: 15155 Columns: 40
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (34): INSTANCEID, INCIDENT_NO, DATE_REPORTED, DATE_FROM, DATE_TO, CLSD, ...
## dbl (6): UCR, LONGITUDE_X, LATITUDE_X, TOTALNUMBERVICTIMS, TOTALSUSPECTS, ZIP
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df
## # A tibble: 15,155 × 40
## INSTANCEID INCIDENT_NO DATE_REPORTED DATE_FROM DATE_TO CLSD UCR DST
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 803 2
## 2 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 803 2
## 3 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 803 2
## 4 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 1493 2
## 5 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 1493 2
## 6 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 1493 2
## 7 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 810 2
## 8 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 810 2
## 9 4B312B08-FE95-… 229000003 01/01/2022 1… 12/31/20… 01/01/… F--C… 810 2
## 10 2565E4A0-1C0B-… 229000009 01/01/2022 1… 01/01/20… 01/01/… Z--E… 1521 3
## # ℹ 15,145 more rows
## # ℹ 32 more variables: BEAT <chr>, OFFENSE <chr>, LOCATION <chr>,
## # THEFT_CODE <chr>, FLOOR <chr>, SIDE <chr>, OPENING <chr>, HATE_BIAS <chr>,
## # DAYOFWEEK <chr>, RPT_AREA <chr>, CPD_NEIGHBORHOOD <chr>, WEAPONS <chr>,
## # DATE_OF_CLEARANCE <chr>, HOUR_FROM <chr>, HOUR_TO <chr>, ADDRESS_X <chr>,
## # LONGITUDE_X <dbl>, LATITUDE_X <dbl>, VICTIM_AGE <chr>, VICTIM_RACE <chr>,
## # VICTIM_ETHNICITY <chr>, VICTIM_GENDER <chr>, SUSPECT_AGE <chr>, …
dim(df)
## [1] 15155 40
sum(is.na(df))
## [1] 95592
colSums(is.na(df))
## INSTANCEID INCIDENT_NO
## 0 0
## DATE_REPORTED DATE_FROM
## 0 2
## DATE_TO CLSD
## 9 545
## UCR DST
## 10 0
## BEAT OFFENSE
## 28 10
## LOCATION THEFT_CODE
## 2 10167
## FLOOR SIDE
## 14127 14120
## OPENING HATE_BIAS
## 14508 0
## DAYOFWEEK RPT_AREA
## 423 239
## CPD_NEIGHBORHOOD WEAPONS
## 249 5
## DATE_OF_CLEARANCE HOUR_FROM
## 2613 2
## HOUR_TO ADDRESS_X
## 9 148
## LONGITUDE_X LATITUDE_X
## 1714 1714
## VICTIM_AGE VICTIM_RACE
## 0 2192
## VICTIM_ETHNICITY VICTIM_GENDER
## 2192 2192
## SUSPECT_AGE SUSPECT_RACE
## 0 7082
## SUSPECT_ETHNICITY SUSPECT_GENDER
## 7082 7082
## TOTALNUMBERVICTIMS TOTALSUSPECTS
## 33 7082
## UCR_GROUP ZIP
## 10 1
## COMMUNITY_COUNCIL_NEIGHBORHOOD SNA_NEIGHBORHOOD
## 0 0
range(df$DATE_REPORTED)
## [1] "01/01/2022 01:08:00 AM" "06/26/2022 12:50:00 AM"
table(df$SUSPECT_AGE)
##
## 18-25 26-30 31-40 41-50 51-60 61-70 OVER 70 UNDER 18
## 1778 1126 1525 659 298 121 16 629
## UNKNOWN
## 9003
sort(table(df$ZIP), decreasing = TRUE)
##
## 45202 45205 45211 45238 45229 45219 45225 45214 45237 45223 45206 45220 45232
## 2049 1110 1094 956 913 863 811 774 699 653 616 477 477
## 45224 45209 45208 45204 45216 45227 45207 45203 45230 45213 45239 45226 45217
## 429 380 359 348 302 286 245 226 214 190 169 112 100
## 45221 45233 45212 45215 45231 45228 42502 45236 45244 45248 4523 5239
## 90 77 61 47 7 5 3 3 3 3 2 1
table(df$DAYOFWEEK) / sum(table(df$INCIDENT_NO))
##
## FRIDAY MONDAY SATURDAY SUNDAY THURSDAY TUESDAY WEDNESDAY
## 0.1331574 0.1398218 0.1499175 0.1408116 0.1324975 0.1392940 0.1365886