knitr::include_graphics("IMG_5076.JPG")
Above is the Clifton based trivia team, “Dirty Lube and the Boys” (13x Murphy’s Pub trivia champions). Going from the left to right are my teammates Emmett, Austin(also known as Lube), Jack and myself. I am wearing a neon orange Nas hoodie because Nas is the best rapper of all time.
I am from Cincinnati, as I grew up in West Chester and attended Archbishop Moeller High School. At Moeller I was involved in a number of things such as golf, lacrosse, ski team, MACH1 and mens chorus. After high school, I attended the University of Dayton for three years where I studied Mechanical Engineering. I transferred to the University of Cincinnati in the spring of ’20. I currently am in my last semester of undergrad in which I am studying Marketing and minoring in Business Analytics. Outside of school, I am heavily involved in my fraternity, Sigma Phi Epsilon. In my free time I enjoy listening to vinyl as I have been collecting for a while. Some of my other hobbies include:
This past month I completed my summer internship with Interprise Holdings. My job didn’t really explore the anylitical side of the company as it was a sales internship. I do enjoy sales, however I would much rather go into the field of analytics. After I graduate this semester, I will be looking for a full time position in the field of Marketing research or business analytics.
I have been using R in various classes over the past year and a half. I would say that I have used R in at least one class for the past 3 semesters. I enjoy R very much and I like that fact that it is extremely versatile. One of my favorite features of R is the data visualization tools that can be used such as ggplot. I have used R for many different projects in which I used different types of Histograms to evaluate Excel files with millions of cells.
I have been using Excel and VBA starting out in high school as well as using it for some of my engineering classes at UD. Excel is one of my favorite programs and I think that there is a lot that can be done with that. Other programs that I have used in the past to analyze data include SAS, python and SQL.
blood_trans <- read.csv(file = 'blood_transfusion.csv')
nrow(blood_trans)
## [1] 748
ncol(blood_trans)
## [1] 5
class(blood_trans$Recency)
## [1] "integer"
class(blood_trans$Frequency)
## [1] "integer"
class(blood_trans$Monetary)
## [1] "integer"
class(blood_trans$Time)
## [1] "integer"
class(blood_trans$Class)
## [1] "character"
sum(is.na(blood_trans))
## [1] 0
head(blood_trans,10)
## Recency Frequency Monetary Time Class
## 1 2 50 12500 98 donated
## 2 0 13 3250 28 donated
## 3 1 16 4000 35 donated
## 4 2 20 5000 45 donated
## 5 1 24 6000 77 not donated
## 6 4 4 1000 4 not donated
## 7 2 7 1750 14 donated
## 8 1 12 3000 35 not donated
## 9 2 9 2250 22 donated
## 10 5 46 11500 98 donated
tail(blood_trans,10)
## Recency Frequency Monetary Time Class
## 739 23 1 250 23 not donated
## 740 23 4 1000 52 not donated
## 741 23 1 250 23 not donated
## 742 23 7 1750 88 not donated
## 743 16 3 750 86 not donated
## 744 23 2 500 38 not donated
## 745 21 2 500 52 not donated
## 746 23 3 750 62 not donated
## 747 39 1 250 39 not donated
## 748 72 1 250 72 not donated
blood_trans$Monetary[100]
## [1] 1750
mean(blood_trans$Monetary)
## [1] 1378.676
= Subset this data frame for all observations where Monetary is greater than the mean value. How many rows are in the resulting data frame?
sum(blood_trans$Monetary > mean(blood_trans$Monetary))
## [1] 267
UD_Table <- 'https://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt'
UDtemp <- read.table(UD_Table, col.names = c("Month", "Day", "Year", "Temp"))
nrow(UDtemp)
## [1] 9265
ncol(UDtemp)
## [1] 4
sum(is.na(UDtemp))
## [1] 0
UDtemp[365,]
## Month Day Year Temp
## 365 12 31 1995 39.3
-Subset for all observations that happened during January of 2000. What was the median average temp for this month?
JanTemp <- UDtemp[ which( UDtemp$Month == 1 & UDtemp$Year == 2000),]
median(JanTemp$Temp)
## [1] 27.1
-Which date was the highest average temp recorded (hint: which.max)?
UDtemp[ which.max(UDtemp$Temp),]
## Month Day Year Temp
## 6398 7 7 2012 89.2
-Which date was the cold average temp recorded? Does this temp make sense? Are there more than just one date that has this temperature value recorded? If so, how many?
UDtemp[ which.min(UDtemp$Temp),]
## Month Day Year Temp
## 1454 12 24 1998 -99
sum(UDtemp$Temp == -99)
## [1] 14
##a temp of -99 doesn't make sense.
-Compute the mean of the average temp column. Now re-code all -99s to NA and recompute the mean.
mean(UDtemp$Temp)
## [1] 54.39876
New_Mean <- ifelse(UDtemp$Temp <= -99, NA, UDtemp$Temp)
mean(New_Mean, na.rm = TRUE)
## [1] 54.6309
Police_data <- read.csv('PDI__Police_Data_Initiative__Crime_Incidents.csv', na.strings = "")
head(Police_data)
## INSTANCEID INCIDENT_NO DATE_REPORTED
## 1 4B312B08-FE95-4DD4-8A62-20D1A1138E82 229000003 01/01/2022 12:09:00 AM
## 2 4B312B08-FE95-4DD4-8A62-20D1A1138E82 229000003 01/01/2022 12:09:00 AM
## 3 4B312B08-FE95-4DD4-8A62-20D1A1138E82 229000003 01/01/2022 12:09:00 AM
## 4 4B312B08-FE95-4DD4-8A62-20D1A1138E82 229000003 01/01/2022 12:09:00 AM
## 5 4B312B08-FE95-4DD4-8A62-20D1A1138E82 229000003 01/01/2022 12:09:00 AM
## 6 4B312B08-FE95-4DD4-8A62-20D1A1138E82 229000003 01/01/2022 12:09:00 AM
## DATE_FROM DATE_TO CLSD
## 1 12/31/2021 11:50:00 PM 01/01/2022 12:08:00 AM F--CLEARED BY ARREST - ADULT
## 2 12/31/2021 11:50:00 PM 01/01/2022 12:08:00 AM F--CLEARED BY ARREST - ADULT
## 3 12/31/2021 11:50:00 PM 01/01/2022 12:08:00 AM F--CLEARED BY ARREST - ADULT
## 4 12/31/2021 11:50:00 PM 01/01/2022 12:08:00 AM F--CLEARED BY ARREST - ADULT
## 5 12/31/2021 11:50:00 PM 01/01/2022 12:08:00 AM F--CLEARED BY ARREST - ADULT
## 6 12/31/2021 11:50:00 PM 01/01/2022 12:08:00 AM F--CLEARED BY ARREST - ADULT
## UCR DST BEAT OFFENSE LOCATION THEFT_CODE FLOOR SIDE
## 1 803 2 2 MENACING 26-BAR <NA> <NA> <NA>
## 2 803 2 2 MENACING 26-BAR <NA> <NA> <NA>
## 3 803 2 2 MENACING 26-BAR <NA> <NA> <NA>
## 4 1493 2 2 CRIMINAL DAMAGING/ENDANGERING 26-BAR <NA> <NA> <NA>
## 5 1493 2 2 CRIMINAL DAMAGING/ENDANGERING 26-BAR <NA> <NA> <NA>
## 6 1493 2 2 CRIMINAL DAMAGING/ENDANGERING 26-BAR <NA> <NA> <NA>
## OPENING HATE_BIAS DAYOFWEEK RPT_AREA CPD_NEIGHBORHOOD
## 1 <NA> N--NO BIAS/NOT APPLICABLE FRIDAY 124 OAKLEY
## 2 <NA> N--NO BIAS/NOT APPLICABLE FRIDAY 124 OAKLEY
## 3 <NA> N--NO BIAS/NOT APPLICABLE FRIDAY 124 OAKLEY
## 4 <NA> N--NO BIAS/NOT APPLICABLE FRIDAY 124 OAKLEY
## 5 <NA> N--NO BIAS/NOT APPLICABLE FRIDAY 124 OAKLEY
## 6 <NA> N--NO BIAS/NOT APPLICABLE FRIDAY 124 OAKLEY
## WEAPONS DATE_OF_CLEARANCE HOUR_FROM HOUR_TO ADDRESS_X
## 1 99 - NONE 01/01/2022 12:00:00 AM 2350 8 30XX MADISON RD
## 2 99 - NONE 01/01/2022 12:00:00 AM 2350 8 30XX MADISON RD
## 3 99 - NONE 01/01/2022 12:00:00 AM 2350 8 30XX MADISON RD
## 4 80 - OTHER WEAPON 01/01/2022 12:00:00 AM 2350 8 30XX MADISON RD
## 5 80 - OTHER WEAPON 01/01/2022 12:00:00 AM 2350 8 30XX MADISON RD
## 6 80 - OTHER WEAPON 01/01/2022 12:00:00 AM 2350 8 30XX MADISON RD
## LONGITUDE_X LATITUDE_X VICTIM_AGE VICTIM_RACE VICTIM_ETHNICITY
## 1 -84.43017 39.15166 18-25 WHITE NOT OF HISPANIC ORIG
## 2 -84.43140 39.15350 UNKNOWN <NA> <NA>
## 3 -84.43091 39.15360 31-40 WHITE NOT OF HISPANIC ORIG
## 4 -84.42995 39.15224 18-25 WHITE NOT OF HISPANIC ORIG
## 5 -84.43008 39.15209 UNKNOWN <NA> <NA>
## 6 -84.42980 39.15228 31-40 WHITE NOT OF HISPANIC ORIG
## VICTIM_GENDER SUSPECT_AGE SUSPECT_RACE SUSPECT_ETHNICITY SUSPECT_GENDER
## 1 MALE UNKNOWN <NA> <NA> <NA>
## 2 <NA> UNKNOWN <NA> <NA> <NA>
## 3 FEMALE UNKNOWN <NA> <NA> <NA>
## 4 MALE UNKNOWN <NA> <NA> <NA>
## 5 <NA> UNKNOWN <NA> <NA> <NA>
## 6 FEMALE UNKNOWN <NA> <NA> <NA>
## TOTALNUMBERVICTIMS TOTALSUSPECTS UCR_GROUP ZIP
## 1 3 NA PART 2 MINOR 45209
## 2 3 NA PART 2 MINOR 45209
## 3 3 NA PART 2 MINOR 45209
## 4 3 NA PART 2 MINOR 45209
## 5 3 NA PART 2 MINOR 45209
## 6 3 NA PART 2 MINOR 45209
## COMMUNITY_COUNCIL_NEIGHBORHOOD SNA_NEIGHBORHOOD
## 1 OAKLEY OAKLEY
## 2 OAKLEY OAKLEY
## 3 OAKLEY OAKLEY
## 4 OAKLEY OAKLEY
## 5 OAKLEY OAKLEY
## 6 OAKLEY OAKLEY
##These columns represent many different things like dates, times, ethnicity, location, weapons, age, etc.
= Are there any missing values in this data? If so, how many missing values are in each column?
sum(is.na(Police_data))
## [1] 95592
colSums(is.na(Police_data))
## INSTANCEID INCIDENT_NO
## 0 0
## DATE_REPORTED DATE_FROM
## 0 2
## DATE_TO CLSD
## 9 545
## UCR DST
## 10 0
## BEAT OFFENSE
## 28 10
## LOCATION THEFT_CODE
## 2 10167
## FLOOR SIDE
## 14127 14120
## OPENING HATE_BIAS
## 14508 0
## DAYOFWEEK RPT_AREA
## 423 239
## CPD_NEIGHBORHOOD WEAPONS
## 249 5
## DATE_OF_CLEARANCE HOUR_FROM
## 2613 2
## HOUR_TO ADDRESS_X
## 9 148
## LONGITUDE_X LATITUDE_X
## 1714 1714
## VICTIM_AGE VICTIM_RACE
## 0 2192
## VICTIM_ETHNICITY VICTIM_GENDER
## 2192 2192
## SUSPECT_AGE SUSPECT_RACE
## 0 7082
## SUSPECT_ETHNICITY SUSPECT_GENDER
## 7082 7082
## TOTALNUMBERVICTIMS TOTALSUSPECTS
## 33 7082
## UCR_GROUP ZIP
## 10 1
## COMMUNITY_COUNCIL_NEIGHBORHOOD SNA_NEIGHBORHOOD
## 0 0
colnames(Police_data[colSums(is.na(Police_data)) == max(sapply(Police_data, function(x) sum(is.na(x))))])
## [1] "OPENING"
max(sapply(Police_data, function(x) sum(is.na(x))))
## [1] 14508
range(Police_data$DATE_REPORTED)
## [1] "01/01/2022 01:08:00 AM" "06/26/2022 12:50:00 AM"
table(Police_data$SUSPECT_AGE)
##
## 18-25 26-30 31-40 41-50 51-60 61-70 OVER 70 UNDER 18
## 1778 1126 1525 659 298 121 16 629
## UNKNOWN
## 9003
table(Police_data$ZIP)
##
## 4523 5239 42502 45202 45203 45204 45205 45206 45207 45208 45209 45211 45212
## 2 1 3 2049 226 348 1110 616 245 359 380 1094 61
## 45213 45214 45215 45216 45217 45219 45220 45221 45223 45224 45225 45226 45227
## 190 774 47 302 100 863 477 90 653 429 811 112 286
## 45228 45229 45230 45231 45232 45233 45236 45237 45238 45239 45244 45248
## 5 913 214 7 477 77 3 699 956 169 3 3
table(Police_data$DAYOFWEEK) / length(Police_data$DAYOFWEEK)
##
## FRIDAY MONDAY SATURDAY SUNDAY THURSDAY TUESDAY WEDNESDAY
## 0.1331574 0.1398218 0.1499175 0.1408116 0.1324975 0.1392940 0.1365886
I think that some good things to look at would be if a weapon was involved. Then you could calculate things like where the most gun incidents occur in the city. Something else would be using the date column to see the season of when things occur. Another column that would be interesting is looking at the average age of certain crimes that are being commited.
knitr::purl(input = "LAB2DrewAsher.Rmd", output = "Module_2_lab_Asher_Andrew.R",documentation = 0)
##
##
## processing file: LAB2DrewAsher.Rmd
##
|
| | 0%
|
|. | 2%
|
|... | 4%
|
|.... | 6%
|
|..... | 7%
|
|...... | 9%
|
|........ | 11%
|
|......... | 13%
|
|.......... | 15%
|
|............ | 17%
|
|............. | 19%
|
|.............. | 20%
|
|................ | 22%
|
|................. | 24%
|
|.................. | 26%
|
|................... | 28%
|
|..................... | 30%
|
|...................... | 31%
|
|....................... | 33%
|
|......................... | 35%
|
|.......................... | 37%
|
|........................... | 39%
|
|............................. | 41%
|
|.............................. | 43%
|
|............................... | 44%
|
|................................ | 46%
|
|.................................. | 48%
|
|................................... | 50%
|
|.................................... | 52%
|
|...................................... | 54%
|
|....................................... | 56%
|
|........................................ | 57%
|
|......................................... | 59%
|
|........................................... | 61%
|
|............................................ | 63%
|
|............................................. | 65%
|
|............................................... | 67%
|
|................................................ | 69%
|
|................................................. | 70%
|
|................................................... | 72%
|
|.................................................... | 74%
|
|..................................................... | 76%
|
|...................................................... | 78%
|
|........................................................ | 80%
|
|......................................................... | 81%
|
|.......................................................... | 83%
|
|............................................................ | 85%
|
|............................................................. | 87%
|
|.............................................................. | 89%
|
|................................................................ | 91%
|
|................................................................. | 93%
|
|.................................................................. | 94%
|
|................................................................... | 96%
|
|..................................................................... | 98%
|
|......................................................................| 100%
## output file: Module_2_lab_Asher_Andrew.R
## [1] "Module_2_lab_Asher_Andrew.R"