Hello! My name is Denise Huynh. I grew up in central Vietnam; I studied aboard to the US,Pennsylvalnia 5 years ago; and now Cincinnati, OH.
I currently work as a Student Worker at Aerospace Engineering, and Mechanical Engineering Department of University of Cincinnati. My daily work includes:
In additional to R, I have experience with tools such as:
df <- readr::read_csv(‘blood_transfusion.csv’)
Rows: 748 Columns: 5
── Column specification ───────────────────────────
Delimiter: “,”
chr (1): Class
dbl (4): Recency, Frequency, Monetary, Time
ℹ Use spec() to retrieve the full column specification
for this data.
ℹ Specify the column types or set show_col_types = FALSE
to quiet this message.
dim(df)
\[[1] 748 5\]
df <- readr::read_csv(‘blood_transfusion.csv’)
Rows: 748 Columns: 5 66MB/s, eta: 0s ── Column specification ────────────── Delimiter: “,” chr (1): Class dbl (4): Recency, Frequency, Monetary, Time
sum(is.na(df))
\[[1] 0\]
head(df, 10)
Recency Frequency Monetary Time Class
<dbl> <dbl> <dbl> <dbl> <chr>
1 2 50 12500 98 donated
2 0 13 3250 28 donated
3 1 16 4000 35 donated
4 2 20 5000 45 donated
5 1 24 6000 77 not don…
6 4 4 1000 4 not don…
7 2 7 1750 14 donated
8 1 12 3000 35 not don…
9 2 9 2250 22 donated
10 5 46 11500 98 donated
tail(df, 10)
Recency Frequency Monetary Time Class
<dbl> <dbl> <dbl> <dbl> <chr>
1 23 1 250 23 not don…
2 23 4 1000 52 not don…
3 23 1 250 23 not don…
4 23 7 1750 88 not don…
5 16 3 750 86 not don…
6 23 2 500 38 not don…
7 21 2 500 52 not don…
8 23 3 750 62 not don…
9 39 1 250 39 not don…
10 72 1 250 72 not don…
df[100, ‘Monetary’]
Monetary
<dbl>
1 1750
mean(df[[‘Monetary’]])
[1] 1378.676
above_avg <- df[[‘Monetary’]] > mean(df[[‘Monetary’]])
[1] 267
Fill in the blanks below to import the PDI__Police_Data_Initiative__Crime_Incidents.csv data (provided via Canvas) and answer the questions that follow. Data is taken from the City of Cincinnati Open Data Portal website 4, which you may need to read to place context in your answers.
df <- readr::read_csv(’PDI__Police_Data_Initiative__Crime_Incidents.csv’)
• What are the dimensions of this data (number of rows and columns)?
dim(df)
[1] 15155 40
• What do you think these columns represent?
• Are there any missing values in this data? If so, how many missing values are in each column? Which column has the most missing values?
sum(is.na(df))
[1] 95592
colSums(is.na(df))
INSTANCEID
0
INCIDENT_NO
0
DATE_REPORTED
0
DATE_FROM
2
DATE_TO
9
CLSD
545
UCR
10
DST
0
BEAT
28
OFFENSE
10
LOCATION
2
THEFT_CODE
10167
FLOOR
14127
SIDE
14120
OPENING
14508
HATE_BIAS
0
DAYOFWEEK
423
RPT_AREA
239
CPD_NEIGHBORHOOD
249
WEAPONS
5
DATE_OF_CLEARANCE
2613
HOUR_FROM
2
HOUR_TO
9
ADDRESS_X
148
LONGITUDE_X
1714
LATITUDE_X
1714
VICTIM_AGE
0
VICTIM_RACE
2192
VICTIM_ETHNICITY
2192
VICTIM_GENDER
2192
SUSPECT_AGE
0
SUSPECT_RACE
7082
SUSPECT_ETHNICITY
7082
SUSPECT_GENDER
7082
TOTALNUMBERVICTIMS
33
TOTALSUSPECTS
7082
UCR_GROUP
10
ZIP
1
COMMUNITY_COUNCIL_NEIGHBORHOOD 0 SNA_NEIGHBORHOOD 0
• Using the DATE_REPORTED column, what is the range of dates included in this data?
range(df[[‘DATE_REPORTED’]])
[1] “01/01/2022 01:08:00 AM”
[2] “06/26/2022 12:50:00 AM”
• Using table(), what is the most common age range for known SUSPECT_AGEs?
table(df[[‘SUSPECT_AGE’]])
18-25
1778