Part 1: HTML Bio

Introduction about Me!

Picture

Hello! My name is Denise Huynh. I grew up in central Vietnam; I studied aboard to the US,Pennsylvalnia 5 years ago; and now Cincinnati, OH.

Academic Background

Professional Background

I currently work as a Student Worker at Aerospace Engineering, and Mechanical Engineering Department of University of Cincinnati. My daily work includes:

Experience with R

Experience with other Analytic software

In additional to R, I have experience with tools such as:

Part 2: Importing Data

  1. Fill in the blanks below to import the blood_transfusion.csv file (provided via Canvas) and answer the following questions.

df <- readr::read_csv(‘blood_transfusion.csv’)

Rows: 748 Columns: 5

── Column specification ───────────────────────────

Delimiter: “,”

chr (1): Class

dbl (4): Recency, Frequency, Monetary, Time

ℹ Use spec() to retrieve the full column specification for this data.

ℹ Specify the column types or set show_col_types = FALSE to quiet this message.

dim(df)

\[[1] 748 5\]

df <- readr::read_csv(‘blood_transfusion.csv’)

Rows: 748 Columns: 5 66MB/s, eta: 0s ── Column specification ────────────── Delimiter: “,” chr (1): Class dbl (4): Recency, Frequency, Monetary, Time

sum(is.na(df))

\[[1] 0\]

head(df, 10)

A tibble: 10 × 5

Recency Frequency Monetary Time Class

 <dbl>     <dbl>    <dbl> <dbl> <chr>   
 

1 2 50 12500 98 donated

2 0 13 3250 28 donated

3 1 16 4000 35 donated

4 2 20 5000 45 donated

5 1 24 6000 77 not don…

6 4 4 1000 4 not don…

7 2 7 1750 14 donated

8 1 12 3000 35 not don…

9 2 9 2250 22 donated

10 5 46 11500 98 donated

tail(df, 10)

A tibble: 10 × 5

Recency Frequency Monetary Time Class

 <dbl>     <dbl>    <dbl> <dbl> <chr>   
 

1 23 1 250 23 not don…

2 23 4 1000 52 not don…

3 23 1 250 23 not don…

4 23 7 1750 88 not don…

5 16 3 750 86 not don…

6 23 2 500 38 not don…

7 21 2 500 52 not don…

8 23 3 750 62 not don…

9 39 1 250 39 not don…

10 72 1 250 72 not don…

df[100, ‘Monetary’]

A tibble: 1 × 1

Monetary

 <dbl>
 

1 1750

mean(df[[‘Monetary’]])

[1] 1378.676

above_avg <- df[[‘Monetary’]] > mean(df[[‘Monetary’]])

[1] 267

Question 2

Fill in the blanks below to import the PDI__Police_Data_Initiative__Crime_Incidents.csv data (provided via Canvas) and answer the questions that follow. Data is taken from the City of Cincinnati Open Data Portal website 4, which you may need to read to place context in your answers.

df <- readr::read_csv(’PDI__Police_Data_Initiative__Crime_Incidents.csv’)

• What are the dimensions of this data (number of rows and columns)?

dim(df)

[1] 15155 40

• What do you think these columns represent?

• Are there any missing values in this data? If so, how many missing values are in each column? Which column has the most missing values?

sum(is.na(df))

[1] 95592

colSums(is.na(df))

                INSTANCEID 
                         0 
               INCIDENT_NO 
                         0 
             DATE_REPORTED 
                         0 
                 DATE_FROM 
                         2 
                   DATE_TO 
                         9 
                      CLSD 
                       545 
                       UCR 
                        10 
                       DST 
                         0 
                      BEAT 
                        28 
                   OFFENSE 
                        10 
                  LOCATION 
                         2 
                THEFT_CODE 
                     10167 
                     FLOOR 
                     14127 
                      SIDE 
                     14120 
                   OPENING 
                     14508 
                 HATE_BIAS 
                         0 
                 DAYOFWEEK 
                       423 
                  RPT_AREA 
                       239 
          CPD_NEIGHBORHOOD 
                       249 
                   WEAPONS 
                         5 
         DATE_OF_CLEARANCE 
                      2613 
                 HOUR_FROM 
                         2 
                   HOUR_TO 
                         9 
                 ADDRESS_X 
                       148 
               LONGITUDE_X 
                      1714 
                LATITUDE_X 
                      1714 
                VICTIM_AGE 
                         0 
               VICTIM_RACE 
                      2192 
          VICTIM_ETHNICITY 
                      2192 
             VICTIM_GENDER 
                      2192 
               SUSPECT_AGE 
                         0 
              SUSPECT_RACE 
                      7082 
         SUSPECT_ETHNICITY 
                      7082 
            SUSPECT_GENDER 
                      7082 
        TOTALNUMBERVICTIMS 
                        33 
             TOTALSUSPECTS 
                      7082 
                 UCR_GROUP 
                        10 
                       ZIP 
                         1 

COMMUNITY_COUNCIL_NEIGHBORHOOD 0 SNA_NEIGHBORHOOD 0

• Using the DATE_REPORTED column, what is the range of dates included in this data?

range(df[[‘DATE_REPORTED’]])

[1] “01/01/2022 01:08:00 AM”

[2] “06/26/2022 12:50:00 AM”

• Using table(), what is the most common age range for known SUSPECT_AGEs?

table(df[[‘SUSPECT_AGE’]])

18-25

1778