Part 1: HTML Bio

Introduction about Me!

Hello! My name is Denise Huynh. I grew up in central Vietnam; I studied aboard to the US,Pennsylvalnia 5 years ago; and now Cincinnati, OH.

Academic Background

Currently pursuing Bachelor’s Degree: Business Analytics, Information Systems (University of Cincinnati, Cincinnati, OH)

Professional Background

I currently work as a Student Worker at Aerospace Engineering, and Mechanical Engineering Department of University of Cincinnati. My daily work includes:

Created reports, spreadsheets, performed data entry tasks with high degree of accuracy and attention to detail.
Provided front-line support by handling incoming calls, emails, and correspondence in a timely manner.
Prepared comprehensive reimbursement reports detailing student expenses for processing and disbursement of funds.
Maintained and updated monthly financial records for departmental professors, tracking and reporting on individual expense accounts for business administration.

Experience with R

I am a newbie to R and I am really excited about this new experience!

Experience with other Analytic software

In additional to R, I have experience with tools such as:

Python
Tableau
C#, C++

Part 2: Importing Data

Fill in the blanks below to import the blood_transfusion.csv file (provided via Canvas) and answer the following questions.

df <- readr::read_csv(‘blood_transfusion.csv’)

Rows: 748 Columns: 5

── Column specification ───────────────────────────

Delimiter: “,”

chr (1): Class

dbl (4): Recency, Frequency, Monetary, Time

ℹ Use spec() to retrieve the full column specification for this data.

ℹ Specify the column types or set show_col_types = FALSE to quiet this message.

What are the dimensions of this data (number of rows and columns)?

dim(df)

\[[1] 748 5\]

What are the data types of each column?

df <- readr::read_csv(‘blood_transfusion.csv’)

Rows: 748 Columns: 5 66MB/s, eta: 0s ── Column specification ────────────── Delimiter: “,” chr (1): Class dbl (4): Recency, Frequency, Monetary, Time

Are there any missing values?

sum(is.na(df))

\[[1] 0\]

Check out the first 10 rows? What are the Class values for the first 10 observations?

head(df, 10)

A tibble: 10 × 5

Recency Frequency Monetary Time Class

 <dbl>     <dbl>    <dbl> <dbl> <chr>

1 2 50 12500 98 donated

2 0 13 3250 28 donated

3 1 16 4000 35 donated

4 2 20 5000 45 donated

5 1 24 6000 77 not don…

6 4 4 1000 4 not don…

7 2 7 1750 14 donated

8 1 12 3000 35 not don…

9 2 9 2250 22 donated

10 5 46 11500 98 donated

Check out the last 10 rows? What are the Class values for the last 10 observations?

tail(df, 10)

A tibble: 10 × 5

Recency Frequency Monetary Time Class

 <dbl>     <dbl>    <dbl> <dbl> <chr>

1 23 1 250 23 not don…

2 23 4 1000 52 not don…

3 23 1 250 23 not don…

4 23 7 1750 88 not don…

5 16 3 750 86 not don…

6 23 2 500 38 not don…

7 21 2 500 52 not don…

8 23 3 750 62 not don…

9 39 1 250 39 not don…

10 72 1 250 72 not don…

Index for the 100th row and just the Monetary column. What is the value?

df[100, ‘Monetary’]

A tibble: 1 × 1

Monetary

 <dbl>

1 1750

Index for just the Monetary column. What is the mean of this vector?

mean(df[[‘Monetary’]])

[1] 1378.676

Subset this data frame for all observations where Monetary is greater than the mean value. How many rows are in the resulting data frame?

above_avg <- df[[‘Monetary’]] > mean(df[[‘Monetary’]])

[1] 267

Question 2

Fill in the blanks below to import the PDI__Police_Data_Initiative__Crime_Incidents.csv data (provided via Canvas) and answer the questions that follow. Data is taken from the City of Cincinnati Open Data Portal website 4, which you may need to read to place context in your answers.

df <- readr::read_csv(’PDI__Police_Data_Initiative__Crime_Incidents.csv’)

• What are the dimensions of this data (number of rows and columns)?

dim(df)

[1] 15155 40

• What do you think these columns represent?

• Are there any missing values in this data? If so, how many missing values are in each column? Which column has the most missing values?

sum(is.na(df))

[1] 95592

colSums(is.na(df))

                INSTANCEID 
                         0 
               INCIDENT_NO 
                         0 
             DATE_REPORTED 
                         0 
                 DATE_FROM 
                         2 
                   DATE_TO 
                         9 
                      CLSD 
                       545 
                       UCR 
                        10 
                       DST 
                         0 
                      BEAT 
                        28 
                   OFFENSE 
                        10 
                  LOCATION 
                         2 
                THEFT_CODE 
                     10167 
                     FLOOR 
                     14127 
                      SIDE 
                     14120 
                   OPENING 
                     14508 
                 HATE_BIAS 
                         0 
                 DAYOFWEEK 
                       423 
                  RPT_AREA 
                       239 
          CPD_NEIGHBORHOOD 
                       249 
                   WEAPONS 
                         5 
         DATE_OF_CLEARANCE 
                      2613 
                 HOUR_FROM 
                         2 
                   HOUR_TO 
                         9 
                 ADDRESS_X 
                       148 
               LONGITUDE_X 
                      1714 
                LATITUDE_X 
                      1714 
                VICTIM_AGE 
                         0 
               VICTIM_RACE 
                      2192 
          VICTIM_ETHNICITY 
                      2192 
             VICTIM_GENDER 
                      2192 
               SUSPECT_AGE 
                         0 
              SUSPECT_RACE 
                      7082 
         SUSPECT_ETHNICITY 
                      7082 
            SUSPECT_GENDER 
                      7082 
        TOTALNUMBERVICTIMS 
                        33 
             TOTALSUSPECTS 
                      7082 
                 UCR_GROUP 
                        10 
                       ZIP 
                         1

COMMUNITY_COUNCIL_NEIGHBORHOOD 0 SNA_NEIGHBORHOOD 0

• Using the DATE_REPORTED column, what is the range of dates included in this data?

range(df[[‘DATE_REPORTED’]])

[1] “01/01/2022 01:08:00 AM”

[2] “06/26/2022 12:50:00 AM”

• Using table(), what is the most common age range for known SUSPECT_AGEs?

table(df[[‘SUSPECT_AGE’]])

18-25

Module 2 Lab Quiz

Denise Huynh

2024-09-04

Part 1: HTML Bio

Introduction about Me!

Academic Background

Professional Background

Experience with R

Experience with other Analytic software

Part 2: Importing Data

A tibble: 10 × 5

A tibble: 10 × 5

A tibble: 1 × 1

Question 2