Introduction

Dataset Description

This data set is composed of a curated collection of over 200 publicly available COVID-19 related data sets from sources like Johns Hopkins, the WHO, the World Bank, the New York Times, and many others. It includes data on a wide variety of potentially powerful statistics and indicators, like local and national infection rates, global social distancing policies, geospatial data on movement of people, and more.

This was a challenge/competition on Kaggle. But the key questions or tasks were not available during this current analysis. Came up with new Questions to analyze the data.

The dataset was downloaded from Kaggle.Data set

Loading packages

library(tidyverse)
library(readxl)
library(skimr)
library(ggplot2)
library(tidyr)
library(stringr)
library(gridExtra)
library(scales)

Reading Data

Data_at_admission <- read_excel("~/Asha/Projects_R/Canada_Hosp1_COVID_InpatientData.xlsx", sheet = "Data-at-admission")

Days_breakdown <- read_excel("~/Asha/Projects_R/Canada_Hosp1_COVID_InpatientData.xlsx", sheet = "Days-breakdown")

Hospital_length_of_stay <- read_excel("~/Asha/Projects_R/Canada_Hosp1_COVID_InpatientData.xlsx", sheet = "Hospital-length-of-stay")

Medication_Static_List <- read_excel("~/Asha/Projects_R/Canada_Hosp1_COVID_InpatientData.xlsx", sheet = "Medication-Static-List")

Data Cleaning

Cleaned the comorbidities column to remove symbols and signs (“” [ ] ). Made longdata of this column for the analysis.

## # A tibble: 6 × 3
##      id type  comorbidities
##   <dbl> <chr> <chr>        
## 1     1 como1 Hypertension 
## 2     1 como2 Diabetes     
## 3     1 como3 Other        
## 4     2 como1 Hypertension 
## 5     2 como2 Other        
## 6     3 como1 Hypertension
## # A tibble: 6 × 8
##      id como1                              como2 como3 como4 como5 como6 Comor…¹
##   <dbl> <chr>                              <chr> <chr> <chr> <chr> <chr> <chr>  
## 1     1 Hypertension                       Diab… Other <NA>  <NA>  <NA>  "Hyper…
## 2     2 Hypertension                       Other <NA>  <NA>  <NA>  <NA>  "Hyper…
## 3     3 Hypertension                       <NA>  <NA>  <NA>  <NA>  <NA>  "Hyper…
## 4     4 Hypertension                       Other <NA>  <NA>  <NA>  <NA>  "Hyper…
## 5     5 Chronic cardiac disease  not hype… Hype… Diab… Other <NA>  <NA>  "Chron…
## 6     6 Hypertension                       <NA>  <NA>  <NA>  <NA>  <NA>  "Hyper…
## # … with abbreviated variable name ¹​Comorbidities

VIZES OF THE DATA

1. Age distribution graph.

Observation:-Most no. of patients in the age of 60s and 70s.

2. How many male and female patients are admitted for each reason for admission?

## # A tibble: 38 × 3
## # Groups:   reason_for_admission, sex [38]
##    reason_for_admission                           sex    count
##    <chr>                                          <chr>  <int>
##  1 COVID-19 [U07.1]                               Male     124
##  2 COVID-19 [U07.1]                               Female    93
##  3 Pneumonia [J18.9]                              Male      83
##  4 Pneumonia [J18.9]                              Female    51
##  5 Pneumonia due to COVID-19 virus [U07.1, J12.8] Male      24
##  6 Pneumonia due to COVID-19 virus [U07.1, J12.8] Female    16
##  7 Shortness of breath [R06.0]                    Female    14
##  8 Fever [R50.9]                                  Male      13
##  9 Hypoxia [R09.0]                                Male      12
## 10 Shortness of breath [R06.0]                    Male      11
## # … with 28 more rows

Observation:-

In most reasons of admissions no. of male patients are more than the female.

Covid-19 related admissions are majority in this dataset of 508 patients.

3. How many patients died in the hospital

## # A tibble: 1 × 1
##   No_of_patients_died
##                 <int>
## 1                  90
## # A tibble: 2 × 2
## # Groups:   sex [2]
##   sex    No_of_patients
##   <chr>           <int>
## 1 Female            212
## 2 Male              296
## # A tibble: 2 × 2
## # Groups:   sex [2]
##   sex    No_of_patients_died
##   <chr>                <int>
## 1 Female                  31
## 2 Male                    59
## # A tibble: 2 × 2
## # Groups:   intubated [2]
##   intubated No_of_patients_died
##   <chr>                   <int>
## 1 No                         83
## 2 Yes                         7
## # A tibble: 3 × 3
## # Groups:   admission_disposition, intubated [3]
##   admission_disposition intubated No_of_patients_died
##   <chr>                 <chr>                   <int>
## 1 ICU                   No                          7
## 2 ICU                   Yes                         7
## 3 WARD                  No                         76

Observation:-

1. There were more male patients than female in the dataset.

2. 90 patients expired in hospital out of 508 patients in the dataset.

3. Male death rate was higher than female.

ANALYSIS :

PART 1

1. Average, min, and max age, height and weight of the patients who are admitted

##       age             height          weight      
##  Min.   : 19.00   Min.   :125.0   Min.   : 27.70  
##  1st Qu.: 55.75   1st Qu.:159.0   1st Qu.: 65.80  
##  Median : 66.00   Median :167.6   Median : 76.70  
##  Mean   : 66.03   Mean   :166.9   Mean   : 80.27  
##  3rd Qu.: 78.00   3rd Qu.:175.0   3rd Qu.: 89.80  
##  Max.   :100.00   Max.   :198.0   Max.   :199.60  
##                   NA's   :236     NA's   :150

2. Most common comorbidities found in patients in the dataset.

## # A tibble: 6 × 2
##   comorbidities                             count
##   <chr>                                     <int>
## 1 Other                                       400
## 2 Hypertension                                310
## 3 Diabetes                                    175
## 4 Chronic cardiac disease  not hypertension    88
## 5 Asthma                                       54
## 6 Chronic renal                                40

Observation:- Hypertension and Diabetes are the common comorbidities found in most of the patients and most patients had other medical conditions.

3. Patients in different age groups below:

  1. 14-30

  2. 31-50

  3. 51-75

  4. 76-90

  5. 90+

4. No.of patients in each age group

## # A tibble: 5 × 2
##   age_group Patient_Count
##   <chr>             <int>
## 1 14-30                 8
## 2 31-50                81
## 3 51-75               263
## 4 76-90               121
## 5 90+                  35

Observation:- Most patients are in the age group 51-75.

5.Average length of stay of patients in each age group.

## # A tibble: 5 × 2
##   age_group Avg_length_ofStay
##   <chr>                 <dbl>
## 1 14-30                     5
## 2 31-50                     8
## 3 51-75                    13
## 4 76-90                    15
## 5 90+                      14

Observation:- Average length of stay is more than 10 days in age group above 50.

6.No.of patients expired in the hospital for each age group

## # A tibble: 3 × 2
## # Groups:   age_group [3]
##   age_group No_of_patients_died
##   <chr>                   <int>
## 1 51-75                      32
## 2 76-90                      43
## 3 90+                        15

Observation:- Age 50+, Death rate increases as age increases.

PART 2

1.Average Vitals and their Standard deviation in the dataset

Normal range values:

Temperature:- 97 F (36.1 C) and 99 F (37.2 C).

Systolic bp: 90-120

Diastolic: bp 60-80

WBC: 4.5 to 11.0 × 109/L

Hematocrit:- men:41% to 50%, women:36% to 48%

Platelet: 150,000 to 450,000 platelets per microliter of blood

Heart rate: 60-100

## # A tibble: 1 × 21
##   No_of_Patiens Avg_Sbp Std_Sbp Avg_He…¹ Std_H…² Avg_R…³ Std_R…⁴ Avg_O…⁵ Std_O…⁶
##           <int>   <dbl>   <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1           508    129.    22.1     97.4    17.5    24.9    7.17    93.1    6.10
## # … with 12 more variables: Avg_Temp <dbl>, Std_Temp <dbl>, Avg_dbp <dbl>,
## #   Std_dbp <dbl>, Avg_wbc <dbl>, Std_wbc <dbl>, Avg_rbc <dbl>, Std_rbc <dbl>,
## #   Avg_Hematocrit <dbl>, Std_Hematocrit <dbl>, Avg_PlateletCount <dbl>,
## #   Std_PlateletCount <dbl>, and abbreviated variable names ¹​Avg_Heartrate,
## #   ²​Std_Heartrate, ³​Avg_Resprate, ⁴​Std_Resprate, ⁵​Avg_OxySat, ⁶​Std_OxySat

2. Average Vitals in different groups below:

## # A tibble: 2 × 7
##   C      RespRate_avg Oxy_Sat_avg SysBP_avg DiaBP_avg HeartRate_avg Temp_avg
##   <chr>         <dbl>       <dbl>     <dbl>     <dbl>         <dbl>    <dbl>
## 1 Female         24.5        93.4      127.      74.6          97.1     37.8
## 2 Male           25.2        92.9      131.      76.2          97.6     37.8
## # A tibble: 2 × 7
##   C     RespRate_avg Oxy_Sat_avg SysBP_avg DiaBP_avg HeartRate_avg Temp_avg
##   <chr>        <dbl>       <dbl>     <dbl>     <dbl>         <dbl>    <dbl>
## 1 ICU           32.0        85.9      128.      75.6         105.      37.8
## 2 WARD          24.2        93.8      129.      75.6          96.7     37.8
## # A tibble: 2 × 7
##   C            RespRate_avg Oxy_Sat_avg SysBP_avg DiaBP_avg HeartRate_…¹ Temp_…²
##   <chr>               <dbl>       <dbl>     <dbl>     <dbl>        <dbl>   <dbl>
## 1 50 and above         24.4        92.9      130.      75.0         95.4    37.8
## 2 Below 50             27.7        94.0      124.      78.4        108.     37.8
## # … with abbreviated variable names ¹​HeartRate_avg, ²​Temp_avg

Observation:-

a) Resp-rate and Sbp was high for both male & female;male readings were higher.

b) Oxygen saturation was significantly low and respiratory rate was high in ICU patients(43 ICU/465 WARD patients),Oxygen saturation below 90% is very concerning and indicates an emergency.

c) Avg Resp-rate was high in age group 50 and below, Sbp was high in both group.

3. Visually analyze the data to see if there were vital changes during their stay. Vitals include systolic bp, diastolic bp, temperature, and heart rate, wbc, rbc, hematocrit, platelet count.

Observation:-

Systolic bp normal reading 90-120:- Sbp was significantly high throughout these days in patients above age 50.

Normal WBC reading ranges 4.5 to 11.0 × 109/L:- First 4 days all age group show normal readings; age group 51-90 show steady increase during rest of the days. After 5th day it goes above 11,which is a concern.

4. Compare patients in the WARD vs ICU

## # A tibble: 3 × 3
## # Groups:   admission_disposition [2]
##   admission_disposition intubated `No.of patients`
##   <chr>                 <chr>                <int>
## 1 ICU                   No                      28
## 2 ICU                   Yes                     15
## 3 WARD                  No                     465

Observation:-

1. Below 90 Oxygen-sat, 2. High Systolic-bp and Respiratory rate

5. Compare patients who expired and not-expired in hospital.

## # A tibble: 2 × 10
##   Patien…¹ Count Avg_s…² Avg_dbp Avg_r…³ Avg_oxy Avg_wbc Avg_rbc Avg_H…⁴ Avg_P…⁵
##   <chr>    <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 No         418    128.    75.8    24.6    93.5    7.90    4.68   0.398    237.
## 2 Yes         90    134.    74.3    26.2    91.3    8.64    4.35   0.382    203.
## # … with abbreviated variable names ¹​Patient_expired_in_hospital, ²​Avg_sys_bp,
## #   ³​Avg_resp, ⁴​Avg_Hematocrit, ⁵​Avg_PlateletCount

Observation:- Can see a shift in the density graphs for age and Sbp.

ANALYSIS RESULTS:-

1.Total Patients in the data set: 508 (296M,212F)

2.Most no. of patients in the age of 60s and 70s.

3.Total patients died: 90(59M,31F),(83 Intubated, 7 Not),(14 ICU[7 Intu,7 Not],76 WARD), more people died in the age-group 76-90

4.Total ICU patients: 43(15 Intu, 28 Not)

5.Oxygen saturation was significantly low and respiratory rate was high in ICU patients(43 ICU/465 WARD patients),Oxygen saturation below 90% is very concerning and indicates an emergency.

6.Most common Comorbidities in the patients:-Hypertension (310/508), Diabetes (175/308), Chronic cardiac disease (88/508). And 400/508 patients have other medical conditions

7.Systolic-blood-pressure,Respiratory-rate and oxygen-saturation were risky vital levels.

8.70+ Age group was at highest risk