Introduction
Dataset Description
This data set is composed of a curated collection of over 200 publicly available COVID-19 related data sets from sources like Johns Hopkins, the WHO, the World Bank, the New York Times, and many others. It includes data on a wide variety of potentially powerful statistics and indicators, like local and national infection rates, global social distancing policies, geospatial data on movement of people, and more.
This was a challenge/competition on Kaggle. But the key questions or tasks were not available during this current analysis. Came up with new Questions to analyze the data.
The dataset was downloaded from Kaggle.Data set
Loading packages
library(tidyverse)
library(readxl)
library(skimr)
library(ggplot2)
library(tidyr)
library(stringr)
library(gridExtra)
library(scales)Reading Data
Data_at_admission <- read_excel("~/Asha/Projects_R/Canada_Hosp1_COVID_InpatientData.xlsx", sheet = "Data-at-admission")
Days_breakdown <- read_excel("~/Asha/Projects_R/Canada_Hosp1_COVID_InpatientData.xlsx", sheet = "Days-breakdown")
Hospital_length_of_stay <- read_excel("~/Asha/Projects_R/Canada_Hosp1_COVID_InpatientData.xlsx", sheet = "Hospital-length-of-stay")
Medication_Static_List <- read_excel("~/Asha/Projects_R/Canada_Hosp1_COVID_InpatientData.xlsx", sheet = "Medication-Static-List")Data Cleaning
Cleaned the comorbidities column to remove symbols and signs (“” [ ] ). Made longdata of this column for the analysis.
## # A tibble: 6 × 3
## id type comorbidities
## <dbl> <chr> <chr>
## 1 1 como1 Hypertension
## 2 1 como2 Diabetes
## 3 1 como3 Other
## 4 2 como1 Hypertension
## 5 2 como2 Other
## 6 3 como1 Hypertension
## # A tibble: 6 × 8
## id como1 como2 como3 como4 como5 como6 Comor…¹
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1 Hypertension Diab… Other <NA> <NA> <NA> "Hyper…
## 2 2 Hypertension Other <NA> <NA> <NA> <NA> "Hyper…
## 3 3 Hypertension <NA> <NA> <NA> <NA> <NA> "Hyper…
## 4 4 Hypertension Other <NA> <NA> <NA> <NA> "Hyper…
## 5 5 Chronic cardiac disease not hype… Hype… Diab… Other <NA> <NA> "Chron…
## 6 6 Hypertension <NA> <NA> <NA> <NA> <NA> "Hyper…
## # … with abbreviated variable name ¹Comorbidities
VIZES OF THE DATA
1. Age distribution graph.
Observation:-Most no. of patients in the age of 60s and 70s.
2. How many male and female patients are admitted for each reason for admission?
## # A tibble: 38 × 3
## # Groups: reason_for_admission, sex [38]
## reason_for_admission sex count
## <chr> <chr> <int>
## 1 COVID-19 [U07.1] Male 124
## 2 COVID-19 [U07.1] Female 93
## 3 Pneumonia [J18.9] Male 83
## 4 Pneumonia [J18.9] Female 51
## 5 Pneumonia due to COVID-19 virus [U07.1, J12.8] Male 24
## 6 Pneumonia due to COVID-19 virus [U07.1, J12.8] Female 16
## 7 Shortness of breath [R06.0] Female 14
## 8 Fever [R50.9] Male 13
## 9 Hypoxia [R09.0] Male 12
## 10 Shortness of breath [R06.0] Male 11
## # … with 28 more rows
Observation:-
In most reasons of admissions no. of male patients are more than the female.
Covid-19 related admissions are majority in this dataset of 508 patients.
3. How many patients died in the hospital
## # A tibble: 1 × 1
## No_of_patients_died
## <int>
## 1 90
## # A tibble: 2 × 2
## # Groups: sex [2]
## sex No_of_patients
## <chr> <int>
## 1 Female 212
## 2 Male 296
## # A tibble: 2 × 2
## # Groups: sex [2]
## sex No_of_patients_died
## <chr> <int>
## 1 Female 31
## 2 Male 59
## # A tibble: 2 × 2
## # Groups: intubated [2]
## intubated No_of_patients_died
## <chr> <int>
## 1 No 83
## 2 Yes 7
## # A tibble: 3 × 3
## # Groups: admission_disposition, intubated [3]
## admission_disposition intubated No_of_patients_died
## <chr> <chr> <int>
## 1 ICU No 7
## 2 ICU Yes 7
## 3 WARD No 76
Observation:-
1. There were more male patients than female in the dataset.
2. 90 patients expired in hospital out of 508 patients in the dataset.
3. Male death rate was higher than female.
ANALYSIS :
PART 1
1. Average, min, and max age, height and weight of the patients who are admitted
## age height weight
## Min. : 19.00 Min. :125.0 Min. : 27.70
## 1st Qu.: 55.75 1st Qu.:159.0 1st Qu.: 65.80
## Median : 66.00 Median :167.6 Median : 76.70
## Mean : 66.03 Mean :166.9 Mean : 80.27
## 3rd Qu.: 78.00 3rd Qu.:175.0 3rd Qu.: 89.80
## Max. :100.00 Max. :198.0 Max. :199.60
## NA's :236 NA's :150
2. Most common comorbidities found in patients in the dataset.
## # A tibble: 6 × 2
## comorbidities count
## <chr> <int>
## 1 Other 400
## 2 Hypertension 310
## 3 Diabetes 175
## 4 Chronic cardiac disease not hypertension 88
## 5 Asthma 54
## 6 Chronic renal 40
Observation:- Hypertension and Diabetes are the common comorbidities found in most of the patients and most patients had other medical conditions.
3. Patients in different age groups below:
14-30
31-50
51-75
76-90
90+
4. No.of patients in each age group
## # A tibble: 5 × 2
## age_group Patient_Count
## <chr> <int>
## 1 14-30 8
## 2 31-50 81
## 3 51-75 263
## 4 76-90 121
## 5 90+ 35
Observation:- Most patients are in the age group 51-75.
5.Average length of stay of patients in each age group.
## # A tibble: 5 × 2
## age_group Avg_length_ofStay
## <chr> <dbl>
## 1 14-30 5
## 2 31-50 8
## 3 51-75 13
## 4 76-90 15
## 5 90+ 14
Observation:- Average length of stay is more than 10 days in age group above 50.
6.No.of patients expired in the hospital for each age group
## # A tibble: 3 × 2
## # Groups: age_group [3]
## age_group No_of_patients_died
## <chr> <int>
## 1 51-75 32
## 2 76-90 43
## 3 90+ 15
Observation:- Age 50+, Death rate increases as age increases.
PART 2
1.Average Vitals and their Standard deviation in the dataset
Normal range values:
Temperature:- 97 F (36.1 C) and 99 F (37.2 C).
Systolic bp: 90-120
Diastolic: bp 60-80
WBC: 4.5 to 11.0 × 109/L
Hematocrit:- men:41% to 50%, women:36% to 48%
Platelet: 150,000 to 450,000 platelets per microliter of blood
Heart rate: 60-100
## # A tibble: 1 × 21
## No_of_Patiens Avg_Sbp Std_Sbp Avg_He…¹ Std_H…² Avg_R…³ Std_R…⁴ Avg_O…⁵ Std_O…⁶
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 508 129. 22.1 97.4 17.5 24.9 7.17 93.1 6.10
## # … with 12 more variables: Avg_Temp <dbl>, Std_Temp <dbl>, Avg_dbp <dbl>,
## # Std_dbp <dbl>, Avg_wbc <dbl>, Std_wbc <dbl>, Avg_rbc <dbl>, Std_rbc <dbl>,
## # Avg_Hematocrit <dbl>, Std_Hematocrit <dbl>, Avg_PlateletCount <dbl>,
## # Std_PlateletCount <dbl>, and abbreviated variable names ¹Avg_Heartrate,
## # ²Std_Heartrate, ³Avg_Resprate, ⁴Std_Resprate, ⁵Avg_OxySat, ⁶Std_OxySat
2. Average Vitals in different groups below:
## # A tibble: 2 × 7
## C RespRate_avg Oxy_Sat_avg SysBP_avg DiaBP_avg HeartRate_avg Temp_avg
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Female 24.5 93.4 127. 74.6 97.1 37.8
## 2 Male 25.2 92.9 131. 76.2 97.6 37.8
## # A tibble: 2 × 7
## C RespRate_avg Oxy_Sat_avg SysBP_avg DiaBP_avg HeartRate_avg Temp_avg
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ICU 32.0 85.9 128. 75.6 105. 37.8
## 2 WARD 24.2 93.8 129. 75.6 96.7 37.8
## # A tibble: 2 × 7
## C RespRate_avg Oxy_Sat_avg SysBP_avg DiaBP_avg HeartRate_…¹ Temp_…²
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 50 and above 24.4 92.9 130. 75.0 95.4 37.8
## 2 Below 50 27.7 94.0 124. 78.4 108. 37.8
## # … with abbreviated variable names ¹HeartRate_avg, ²Temp_avg
Observation:-
a) Resp-rate and Sbp was high for both male & female;male readings were higher.
b) Oxygen saturation was significantly low and respiratory rate was high in ICU patients(43 ICU/465 WARD patients),Oxygen saturation below 90% is very concerning and indicates an emergency.
c) Avg Resp-rate was high in age group 50 and below, Sbp was high in both group.
3. Visually analyze the data to see if there were vital changes during their stay. Vitals include systolic bp, diastolic bp, temperature, and heart rate, wbc, rbc, hematocrit, platelet count.
Observation:-
Systolic bp normal reading 90-120:- Sbp was significantly high throughout these days in patients above age 50.
Normal WBC reading ranges 4.5 to 11.0 × 109/L:- First 4 days all age group show normal readings; age group 51-90 show steady increase during rest of the days. After 5th day it goes above 11,which is a concern.
4. Compare patients in the WARD vs ICU
## # A tibble: 3 × 3
## # Groups: admission_disposition [2]
## admission_disposition intubated `No.of patients`
## <chr> <chr> <int>
## 1 ICU No 28
## 2 ICU Yes 15
## 3 WARD No 465
Observation:-
1. Below 90 Oxygen-sat, 2. High Systolic-bp and Respiratory rate
5. Compare patients who expired and not-expired in hospital.
## # A tibble: 2 × 10
## Patien…¹ Count Avg_s…² Avg_dbp Avg_r…³ Avg_oxy Avg_wbc Avg_rbc Avg_H…⁴ Avg_P…⁵
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 No 418 128. 75.8 24.6 93.5 7.90 4.68 0.398 237.
## 2 Yes 90 134. 74.3 26.2 91.3 8.64 4.35 0.382 203.
## # … with abbreviated variable names ¹Patient_expired_in_hospital, ²Avg_sys_bp,
## # ³Avg_resp, ⁴Avg_Hematocrit, ⁵Avg_PlateletCount
Observation:- Can see a shift in the density graphs for age and Sbp.
ANALYSIS RESULTS:-
1.Total Patients in the data set: 508 (296M,212F)
2.Most no. of patients in the age of 60s and 70s.
3.Total patients died: 90(59M,31F),(83 Intubated, 7 Not),(14 ICU[7 Intu,7 Not],76 WARD), more people died in the age-group 76-90
4.Total ICU patients: 43(15 Intu, 28 Not)
5.Oxygen saturation was significantly low and respiratory rate was high in ICU patients(43 ICU/465 WARD patients),Oxygen saturation below 90% is very concerning and indicates an emergency.
6.Most common Comorbidities in the patients:-Hypertension (310/508), Diabetes (175/308), Chronic cardiac disease (88/508). And 400/508 patients have other medical conditions
7.Systolic-blood-pressure,Respiratory-rate and oxygen-saturation were risky vital levels.
8.70+ Age group was at highest risk