The objective of this exercise is
- Analyse patient-dataset
- Understand and learn how to use dplyr package in R
- Concept of R markdown
- learn how to create dynamic documents in R by using R markdown
1.Read the patient-data.csv file.
2.Perform Data Preparation as per the instructions provided.
3.Review the data for errors and missing values. Provide solution to rectify the same.
4.Perform data reporting as per the instructions provided.
5.Create an HTML rmarkdown file.
6.Publish the file on rpubs.
knitr Global Options
# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')
Load Libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Read Data
# Read Patient data csv file. Count the no of rows present. Display the first 6 rows of data.
setwd("D:/R-BA/R-Scripts")
dfrPatient <- read.csv("./data/patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015
## 4 11-08-1972 Florida Cat 1 False 25-11-2015
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015
Total Rows Of Patient File: 100
Data Preparing
Add coloumn BMI-Value
# Insert a new column "BMIValue" based on the formula
dfrPatient <- mutate(dfrPatient, BMIValue=(WeightInKgs/(HeightInCms/100)^2))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015 27.90487
Add column BMI-Label
# Insert a new column "BMILable" indicating the category in which the patients BMIValue belongs
dfrPatient <- mutate(dfrPatient, BMILabel=NA)
dfrPatient$BMILabel <- ifelse(dfrPatient$BMIValue < 18.50,"UNDERWEIGHT",
ifelse(dfrPatient$BMIValue > 18.50 & dfrPatient$BMIValue < 25.00, "NORMAL",
ifelse(dfrPatient$BMIValue > 25.00 & dfrPatient$BMIValue < 30.00, "OVERWEIGHT",
ifelse(dfrPatient$BMIValue > 30.00,"OBESE", NA))))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OBESE
## 5 OVERWEIGHT
## 6 OVERWEIGHT
Convert Heathgrade column values
#Converting HealthGrade column values to Good, Normal and Bad Health
dfrPatient$HealthGrade <- ifelse(dfrPatient$HealthGrade == 1, "GOOD HEALTH",
ifelse(dfrPatient$HealthGrade == 2, "NORMAL",
ifelse(dfrPatient$HealthGrade == 3, "BAD HEALTH",NA)))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 Georgia,xxx Dog NORMAL False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog NORMAL False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None NORMAL False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat GOOD HEALTH False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL NORMAL True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog NORMAL False 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OBESE
## 5 OVERWEIGHT
## 6 OVERWEIGHT
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <chr> <int>
## 1 BAD HEALTH 34
## 2 GOOD HEALTH 29
## 3 NORMAL 30
## 4 <NA> 7
Error Handling
#summarize to find errors and missing data
summarise(group_by(dfrPatient, BMILabel), n())
## # A tibble: 3 × 2
## BMILabel `n()`
## <chr> <int>
## 1 NORMAL 23
## 2 OBESE 6
## 3 OVERWEIGHT 71
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 55
## 2 Male 45
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Dog 1
## 5 Hispanic 17
## 6 White 68
summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
## Died `n()`
## <chr> <int>
## 1 False 46
## 2 True 54
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 24
## 3 CAT 5
## 4 Dog 28
## 5 DOG 4
## 6 Horse 1
## 7 None 23
## 8 NONE 1
## 9 NULL 3
## 10 <NA> 2
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
## Smokes `n()`
## <chr> <int>
## 1 False 72
## 2 No 6
## 3 True 18
## 4 Yes 4
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <chr> <int>
## 1 BAD HEALTH 34
## 2 GOOD HEALTH 29
## 3 NORMAL 30
## 4 <NA> 7
summarise(group_by(dfrPatient, State), n())
## # A tibble: 34 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 3
## 8 Georgia,xxx 1
## 9 Hawaii 2
## 10 Illinois 4
## # ... with 24 more rows
Error handling in Gender column
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 55
## 2 Male 45
dfrPatient$Gender <- trimws(toupper(dfrPatient$Gender))
head(dfrPatient$Gender)
## [1] "MALE" "MALE" "MALE" "MALE" "FEMALE" "FEMALE"
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
## Gender `n()`
## <chr> <int>
## 1 FEMALE 55
## 2 MALE 45
Error handling in Race column
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Dog 1
## 5 Hispanic 17
## 6 White 68
dfrPatient$Race <- trimws(toupper(dfrPatient$Race))
dfrPatient$Race[dfrPatient$Race=="DOG"] <- NA
dfrPatient$Race[dfrPatient$Race=="BI-RACIAL"] <- NA
head(dfrPatient$Race)
## [1] "WHITE" "WHITE" "BLACK" "WHITE" "WHITE" NA
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 5 × 2
## Race `n()`
## <chr> <int>
## 1 ASIAN 5
## 2 BLACK 8
## 3 HISPANIC 17
## 4 WHITE 68
## 5 <NA> 2
Error handling in Died
summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
## Died `n()`
## <chr> <int>
## 1 False 46
## 2 True 54
class(dfrPatient$Died)
## [1] "character"
dfrPatient$Died <- as.logical(dfrPatient$Died)
class(dfrPatient$Died)
## [1] "logical"
summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
## Died `n()`
## <lgl> <int>
## 1 FALSE 46
## 2 TRUE 54
Error handling in pet
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 24
## 3 CAT 5
## 4 Dog 28
## 5 DOG 4
## 6 Horse 1
## 7 None 23
## 8 NONE 1
## 9 NULL 3
## 10 <NA> 2
dfrPatient$Pet <- trimws(toupper(dfrPatient$Pet))
dfrPatient$Pet[dfrPatient$Pet=="NONE"] <- NA
dfrPatient$Pet[dfrPatient$Pet=="NULL"] <- NA
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 5 × 2
## Pet `n()`
## <chr> <int>
## 1 BIRD 9
## 2 CAT 29
## 3 DOG 32
## 4 HORSE 1
## 5 <NA> 29
Error handling in smokes
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
## Smokes `n()`
## <chr> <int>
## 1 False 72
## 2 No 6
## 3 True 18
## 4 Yes 4
class(dfrPatient$Smokes)
## [1] "character"
dfrPatient$Smokes <- as.logical(dfrPatient$Smokes)
class(dfrPatient$Smokes)
## [1] "logical"
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 3 × 2
## Smokes `n()`
## <lgl> <int>
## 1 FALSE 72
## 2 TRUE 18
## 3 NA 10
Error handling in State
summarise(group_by(dfrPatient, State), n())
## # A tibble: 34 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 3
## 8 Georgia,xxx 1
## 9 Hawaii 2
## 10 Illinois 4
## # ... with 24 more rows
dfrPatient$State[dfrPatient$State=="Georgia,xxx"] <- "Georgia"
summarise(group_by(dfrPatient, State), n())
## # A tibble: 33 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 4
## 8 Hawaii 2
## 9 Illinois 4
## 10 Indiana 4
## # ... with 23 more rows
dfrPatient$State <- toupper(dfrPatient$State)
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius WHITE MALE FALSE 182.87 76.57
## 2 AC/AH/017 Rosario WHITE MALE FALSE 179.12 80.43
## 3 AC/AH/020 Julio BLACK MALE FALSE 169.15 75.48
## 4 AC/AH/022 Lupe WHITE MALE FALSE 175.66 94.54
## 5 AC/AH/029 Lavern WHITE FEMALE FALSE 164.47 71.78
## 6 AC/AH/033 Bernie <NA> FEMALE TRUE 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 GEORGIA DOG NORMAL FALSE 25-11-2015 22.89674
## 2 09-06-1972 MISSOURI DOG NORMAL FALSE 25-11-2015 25.06859
## 3 03-07-1972 PENNSYLVANIA <NA> NORMAL FALSE 25-11-2015 26.38080
## 4 11-08-1972 FLORIDA CAT GOOD HEALTH FALSE 25-11-2015 30.63867
## 5 06-06-1973 IOWA <NA> NORMAL TRUE 25-11-2015 26.53567
## 6 25-06-1973 MARYLAND DOG NORMAL FALSE 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OBESE
## 5 OVERWEIGHT
## 6 OVERWEIGHT
Names column to uppercase
dfrPatient$Name <- trimws(toupper(dfrPatient$Name))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 DEMETRIUS WHITE MALE FALSE 182.87 76.57
## 2 AC/AH/017 ROSARIO WHITE MALE FALSE 179.12 80.43
## 3 AC/AH/020 JULIO BLACK MALE FALSE 169.15 75.48
## 4 AC/AH/022 LUPE WHITE MALE FALSE 175.66 94.54
## 5 AC/AH/029 LAVERN WHITE FEMALE FALSE 164.47 71.78
## 6 AC/AH/033 BERNIE <NA> FEMALE TRUE 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 GEORGIA DOG NORMAL FALSE 25-11-2015 22.89674
## 2 09-06-1972 MISSOURI DOG NORMAL FALSE 25-11-2015 25.06859
## 3 03-07-1972 PENNSYLVANIA <NA> NORMAL FALSE 25-11-2015 26.38080
## 4 11-08-1972 FLORIDA CAT GOOD HEALTH FALSE 25-11-2015 30.63867
## 5 06-06-1973 IOWA <NA> NORMAL TRUE 25-11-2015 26.53567
## 6 25-06-1973 MARYLAND DOG NORMAL FALSE 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OBESE
## 5 OVERWEIGHT
## 6 OVERWEIGHT
Complete cases to remove all records with NA in any columns
vclComplete <- complete.cases(dfrPatient)
nrow(dfrPatient)
## [1] 100
dfrPatient <- dfrPatient[vclComplete, ]
nrow(dfrPatient)
## [1] 60
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 DEMETRIUS WHITE MALE FALSE 182.87 76.57
## 2 AC/AH/017 ROSARIO WHITE MALE FALSE 179.12 80.43
## 4 AC/AH/022 LUPE WHITE MALE FALSE 175.66 94.54
## 9 AC/AH/045 SHIRLEY WHITE MALE FALSE 181.32 76.90
## 11 AC/AH/049 MARTIN WHITE FEMALE FALSE 160.06 72.37
## 13 AC/AH/052 COURTNEY WHITE MALE TRUE 175.39 92.22
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 GEORGIA DOG NORMAL FALSE 25-11-2015 22.89674
## 2 09-06-1972 MISSOURI DOG NORMAL FALSE 25-11-2015 25.06859
## 4 11-08-1972 FLORIDA CAT GOOD HEALTH FALSE 25-11-2015 30.63867
## 9 25-12-1971 LOUISIANA DOG GOOD HEALTH FALSE 25-11-2015 23.39025
## 11 28-04-1972 CALIFORNIA HORSE NORMAL TRUE 25-12-2015 28.24834
## 13 16-03-1972 INDIANA BIRD BAD HEALTH FALSE 25-12-2015 29.97888
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 4 OBESE
## 9 NORMAL
## 11 OVERWEIGHT
## 13 OVERWEIGHT
Data Reporting Display top 10 records based on BMI-Value.
head(arrange(dfrPatient, desc(BMIValue)), 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/SG/009 SAMMY WHITE MALE FALSE 166.84 88.25
## 2 AC/SG/064 JON WHITE MALE FALSE 169.16 90.08
## 3 AC/AH/076 ALBERT WHITE MALE FALSE 176.22 97.67
## 4 AC/AH/022 LUPE WHITE MALE FALSE 175.66 94.54
## 5 AC/AH/248 ANDREA WHITE MALE FALSE 178.64 97.05
## 6 AC/SG/067 THOMAS WHITE MALE FALSE 167.51 84.15
## 7 AC/AH/052 COURTNEY WHITE MALE TRUE 175.39 92.22
## 8 AC/AH/127 JAME WHITE MALE FALSE 167.75 82.06
## 9 AC/SG/055 EVAN WHITE MALE FALSE 166.75 79.06
## 10 AC/SG/181 TERRY HISPANIC MALE FALSE 177.14 88.70
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 04-03-1972 VERMONT DOG GOOD HEALTH FALSE 25-06-2016 31.70402
## 2 04-10-1972 ILLINOIS CAT NORMAL TRUE 25-07-2016 31.47988
## 3 08-04-1973 LOUISIANA CAT NORMAL FALSE 25-12-2015 31.45218
## 4 11-08-1972 FLORIDA CAT GOOD HEALTH FALSE 25-11-2015 30.63867
## 5 12-01-1973 INDIANA CAT GOOD HEALTH TRUE 25-05-2016 30.41152
## 6 19-07-1972 PENNSYLVANIA BIRD NORMAL TRUE 25-07-2016 29.98974
## 7 16-03-1972 INDIANA BIRD BAD HEALTH FALSE 25-12-2015 29.97888
## 8 29-10-1972 TEXAS DOG GOOD HEALTH TRUE 25-01-2016 29.16127
## 9 24-02-1972 ILLINOIS BIRD BAD HEALTH TRUE 25-07-2016 28.43316
## 10 24-11-1971 INDIANA CAT BAD HEALTH TRUE 25-09-2016 28.26769
## BMILabel
## 1 OBESE
## 2 OBESE
## 3 OBESE
## 4 OBESE
## 5 OBESE
## 6 OVERWEIGHT
## 7 OVERWEIGHT
## 8 OVERWEIGHT
## 9 OVERWEIGHT
## 10 OVERWEIGHT
Display bottom 10 records based on BMI-Value.
head(arrange(dfrPatient, BMIValue), 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/SG/193 RONNIE WHITE MALE TRUE 185.43 73.63
## 2 AC/SG/099 LESLIE ASIAN MALE FALSE 172.72 67.62
## 3 AC/AH/001 DEMETRIUS WHITE MALE FALSE 182.87 76.57
## 4 AC/AH/086 KYLE BLACK MALE TRUE 180.11 75.72
## 5 AC/AH/045 SHIRLEY WHITE MALE FALSE 181.32 76.90
## 6 AC/AH/114 KRIS HISPANIC MALE FALSE 177.75 74.84
## 7 AC/AH/077 TOMMY BLACK MALE FALSE 174.09 72.20
## 8 AC/AH/150 BRETT WHITE MALE TRUE 181.56 79.54
## 9 AC/AH/057 VERNON WHITE FEMALE TRUE 163.79 65.76
## 10 AC/AH/207 BOBBIE WHITE FEMALE FALSE 163.01 65.19
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 05-06-1973 IOWA DOG BAD HEALTH FALSE 25-09-2016 21.41385
## 2 04-02-1972 OHIO CAT GOOD HEALTH FALSE 25-07-2016 22.66678
## 3 31-01-1972 GEORGIA DOG NORMAL FALSE 25-11-2015 22.89674
## 4 12-05-1973 GEORGIA CAT BAD HEALTH FALSE 25-12-2015 23.34183
## 5 25-12-1971 LOUISIANA DOG GOOD HEALTH FALSE 25-11-2015 23.39025
## 6 19-11-1972 PENNSYLVANIA BIRD BAD HEALTH FALSE 25-01-2016 23.68725
## 7 01-02-1973 WASHINGTON CAT BAD HEALTH FALSE 25-12-2015 23.82262
## 8 03-05-1972 KENTUCKY DOG GOOD HEALTH TRUE 25-02-2016 24.12933
## 9 06-01-1972 ILLINOIS CAT BAD HEALTH FALSE 25-12-2015 24.51247
## 10 17-05-1973 FLORIDA DOG NORMAL FALSE 25-03-2016 24.53310
## BMILabel
## 1 NORMAL
## 2 NORMAL
## 3 NORMAL
## 4 NORMAL
## 5 NORMAL
## 6 NORMAL
## 7 NORMAL
## 8 NORMAL
## 9 NORMAL
## 10 NORMAL
Display Gender > Race - frequency / counts
summarise(group_by(dfrPatient, Gender, Race), n())
## Source: local data frame [8 x 3]
## Groups: Gender [?]
##
## Gender Race `n()`
## <chr> <chr> <int>
## 1 FEMALE ASIAN 2
## 2 FEMALE BLACK 1
## 3 FEMALE HISPANIC 4
## 4 FEMALE WHITE 27
## 5 MALE ASIAN 2
## 6 MALE BLACK 2
## 7 MALE HISPANIC 5
## 8 MALE WHITE 17
Race > Gender - max, min and average values for BMI-Values
summarise(group_by(dfrPatient, Race, Gender), min(BMIValue), mean(BMIValue), max(BMIValue))
## Source: local data frame [8 x 5]
## Groups: Race [?]
##
## Race Gender `min(BMIValue)` `mean(BMIValue)` `max(BMIValue)`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 ASIAN FEMALE 25.57631 26.88531 28.19431
## 2 ASIAN MALE 22.66678 24.95782 27.24885
## 3 BLACK FEMALE 26.71407 26.71407 26.71407
## 4 BLACK MALE 23.34183 23.58223 23.82262
## 5 HISPANIC FEMALE 25.03916 26.29513 26.89942
## 6 HISPANIC MALE 23.68725 26.39844 28.26769
## 7 WHITE FEMALE 24.51247 26.60612 28.24834
## 8 WHITE MALE 21.41385 27.53445 31.70402
Display all dead people
filter(dfrPatient, Died==TRUE)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/049 MARTIN WHITE FEMALE FALSE 160.06 72.37
## 2 AC/AH/127 JAME WHITE MALE FALSE 167.75 82.06
## 3 AC/AH/133 CLYDE HISPANIC MALE FALSE 181.15 83.93
## 4 AC/AH/150 BRETT WHITE MALE TRUE 181.56 79.54
## 5 AC/AH/154 TONY WHITE FEMALE FALSE 160.03 64.30
## 6 AC/AH/156 GEORGE WHITE MALE FALSE 165.62 76.72
## 7 AC/AH/160 RORY ASIAN FEMALE FALSE 159.67 71.88
## 8 AC/AH/176 JERRY ASIAN MALE FALSE 175.21 83.65
## 9 AC/AH/180 DREW WHITE FEMALE FALSE 160.80 64.77
## 10 AC/AH/186 CHRISTOPHER WHITE FEMALE FALSE 157.95 67.41
## 11 AC/AH/211 SON WHITE FEMALE FALSE 157.16 69.64
## 12 AC/AH/219 JAY WHITE FEMALE FALSE 163.47 72.89
## 13 AC/AH/233 MARION WHITE FEMALE FALSE 163.97 66.71
## 14 AC/AH/248 ANDREA WHITE MALE FALSE 178.64 97.05
## 15 AC/AH/249 JESUS HISPANIC FEMALE TRUE 159.78 68.31
## 16 AC/SG/010 THEO ASIAN FEMALE FALSE 159.32 64.92
## 17 AC/SG/016 JIMMIE BLACK FEMALE FALSE 161.84 69.97
## 18 AC/SG/046 CARL HISPANIC MALE FALSE 171.41 81.70
## 19 AC/SG/055 EVAN WHITE MALE FALSE 166.75 79.06
## 20 AC/SG/064 JON WHITE MALE FALSE 169.16 90.08
## 21 AC/SG/065 SHAYNE WHITE FEMALE FALSE 157.01 66.56
## 22 AC/SG/067 THOMAS WHITE MALE FALSE 167.51 84.15
## 23 AC/SG/068 VALENTINE HISPANIC FEMALE FALSE 160.47 68.20
## 24 AC/SG/084 BRIAN HISPANIC MALE FALSE 174.25 80.93
## 25 AC/SG/101 JASON WHITE FEMALE FALSE 159.23 69.96
## 26 AC/SG/123 DARNELL WHITE FEMALE TRUE 162.32 72.72
## 27 AC/SG/134 DARYL WHITE FEMALE TRUE 162.59 69.76
## 28 AC/SG/155 RAYMOND WHITE FEMALE FALSE 158.35 69.72
## 29 AC/SG/165 ELMER WHITE FEMALE FALSE 162.18 67.81
## 30 AC/SG/179 LOGAN WHITE MALE FALSE 183.10 82.47
## 31 AC/SG/181 TERRY HISPANIC MALE FALSE 177.14 88.70
## 32 AC/SG/197 STACY WHITE FEMALE FALSE 159.44 66.21
## 33 AC/SG/234 LUIS HISPANIC FEMALE FALSE 164.88 68.07
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 28-04-1972 CALIFORNIA HORSE NORMAL TRUE 25-12-2015 28.24834
## 2 29-10-1972 TEXAS DOG GOOD HEALTH TRUE 25-01-2016 29.16127
## 3 13-10-1973 WASHINGTON CAT BAD HEALTH TRUE 25-02-2016 25.57647
## 4 03-05-1972 KENTUCKY DOG GOOD HEALTH TRUE 25-02-2016 24.12933
## 5 30-08-1973 CALIFORNIA DOG GOOD HEALTH TRUE 25-02-2016 25.10777
## 6 09-07-1972 CALIFORNIA DOG GOOD HEALTH TRUE 25-02-2016 27.96939
## 7 22-09-1973 FLORIDA CAT NORMAL TRUE 25-02-2016 28.19431
## 8 01-05-1973 VIRGINIA DOG BAD HEALTH TRUE 25-03-2016 27.24885
## 9 18-02-1973 OREGON CAT GOOD HEALTH TRUE 25-03-2016 25.04966
## 10 06-05-1972 NEW JERSEY DOG BAD HEALTH TRUE 25-03-2016 27.01998
## 11 14-07-1973 CALIFORNIA CAT NORMAL TRUE 25-04-2016 28.19517
## 12 07-04-1972 NORTH CAROLINA BIRD GOOD HEALTH TRUE 25-04-2016 27.27670
## 13 23-12-1971 OHIO CAT BAD HEALTH TRUE 25-04-2016 24.81202
## 14 12-01-1973 INDIANA CAT GOOD HEALTH TRUE 25-05-2016 30.41152
## 15 23-04-1972 ALABAMA CAT NORMAL TRUE 25-05-2016 26.75713
## 16 29-01-1973 NEW YORK CAT NORMAL TRUE 25-06-2016 25.57631
## 17 03-04-1972 ARIZONA CAT BAD HEALTH TRUE 25-06-2016 26.71407
## 18 05-08-1973 MISSISSIPPI BIRD NORMAL TRUE 25-06-2016 27.80672
## 19 24-02-1972 ILLINOIS BIRD BAD HEALTH TRUE 25-07-2016 28.43316
## 20 04-10-1972 ILLINOIS CAT NORMAL TRUE 25-07-2016 31.47988
## 21 05-04-1972 CALIFORNIA DOG BAD HEALTH TRUE 25-07-2016 26.99968
## 22 19-07-1972 PENNSYLVANIA BIRD NORMAL TRUE 25-07-2016 29.98974
## 23 15-04-1972 TENNESSEE CAT BAD HEALTH TRUE 25-07-2016 26.48480
## 24 06-03-1972 VIRGINIA DOG NORMAL TRUE 25-07-2016 26.65410
## 25 28-09-1973 MICHIGAN DOG NORMAL TRUE 25-07-2016 27.59307
## 26 03-09-1972 NORTH CAROLINA BIRD GOOD HEALTH TRUE 25-08-2016 27.60005
## 27 28-05-1972 TEXAS CAT NORMAL TRUE 25-08-2016 26.38875
## 28 02-06-1972 CALIFORNIA CAT BAD HEALTH TRUE 25-08-2016 27.80489
## 29 25-03-1972 WASHINGTON BIRD GOOD HEALTH TRUE 25-08-2016 25.78096
## 30 24-10-1972 OHIO DOG BAD HEALTH TRUE 25-09-2016 24.59910
## 31 24-11-1971 INDIANA CAT BAD HEALTH TRUE 25-09-2016 28.26769
## 32 08-11-1972 NEW YORK CAT GOOD HEALTH TRUE 25-10-2016 26.04528
## 33 10-11-1971 PENNSYLVANIA CAT BAD HEALTH TRUE 25-10-2016 25.03916
## BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 NORMAL
## 5 OVERWEIGHT
## 6 OVERWEIGHT
## 7 OVERWEIGHT
## 8 OVERWEIGHT
## 9 OVERWEIGHT
## 10 OVERWEIGHT
## 11 OVERWEIGHT
## 12 OVERWEIGHT
## 13 NORMAL
## 14 OBESE
## 15 OVERWEIGHT
## 16 OVERWEIGHT
## 17 OVERWEIGHT
## 18 OVERWEIGHT
## 19 OVERWEIGHT
## 20 OBESE
## 21 OVERWEIGHT
## 22 OVERWEIGHT
## 23 OVERWEIGHT
## 24 OVERWEIGHT
## 25 OVERWEIGHT
## 26 OVERWEIGHT
## 27 OVERWEIGHT
## 28 OVERWEIGHT
## 29 OVERWEIGHT
## 30 NORMAL
## 31 OVERWEIGHT
## 32 OVERWEIGHT
## 33 OVERWEIGHT
nrow(filter(dfrPatient, Died==TRUE))
## [1] 33
Display all hispanice females
filter(dfrPatient, Race=="HISPANIC" & Gender=="FEMALE")
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/249 JESUS HISPANIC FEMALE TRUE 159.78 68.31
## 2 AC/SG/068 VALENTINE HISPANIC FEMALE FALSE 160.47 68.20
## 3 AC/SG/122 MICHAL HISPANIC FEMALE FALSE 160.09 68.94
## 4 AC/SG/234 LUIS HISPANIC FEMALE FALSE 164.88 68.07
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 23-04-1972 ALABAMA CAT NORMAL TRUE 25-05-2016 26.75713
## 2 15-04-1972 TENNESSEE CAT BAD HEALTH TRUE 25-07-2016 26.48480
## 3 16-12-1971 SOUTH CAROLINA DOG GOOD HEALTH FALSE 25-08-2016 26.89942
## 4 10-11-1971 PENNSYLVANIA CAT BAD HEALTH TRUE 25-10-2016 25.03916
## BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OVERWEIGHT
nrow(filter(dfrPatient, Race=="HISPANIC" & Gender=="FEMALE"))
## [1] 4
7 sample records from the dataset using seed(707)
set.seed(707)
sample_n(dfrPatient, 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 13 AC/AH/052 COURTNEY WHITE MALE TRUE 175.39 92.22
## 48 AC/AH/219 JAY WHITE FEMALE FALSE 163.47 72.89
## 30 AC/AH/150 BRETT WHITE MALE TRUE 181.56 79.54
## 55 AC/AH/248 ANDREA WHITE MALE FALSE 178.64 97.05
## 73 AC/SG/084 BRIAN HISPANIC MALE FALSE 174.25 80.93
## 67 AC/SG/064 JON WHITE MALE FALSE 169.16 90.08
## 80 AC/SG/122 MICHAL HISPANIC FEMALE FALSE 160.09 68.94
## 9 AC/AH/045 SHIRLEY WHITE MALE FALSE 181.32 76.90
## 20 AC/AH/086 KYLE BLACK MALE TRUE 180.11 75.72
## 57 AC/SG/002 JAN WHITE FEMALE TRUE 161.57 67.92
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 13 16-03-1972 INDIANA BIRD BAD HEALTH FALSE 25-12-2015 29.97888
## 48 07-04-1972 NORTH CAROLINA BIRD GOOD HEALTH TRUE 25-04-2016 27.27670
## 30 03-05-1972 KENTUCKY DOG GOOD HEALTH TRUE 25-02-2016 24.12933
## 55 12-01-1973 INDIANA CAT GOOD HEALTH TRUE 25-05-2016 30.41152
## 73 06-03-1972 VIRGINIA DOG NORMAL TRUE 25-07-2016 26.65410
## 67 04-10-1972 ILLINOIS CAT NORMAL TRUE 25-07-2016 31.47988
## 80 16-12-1971 SOUTH CAROLINA DOG GOOD HEALTH FALSE 25-08-2016 26.89942
## 9 25-12-1971 LOUISIANA DOG GOOD HEALTH FALSE 25-11-2015 23.39025
## 20 12-05-1973 GEORGIA CAT BAD HEALTH FALSE 25-12-2015 23.34183
## 57 03-07-1973 ARIZONA DOG BAD HEALTH FALSE 25-05-2016 26.01814
## BMILabel
## 13 OVERWEIGHT
## 48 OVERWEIGHT
## 30 NORMAL
## 55 OBESE
## 73 OVERWEIGHT
## 67 OBESE
## 80 OVERWEIGHT
## 9 NORMAL
## 20 NORMAL
## 57 OVERWEIGHT
Note
Patient-data gives us an overall background of the patients in a particular hospital ward
Initially it contained a dataset of 100 records of which 45 were males and 55 females.
Since the data contained NA values those 40 records had to be removed after indentifying eror and missing data.
So finally the data set after cleaning contained 60 records.
Majority of patients race was of Whites followed by Hispanics.
More than half of them were dead patients.
Majority of females had good health as compared to males and the rest were normal or bad health.
Obesity condition was seen in males.
Objectives
The objectives of analyis of data, study of dplyr package, working of rmarkdown and publishing an html document on rpubs was successfully met.