Analysis Of Patient Data

Objectives

The objective of this exercise is
- Analyse patient-dataset
- Understand and learn how to use dplyr package in R
- Concept of R markdown
- learn how to create dynamic documents in R by using R markdown

Probem Definition

1.Read the patient-data.csv file.
2.Perform Data Preparation as per the instructions provided.
3.Review the data for errors and missing values. Provide solution to rectify the same.
4.Perform data reporting as per the instructions provided.
5.Create an HTML rmarkdown file.
6.Publish the file on rpubs.

Code & Output

knitr Global Options

# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')

Load Libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Read Data

# Read Patient data csv file. Count the no of rows present. Display the first 6 rows of data.
setwd("D:/R-BA/R-Scripts")
dfrPatient <- read.csv("./data/patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015

Total Rows Of Patient File: 100

Data Preparing

Add coloumn BMI-Value

# Insert a new column "BMIValue" based on the formula
dfrPatient <- mutate(dfrPatient, BMIValue=(WeightInKgs/(HeightInCms/100)^2))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015 27.90487

Add column BMI-Label

# Insert a new column "BMILable" indicating the category in which the patients BMIValue belongs
dfrPatient <- mutate(dfrPatient, BMILabel=NA)
dfrPatient$BMILabel <- ifelse(dfrPatient$BMIValue < 18.50,"UNDERWEIGHT",
                         ifelse(dfrPatient$BMIValue > 18.50 & dfrPatient$BMIValue < 25.00, "NORMAL",
                         ifelse(dfrPatient$BMIValue > 25.00 & dfrPatient$BMIValue < 30.00, "OVERWEIGHT",
                         ifelse(dfrPatient$BMIValue > 30.00,"OBESE", NA))))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      OBESE
## 5 OVERWEIGHT
## 6 OVERWEIGHT

Convert Heathgrade column values

#Converting HealthGrade column values to Good, Normal and Bad Health
dfrPatient$HealthGrade <- ifelse(dfrPatient$HealthGrade == 1, "GOOD HEALTH", 
                               ifelse(dfrPatient$HealthGrade == 2, "NORMAL", 
                                      ifelse(dfrPatient$HealthGrade == 3, "BAD HEALTH",NA)))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972  Georgia,xxx  Dog      NORMAL False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog      NORMAL False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None      NORMAL False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat GOOD HEALTH False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL      NORMAL  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog      NORMAL False 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      OBESE
## 5 OVERWEIGHT
## 6 OVERWEIGHT
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1  BAD HEALTH    34
## 2 GOOD HEALTH    29
## 3      NORMAL    30
## 4        <NA>     7

Error Handling

#summarize to find errors and missing data
summarise(group_by(dfrPatient, BMILabel), n())  
## # A tibble: 3 × 2
##     BMILabel `n()`
##        <chr> <int>
## 1     NORMAL    23
## 2      OBESE     6
## 3 OVERWEIGHT    71
summarise(group_by(dfrPatient, Gender), n())  
## # A tibble: 2 × 2
##   Gender `n()`
##    <chr> <int>
## 1 Female    55
## 2   Male    45
summarise(group_by(dfrPatient, Race), n())  
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
summarise(group_by(dfrPatient, Died), n())  
## # A tibble: 2 × 2
##    Died `n()`
##   <chr> <int>
## 1 False    46
## 2  True    54
summarise(group_by(dfrPatient, Pet), n())  
## # A tibble: 10 × 2
##      Pet `n()`
##    <chr> <int>
## 1   Bird     9
## 2    Cat    24
## 3    CAT     5
## 4    Dog    28
## 5    DOG     4
## 6  Horse     1
## 7   None    23
## 8   NONE     1
## 9   NULL     3
## 10  <NA>     2
summarise(group_by(dfrPatient, Smokes), n())  
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  False    72
## 2     No     6
## 3   True    18
## 4    Yes     4
summarise(group_by(dfrPatient, HealthGrade), n())  
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1  BAD HEALTH    34
## 2 GOOD HEALTH    29
## 3      NORMAL    30
## 4        <NA>     7
summarise(group_by(dfrPatient, State), n())  
## # A tibble: 34 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     3
## 8  Georgia,xxx     1
## 9       Hawaii     2
## 10    Illinois     4
## # ... with 24 more rows

Error handling in Gender column

summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
##   Gender `n()`
##    <chr> <int>
## 1 Female    55
## 2   Male    45
dfrPatient$Gender <- trimws(toupper(dfrPatient$Gender))
head(dfrPatient$Gender)
## [1] "MALE"   "MALE"   "MALE"   "MALE"   "FEMALE" "FEMALE"
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
##   Gender `n()`
##    <chr> <int>
## 1 FEMALE    55
## 2   MALE    45

Error handling in Race column

summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
dfrPatient$Race <- trimws(toupper(dfrPatient$Race))
dfrPatient$Race[dfrPatient$Race=="DOG"] <- NA
dfrPatient$Race[dfrPatient$Race=="BI-RACIAL"] <- NA
head(dfrPatient$Race)
## [1] "WHITE" "WHITE" "BLACK" "WHITE" "WHITE" NA
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 5 × 2
##       Race `n()`
##      <chr> <int>
## 1    ASIAN     5
## 2    BLACK     8
## 3 HISPANIC    17
## 4    WHITE    68
## 5     <NA>     2

Error handling in Died

summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
##    Died `n()`
##   <chr> <int>
## 1 False    46
## 2  True    54
class(dfrPatient$Died)
## [1] "character"
dfrPatient$Died <- as.logical(dfrPatient$Died)
class(dfrPatient$Died)
## [1] "logical"
summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
##    Died `n()`
##   <lgl> <int>
## 1 FALSE    46
## 2  TRUE    54

Error handling in pet

summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
##      Pet `n()`
##    <chr> <int>
## 1   Bird     9
## 2    Cat    24
## 3    CAT     5
## 4    Dog    28
## 5    DOG     4
## 6  Horse     1
## 7   None    23
## 8   NONE     1
## 9   NULL     3
## 10  <NA>     2
dfrPatient$Pet <- trimws(toupper(dfrPatient$Pet))
dfrPatient$Pet[dfrPatient$Pet=="NONE"] <- NA
dfrPatient$Pet[dfrPatient$Pet=="NULL"] <- NA
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 5 × 2
##     Pet `n()`
##   <chr> <int>
## 1  BIRD     9
## 2   CAT    29
## 3   DOG    32
## 4 HORSE     1
## 5  <NA>    29

Error handling in smokes

summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  False    72
## 2     No     6
## 3   True    18
## 4    Yes     4
class(dfrPatient$Smokes)
## [1] "character"
dfrPatient$Smokes <- as.logical(dfrPatient$Smokes)
class(dfrPatient$Smokes)
## [1] "logical"
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 3 × 2
##   Smokes `n()`
##    <lgl> <int>
## 1  FALSE    72
## 2   TRUE    18
## 3     NA    10

Error handling in State

summarise(group_by(dfrPatient, State), n())
## # A tibble: 34 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     3
## 8  Georgia,xxx     1
## 9       Hawaii     2
## 10    Illinois     4
## # ... with 24 more rows
dfrPatient$State[dfrPatient$State=="Georgia,xxx"] <- "Georgia"
summarise(group_by(dfrPatient, State), n())
## # A tibble: 33 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     4
## 8       Hawaii     2
## 9     Illinois     4
## 10     Indiana     4
## # ... with 23 more rows
dfrPatient$State <- toupper(dfrPatient$State)
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius WHITE   MALE  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario WHITE   MALE  FALSE      179.12       80.43
## 3 AC/AH/020     Julio BLACK   MALE  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe WHITE   MALE  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern WHITE FEMALE  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie  <NA> FEMALE   TRUE      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972      GEORGIA  DOG      NORMAL FALSE 25-11-2015 22.89674
## 2 09-06-1972     MISSOURI  DOG      NORMAL FALSE 25-11-2015 25.06859
## 3 03-07-1972 PENNSYLVANIA <NA>      NORMAL FALSE 25-11-2015 26.38080
## 4 11-08-1972      FLORIDA  CAT GOOD HEALTH FALSE 25-11-2015 30.63867
## 5 06-06-1973         IOWA <NA>      NORMAL  TRUE 25-11-2015 26.53567
## 6 25-06-1973     MARYLAND  DOG      NORMAL FALSE 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      OBESE
## 5 OVERWEIGHT
## 6 OVERWEIGHT

Names column to uppercase

dfrPatient$Name <- trimws(toupper(dfrPatient$Name))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 DEMETRIUS WHITE   MALE  FALSE      182.87       76.57
## 2 AC/AH/017   ROSARIO WHITE   MALE  FALSE      179.12       80.43
## 3 AC/AH/020     JULIO BLACK   MALE  FALSE      169.15       75.48
## 4 AC/AH/022      LUPE WHITE   MALE  FALSE      175.66       94.54
## 5 AC/AH/029    LAVERN WHITE FEMALE  FALSE      164.47       71.78
## 6 AC/AH/033    BERNIE  <NA> FEMALE   TRUE      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972      GEORGIA  DOG      NORMAL FALSE 25-11-2015 22.89674
## 2 09-06-1972     MISSOURI  DOG      NORMAL FALSE 25-11-2015 25.06859
## 3 03-07-1972 PENNSYLVANIA <NA>      NORMAL FALSE 25-11-2015 26.38080
## 4 11-08-1972      FLORIDA  CAT GOOD HEALTH FALSE 25-11-2015 30.63867
## 5 06-06-1973         IOWA <NA>      NORMAL  TRUE 25-11-2015 26.53567
## 6 25-06-1973     MARYLAND  DOG      NORMAL FALSE 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      OBESE
## 5 OVERWEIGHT
## 6 OVERWEIGHT

Complete cases to remove all records with NA in any columns

vclComplete <- complete.cases(dfrPatient)
nrow(dfrPatient)
## [1] 100
dfrPatient <- dfrPatient[vclComplete, ]
nrow(dfrPatient)
## [1] 60
head(dfrPatient)
##           ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/001 DEMETRIUS WHITE   MALE  FALSE      182.87       76.57
## 2  AC/AH/017   ROSARIO WHITE   MALE  FALSE      179.12       80.43
## 4  AC/AH/022      LUPE WHITE   MALE  FALSE      175.66       94.54
## 9  AC/AH/045   SHIRLEY WHITE   MALE  FALSE      181.32       76.90
## 11 AC/AH/049    MARTIN WHITE FEMALE  FALSE      160.06       72.37
## 13 AC/AH/052  COURTNEY WHITE   MALE   TRUE      175.39       92.22
##     BirthDate      State   Pet HealthGrade  Died RecordDate BMIValue
## 1  31-01-1972    GEORGIA   DOG      NORMAL FALSE 25-11-2015 22.89674
## 2  09-06-1972   MISSOURI   DOG      NORMAL FALSE 25-11-2015 25.06859
## 4  11-08-1972    FLORIDA   CAT GOOD HEALTH FALSE 25-11-2015 30.63867
## 9  25-12-1971  LOUISIANA   DOG GOOD HEALTH FALSE 25-11-2015 23.39025
## 11 28-04-1972 CALIFORNIA HORSE      NORMAL  TRUE 25-12-2015 28.24834
## 13 16-03-1972    INDIANA  BIRD  BAD HEALTH FALSE 25-12-2015 29.97888
##      BMILabel
## 1      NORMAL
## 2  OVERWEIGHT
## 4       OBESE
## 9      NORMAL
## 11 OVERWEIGHT
## 13 OVERWEIGHT

Data Reporting Display top 10 records based on BMI-Value.

head(arrange(dfrPatient, desc(BMIValue)), 10)
##           ID     Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/009    SAMMY    WHITE   MALE  FALSE      166.84       88.25
## 2  AC/SG/064      JON    WHITE   MALE  FALSE      169.16       90.08
## 3  AC/AH/076   ALBERT    WHITE   MALE  FALSE      176.22       97.67
## 4  AC/AH/022     LUPE    WHITE   MALE  FALSE      175.66       94.54
## 5  AC/AH/248   ANDREA    WHITE   MALE  FALSE      178.64       97.05
## 6  AC/SG/067   THOMAS    WHITE   MALE  FALSE      167.51       84.15
## 7  AC/AH/052 COURTNEY    WHITE   MALE   TRUE      175.39       92.22
## 8  AC/AH/127     JAME    WHITE   MALE  FALSE      167.75       82.06
## 9  AC/SG/055     EVAN    WHITE   MALE  FALSE      166.75       79.06
## 10 AC/SG/181    TERRY HISPANIC   MALE  FALSE      177.14       88.70
##     BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1  04-03-1972      VERMONT  DOG GOOD HEALTH FALSE 25-06-2016 31.70402
## 2  04-10-1972     ILLINOIS  CAT      NORMAL  TRUE 25-07-2016 31.47988
## 3  08-04-1973    LOUISIANA  CAT      NORMAL FALSE 25-12-2015 31.45218
## 4  11-08-1972      FLORIDA  CAT GOOD HEALTH FALSE 25-11-2015 30.63867
## 5  12-01-1973      INDIANA  CAT GOOD HEALTH  TRUE 25-05-2016 30.41152
## 6  19-07-1972 PENNSYLVANIA BIRD      NORMAL  TRUE 25-07-2016 29.98974
## 7  16-03-1972      INDIANA BIRD  BAD HEALTH FALSE 25-12-2015 29.97888
## 8  29-10-1972        TEXAS  DOG GOOD HEALTH  TRUE 25-01-2016 29.16127
## 9  24-02-1972     ILLINOIS BIRD  BAD HEALTH  TRUE 25-07-2016 28.43316
## 10 24-11-1971      INDIANA  CAT  BAD HEALTH  TRUE 25-09-2016 28.26769
##      BMILabel
## 1       OBESE
## 2       OBESE
## 3       OBESE
## 4       OBESE
## 5       OBESE
## 6  OVERWEIGHT
## 7  OVERWEIGHT
## 8  OVERWEIGHT
## 9  OVERWEIGHT
## 10 OVERWEIGHT

Display bottom 10 records based on BMI-Value.

head(arrange(dfrPatient, BMIValue), 10)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/193    RONNIE    WHITE   MALE   TRUE      185.43       73.63
## 2  AC/SG/099    LESLIE    ASIAN   MALE  FALSE      172.72       67.62
## 3  AC/AH/001 DEMETRIUS    WHITE   MALE  FALSE      182.87       76.57
## 4  AC/AH/086      KYLE    BLACK   MALE   TRUE      180.11       75.72
## 5  AC/AH/045   SHIRLEY    WHITE   MALE  FALSE      181.32       76.90
## 6  AC/AH/114      KRIS HISPANIC   MALE  FALSE      177.75       74.84
## 7  AC/AH/077     TOMMY    BLACK   MALE  FALSE      174.09       72.20
## 8  AC/AH/150     BRETT    WHITE   MALE   TRUE      181.56       79.54
## 9  AC/AH/057    VERNON    WHITE FEMALE   TRUE      163.79       65.76
## 10 AC/AH/207    BOBBIE    WHITE FEMALE  FALSE      163.01       65.19
##     BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1  05-06-1973         IOWA  DOG  BAD HEALTH FALSE 25-09-2016 21.41385
## 2  04-02-1972         OHIO  CAT GOOD HEALTH FALSE 25-07-2016 22.66678
## 3  31-01-1972      GEORGIA  DOG      NORMAL FALSE 25-11-2015 22.89674
## 4  12-05-1973      GEORGIA  CAT  BAD HEALTH FALSE 25-12-2015 23.34183
## 5  25-12-1971    LOUISIANA  DOG GOOD HEALTH FALSE 25-11-2015 23.39025
## 6  19-11-1972 PENNSYLVANIA BIRD  BAD HEALTH FALSE 25-01-2016 23.68725
## 7  01-02-1973   WASHINGTON  CAT  BAD HEALTH FALSE 25-12-2015 23.82262
## 8  03-05-1972     KENTUCKY  DOG GOOD HEALTH  TRUE 25-02-2016 24.12933
## 9  06-01-1972     ILLINOIS  CAT  BAD HEALTH FALSE 25-12-2015 24.51247
## 10 17-05-1973      FLORIDA  DOG      NORMAL FALSE 25-03-2016 24.53310
##    BMILabel
## 1    NORMAL
## 2    NORMAL
## 3    NORMAL
## 4    NORMAL
## 5    NORMAL
## 6    NORMAL
## 7    NORMAL
## 8    NORMAL
## 9    NORMAL
## 10   NORMAL

Display Gender > Race - frequency / counts

summarise(group_by(dfrPatient, Gender, Race), n())
## Source: local data frame [8 x 3]
## Groups: Gender [?]
## 
##   Gender     Race `n()`
##    <chr>    <chr> <int>
## 1 FEMALE    ASIAN     2
## 2 FEMALE    BLACK     1
## 3 FEMALE HISPANIC     4
## 4 FEMALE    WHITE    27
## 5   MALE    ASIAN     2
## 6   MALE    BLACK     2
## 7   MALE HISPANIC     5
## 8   MALE    WHITE    17

Race > Gender - max, min and average values for BMI-Values

summarise(group_by(dfrPatient, Race, Gender), min(BMIValue), mean(BMIValue), max(BMIValue))
## Source: local data frame [8 x 5]
## Groups: Race [?]
## 
##       Race Gender `min(BMIValue)` `mean(BMIValue)` `max(BMIValue)`
##      <chr>  <chr>           <dbl>            <dbl>           <dbl>
## 1    ASIAN FEMALE        25.57631         26.88531        28.19431
## 2    ASIAN   MALE        22.66678         24.95782        27.24885
## 3    BLACK FEMALE        26.71407         26.71407        26.71407
## 4    BLACK   MALE        23.34183         23.58223        23.82262
## 5 HISPANIC FEMALE        25.03916         26.29513        26.89942
## 6 HISPANIC   MALE        23.68725         26.39844        28.26769
## 7    WHITE FEMALE        24.51247         26.60612        28.24834
## 8    WHITE   MALE        21.41385         27.53445        31.70402

Display all dead people

filter(dfrPatient, Died==TRUE)
##           ID        Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/049      MARTIN    WHITE FEMALE  FALSE      160.06       72.37
## 2  AC/AH/127        JAME    WHITE   MALE  FALSE      167.75       82.06
## 3  AC/AH/133       CLYDE HISPANIC   MALE  FALSE      181.15       83.93
## 4  AC/AH/150       BRETT    WHITE   MALE   TRUE      181.56       79.54
## 5  AC/AH/154        TONY    WHITE FEMALE  FALSE      160.03       64.30
## 6  AC/AH/156      GEORGE    WHITE   MALE  FALSE      165.62       76.72
## 7  AC/AH/160        RORY    ASIAN FEMALE  FALSE      159.67       71.88
## 8  AC/AH/176       JERRY    ASIAN   MALE  FALSE      175.21       83.65
## 9  AC/AH/180        DREW    WHITE FEMALE  FALSE      160.80       64.77
## 10 AC/AH/186 CHRISTOPHER    WHITE FEMALE  FALSE      157.95       67.41
## 11 AC/AH/211         SON    WHITE FEMALE  FALSE      157.16       69.64
## 12 AC/AH/219         JAY    WHITE FEMALE  FALSE      163.47       72.89
## 13 AC/AH/233      MARION    WHITE FEMALE  FALSE      163.97       66.71
## 14 AC/AH/248      ANDREA    WHITE   MALE  FALSE      178.64       97.05
## 15 AC/AH/249       JESUS HISPANIC FEMALE   TRUE      159.78       68.31
## 16 AC/SG/010        THEO    ASIAN FEMALE  FALSE      159.32       64.92
## 17 AC/SG/016      JIMMIE    BLACK FEMALE  FALSE      161.84       69.97
## 18 AC/SG/046        CARL HISPANIC   MALE  FALSE      171.41       81.70
## 19 AC/SG/055        EVAN    WHITE   MALE  FALSE      166.75       79.06
## 20 AC/SG/064         JON    WHITE   MALE  FALSE      169.16       90.08
## 21 AC/SG/065      SHAYNE    WHITE FEMALE  FALSE      157.01       66.56
## 22 AC/SG/067      THOMAS    WHITE   MALE  FALSE      167.51       84.15
## 23 AC/SG/068   VALENTINE HISPANIC FEMALE  FALSE      160.47       68.20
## 24 AC/SG/084       BRIAN HISPANIC   MALE  FALSE      174.25       80.93
## 25 AC/SG/101       JASON    WHITE FEMALE  FALSE      159.23       69.96
## 26 AC/SG/123     DARNELL    WHITE FEMALE   TRUE      162.32       72.72
## 27 AC/SG/134       DARYL    WHITE FEMALE   TRUE      162.59       69.76
## 28 AC/SG/155     RAYMOND    WHITE FEMALE  FALSE      158.35       69.72
## 29 AC/SG/165       ELMER    WHITE FEMALE  FALSE      162.18       67.81
## 30 AC/SG/179       LOGAN    WHITE   MALE  FALSE      183.10       82.47
## 31 AC/SG/181       TERRY HISPANIC   MALE  FALSE      177.14       88.70
## 32 AC/SG/197       STACY    WHITE FEMALE  FALSE      159.44       66.21
## 33 AC/SG/234        LUIS HISPANIC FEMALE  FALSE      164.88       68.07
##     BirthDate          State   Pet HealthGrade Died RecordDate BMIValue
## 1  28-04-1972     CALIFORNIA HORSE      NORMAL TRUE 25-12-2015 28.24834
## 2  29-10-1972          TEXAS   DOG GOOD HEALTH TRUE 25-01-2016 29.16127
## 3  13-10-1973     WASHINGTON   CAT  BAD HEALTH TRUE 25-02-2016 25.57647
## 4  03-05-1972       KENTUCKY   DOG GOOD HEALTH TRUE 25-02-2016 24.12933
## 5  30-08-1973     CALIFORNIA   DOG GOOD HEALTH TRUE 25-02-2016 25.10777
## 6  09-07-1972     CALIFORNIA   DOG GOOD HEALTH TRUE 25-02-2016 27.96939
## 7  22-09-1973        FLORIDA   CAT      NORMAL TRUE 25-02-2016 28.19431
## 8  01-05-1973       VIRGINIA   DOG  BAD HEALTH TRUE 25-03-2016 27.24885
## 9  18-02-1973         OREGON   CAT GOOD HEALTH TRUE 25-03-2016 25.04966
## 10 06-05-1972     NEW JERSEY   DOG  BAD HEALTH TRUE 25-03-2016 27.01998
## 11 14-07-1973     CALIFORNIA   CAT      NORMAL TRUE 25-04-2016 28.19517
## 12 07-04-1972 NORTH CAROLINA  BIRD GOOD HEALTH TRUE 25-04-2016 27.27670
## 13 23-12-1971           OHIO   CAT  BAD HEALTH TRUE 25-04-2016 24.81202
## 14 12-01-1973        INDIANA   CAT GOOD HEALTH TRUE 25-05-2016 30.41152
## 15 23-04-1972        ALABAMA   CAT      NORMAL TRUE 25-05-2016 26.75713
## 16 29-01-1973       NEW YORK   CAT      NORMAL TRUE 25-06-2016 25.57631
## 17 03-04-1972        ARIZONA   CAT  BAD HEALTH TRUE 25-06-2016 26.71407
## 18 05-08-1973    MISSISSIPPI  BIRD      NORMAL TRUE 25-06-2016 27.80672
## 19 24-02-1972       ILLINOIS  BIRD  BAD HEALTH TRUE 25-07-2016 28.43316
## 20 04-10-1972       ILLINOIS   CAT      NORMAL TRUE 25-07-2016 31.47988
## 21 05-04-1972     CALIFORNIA   DOG  BAD HEALTH TRUE 25-07-2016 26.99968
## 22 19-07-1972   PENNSYLVANIA  BIRD      NORMAL TRUE 25-07-2016 29.98974
## 23 15-04-1972      TENNESSEE   CAT  BAD HEALTH TRUE 25-07-2016 26.48480
## 24 06-03-1972       VIRGINIA   DOG      NORMAL TRUE 25-07-2016 26.65410
## 25 28-09-1973       MICHIGAN   DOG      NORMAL TRUE 25-07-2016 27.59307
## 26 03-09-1972 NORTH CAROLINA  BIRD GOOD HEALTH TRUE 25-08-2016 27.60005
## 27 28-05-1972          TEXAS   CAT      NORMAL TRUE 25-08-2016 26.38875
## 28 02-06-1972     CALIFORNIA   CAT  BAD HEALTH TRUE 25-08-2016 27.80489
## 29 25-03-1972     WASHINGTON  BIRD GOOD HEALTH TRUE 25-08-2016 25.78096
## 30 24-10-1972           OHIO   DOG  BAD HEALTH TRUE 25-09-2016 24.59910
## 31 24-11-1971        INDIANA   CAT  BAD HEALTH TRUE 25-09-2016 28.26769
## 32 08-11-1972       NEW YORK   CAT GOOD HEALTH TRUE 25-10-2016 26.04528
## 33 10-11-1971   PENNSYLVANIA   CAT  BAD HEALTH TRUE 25-10-2016 25.03916
##      BMILabel
## 1  OVERWEIGHT
## 2  OVERWEIGHT
## 3  OVERWEIGHT
## 4      NORMAL
## 5  OVERWEIGHT
## 6  OVERWEIGHT
## 7  OVERWEIGHT
## 8  OVERWEIGHT
## 9  OVERWEIGHT
## 10 OVERWEIGHT
## 11 OVERWEIGHT
## 12 OVERWEIGHT
## 13     NORMAL
## 14      OBESE
## 15 OVERWEIGHT
## 16 OVERWEIGHT
## 17 OVERWEIGHT
## 18 OVERWEIGHT
## 19 OVERWEIGHT
## 20      OBESE
## 21 OVERWEIGHT
## 22 OVERWEIGHT
## 23 OVERWEIGHT
## 24 OVERWEIGHT
## 25 OVERWEIGHT
## 26 OVERWEIGHT
## 27 OVERWEIGHT
## 28 OVERWEIGHT
## 29 OVERWEIGHT
## 30     NORMAL
## 31 OVERWEIGHT
## 32 OVERWEIGHT
## 33 OVERWEIGHT
nrow(filter(dfrPatient, Died==TRUE))
## [1] 33

Display all hispanice females

filter(dfrPatient, Race=="HISPANIC" & Gender=="FEMALE")
##          ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/249     JESUS HISPANIC FEMALE   TRUE      159.78       68.31
## 2 AC/SG/068 VALENTINE HISPANIC FEMALE  FALSE      160.47       68.20
## 3 AC/SG/122    MICHAL HISPANIC FEMALE  FALSE      160.09       68.94
## 4 AC/SG/234      LUIS HISPANIC FEMALE  FALSE      164.88       68.07
##    BirthDate          State Pet HealthGrade  Died RecordDate BMIValue
## 1 23-04-1972        ALABAMA CAT      NORMAL  TRUE 25-05-2016 26.75713
## 2 15-04-1972      TENNESSEE CAT  BAD HEALTH  TRUE 25-07-2016 26.48480
## 3 16-12-1971 SOUTH CAROLINA DOG GOOD HEALTH FALSE 25-08-2016 26.89942
## 4 10-11-1971   PENNSYLVANIA CAT  BAD HEALTH  TRUE 25-10-2016 25.03916
##     BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OVERWEIGHT
nrow(filter(dfrPatient, Race=="HISPANIC" & Gender=="FEMALE"))
## [1] 4

7 sample records from the dataset using seed(707)

set.seed(707)
sample_n(dfrPatient, 10)
##           ID     Name     Race Gender Smokes HeightInCms WeightInKgs
## 13 AC/AH/052 COURTNEY    WHITE   MALE   TRUE      175.39       92.22
## 48 AC/AH/219      JAY    WHITE FEMALE  FALSE      163.47       72.89
## 30 AC/AH/150    BRETT    WHITE   MALE   TRUE      181.56       79.54
## 55 AC/AH/248   ANDREA    WHITE   MALE  FALSE      178.64       97.05
## 73 AC/SG/084    BRIAN HISPANIC   MALE  FALSE      174.25       80.93
## 67 AC/SG/064      JON    WHITE   MALE  FALSE      169.16       90.08
## 80 AC/SG/122   MICHAL HISPANIC FEMALE  FALSE      160.09       68.94
## 9  AC/AH/045  SHIRLEY    WHITE   MALE  FALSE      181.32       76.90
## 20 AC/AH/086     KYLE    BLACK   MALE   TRUE      180.11       75.72
## 57 AC/SG/002      JAN    WHITE FEMALE   TRUE      161.57       67.92
##     BirthDate          State  Pet HealthGrade  Died RecordDate BMIValue
## 13 16-03-1972        INDIANA BIRD  BAD HEALTH FALSE 25-12-2015 29.97888
## 48 07-04-1972 NORTH CAROLINA BIRD GOOD HEALTH  TRUE 25-04-2016 27.27670
## 30 03-05-1972       KENTUCKY  DOG GOOD HEALTH  TRUE 25-02-2016 24.12933
## 55 12-01-1973        INDIANA  CAT GOOD HEALTH  TRUE 25-05-2016 30.41152
## 73 06-03-1972       VIRGINIA  DOG      NORMAL  TRUE 25-07-2016 26.65410
## 67 04-10-1972       ILLINOIS  CAT      NORMAL  TRUE 25-07-2016 31.47988
## 80 16-12-1971 SOUTH CAROLINA  DOG GOOD HEALTH FALSE 25-08-2016 26.89942
## 9  25-12-1971      LOUISIANA  DOG GOOD HEALTH FALSE 25-11-2015 23.39025
## 20 12-05-1973        GEORGIA  CAT  BAD HEALTH FALSE 25-12-2015 23.34183
## 57 03-07-1973        ARIZONA  DOG  BAD HEALTH FALSE 25-05-2016 26.01814
##      BMILabel
## 13 OVERWEIGHT
## 48 OVERWEIGHT
## 30     NORMAL
## 55      OBESE
## 73 OVERWEIGHT
## 67      OBESE
## 80 OVERWEIGHT
## 9      NORMAL
## 20     NORMAL
## 57 OVERWEIGHT

Summary

Note
Patient-data gives us an overall background of the patients in a particular hospital ward
Initially it contained a dataset of 100 records of which 45 were males and 55 females.
Since the data contained NA values those 40 records had to be removed after indentifying eror and missing data.
So finally the data set after cleaning contained 60 records.
Majority of patients race was of Whites followed by Hispanics.
More than half of them were dead patients.
Majority of females had good health as compared to males and the rest were normal or bad health.
Obesity condition was seen in males.

Objectives
The objectives of analyis of data, study of dplyr package, working of rmarkdown and publishing an html document on rpubs was successfully met.