Analysis Of Patient Data

Objectives

Understand and leran how to use RMD and dplyr properly..
1. Learn RMD
2. Analysis, Cleaning, Reporting of Patient Data

Assumption

  • Only White, Black, Hispanic, Bi-Racial, Asian are the reces others are invalid so used “NA”
  • Value of Smokes are “False” “True”, “No” and “Yes”, I assumed “No” as “False” while “Yes as”True"

Probem Definition

Provide R code to read the Patient.csv file and do data Preparation & Reporting 
- Add two new columns BMI Value and BMI Lebel  
- Data Cleaning  
- Report Preparation  

Code & Output

knitr Global Options

# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')

Load Libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Set working Directory

setwd(“D:/Welingkar/Trim 3/R/Assignment”)

Read Data

# inline comments
dfrPatient <- read.csv("./patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015

Total Rows Of Patient File: 100

1. Add coloumn BMI-Value

# inline comments
dfrPatient <- mutate(dfrPatient, BMI=(WeightInKgs/(HeightInCms/100)^2))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate      BMI
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015 27.90487

2. Add column BMI-Label

# inline comments
dfrPatient <- mutate(dfrPatient, BMILabel=NA)
dfrPatient$BMILabel <- ifelse(dfrPatient$BMI < 18.50,"UNDERWEIGHT",
                         ifelse(dfrPatient$BMI >= 18.50 & dfrPatient$BMI < 25.00, "NORMAL",
                         ifelse(dfrPatient$BMI >= 25.00 & dfrPatient$BMI < 30.00, "OVERWEIGHT",
                         ifelse(dfrPatient$BMI >= 30.00,"Obese", NA))))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate      BMI
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT

Data Cleaning

val_count <- nrow(dfrPatient)

The no of rows before data cleaning are 100

3.1 Distinct values in the columns

# inline comments
#####BMI  
summarise(group_by(dfrPatient, BMI), n())
## # A tibble: 100 × 2
##         BMI `n()`
##       <dbl> <int>
## 1  21.41385     1
## 2  22.04640     1
## 3  22.66678     1
## 4  22.89674     1
## 5  23.06452     1
## 6  23.34183     1
## 7  23.39025     1
## 8  23.51295     1
## 9  23.62505     1
## 10 23.68725     1
## # ... with 90 more rows
#####Gender
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 6 × 2
##    Gender `n()`
##     <chr> <int>
## 1  Female     6
## 2    Male     3
## 3  Female    45
## 4 Female      4
## 5    Male    40
## 6   Male      2
#####Race
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
#####Died
summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
##    Died `n()`
##   <chr> <int>
## 1 False    46
## 2  True    54
#####Pet
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
##      Pet `n()`
##    <chr> <int>
## 1   Bird     9
## 2    Cat    24
## 3    CAT     5
## 4    Dog    28
## 5    DOG     4
## 6  Horse     1
## 7   None    23
## 8   NONE     1
## 9   NULL     3
## 10  <NA>     2
#####Smokes
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  False    72
## 2     No     6
## 3   True    18
## 4    Yes     4
#####HealthGrade
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <int> <int>
## 1           1    29
## 2           2    30
## 3           3    34
## 4          99     7

3.2 Removing the Extra spaces in columns Gender (Leading/Trailing)

# inline comments
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 6 × 2
##    Gender `n()`
##     <chr> <int>
## 1  Female     6
## 2    Male     3
## 3  Female    45
## 4 Female      4
## 5    Male    40
## 6   Male      2
dfrPatient$Gender<-trimws(dfrPatient$Gender, which = c("both", "left", "right"))
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
##   Gender `n()`
##    <chr> <int>
## 1 Female    55
## 2   Male    45
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate      BMI
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT

3.3 Error handling in Race column

# inline comments
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
dfrPatient[,3]<- ifelse(dfrPatient[,3]=="White", "White",
                        ifelse(dfrPatient[,3]=="Black", "Black",
                               ifelse(dfrPatient[,3]=="Hispanic", "Hispanic",
                                      ifelse(dfrPatient[,3]=="Bi-Racial", "Bi-Racial",
                                             ifelse(dfrPatient[,3]=="Asian", "Asian","NA")))))
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4  Hispanic    17
## 5        NA     1
## 6     White    68

3.4 error handling in pet

# inline comments
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
##      Pet `n()`
##    <chr> <int>
## 1   Bird     9
## 2    Cat    24
## 3    CAT     5
## 4    Dog    28
## 5    DOG     4
## 6  Horse     1
## 7   None    23
## 8   NONE     1
## 9   NULL     3
## 10  <NA>     2
dfrPatient$Pet<-ifelse(is.na(dfrPatient$Pet),"NA",
                       ifelse(dfrPatient$Pet=="DOG", "Dog",
                       ifelse(dfrPatient$Pet=="CAT", "Cat",
                              ifelse(dfrPatient$Pet=="None", "NA",
                                     ifelse(dfrPatient$Pet=="NONE", "NA",
                                            ifelse(dfrPatient$Pet=="NULL", "NA",dfrPatient$Pet))))))
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 5 × 2
##     Pet `n()`
##   <chr> <int>
## 1  Bird     9
## 2   Cat    29
## 3   Dog    32
## 4 Horse     1
## 5    NA    29

3.5 Error handling in Smokes column

# inline comments
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  False    72
## 2     No     6
## 3   True    18
## 4    Yes     4
dfrPatient[,5] <- ifelse(dfrPatient[,5] == "No","False" ,
                              ifelse(dfrPatient[,5] == "Yes", "True",
                                     ifelse(dfrPatient[,5]=="True", "True",
                              ifelse(dfrPatient[,5]=="False","False","NA"))))
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 2 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  False    78
## 2   True    22

3.6 Error handling in state column

# inline comments
dfrPatient$State[dfrPatient$State=="Georgia,xxx"] <- "Georgia"

3.7 Error handling in Healthgrade

# inline comments
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <int> <int>
## 1           1    29
## 2           2    30
## 3           3    34
## 4          99     7
class(dfrPatient$HealthGrade)
## [1] "integer"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==1] <- "GOOD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==2] <- "NORMAL"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==3] <- "BAD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==99] <- NA
class(dfrPatient$HealthGrade)
## [1] "character"
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1         BAD    34
## 2        GOOD    29
## 3      NORMAL    30
## 4        <NA>     7

3.8 Remove all records with NA in any columns (complete cases)

# inline comments
vclComplete <- complete.cases(dfrPatient)
vclComplete[is.na(vclComplete)]
## logical(0)
dfrPatient <- dfrPatient[vclComplete, ]
val1 <- nrow(dfrPatient)
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 3 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1         BAD    34
## 2        GOOD    29
## 3      NORMAL    30

The no of rows after data cleaning are 93

Reporting

1. Display top 10 records based on BMI-Value

# inline comments
head(arrange(dfrPatient, desc(BMI)), 10)
##           ID     Name  Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/009    Sammy White   Male  False      166.84       88.25
## 2  AC/SG/064      Jon White   Male  False      169.16       90.08
## 3  AC/AH/076   Albert White   Male  False      176.22       97.67
## 4  AC/AH/104   Jeremy White   Male   True      169.85       90.63
## 5  AC/AH/022     Lupe White   Male  False      175.66       94.54
## 6  AC/AH/248   Andrea White   Male  False      178.64       97.05
## 7  AC/SG/067   Thomas White   Male  False      167.51       84.15
## 8  AC/AH/052 Courtney White   Male   True      175.39       92.22
## 9  AC/AH/159   Edward White   Male  False      181.64       96.91
## 10 AC/AH/127     Jame White   Male  False      167.75       82.06
##     BirthDate        State  Pet HealthGrade  Died RecordDate      BMI
## 1  04-03-1972      Vermont  Dog        GOOD False 25-06-2016 31.70402
## 2  04-10-1972     Illinois  Cat      NORMAL  True 25-07-2016 31.47988
## 3  08-04-1973    Louisiana  Cat      NORMAL False 25-12-2015 31.45218
## 4  12-04-1972     Kentucky   NA        GOOD  True 25-12-2015 31.41528
## 5  11-08-1972      Florida  Cat        GOOD False 25-11-2015 30.63867
## 6  12-01-1973      Indiana  Cat        GOOD  True 25-05-2016 30.41152
## 7  19-07-1972 Pennsylvania Bird      NORMAL  True 25-07-2016 29.98974
## 8  16-03-1972      Indiana Bird         BAD False 25-12-2015 29.97888
## 9  04-12-1972  Connecticut  Cat      NORMAL False 25-02-2016 29.37282
## 10 29-10-1972        Texas  Dog        GOOD  True 25-01-2016 29.16127
##      BMILabel
## 1       Obese
## 2       Obese
## 3       Obese
## 4       Obese
## 5       Obese
## 6       Obese
## 7  OVERWEIGHT
## 8  OVERWEIGHT
## 9  OVERWEIGHT
## 10 OVERWEIGHT

2. Display Bottom Ten records according to BMI

# inline comments
fr <- as.integer(nrow(arrange(dfrPatient, desc(BMI)))-9)
to <- nrow(dfrPatient)
dfr_bottom_ten<- slice(arrange(dfrPatient, desc(BMI)), fr:to)
val_row <- nrow(dfr_bottom_ten)
head(dfr_bottom_ten,10)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/150     Brett    White   Male   True      181.56       79.54
## 2  AC/AH/077     Tommy    Black   Male  False      174.09       72.20
## 3  AC/AH/114      Kris Hispanic   Male  False      177.75       74.84
## 4  AC/AH/164     Shane Hispanic   Male   True      177.03       74.04
## 5  AC/AH/089      Dong    White   Male  False      179.24       75.54
## 6  AC/AH/045   Shirley    White   Male  False      181.32       76.90
## 7  AC/AH/086      Kyle    Black   Male   True      180.11       75.72
## 8  AC/AH/001 Demetrius    White   Male  False      182.87       76.57
## 9  AC/SG/099    Leslie    Asian   Male  False      172.72       67.62
## 10 AC/SG/193    Ronnie    White   Male   True      185.43       73.63
##     BirthDate        State  Pet HealthGrade  Died RecordDate      BMI
## 1  03-05-1972     Kentucky  Dog        GOOD  True 25-02-2016 24.12933
## 2  01-02-1973   Washington  Cat         BAD False 25-12-2015 23.82262
## 3  19-11-1972 Pennsylvania Bird         BAD False 25-01-2016 23.68725
## 4  18-02-1972      Florida   NA      NORMAL False 25-02-2016 23.62505
## 5  11-03-1972   California   NA      NORMAL  True 25-12-2015 23.51295
## 6  25-12-1971    Louisiana  Dog        GOOD False 25-11-2015 23.39025
## 7  12-05-1973      Georgia  Cat         BAD False 25-12-2015 23.34183
## 8  31-01-1972      Georgia  Dog      NORMAL False 25-11-2015 22.89674
## 9  04-02-1972         Ohio  Cat        GOOD False 25-07-2016 22.66678
## 10 05-06-1973         Iowa  Dog         BAD False 25-09-2016 21.41385
##    BMILabel
## 1    NORMAL
## 2    NORMAL
## 3    NORMAL
## 4    NORMAL
## 5    NORMAL
## 6    NORMAL
## 7    NORMAL
## 8    NORMAL
## 9    NORMAL
## 10   NORMAL

No of records in above table are 10

3. Provide frequency / counts of Gender > Race

# inline comments
summarise(group_by(dfrPatient, Gender, Race), n())
## Source: local data frame [10 x 3]
## Groups: Gender [?]
## 
##    Gender      Race `n()`
##     <chr>     <chr> <int>
## 1  Female     Asian     3
## 2  Female     Black     2
## 3  Female  Hispanic     6
## 4  Female        NA     1
## 5  Female     White    39
## 6    Male     Asian     2
## 7    Male Bi-Racial     1
## 8    Male     Black     4
## 9    Male  Hispanic     9
## 10   Male     White    26

4. Race > Gender - max, min and average values for BMI-Values

# inline comments
summarise(group_by(dfrPatient, Race, Gender), min(BMI), mean(BMI), max(BMI))
## Source: local data frame [10 x 5]
## Groups: Race [?]
## 
##         Race Gender `min(BMI)` `mean(BMI)` `max(BMI)`
##        <chr>  <chr>      <dbl>       <dbl>      <dbl>
## 1      Asian Female   24.42511    26.06524   28.19431
## 2      Asian   Male   22.66678    24.95782   27.24885
## 3  Bi-Racial   Male   24.83473    24.83473   24.83473
## 4      Black Female   25.22482    25.96945   26.71407
## 5      Black   Male   23.34183    25.03778   26.60586
## 6   Hispanic Female   25.03916    26.52176   27.84206
## 7   Hispanic   Male   23.62505    26.02289   28.26769
## 8         NA Female   27.90487    27.90487   27.90487
## 9      White Female   24.21459    26.41985   28.24834
## 10     White   Male   21.41385    27.67114   31.70402

5. Data related to Dead people

# inline comments
filter(dfrPatient, Died=="True")
##           ID        Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/029      Lavern    White Female  False      164.47       71.78
## 2  AC/AH/049      Martin    White Female  False      160.06       72.37
## 3  AC/AH/089        Dong    White   Male  False      179.24       75.54
## 4  AC/AH/104      Jeremy    White   Male   True      169.85       90.63
## 5  AC/AH/127        Jame    White   Male  False      167.75       82.06
## 6  AC/AH/133       Clyde Hispanic   Male  False      181.15       83.93
## 7  AC/AH/150       Brett    White   Male   True      181.56       79.54
## 8  AC/AH/154        Tony    White Female  False      160.03       64.30
## 9  AC/AH/156      George    White   Male  False      165.62       76.72
## 10 AC/AH/160        Rory    Asian Female  False      159.67       71.88
## 11 AC/AH/171       Devin    White Female  False      163.35       70.46
## 12 AC/AH/176       Jerry    Asian   Male  False      175.21       83.65
## 13 AC/AH/180        Drew    White Female  False      160.80       64.77
## 14 AC/AH/186 Christopher    White Female  False      157.95       67.41
## 15 AC/AH/192   Dominique    White   Male  False      180.61       83.59
## 16 AC/AH/211         Son    White Female  False      157.16       69.64
## 17 AC/AH/219         Jay    White Female  False      163.47       72.89
## 18 AC/AH/233      Marion    White Female  False      163.97       66.71
## 19 AC/AH/248      Andrea    White   Male  False      178.64       97.05
## 20 AC/AH/249       Jesus Hispanic Female   True      159.78       68.31
## 21 AC/SG/003      Walter    White Female  False      161.83       66.03
## 22 AC/SG/008        Dana    White   Male   True      169.66       77.30
## 23 AC/SG/010        Theo    Asian Female  False      159.32       64.92
## 24 AC/SG/015       Shaun    White   Male   True      170.51       84.35
## 25 AC/SG/016      Jimmie    Black Female  False      161.84       69.97
## 26 AC/SG/046        Carl Hispanic   Male  False      171.41       81.70
## 27 AC/SG/055        Evan    White   Male  False      166.75       79.06
## 28 AC/SG/056     Merrill    Asian Female   True      166.19       67.46
## 29 AC/SG/064         Jon    White   Male  False      169.16       90.08
## 30 AC/SG/065      Shayne    White Female  False      157.01       66.56
## 31 AC/SG/067      Thomas    White   Male  False      167.51       84.15
## 32 AC/SG/068   Valentine Hispanic Female  False      160.47       68.20
## 33 AC/SG/084       Brian Hispanic   Male  False      174.25       80.93
## 34 AC/SG/101       Jason    White Female  False      159.23       69.96
## 35 AC/SG/116      Connie    Black   Male  False      184.34       90.41
## 36 AC/SG/123     Darnell    White Female   True      162.32       72.72
## 37 AC/SG/134       Daryl    White Female   True      162.59       69.76
## 38 AC/SG/155     Raymond    White Female  False      158.35       69.72
## 39 AC/SG/165       Elmer    White Female  False      162.18       67.81
## 40 AC/SG/167       Jimmy    White Female  False      159.38       70.37
## 41 AC/SG/172     Whitney    White   Male  False      171.45       84.29
## 42 AC/SG/179       Logan    White   Male  False      183.10       82.47
## 43 AC/SG/181       Terry Hispanic   Male  False      177.14       88.70
## 44 AC/SG/182       Jamie Hispanic   Male   True      171.08       72.51
## 45 AC/SG/191        Lacy Hispanic Female  False      159.33       70.68
## 46 AC/SG/197       Stacy    White Female  False      159.44       66.21
## 47 AC/SG/216        Alva    White Female  False      159.13       66.96
## 48 AC/SG/217        Dean    White Female  False      160.58       71.49
## 49 AC/SG/234        Luis Hispanic Female  False      164.88       68.07
##     BirthDate          State   Pet HealthGrade Died RecordDate      BMI
## 1  06-06-1973           Iowa    NA      NORMAL True 25-11-2015 26.53567
## 2  28-04-1972     California Horse      NORMAL True 25-12-2015 28.24834
## 3  11-03-1972     California    NA      NORMAL True 25-12-2015 23.51295
## 4  12-04-1972       Kentucky    NA        GOOD True 25-12-2015 31.41528
## 5  29-10-1972          Texas   Dog        GOOD True 25-01-2016 29.16127
## 6  13-10-1973     Washington   Cat         BAD True 25-02-2016 25.57647
## 7  03-05-1972       Kentucky   Dog        GOOD True 25-02-2016 24.12933
## 8  30-08-1973     California   Dog        GOOD True 25-02-2016 25.10777
## 9  09-07-1972     California   Dog        GOOD True 25-02-2016 27.96939
## 10 22-09-1973        Florida   Cat      NORMAL True 25-02-2016 28.19431
## 11 16-04-1973     California  Bird         BAD True 25-03-2016 26.40611
## 12 01-05-1973       Virginia   Dog         BAD True 25-03-2016 27.24885
## 13 18-02-1973         Oregon   Cat        GOOD True 25-03-2016 25.04966
## 14 06-05-1972     New Jersey   Dog         BAD True 25-03-2016 27.01998
## 15 24-03-1972       Michigan    NA         BAD True 25-03-2016 25.62541
## 16 14-07-1973     California   Cat      NORMAL True 25-04-2016 28.19517
## 17 07-04-1972 North Carolina  Bird        GOOD True 25-04-2016 27.27670
## 18 23-12-1971           Ohio   Cat         BAD True 25-04-2016 24.81202
## 19 12-01-1973        Indiana   Cat        GOOD True 25-05-2016 30.41152
## 20 23-04-1972        Alabama   Cat      NORMAL True 25-05-2016 26.75713
## 21 11-07-1972         Oregon    NA      NORMAL True 25-05-2016 25.21292
## 22 26-05-1973         Nevada   Dog        GOOD True 25-05-2016 26.85472
## 23 29-01-1973       New York   Cat      NORMAL True 25-06-2016 25.57631
## 24 09-11-1972     New Jersey   Dog         BAD True 25-06-2016 29.01252
## 25 03-04-1972        Arizona   Cat         BAD True 25-06-2016 26.71407
## 26 05-08-1973    Mississippi  Bird      NORMAL True 25-06-2016 27.80672
## 27 24-02-1972       Illinois  Bird         BAD True 25-07-2016 28.43316
## 28 27-11-1972        Indiana    NA         BAD True 25-07-2016 24.42511
## 29 04-10-1972       Illinois   Cat      NORMAL True 25-07-2016 31.47988
## 30 05-04-1972     California   Dog         BAD True 25-07-2016 26.99968
## 31 19-07-1972   Pennsylvania  Bird      NORMAL True 25-07-2016 29.98974
## 32 15-04-1972      Tennessee   Cat         BAD True 25-07-2016 26.48480
## 33 06-03-1972       Virginia   Dog      NORMAL True 25-07-2016 26.65410
## 34 28-09-1973       Michigan   Dog      NORMAL True 25-07-2016 27.59307
## 35 05-06-1972        Florida    NA         BAD True 25-08-2016 26.60586
## 36 03-09-1972 North Carolina  Bird        GOOD True 25-08-2016 27.60005
## 37 28-05-1972          Texas   Cat      NORMAL True 25-08-2016 26.38875
## 38 02-06-1972     California   Cat         BAD True 25-08-2016 27.80489
## 39 25-03-1972     Washington  Bird        GOOD True 25-08-2016 25.78096
## 40 30-09-1973     Washington    NA      NORMAL True 25-09-2016 27.70256
## 41 25-02-1972        Florida   Dog      NORMAL True 25-09-2016 28.67484
## 42 24-10-1972           Ohio   Dog         BAD True 25-09-2016 24.59910
## 43 24-11-1971        Indiana   Cat         BAD True 25-09-2016 28.26769
## 44 25-03-1973      Louisiana    NA         BAD True 25-09-2016 24.77419
## 45 21-06-1973          Texas    NA         BAD True 25-09-2016 27.84206
## 46 08-11-1972       New York   Cat        GOOD True 25-10-2016 26.04528
## 47 19-06-1972        Alabama    NA        GOOD True 25-10-2016 26.44304
## 48 11-11-1972           Ohio    NA        GOOD True 25-10-2016 27.72441
## 49 10-11-1971   Pennsylvania   Cat         BAD True 25-10-2016 25.03916
##      BMILabel
## 1  OVERWEIGHT
## 2  OVERWEIGHT
## 3      NORMAL
## 4       Obese
## 5  OVERWEIGHT
## 6  OVERWEIGHT
## 7      NORMAL
## 8  OVERWEIGHT
## 9  OVERWEIGHT
## 10 OVERWEIGHT
## 11 OVERWEIGHT
## 12 OVERWEIGHT
## 13 OVERWEIGHT
## 14 OVERWEIGHT
## 15 OVERWEIGHT
## 16 OVERWEIGHT
## 17 OVERWEIGHT
## 18     NORMAL
## 19      Obese
## 20 OVERWEIGHT
## 21 OVERWEIGHT
## 22 OVERWEIGHT
## 23 OVERWEIGHT
## 24 OVERWEIGHT
## 25 OVERWEIGHT
## 26 OVERWEIGHT
## 27 OVERWEIGHT
## 28     NORMAL
## 29      Obese
## 30 OVERWEIGHT
## 31 OVERWEIGHT
## 32 OVERWEIGHT
## 33 OVERWEIGHT
## 34 OVERWEIGHT
## 35 OVERWEIGHT
## 36 OVERWEIGHT
## 37 OVERWEIGHT
## 38 OVERWEIGHT
## 39 OVERWEIGHT
## 40 OVERWEIGHT
## 41 OVERWEIGHT
## 42     NORMAL
## 43 OVERWEIGHT
## 44     NORMAL
## 45 OVERWEIGHT
## 46 OVERWEIGHT
## 47 OVERWEIGHT
## 48 OVERWEIGHT
## 49 OVERWEIGHT
Val_Dead <- nrow(filter(dfrPatient, Died=="True"))

The no of Dead people are 49

6. Display All Records for “Hispanic Females”

# inline comments
filter(dfrPatient, Race=="Hispanic" & Gender=="Female")
##          ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/208  Lawrence Hispanic Female  False      165.80       71.77
## 2 AC/AH/249     Jesus Hispanic Female   True      159.78       68.31
## 3 AC/SG/068 Valentine Hispanic Female  False      160.47       68.20
## 4 AC/SG/122    Michal Hispanic Female  False      160.09       68.94
## 5 AC/SG/191      Lacy Hispanic Female  False      159.33       70.68
## 6 AC/SG/234      Luis Hispanic Female  False      164.88       68.07
##    BirthDate          State Pet HealthGrade  Died RecordDate      BMI
## 1 07-08-1973      Louisiana  NA        GOOD False 25-03-2016 26.10802
## 2 23-04-1972        Alabama Cat      NORMAL  True 25-05-2016 26.75713
## 3 15-04-1972      Tennessee Cat         BAD  True 25-07-2016 26.48480
## 4 16-12-1971 South Carolina Dog        GOOD False 25-08-2016 26.89942
## 5 21-06-1973          Texas  NA         BAD  True 25-09-2016 27.84206
## 6 10-11-1971   Pennsylvania Cat         BAD  True 25-10-2016 25.03916
##     BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OVERWEIGHT
## 5 OVERWEIGHT
## 6 OVERWEIGHT
Val_HF <- nrow(filter(dfrPatient, Race=="Hispanic" & Gender=="Female"))

The no of Hispanic Females are 6

7. Seven sample records from the dataset using seed(707)

# inline comments
set.seed(707)
sample_n(dfrPatient, 7)
##           ID     Name      Race Gender Smokes HeightInCms WeightInKgs
## 10 AC/AH/048    Merle  Hispanic   Male  False      167.37       79.06
## 44 AC/AH/208 Lawrence  Hispanic Female  False      165.80       71.77
## 27 AC/AH/115    Tracy Bi-Racial   Male   True      183.21       83.36
## 53 AC/AH/241  Lindsay     White Female  False      161.38       73.55
## 75 AC/SG/099   Leslie     Asian   Male  False      172.72       67.62
## 68 AC/SG/065   Shayne     White Female  False      157.01       66.56
## 83 AC/SG/139   Jordan     White   Male  False      171.94       82.11
##     BirthDate          State Pet HealthGrade  Died RecordDate      BMI
## 10 13-07-1973 North Carolina  NA      NORMAL False 25-12-2015 28.22290
## 44 07-08-1973      Louisiana  NA        GOOD False 25-03-2016 26.10802
## 27 29-09-1973     California Dog      NORMAL False 25-01-2016 24.83473
## 53 08-02-1972        Florida Cat         BAD False 25-05-2016 28.24121
## 75 04-02-1972           Ohio Cat        GOOD False 25-07-2016 22.66678
## 68 05-04-1972     California Dog         BAD  True 25-07-2016 26.99968
## 83 06-10-1973       Michigan  NA        GOOD False 25-08-2016 27.77424
##      BMILabel
## 10 OVERWEIGHT
## 44 OVERWEIGHT
## 27     NORMAL
## 53 OVERWEIGHT
## 75     NORMAL
## 68 OVERWEIGHT
## 83 OVERWEIGHT

Summary

# inline comments
val_total <- nrow(filter(dfrPatient))
val_female <- nrow(filter(dfrPatient, Gender=="Female"))
val_Male <- nrow(filter(dfrPatient, Gender=="Male"))
val_died <- nrow(filter(dfrPatient, Died=="True"))
val_good_health <- nrow(filter(dfrPatient, HealthGrade=="GOOD"))
val_bad_health <- nrow(filter(dfrPatient, HealthGrade=="BAD"))
val_bmi <- nrow(filter(dfrPatient, BMILabel=="Underweight" | BMILabel=="Overweight" | BMILabel=="Obese"))
val_maxbmi <- max(dfrPatient$BMI)
val_minbmi <- min(dfrPatient$BMI)
val_meanbmi <- mean(dfrPatient$BMI)

Note

Patint data file was having information of 100 patients, After cleaning the data there are total information of 93 patients.  
The no of males are 42 while no of females are 51  
Out of 93 patients, 29 people are having good health while 34 people are having bad health.  
6 people are not having Normal BMI Value.
Out of 93 patients, maximum BMI value is 31.7040213, Minimum BMI value is 21.4138523 while average BMI value is 26.624746

Objectives

It was a good exercise which helped to know about the RMD and how to show the information in HTML.