Analysis Of Patient Data

Objectives

  1. To understand the working of RMD tool using various syntax Conveting a RMD file into a Markdown using knit pacakage and finally to HTML code

  2. Analyze the patient data with the help of deployr Package

Probem Definition

Data Manipulation
1. Understand and learn how to use dplyr properly

Code & Output

knitr Global Options

# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')

Working Directory

setwd("F:/Management/Trimester 3/8. R/Class/R-BA/R_scripts")

Load Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Read Data

# Reading the Patient-Data .csv file
dfrPatient <- read.csv("./data/patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  FALSE      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern White Female  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   TRUE      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate
## 1 31-01-1972  Georgia,xxx  Dog           2 FALSE 25-11-2015
## 2 09-06-1972     Missouri  Dog           2 FALSE 25-11-2015
## 3 03-07-1972 Pennsylvania None           2 FALSE 25-11-2015
## 4 11-08-1972      Florida  Cat           1 FALSE 25-11-2015
## 5 06-06-1973         Iowa NULL           2  TRUE 25-11-2015
## 6 25-06-1973     Maryland  Dog           2 FALSE 25-11-2015

Total Rows Of Patient File: 100

Add coloumn BMI-Value

# Provide New Column BMI-Value (BodyMassIndex)
dfrPatient <- mutate(dfrPatient, BMIValue=(WeightInKgs/(HeightInCms/100)^2))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  FALSE      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern White Female  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   TRUE      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972  Georgia,xxx  Dog           2 FALSE 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 FALSE 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 FALSE 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 FALSE 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  TRUE 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 FALSE 25-11-2015 27.90487

Add column BMI-Label

# Provide New Column BMI-Label based on BMI-Value
dfrPatient <- mutate(dfrPatient, BMILabel=NA)
dfrPatient$BMILabel <- ifelse(dfrPatient$BMIValue < 18.50,"UNDERWEIGHT",
                         ifelse(dfrPatient$BMIValue > 18.50 & dfrPatient$BMIValue < 25.00, "NORMAL",
                         ifelse(dfrPatient$BMIValue > 25.00 & dfrPatient$BMIValue < 30.00, "OVERWEIGHT",
                         ifelse(dfrPatient$BMIValue > 30.00,"Obese", NA))))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  FALSE      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern White Female  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   TRUE      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972  Georgia,xxx  Dog           2 FALSE 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 FALSE 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 FALSE 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 FALSE 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  TRUE 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 FALSE 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT

Using summarise function to check for any error values or missing data in columns

# Error Handling
#1. BMILabel
summarise(group_by(dfrPatient, BMILabel), n())
## # A tibble: 3 × 2
##     BMILabel `n()`
##        <chr> <int>
## 1     NORMAL    23
## 2      Obese     6
## 3 OVERWEIGHT    71
#2. Gender
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 6 × 2
##    Gender `n()`
##     <chr> <int>
## 1  Female     6
## 2    Male     3
## 3  Female    45
## 4 Female      4
## 5    Male    40
## 6   Male      2
#3. Race
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
#4. Died
summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
##    Died `n()`
##   <lgl> <int>
## 1 FALSE    46
## 2  TRUE    54
#5. Pet
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
##      Pet `n()`
##    <chr> <int>
## 1   Bird     9
## 2    Cat    24
## 3    CAT     5
## 4    Dog    28
## 5    DOG     4
## 6  Horse     1
## 7   None    23
## 8   NONE     1
## 9   NULL     3
## 10  <NA>     2
#6. Smokes
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  FALSE    72
## 2     No     6
## 3   TRUE    18
## 4    Yes     4
#7. HealthGrade
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <int> <int>
## 1           1    29
## 2           2    30
## 3           3    34
## 4          99     7
#8. State
summarise(group_by(dfrPatient, State), n())
## # A tibble: 34 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     3
## 8  Georgia,xxx     1
## 9       Hawaii     2
## 10    Illinois     4
## # ... with 24 more rows

Error handling in gender

#---
# There seems to be leading or trailing spaces, hence removing them
# Using functions TRIM, UPPER/LOWER, to ensure the correctness of data
#---
# Before modification
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 6 × 2
##    Gender `n()`
##     <chr> <int>
## 1  Female     6
## 2    Male     3
## 3  Female    45
## 4 Female      4
## 5    Male    40
## 6   Male      2
dfrPatient$Gender <- trimws(toupper(dfrPatient$Gender))
# After modification
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
##   Gender `n()`
##    <chr> <int>
## 1 FEMALE    55
## 2   MALE    45

Error handling in race

#---
# Removing the irrelevant data to ensure the correctness of data
#---
# Before modification
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
dfrPatient$Race <- trimws(toupper(dfrPatient$Race))
dfrPatient$Race[dfrPatient$Race=="DOG"] <- NA
dfrPatient$Race[dfrPatient$Race=="BI-RACIAL"] <- NA
# After modification
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 5 × 2
##       Race `n()`
##      <chr> <int>
## 1    ASIAN     5
## 2    BLACK     8
## 3 HISPANIC    17
## 4    WHITE    68
## 5     <NA>     2

Error handling in pet

#---
# Maintaining uniformity & converting None, NULL to NA
#---
# Before modification
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
##      Pet `n()`
##    <chr> <int>
## 1   Bird     9
## 2    Cat    24
## 3    CAT     5
## 4    Dog    28
## 5    DOG     4
## 6  Horse     1
## 7   None    23
## 8   NONE     1
## 9   NULL     3
## 10  <NA>     2
dfrPatient$Pet <- trimws(toupper(dfrPatient$Pet))
dfrPatient$Pet[dfrPatient$Pet=="NONE"] <- NA
dfrPatient$Pet[dfrPatient$Pet=="NULL"] <- NA
# After modification
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 5 × 2
##     Pet `n()`
##   <chr> <int>
## 1  BIRD     9
## 2   CAT    29
## 3   DOG    32
## 4 HORSE     1
## 5  <NA>    29

Error handling in smokes

#---
# Maintaining uniformity, converting character to LOgical
#---
# Before modification
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  FALSE    72
## 2     No     6
## 3   TRUE    18
## 4    Yes     4
class(dfrPatient$Smokes)
## [1] "character"
dfrPatient$Smokes[dfrPatient$Smokes=="Yes"]<-"TRUE"
dfrPatient$Smokes[dfrPatient$Smokes=="No"] <-"FALSE"
dfrPatient$Smokes <- as.logical(dfrPatient$Smokes)
# After modification
class(dfrPatient$Smokes)
## [1] "logical"
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 2 × 2
##   Smokes `n()`
##    <lgl> <int>
## 1  FALSE    78
## 2   TRUE    22

Error handling in HealthGrade

#---
# Mapping the character values to the column 
#---
# Before modification
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <int> <int>
## 1           1    29
## 2           2    30
## 3           3    34
## 4          99     7
class(dfrPatient$HealthGrade)
## [1] "integer"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==1] <- "GOOD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==2] <- "NORMAL"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==3] <- "BAD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==99] <- NA
# After modification
class(dfrPatient$HealthGrade)
## [1] "character"
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1         BAD    34
## 2        GOOD    29
## 3      NORMAL    30
## 4        <NA>     7

Error handling in State

#---
# Correcting the state value
#---
# Before modification
summarise(group_by(dfrPatient, State), n())
## # A tibble: 34 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     3
## 8  Georgia,xxx     1
## 9       Hawaii     2
## 10    Illinois     4
## # ... with 24 more rows
dfrPatient$State[dfrPatient$State=="Georgia,xxx"] <- "Georgia"
# After modification
summarise(group_by(dfrPatient, State), n())
## # A tibble: 33 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     4
## 8       Hawaii     2
## 9     Illinois     4
## 10     Indiana     4
## # ... with 23 more rows
print(dfrPatient$State[dfrPatient$State=="Georgia,xxx"])
## character(0)

Handling NA values

#---
# Removing all records with NA in any columns using complete.cases
#---
vclComplete <- complete.cases(dfrPatient)
vclComplete[is.true(vclComplete)]
## Error in eval(expr, envir, enclos): could not find function "is.true"
dfrPatient <- dfrPatient[vclComplete, ]
nrow(dfrPatient)
## [1] 66

—————————————————-

REPORTING

—————————————————-

#1.) Display top 10 records based on BMI-Value**
head(arrange(dfrPatient, desc(BMIValue)), 10)
##           ID     Name  Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/009    Sammy WHITE   MALE  FALSE      166.84       88.25
## 2  AC/SG/064      Jon WHITE   MALE  FALSE      169.16       90.08
## 3  AC/AH/076   Albert WHITE   MALE  FALSE      176.22       97.67
## 4  AC/AH/022     Lupe WHITE   MALE  FALSE      175.66       94.54
## 5  AC/AH/248   Andrea WHITE   MALE  FALSE      178.64       97.05
## 6  AC/SG/067   Thomas WHITE   MALE  FALSE      167.51       84.15
## 7  AC/AH/052 Courtney WHITE   MALE   TRUE      175.39       92.22
## 8  AC/AH/159   Edward WHITE   MALE  FALSE      181.64       96.91
## 9  AC/AH/127     Jame WHITE   MALE  FALSE      167.75       82.06
## 10 AC/SG/015    Shaun WHITE   MALE   TRUE      170.51       84.35
##     BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1  04-03-1972      Vermont  DOG        GOOD FALSE 25-06-2016 31.70402
## 2  04-10-1972     Illinois  CAT      NORMAL  TRUE 25-07-2016 31.47988
## 3  08-04-1973    Louisiana  CAT      NORMAL FALSE 25-12-2015 31.45218
## 4  11-08-1972      Florida  CAT        GOOD FALSE 25-11-2015 30.63867
## 5  12-01-1973      Indiana  CAT        GOOD  TRUE 25-05-2016 30.41152
## 6  19-07-1972 Pennsylvania BIRD      NORMAL  TRUE 25-07-2016 29.98974
## 7  16-03-1972      Indiana BIRD         BAD FALSE 25-12-2015 29.97888
## 8  04-12-1972  Connecticut  CAT      NORMAL FALSE 25-02-2016 29.37282
## 9  29-10-1972        Texas  DOG        GOOD  TRUE 25-01-2016 29.16127
## 10 09-11-1972   New Jersey  DOG         BAD  TRUE 25-06-2016 29.01252
##      BMILabel
## 1       Obese
## 2       Obese
## 3       Obese
## 4       Obese
## 5       Obese
## 6  OVERWEIGHT
## 7  OVERWEIGHT
## 8  OVERWEIGHT
## 9  OVERWEIGHT
## 10 OVERWEIGHT
#2.) Display bottom 10 records based on BMI-Value.
head(arrange(dfrPatient, BMIValue), 10)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/193    Ronnie    WHITE   MALE   TRUE      185.43       73.63
## 2  AC/SG/099    Leslie    ASIAN   MALE  FALSE      172.72       67.62
## 3  AC/AH/001 Demetrius    WHITE   MALE  FALSE      182.87       76.57
## 4  AC/AH/086      Kyle    BLACK   MALE   TRUE      180.11       75.72
## 5  AC/AH/045   Shirley    WHITE   MALE  FALSE      181.32       76.90
## 6  AC/AH/114      Kris HISPANIC   MALE  FALSE      177.75       74.84
## 7  AC/AH/077     Tommy    BLACK   MALE  FALSE      174.09       72.20
## 8  AC/AH/150     Brett    WHITE   MALE   TRUE      181.56       79.54
## 9  AC/AH/057    Vernon    WHITE FEMALE   TRUE      163.79       65.76
## 10 AC/AH/207    Bobbie    WHITE FEMALE  FALSE      163.01       65.19
##     BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1  05-06-1973         Iowa  DOG         BAD FALSE 25-09-2016 21.41385
## 2  04-02-1972         Ohio  CAT        GOOD FALSE 25-07-2016 22.66678
## 3  31-01-1972      Georgia  DOG      NORMAL FALSE 25-11-2015 22.89674
## 4  12-05-1973      Georgia  CAT         BAD FALSE 25-12-2015 23.34183
## 5  25-12-1971    Louisiana  DOG        GOOD FALSE 25-11-2015 23.39025
## 6  19-11-1972 Pennsylvania BIRD         BAD FALSE 25-01-2016 23.68725
## 7  01-02-1973   Washington  CAT         BAD FALSE 25-12-2015 23.82262
## 8  03-05-1972     Kentucky  DOG        GOOD  TRUE 25-02-2016 24.12933
## 9  06-01-1972     Illinois  CAT         BAD FALSE 25-12-2015 24.51247
## 10 17-05-1973      Florida  DOG      NORMAL FALSE 25-03-2016 24.53310
##    BMILabel
## 1    NORMAL
## 2    NORMAL
## 3    NORMAL
## 4    NORMAL
## 5    NORMAL
## 6    NORMAL
## 7    NORMAL
## 8    NORMAL
## 9    NORMAL
## 10   NORMAL
#3.) Gender > Race - frequency / counts
summarise(group_by(dfrPatient, Gender, Race), n())
## Source: local data frame [8 x 3]
## Groups: Gender [?]
## 
##   Gender     Race `n()`
##    <chr>    <chr> <int>
## 1 FEMALE    ASIAN     2
## 2 FEMALE    BLACK     1
## 3 FEMALE HISPANIC     4
## 4 FEMALE    WHITE    29
## 5   MALE    ASIAN     2
## 6   MALE    BLACK     2
## 7   MALE HISPANIC     5
## 8   MALE    WHITE    21
#4.) Race > Gender - max, min and average values for BMI-Values
summarise(group_by(dfrPatient, Race, Gender), min(BMIValue), mean(BMIValue), max(BMIValue))
## Source: local data frame [8 x 5]
## Groups: Race [?]
## 
##       Race Gender `min(BMIValue)` `mean(BMIValue)` `max(BMIValue)`
##      <chr>  <chr>           <dbl>            <dbl>           <dbl>
## 1    ASIAN FEMALE        25.57631         26.88531        28.19431
## 2    ASIAN   MALE        22.66678         24.95782        27.24885
## 3    BLACK FEMALE        26.71407         26.71407        26.71407
## 4    BLACK   MALE        23.34183         23.58223        23.82262
## 5 HISPANIC FEMALE        25.03916         26.29513        26.89942
## 6 HISPANIC   MALE        23.68725         26.39844        28.26769
## 7    WHITE FEMALE        24.51247         26.61097        28.24834
## 8    WHITE   MALE        21.41385         27.71432        31.70402
#5.) All Dead people
filter(dfrPatient, Died==TRUE)
##           ID        Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/049      Martin    WHITE FEMALE  FALSE      160.06       72.37
## 2  AC/AH/127        Jame    WHITE   MALE  FALSE      167.75       82.06
## 3  AC/AH/133       Clyde HISPANIC   MALE  FALSE      181.15       83.93
## 4  AC/AH/150       Brett    WHITE   MALE   TRUE      181.56       79.54
## 5  AC/AH/154        Tony    WHITE FEMALE  FALSE      160.03       64.30
## 6  AC/AH/156      George    WHITE   MALE  FALSE      165.62       76.72
## 7  AC/AH/160        Rory    ASIAN FEMALE  FALSE      159.67       71.88
## 8  AC/AH/171       Devin    WHITE FEMALE  FALSE      163.35       70.46
## 9  AC/AH/176       Jerry    ASIAN   MALE  FALSE      175.21       83.65
## 10 AC/AH/180        Drew    WHITE FEMALE  FALSE      160.80       64.77
## 11 AC/AH/186 Christopher    WHITE FEMALE  FALSE      157.95       67.41
## 12 AC/AH/211         Son    WHITE FEMALE  FALSE      157.16       69.64
## 13 AC/AH/219         Jay    WHITE FEMALE  FALSE      163.47       72.89
## 14 AC/AH/233      Marion    WHITE FEMALE  FALSE      163.97       66.71
## 15 AC/AH/248      Andrea    WHITE   MALE  FALSE      178.64       97.05
## 16 AC/AH/249       Jesus HISPANIC FEMALE   TRUE      159.78       68.31
## 17 AC/SG/008        Dana    WHITE   MALE   TRUE      169.66       77.30
## 18 AC/SG/010        Theo    ASIAN FEMALE  FALSE      159.32       64.92
## 19 AC/SG/015       Shaun    WHITE   MALE   TRUE      170.51       84.35
## 20 AC/SG/016      Jimmie    BLACK FEMALE  FALSE      161.84       69.97
## 21 AC/SG/046        Carl HISPANIC   MALE  FALSE      171.41       81.70
## 22 AC/SG/055        Evan    WHITE   MALE  FALSE      166.75       79.06
## 23 AC/SG/064         Jon    WHITE   MALE  FALSE      169.16       90.08
## 24 AC/SG/065      Shayne    WHITE FEMALE  FALSE      157.01       66.56
## 25 AC/SG/067      Thomas    WHITE   MALE  FALSE      167.51       84.15
## 26 AC/SG/068   Valentine HISPANIC FEMALE  FALSE      160.47       68.20
## 27 AC/SG/084       Brian HISPANIC   MALE  FALSE      174.25       80.93
## 28 AC/SG/101       Jason    WHITE FEMALE  FALSE      159.23       69.96
## 29 AC/SG/123     Darnell    WHITE FEMALE   TRUE      162.32       72.72
## 30 AC/SG/134       Daryl    WHITE FEMALE   TRUE      162.59       69.76
## 31 AC/SG/155     Raymond    WHITE FEMALE  FALSE      158.35       69.72
## 32 AC/SG/165       Elmer    WHITE FEMALE  FALSE      162.18       67.81
## 33 AC/SG/172     Whitney    WHITE   MALE  FALSE      171.45       84.29
## 34 AC/SG/179       Logan    WHITE   MALE  FALSE      183.10       82.47
## 35 AC/SG/181       Terry HISPANIC   MALE  FALSE      177.14       88.70
## 36 AC/SG/197       Stacy    WHITE FEMALE  FALSE      159.44       66.21
## 37 AC/SG/234        Luis HISPANIC FEMALE  FALSE      164.88       68.07
##     BirthDate          State   Pet HealthGrade Died RecordDate BMIValue
## 1  28-04-1972     California HORSE      NORMAL TRUE 25-12-2015 28.24834
## 2  29-10-1972          Texas   DOG        GOOD TRUE 25-01-2016 29.16127
## 3  13-10-1973     Washington   CAT         BAD TRUE 25-02-2016 25.57647
## 4  03-05-1972       Kentucky   DOG        GOOD TRUE 25-02-2016 24.12933
## 5  30-08-1973     California   DOG        GOOD TRUE 25-02-2016 25.10777
## 6  09-07-1972     California   DOG        GOOD TRUE 25-02-2016 27.96939
## 7  22-09-1973        Florida   CAT      NORMAL TRUE 25-02-2016 28.19431
## 8  16-04-1973     California  BIRD         BAD TRUE 25-03-2016 26.40611
## 9  01-05-1973       Virginia   DOG         BAD TRUE 25-03-2016 27.24885
## 10 18-02-1973         Oregon   CAT        GOOD TRUE 25-03-2016 25.04966
## 11 06-05-1972     New Jersey   DOG         BAD TRUE 25-03-2016 27.01998
## 12 14-07-1973     California   CAT      NORMAL TRUE 25-04-2016 28.19517
## 13 07-04-1972 North Carolina  BIRD        GOOD TRUE 25-04-2016 27.27670
## 14 23-12-1971           Ohio   CAT         BAD TRUE 25-04-2016 24.81202
## 15 12-01-1973        Indiana   CAT        GOOD TRUE 25-05-2016 30.41152
## 16 23-04-1972        Alabama   CAT      NORMAL TRUE 25-05-2016 26.75713
## 17 26-05-1973         Nevada   DOG        GOOD TRUE 25-05-2016 26.85472
## 18 29-01-1973       New York   CAT      NORMAL TRUE 25-06-2016 25.57631
## 19 09-11-1972     New Jersey   DOG         BAD TRUE 25-06-2016 29.01252
## 20 03-04-1972        Arizona   CAT         BAD TRUE 25-06-2016 26.71407
## 21 05-08-1973    Mississippi  BIRD      NORMAL TRUE 25-06-2016 27.80672
## 22 24-02-1972       Illinois  BIRD         BAD TRUE 25-07-2016 28.43316
## 23 04-10-1972       Illinois   CAT      NORMAL TRUE 25-07-2016 31.47988
## 24 05-04-1972     California   DOG         BAD TRUE 25-07-2016 26.99968
## 25 19-07-1972   Pennsylvania  BIRD      NORMAL TRUE 25-07-2016 29.98974
## 26 15-04-1972      Tennessee   CAT         BAD TRUE 25-07-2016 26.48480
## 27 06-03-1972       Virginia   DOG      NORMAL TRUE 25-07-2016 26.65410
## 28 28-09-1973       Michigan   DOG      NORMAL TRUE 25-07-2016 27.59307
## 29 03-09-1972 North Carolina  BIRD        GOOD TRUE 25-08-2016 27.60005
## 30 28-05-1972          Texas   CAT      NORMAL TRUE 25-08-2016 26.38875
## 31 02-06-1972     California   CAT         BAD TRUE 25-08-2016 27.80489
## 32 25-03-1972     Washington  BIRD        GOOD TRUE 25-08-2016 25.78096
## 33 25-02-1972        Florida   DOG      NORMAL TRUE 25-09-2016 28.67484
## 34 24-10-1972           Ohio   DOG         BAD TRUE 25-09-2016 24.59910
## 35 24-11-1971        Indiana   CAT         BAD TRUE 25-09-2016 28.26769
## 36 08-11-1972       New York   CAT        GOOD TRUE 25-10-2016 26.04528
## 37 10-11-1971   Pennsylvania   CAT         BAD TRUE 25-10-2016 25.03916
##      BMILabel
## 1  OVERWEIGHT
## 2  OVERWEIGHT
## 3  OVERWEIGHT
## 4      NORMAL
## 5  OVERWEIGHT
## 6  OVERWEIGHT
## 7  OVERWEIGHT
## 8  OVERWEIGHT
## 9  OVERWEIGHT
## 10 OVERWEIGHT
## 11 OVERWEIGHT
## 12 OVERWEIGHT
## 13 OVERWEIGHT
## 14     NORMAL
## 15      Obese
## 16 OVERWEIGHT
## 17 OVERWEIGHT
## 18 OVERWEIGHT
## 19 OVERWEIGHT
## 20 OVERWEIGHT
## 21 OVERWEIGHT
## 22 OVERWEIGHT
## 23      Obese
## 24 OVERWEIGHT
## 25 OVERWEIGHT
## 26 OVERWEIGHT
## 27 OVERWEIGHT
## 28 OVERWEIGHT
## 29 OVERWEIGHT
## 30 OVERWEIGHT
## 31 OVERWEIGHT
## 32 OVERWEIGHT
## 33 OVERWEIGHT
## 34     NORMAL
## 35 OVERWEIGHT
## 36 OVERWEIGHT
## 37 OVERWEIGHT
nrow(filter(dfrPatient, Died==TRUE))
## [1] 37
#6.) Hispanic Females
filter(dfrPatient, Race=="HISPANIC" & Gender=="FEMALE")
##          ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/249     Jesus HISPANIC FEMALE   TRUE      159.78       68.31
## 2 AC/SG/068 Valentine HISPANIC FEMALE  FALSE      160.47       68.20
## 3 AC/SG/122    Michal HISPANIC FEMALE  FALSE      160.09       68.94
## 4 AC/SG/234      Luis HISPANIC FEMALE  FALSE      164.88       68.07
##    BirthDate          State Pet HealthGrade  Died RecordDate BMIValue
## 1 23-04-1972        Alabama CAT      NORMAL  TRUE 25-05-2016 26.75713
## 2 15-04-1972      Tennessee CAT         BAD  TRUE 25-07-2016 26.48480
## 3 16-12-1971 South Carolina DOG        GOOD FALSE 25-08-2016 26.89942
## 4 10-11-1971   Pennsylvania CAT         BAD  TRUE 25-10-2016 25.03916
##     BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OVERWEIGHT
nrow(filter(dfrPatient, Race=="HISPANIC" & Gender=="FEMALE"))
## [1] 4
#7.) 7 sample records from the dataset using seed(707)
set.seed(707)
sample_n(dfrPatient, 10)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 14 AC/AH/053   Francis    WHITE FEMALE   TRUE      164.70       75.69
## 48 AC/AH/219       Jay    WHITE FEMALE  FALSE      163.47       72.89
## 32 AC/AH/156    George    WHITE   MALE  FALSE      165.62       76.72
## 55 AC/AH/248    Andrea    WHITE   MALE  FALSE      178.64       97.05
## 70 AC/SG/068 Valentine HISPANIC FEMALE  FALSE      160.47       68.20
## 65 AC/SG/055      Evan    WHITE   MALE  FALSE      166.75       79.06
## 80 AC/SG/122    Michal HISPANIC FEMALE  FALSE      160.09       68.94
## 9  AC/AH/045   Shirley    WHITE   MALE  FALSE      181.32       76.90
## 22 AC/AH/100    Michel    WHITE FEMALE  FALSE      161.92       69.92
## 59 AC/SG/008      Dana    WHITE   MALE   TRUE      169.66       77.30
##     BirthDate          State  Pet HealthGrade  Died RecordDate BMIValue
## 14 16-11-1971       Virginia  DOG        GOOD FALSE 25-12-2015 27.90303
## 48 07-04-1972 North Carolina BIRD        GOOD  TRUE 25-04-2016 27.27670
## 32 09-07-1972     California  DOG        GOOD  TRUE 25-02-2016 27.96939
## 55 12-01-1973        Indiana  CAT        GOOD  TRUE 25-05-2016 30.41152
## 70 15-04-1972      Tennessee  CAT         BAD  TRUE 25-07-2016 26.48480
## 65 24-02-1972       Illinois BIRD         BAD  TRUE 25-07-2016 28.43316
## 80 16-12-1971 South Carolina  DOG        GOOD FALSE 25-08-2016 26.89942
## 9  25-12-1971      Louisiana  DOG        GOOD FALSE 25-11-2015 23.39025
## 22 27-12-1972        Georgia  DOG        GOOD FALSE 25-12-2015 26.66861
## 59 26-05-1973         Nevada  DOG        GOOD  TRUE 25-05-2016 26.85472
##      BMILabel
## 14 OVERWEIGHT
## 48 OVERWEIGHT
## 32 OVERWEIGHT
## 55      Obese
## 70 OVERWEIGHT
## 65 OVERWEIGHT
## 80 OVERWEIGHT
## 9      NORMAL
## 22 OVERWEIGHT
## 59 OVERWEIGHT

—————————————————-

Analysis based on Reporting

—————————————————-

1- We find that all the top 10 patients with high BMI values are 
    a. All males belonging to race WHITE
    b. 80% of them dont smoke & 50% of them have died
    c. Therefore we can say that white people & mainly those who dont smoke    
    they have a high BMI value.
2- We find that all the top 10 patients with low BMI values are 
    a. 50% of them are white amd out of them all are mostly male
    b. Healthgrade is moderately good with only 1 person dead out of this sample of 10 people
    c. For this sample BMI Value turns out to be normal
       i.e their height and weight are normal wrt to each other.
3- The patients records are mostly of white people.
4- From the mean values we can say that
    a. In case of females belonging to different races there is not much of difference in the mean BMI value 
    the value is approximately same for all i.e 26.
    b. In case of males belonging to different races the mean BMI value varies such that the approx values
    For Black- 23, Asian- 24, Hispanic- 26, White- 27
    Therefore from the given report we can say that 
      i) the Black males have less weight than the other race or White males have more weight than the other race.
     ii) the Black males have more height than the other race or White males have less height than the other race.
5- The patients who have died are mostly white females(21/37) & 
   also mostly the dead people were non smokers but with overweight BMI Value.
6- Hispanic females 3/4 are dead, mostly they were non smokers but with overweight BMI Value.
1.)RMD tool
1. Understood the working of RMD tool using various syntax.

2.)Analysis
1. Most of the data provided was analyzed but still the Analysis could have been done better if the data supplied
had appropriate values i.e without missing values & also the outliers.
2. The data analysis has been done only for 67 records but the main file had 100 records hence the analysis will work 
with 67% records !