Analysis Of Patient Data

Objective

My Objective of doing this Exercise is to Learn RMD and this I have done by using the Patient data .The Patient Data has been converted into a RMD file and then data cleaning has been perfomed on the data in R and then the cleaned Patient Data has been Analysed

Probem Definition

1.To Identify how many people who smoke have their Weight in control
2.To Identify which Race does Majority of the Patient Population belongs

Code & Output

knitr Global Options

# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')

Load Libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Read Data

# inline comments
setwd("F:/R-BA/scripts")
dfrPatient <- read.csv("./data/patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015

Total Rows Of Patient File: 100

Add coloumn BMI-Value

# inline comments
dfrPatient <- mutate(dfrPatient, BMIValue=(WeightInKgs/(HeightInCms/100)^2))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015 27.90487

Add column BMI-Label

# inline comments
dfrPatient <- mutate(dfrPatient, BMILabel=NA)
dfrPatient$BMILabel <- ifelse(dfrPatient$BMIValue < 18.50,"UNDERWEIGHT",
                         ifelse(dfrPatient$BMIValue > 18.50 & dfrPatient$BMIValue < 25.00, "NORMAL",
                         ifelse(dfrPatient$BMIValue > 25.00 & dfrPatient$BMIValue < 30.00, "OVERWEIGHT",
                         ifelse(dfrPatient$BMIValue > 30.00,"Obese", NA))))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972  Georgia,xxx  Dog           2 False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog           2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None           2 False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat           1 False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL           2  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog           2 False 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT
df3 <-dfrPatient
df3$HealthGrade <-with(dfrPatient,ifelse(HealthGrade == 1,"Good Health",
                          ifelse(HealthGrade == 2,"Normal",
                          ifelse(HealthGrade == 3,"Bad Health",NA))))
                          summarise(group_by(df3, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1  Bad Health    34
## 2 Good Health    29
## 3      Normal    30
## 4        <NA>     7
                        head(df3)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  False      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  False      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  False      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  False      175.66       94.54
## 5 AC/AH/029    Lavern White Female  False      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   True      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1 31-01-1972  Georgia,xxx  Dog      Normal False 25-11-2015 22.89674
## 2 09-06-1972     Missouri  Dog      Normal False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None      Normal False 25-11-2015 26.38080
## 4 11-08-1972      Florida  Cat Good Health False 25-11-2015 30.63867
## 5 06-06-1973         Iowa NULL      Normal  True 25-11-2015 26.53567
## 6 25-06-1973     Maryland  Dog      Normal False 25-11-2015 27.90487
##     BMILabel
## 1     NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4      Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT

ERROR HANDLING *

summarise(group_by(df3, BMILabel), n())
## # A tibble: 3 × 2
##     BMILabel `n()`
##        <chr> <int>
## 1     NORMAL    23
## 2      Obese     6
## 3 OVERWEIGHT    71
summarise(group_by(df3, Gender), n())
## # A tibble: 6 × 2
##    Gender `n()`
##     <chr> <int>
## 1  Female     6
## 2    Male     3
## 3  Female    45
## 4 Female      4
## 5    Male    40
## 6   Male      2
summarise(group_by(df3, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
summarise(group_by(df3, Died), n())
## # A tibble: 2 × 2
##    Died `n()`
##   <chr> <int>
## 1 False    46
## 2  True    54
summarise(group_by(df3, Pet), n())
## # A tibble: 10 × 2
##      Pet `n()`
##    <chr> <int>
## 1   Bird     9
## 2    Cat    24
## 3    CAT     5
## 4    Dog    28
## 5    DOG     4
## 6  Horse     1
## 7   None    23
## 8   NONE     1
## 9   NULL     3
## 10  <NA>     2
summarise(group_by(df3, Smokes), n())
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  False    72
## 2     No     6
## 3   True    18
## 4    Yes     4
summarise(group_by(df3, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1  Bad Health    34
## 2 Good Health    29
## 3      Normal    30
## 4        <NA>     7
summarise(group_by(df3, State), n())
## # A tibble: 34 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     3
## 8  Georgia,xxx     1
## 9       Hawaii     2
## 10    Illinois     4
## # ... with 24 more rows

error handling in gender

summarise(group_by(df3, Gender), n())
## # A tibble: 6 × 2
##    Gender `n()`
##     <chr> <int>
## 1  Female     6
## 2    Male     3
## 3  Female    45
## 4 Female      4
## 5    Male    40
## 6   Male      2
df3$Gender <- trimws(toupper(dfrPatient$Gender))
summarise(group_by(df3, Gender), n())
## # A tibble: 2 × 2
##   Gender `n()`
##    <chr> <int>
## 1 FEMALE    55
## 2   MALE    45

error handling in race

summarise(group_by(df3, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
df3$Race <- trimws(toupper(df3$Race))
df3$Race[df3$Race=="DOG"] <- NA
df3$Race[df3$Race=="BI-RACIAL"] <- NA
summarise(group_by(df3, Race), n())
## # A tibble: 5 × 2
##       Race `n()`
##      <chr> <int>
## 1    ASIAN     5
## 2    BLACK     8
## 3 HISPANIC    17
## 4    WHITE    68
## 5     <NA>     2

error handling in died

summarise(group_by(df3, Died), n())
## # A tibble: 2 × 2
##    Died `n()`
##   <chr> <int>
## 1 False    46
## 2  True    54
class(df3$Died)
## [1] "character"
df3$Died <- as.logical(df3$Died)
class(df3$Died)
## [1] "logical"
summarise(group_by(df3, Died), n())
## # A tibble: 2 × 2
##    Died `n()`
##   <lgl> <int>
## 1 FALSE    46
## 2  TRUE    54

error handling in pet

summarise(group_by(df3, Pet), n())
## # A tibble: 10 × 2
##      Pet `n()`
##    <chr> <int>
## 1   Bird     9
## 2    Cat    24
## 3    CAT     5
## 4    Dog    28
## 5    DOG     4
## 6  Horse     1
## 7   None    23
## 8   NONE     1
## 9   NULL     3
## 10  <NA>     2
df3$Pet <- trimws(toupper(df3$Pet))
df3$Pet[df3$Pet=="NONE"] <- NA
df3$Pet[df3$Pet=="NULL"] <- NA
summarise(group_by(df3, Pet), n())
## # A tibble: 5 × 2
##     Pet `n()`
##   <chr> <int>
## 1  BIRD     9
## 2   CAT    29
## 3   DOG    32
## 4 HORSE     1
## 5  <NA>    29

error handling in smokes

summarise(group_by(df3, Smokes), n())
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  False    72
## 2     No     6
## 3   True    18
## 4    Yes     4
class(df3$Smokes)
## [1] "character"
df3$Smokes <- as.logical(df3$Smokes)
class(df3$Smokes)
## [1] "logical"
summarise(group_by(df3, Smokes), n())
## # A tibble: 3 × 2
##   Smokes `n()`
##    <lgl> <int>
## 1  FALSE    72
## 2   TRUE    18
## 3     NA    10

error handling in State

summarise(group_by(df3, State), n())
## # A tibble: 34 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     3
## 8  Georgia,xxx     1
## 9       Hawaii     2
## 10    Illinois     4
## # ... with 24 more rows
df3$State[df3$State=="Georgia,xxx"] <- "Georgia"
summarise(group_by(df3, State), n())
## # A tibble: 33 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     4
## 8       Hawaii     2
## 9     Illinois     4
## 10     Indiana     4
## # ... with 23 more rows

complete cases

vclComplete <- complete.cases(df3)
vclComplete[is.true(vclComplete)]
## Error in eval(expr, envir, enclos): could not find function "is.true"
df3 <- df3[vclComplete, ]
nrow(df3)
## [1] 60

Reporting

# Display top 10 records based on BMI-Value.
head(arrange(df3, desc(BMIValue)), 10)
##           ID     Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/009    Sammy    WHITE   MALE  FALSE      166.84       88.25
## 2  AC/SG/064      Jon    WHITE   MALE  FALSE      169.16       90.08
## 3  AC/AH/076   Albert    WHITE   MALE  FALSE      176.22       97.67
## 4  AC/AH/022     Lupe    WHITE   MALE  FALSE      175.66       94.54
## 5  AC/AH/248   Andrea    WHITE   MALE  FALSE      178.64       97.05
## 6  AC/SG/067   Thomas    WHITE   MALE  FALSE      167.51       84.15
## 7  AC/AH/052 Courtney    WHITE   MALE   TRUE      175.39       92.22
## 8  AC/AH/127     Jame    WHITE   MALE  FALSE      167.75       82.06
## 9  AC/SG/055     Evan    WHITE   MALE  FALSE      166.75       79.06
## 10 AC/SG/181    Terry HISPANIC   MALE  FALSE      177.14       88.70
##     BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1  04-03-1972      Vermont  DOG Good Health FALSE 25-06-2016 31.70402
## 2  04-10-1972     Illinois  CAT      Normal  TRUE 25-07-2016 31.47988
## 3  08-04-1973    Louisiana  CAT      Normal FALSE 25-12-2015 31.45218
## 4  11-08-1972      Florida  CAT Good Health FALSE 25-11-2015 30.63867
## 5  12-01-1973      Indiana  CAT Good Health  TRUE 25-05-2016 30.41152
## 6  19-07-1972 Pennsylvania BIRD      Normal  TRUE 25-07-2016 29.98974
## 7  16-03-1972      Indiana BIRD  Bad Health FALSE 25-12-2015 29.97888
## 8  29-10-1972        Texas  DOG Good Health  TRUE 25-01-2016 29.16127
## 9  24-02-1972     Illinois BIRD  Bad Health  TRUE 25-07-2016 28.43316
## 10 24-11-1971      Indiana  CAT  Bad Health  TRUE 25-09-2016 28.26769
##      BMILabel
## 1       Obese
## 2       Obese
## 3       Obese
## 4       Obese
## 5       Obese
## 6  OVERWEIGHT
## 7  OVERWEIGHT
## 8  OVERWEIGHT
## 9  OVERWEIGHT
## 10 OVERWEIGHT
# Display bottom 10 records based on BMI-Value.
head(arrange(df3, BMIValue), 10)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/193    Ronnie    WHITE   MALE   TRUE      185.43       73.63
## 2  AC/SG/099    Leslie    ASIAN   MALE  FALSE      172.72       67.62
## 3  AC/AH/001 Demetrius    WHITE   MALE  FALSE      182.87       76.57
## 4  AC/AH/086      Kyle    BLACK   MALE   TRUE      180.11       75.72
## 5  AC/AH/045   Shirley    WHITE   MALE  FALSE      181.32       76.90
## 6  AC/AH/114      Kris HISPANIC   MALE  FALSE      177.75       74.84
## 7  AC/AH/077     Tommy    BLACK   MALE  FALSE      174.09       72.20
## 8  AC/AH/150     Brett    WHITE   MALE   TRUE      181.56       79.54
## 9  AC/AH/057    Vernon    WHITE FEMALE   TRUE      163.79       65.76
## 10 AC/AH/207    Bobbie    WHITE FEMALE  FALSE      163.01       65.19
##     BirthDate        State  Pet HealthGrade  Died RecordDate BMIValue
## 1  05-06-1973         Iowa  DOG  Bad Health FALSE 25-09-2016 21.41385
## 2  04-02-1972         Ohio  CAT Good Health FALSE 25-07-2016 22.66678
## 3  31-01-1972      Georgia  DOG      Normal FALSE 25-11-2015 22.89674
## 4  12-05-1973      Georgia  CAT  Bad Health FALSE 25-12-2015 23.34183
## 5  25-12-1971    Louisiana  DOG Good Health FALSE 25-11-2015 23.39025
## 6  19-11-1972 Pennsylvania BIRD  Bad Health FALSE 25-01-2016 23.68725
## 7  01-02-1973   Washington  CAT  Bad Health FALSE 25-12-2015 23.82262
## 8  03-05-1972     Kentucky  DOG Good Health  TRUE 25-02-2016 24.12933
## 9  06-01-1972     Illinois  CAT  Bad Health FALSE 25-12-2015 24.51247
## 10 17-05-1973      Florida  DOG      Normal FALSE 25-03-2016 24.53310
##    BMILabel
## 1    NORMAL
## 2    NORMAL
## 3    NORMAL
## 4    NORMAL
## 5    NORMAL
## 6    NORMAL
## 7    NORMAL
## 8    NORMAL
## 9    NORMAL
## 10   NORMAL
# Gender > Race - frequency / counts
summarise(group_by(df3, Gender, Race), n())
## Source: local data frame [8 x 3]
## Groups: Gender [?]
## 
##   Gender     Race `n()`
##    <chr>    <chr> <int>
## 1 FEMALE    ASIAN     2
## 2 FEMALE    BLACK     1
## 3 FEMALE HISPANIC     4
## 4 FEMALE    WHITE    27
## 5   MALE    ASIAN     2
## 6   MALE    BLACK     2
## 7   MALE HISPANIC     5
## 8   MALE    WHITE    17
# Race > Gender - max, min and average values for BMI-Values
summarise(group_by(df3, Race, Gender), min(BMIValue), mean(BMIValue), max(BMIValue))
## Source: local data frame [8 x 5]
## Groups: Race [?]
## 
##       Race Gender `min(BMIValue)` `mean(BMIValue)` `max(BMIValue)`
##      <chr>  <chr>           <dbl>            <dbl>           <dbl>
## 1    ASIAN FEMALE        25.57631         26.88531        28.19431
## 2    ASIAN   MALE        22.66678         24.95782        27.24885
## 3    BLACK FEMALE        26.71407         26.71407        26.71407
## 4    BLACK   MALE        23.34183         23.58223        23.82262
## 5 HISPANIC FEMALE        25.03916         26.29513        26.89942
## 6 HISPANIC   MALE        23.68725         26.39844        28.26769
## 7    WHITE FEMALE        24.51247         26.60612        28.24834
## 8    WHITE   MALE        21.41385         27.53445        31.70402
# all dead people
filter(df3, Died==TRUE)
##           ID        Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/049      Martin    WHITE FEMALE  FALSE      160.06       72.37
## 2  AC/AH/127        Jame    WHITE   MALE  FALSE      167.75       82.06
## 3  AC/AH/133       Clyde HISPANIC   MALE  FALSE      181.15       83.93
## 4  AC/AH/150       Brett    WHITE   MALE   TRUE      181.56       79.54
## 5  AC/AH/154        Tony    WHITE FEMALE  FALSE      160.03       64.30
## 6  AC/AH/156      George    WHITE   MALE  FALSE      165.62       76.72
## 7  AC/AH/160        Rory    ASIAN FEMALE  FALSE      159.67       71.88
## 8  AC/AH/176       Jerry    ASIAN   MALE  FALSE      175.21       83.65
## 9  AC/AH/180        Drew    WHITE FEMALE  FALSE      160.80       64.77
## 10 AC/AH/186 Christopher    WHITE FEMALE  FALSE      157.95       67.41
## 11 AC/AH/211         Son    WHITE FEMALE  FALSE      157.16       69.64
## 12 AC/AH/219         Jay    WHITE FEMALE  FALSE      163.47       72.89
## 13 AC/AH/233      Marion    WHITE FEMALE  FALSE      163.97       66.71
## 14 AC/AH/248      Andrea    WHITE   MALE  FALSE      178.64       97.05
## 15 AC/AH/249       Jesus HISPANIC FEMALE   TRUE      159.78       68.31
## 16 AC/SG/010        Theo    ASIAN FEMALE  FALSE      159.32       64.92
## 17 AC/SG/016      Jimmie    BLACK FEMALE  FALSE      161.84       69.97
## 18 AC/SG/046        Carl HISPANIC   MALE  FALSE      171.41       81.70
## 19 AC/SG/055        Evan    WHITE   MALE  FALSE      166.75       79.06
## 20 AC/SG/064         Jon    WHITE   MALE  FALSE      169.16       90.08
## 21 AC/SG/065      Shayne    WHITE FEMALE  FALSE      157.01       66.56
## 22 AC/SG/067      Thomas    WHITE   MALE  FALSE      167.51       84.15
## 23 AC/SG/068   Valentine HISPANIC FEMALE  FALSE      160.47       68.20
## 24 AC/SG/084       Brian HISPANIC   MALE  FALSE      174.25       80.93
## 25 AC/SG/101       Jason    WHITE FEMALE  FALSE      159.23       69.96
## 26 AC/SG/123     Darnell    WHITE FEMALE   TRUE      162.32       72.72
## 27 AC/SG/134       Daryl    WHITE FEMALE   TRUE      162.59       69.76
## 28 AC/SG/155     Raymond    WHITE FEMALE  FALSE      158.35       69.72
## 29 AC/SG/165       Elmer    WHITE FEMALE  FALSE      162.18       67.81
## 30 AC/SG/179       Logan    WHITE   MALE  FALSE      183.10       82.47
## 31 AC/SG/181       Terry HISPANIC   MALE  FALSE      177.14       88.70
## 32 AC/SG/197       Stacy    WHITE FEMALE  FALSE      159.44       66.21
## 33 AC/SG/234        Luis HISPANIC FEMALE  FALSE      164.88       68.07
##     BirthDate          State   Pet HealthGrade Died RecordDate BMIValue
## 1  28-04-1972     California HORSE      Normal TRUE 25-12-2015 28.24834
## 2  29-10-1972          Texas   DOG Good Health TRUE 25-01-2016 29.16127
## 3  13-10-1973     Washington   CAT  Bad Health TRUE 25-02-2016 25.57647
## 4  03-05-1972       Kentucky   DOG Good Health TRUE 25-02-2016 24.12933
## 5  30-08-1973     California   DOG Good Health TRUE 25-02-2016 25.10777
## 6  09-07-1972     California   DOG Good Health TRUE 25-02-2016 27.96939
## 7  22-09-1973        Florida   CAT      Normal TRUE 25-02-2016 28.19431
## 8  01-05-1973       Virginia   DOG  Bad Health TRUE 25-03-2016 27.24885
## 9  18-02-1973         Oregon   CAT Good Health TRUE 25-03-2016 25.04966
## 10 06-05-1972     New Jersey   DOG  Bad Health TRUE 25-03-2016 27.01998
## 11 14-07-1973     California   CAT      Normal TRUE 25-04-2016 28.19517
## 12 07-04-1972 North Carolina  BIRD Good Health TRUE 25-04-2016 27.27670
## 13 23-12-1971           Ohio   CAT  Bad Health TRUE 25-04-2016 24.81202
## 14 12-01-1973        Indiana   CAT Good Health TRUE 25-05-2016 30.41152
## 15 23-04-1972        Alabama   CAT      Normal TRUE 25-05-2016 26.75713
## 16 29-01-1973       New York   CAT      Normal TRUE 25-06-2016 25.57631
## 17 03-04-1972        Arizona   CAT  Bad Health TRUE 25-06-2016 26.71407
## 18 05-08-1973    Mississippi  BIRD      Normal TRUE 25-06-2016 27.80672
## 19 24-02-1972       Illinois  BIRD  Bad Health TRUE 25-07-2016 28.43316
## 20 04-10-1972       Illinois   CAT      Normal TRUE 25-07-2016 31.47988
## 21 05-04-1972     California   DOG  Bad Health TRUE 25-07-2016 26.99968
## 22 19-07-1972   Pennsylvania  BIRD      Normal TRUE 25-07-2016 29.98974
## 23 15-04-1972      Tennessee   CAT  Bad Health TRUE 25-07-2016 26.48480
## 24 06-03-1972       Virginia   DOG      Normal TRUE 25-07-2016 26.65410
## 25 28-09-1973       Michigan   DOG      Normal TRUE 25-07-2016 27.59307
## 26 03-09-1972 North Carolina  BIRD Good Health TRUE 25-08-2016 27.60005
## 27 28-05-1972          Texas   CAT      Normal TRUE 25-08-2016 26.38875
## 28 02-06-1972     California   CAT  Bad Health TRUE 25-08-2016 27.80489
## 29 25-03-1972     Washington  BIRD Good Health TRUE 25-08-2016 25.78096
## 30 24-10-1972           Ohio   DOG  Bad Health TRUE 25-09-2016 24.59910
## 31 24-11-1971        Indiana   CAT  Bad Health TRUE 25-09-2016 28.26769
## 32 08-11-1972       New York   CAT Good Health TRUE 25-10-2016 26.04528
## 33 10-11-1971   Pennsylvania   CAT  Bad Health TRUE 25-10-2016 25.03916
##      BMILabel
## 1  OVERWEIGHT
## 2  OVERWEIGHT
## 3  OVERWEIGHT
## 4      NORMAL
## 5  OVERWEIGHT
## 6  OVERWEIGHT
## 7  OVERWEIGHT
## 8  OVERWEIGHT
## 9  OVERWEIGHT
## 10 OVERWEIGHT
## 11 OVERWEIGHT
## 12 OVERWEIGHT
## 13     NORMAL
## 14      Obese
## 15 OVERWEIGHT
## 16 OVERWEIGHT
## 17 OVERWEIGHT
## 18 OVERWEIGHT
## 19 OVERWEIGHT
## 20      Obese
## 21 OVERWEIGHT
## 22 OVERWEIGHT
## 23 OVERWEIGHT
## 24 OVERWEIGHT
## 25 OVERWEIGHT
## 26 OVERWEIGHT
## 27 OVERWEIGHT
## 28 OVERWEIGHT
## 29 OVERWEIGHT
## 30     NORMAL
## 31 OVERWEIGHT
## 32 OVERWEIGHT
## 33 OVERWEIGHT
nrow(filter(df3, Died==TRUE))
## [1] 33
# Hispanic Females
filter(df3, Race=="HISPANIC" & Gender=="FEMALE")
##          ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/249     Jesus HISPANIC FEMALE   TRUE      159.78       68.31
## 2 AC/SG/068 Valentine HISPANIC FEMALE  FALSE      160.47       68.20
## 3 AC/SG/122    Michal HISPANIC FEMALE  FALSE      160.09       68.94
## 4 AC/SG/234      Luis HISPANIC FEMALE  FALSE      164.88       68.07
##    BirthDate          State Pet HealthGrade  Died RecordDate BMIValue
## 1 23-04-1972        Alabama CAT      Normal  TRUE 25-05-2016 26.75713
## 2 15-04-1972      Tennessee CAT  Bad Health  TRUE 25-07-2016 26.48480
## 3 16-12-1971 South Carolina DOG Good Health FALSE 25-08-2016 26.89942
## 4 10-11-1971   Pennsylvania CAT  Bad Health  TRUE 25-10-2016 25.03916
##     BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OVERWEIGHT
nrow(filter(df3, Race=="HISPANIC" & Gender=="FEMALE"))
## [1] 4
# 7 sample records from the dataset using seed(707)
set.seed(707)
sample_n(df3, 10)
##           ID     Name     Race Gender Smokes HeightInCms WeightInKgs
## 13 AC/AH/052 Courtney    WHITE   MALE   TRUE      175.39       92.22
## 48 AC/AH/219      Jay    WHITE FEMALE  FALSE      163.47       72.89
## 30 AC/AH/150    Brett    WHITE   MALE   TRUE      181.56       79.54
## 55 AC/AH/248   Andrea    WHITE   MALE  FALSE      178.64       97.05
## 73 AC/SG/084    Brian HISPANIC   MALE  FALSE      174.25       80.93
## 67 AC/SG/064      Jon    WHITE   MALE  FALSE      169.16       90.08
## 80 AC/SG/122   Michal HISPANIC FEMALE  FALSE      160.09       68.94
## 9  AC/AH/045  Shirley    WHITE   MALE  FALSE      181.32       76.90
## 20 AC/AH/086     Kyle    BLACK   MALE   TRUE      180.11       75.72
## 57 AC/SG/002      Jan    WHITE FEMALE   TRUE      161.57       67.92
##     BirthDate          State  Pet HealthGrade  Died RecordDate BMIValue
## 13 16-03-1972        Indiana BIRD  Bad Health FALSE 25-12-2015 29.97888
## 48 07-04-1972 North Carolina BIRD Good Health  TRUE 25-04-2016 27.27670
## 30 03-05-1972       Kentucky  DOG Good Health  TRUE 25-02-2016 24.12933
## 55 12-01-1973        Indiana  CAT Good Health  TRUE 25-05-2016 30.41152
## 73 06-03-1972       Virginia  DOG      Normal  TRUE 25-07-2016 26.65410
## 67 04-10-1972       Illinois  CAT      Normal  TRUE 25-07-2016 31.47988
## 80 16-12-1971 South Carolina  DOG Good Health FALSE 25-08-2016 26.89942
## 9  25-12-1971      Louisiana  DOG Good Health FALSE 25-11-2015 23.39025
## 20 12-05-1973        Georgia  CAT  Bad Health FALSE 25-12-2015 23.34183
## 57 03-07-1973        Arizona  DOG  Bad Health FALSE 25-05-2016 26.01814
##      BMILabel
## 13 OVERWEIGHT
## 48 OVERWEIGHT
## 30     NORMAL
## 55      Obese
## 73 OVERWEIGHT
## 67      Obese
## 80 OVERWEIGHT
## 9      NORMAL
## 20     NORMAL
## 57 OVERWEIGHT

SUMMARY

The dataset contained of 45 Males and 55 Females out of which only 60 were analysed When we looked into the sample size where people smoked it was found that people who smoke are normally in the Health-Grade of 2 &3. Majority of the Patient Population belongs to the White Population .Hence if a Chemist or a Doctor wants to open his/her clinic it would be prefrable that they open it in a locality where Majority of the White Population resides

OBJECTIVE

Hence the Objective set at the start of the Assignment has been fullfilled that is learning RMD file and also Analyising Patient Data and drawing insights from the data