My Objective of doing this Exercise is to Learn RMD and this I have done by using the Patient data .The Patient Data has been converted into a RMD file and then data cleaning has been perfomed on the data in R and then the cleaned Patient Data has been Analysed
1.To Identify how many people who smoke have their Weight in control
2.To Identify which Race does Majority of the Patient Population belongs
knitr Global Options
# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')
Load Libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Read Data
# inline comments
setwd("F:/R-BA/scripts")
dfrPatient <- read.csv("./data/patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015
## 4 11-08-1972 Florida Cat 1 False 25-11-2015
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015
Total Rows Of Patient File: 100
Add coloumn BMI-Value
# inline comments
dfrPatient <- mutate(dfrPatient, BMIValue=(WeightInKgs/(HeightInCms/100)^2))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015 27.90487
Add column BMI-Label
# inline comments
dfrPatient <- mutate(dfrPatient, BMILabel=NA)
dfrPatient$BMILabel <- ifelse(dfrPatient$BMIValue < 18.50,"UNDERWEIGHT",
ifelse(dfrPatient$BMIValue > 18.50 & dfrPatient$BMIValue < 25.00, "NORMAL",
ifelse(dfrPatient$BMIValue > 25.00 & dfrPatient$BMIValue < 30.00, "OVERWEIGHT",
ifelse(dfrPatient$BMIValue > 30.00,"Obese", NA))))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT
df3 <-dfrPatient
df3$HealthGrade <-with(dfrPatient,ifelse(HealthGrade == 1,"Good Health",
ifelse(HealthGrade == 2,"Normal",
ifelse(HealthGrade == 3,"Bad Health",NA))))
summarise(group_by(df3, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <chr> <int>
## 1 Bad Health 34
## 2 Good Health 29
## 3 Normal 30
## 4 <NA> 7
head(df3)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 Georgia,xxx Dog Normal False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog Normal False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None Normal False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat Good Health False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL Normal True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog Normal False 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT
ERROR HANDLING *
summarise(group_by(df3, BMILabel), n())
## # A tibble: 3 × 2
## BMILabel `n()`
## <chr> <int>
## 1 NORMAL 23
## 2 Obese 6
## 3 OVERWEIGHT 71
summarise(group_by(df3, Gender), n())
## # A tibble: 6 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 6
## 2 Male 3
## 3 Female 45
## 4 Female 4
## 5 Male 40
## 6 Male 2
summarise(group_by(df3, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Dog 1
## 5 Hispanic 17
## 6 White 68
summarise(group_by(df3, Died), n())
## # A tibble: 2 × 2
## Died `n()`
## <chr> <int>
## 1 False 46
## 2 True 54
summarise(group_by(df3, Pet), n())
## # A tibble: 10 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 24
## 3 CAT 5
## 4 Dog 28
## 5 DOG 4
## 6 Horse 1
## 7 None 23
## 8 NONE 1
## 9 NULL 3
## 10 <NA> 2
summarise(group_by(df3, Smokes), n())
## # A tibble: 4 × 2
## Smokes `n()`
## <chr> <int>
## 1 False 72
## 2 No 6
## 3 True 18
## 4 Yes 4
summarise(group_by(df3, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <chr> <int>
## 1 Bad Health 34
## 2 Good Health 29
## 3 Normal 30
## 4 <NA> 7
summarise(group_by(df3, State), n())
## # A tibble: 34 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 3
## 8 Georgia,xxx 1
## 9 Hawaii 2
## 10 Illinois 4
## # ... with 24 more rows
summarise(group_by(df3, Gender), n())
## # A tibble: 6 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 6
## 2 Male 3
## 3 Female 45
## 4 Female 4
## 5 Male 40
## 6 Male 2
df3$Gender <- trimws(toupper(dfrPatient$Gender))
summarise(group_by(df3, Gender), n())
## # A tibble: 2 × 2
## Gender `n()`
## <chr> <int>
## 1 FEMALE 55
## 2 MALE 45
summarise(group_by(df3, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Dog 1
## 5 Hispanic 17
## 6 White 68
df3$Race <- trimws(toupper(df3$Race))
df3$Race[df3$Race=="DOG"] <- NA
df3$Race[df3$Race=="BI-RACIAL"] <- NA
summarise(group_by(df3, Race), n())
## # A tibble: 5 × 2
## Race `n()`
## <chr> <int>
## 1 ASIAN 5
## 2 BLACK 8
## 3 HISPANIC 17
## 4 WHITE 68
## 5 <NA> 2
summarise(group_by(df3, Died), n())
## # A tibble: 2 × 2
## Died `n()`
## <chr> <int>
## 1 False 46
## 2 True 54
class(df3$Died)
## [1] "character"
df3$Died <- as.logical(df3$Died)
class(df3$Died)
## [1] "logical"
summarise(group_by(df3, Died), n())
## # A tibble: 2 × 2
## Died `n()`
## <lgl> <int>
## 1 FALSE 46
## 2 TRUE 54
summarise(group_by(df3, Pet), n())
## # A tibble: 10 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 24
## 3 CAT 5
## 4 Dog 28
## 5 DOG 4
## 6 Horse 1
## 7 None 23
## 8 NONE 1
## 9 NULL 3
## 10 <NA> 2
df3$Pet <- trimws(toupper(df3$Pet))
df3$Pet[df3$Pet=="NONE"] <- NA
df3$Pet[df3$Pet=="NULL"] <- NA
summarise(group_by(df3, Pet), n())
## # A tibble: 5 × 2
## Pet `n()`
## <chr> <int>
## 1 BIRD 9
## 2 CAT 29
## 3 DOG 32
## 4 HORSE 1
## 5 <NA> 29
summarise(group_by(df3, Smokes), n())
## # A tibble: 4 × 2
## Smokes `n()`
## <chr> <int>
## 1 False 72
## 2 No 6
## 3 True 18
## 4 Yes 4
class(df3$Smokes)
## [1] "character"
df3$Smokes <- as.logical(df3$Smokes)
class(df3$Smokes)
## [1] "logical"
summarise(group_by(df3, Smokes), n())
## # A tibble: 3 × 2
## Smokes `n()`
## <lgl> <int>
## 1 FALSE 72
## 2 TRUE 18
## 3 NA 10
summarise(group_by(df3, State), n())
## # A tibble: 34 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 3
## 8 Georgia,xxx 1
## 9 Hawaii 2
## 10 Illinois 4
## # ... with 24 more rows
df3$State[df3$State=="Georgia,xxx"] <- "Georgia"
summarise(group_by(df3, State), n())
## # A tibble: 33 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 4
## 8 Hawaii 2
## 9 Illinois 4
## 10 Indiana 4
## # ... with 23 more rows
vclComplete <- complete.cases(df3)
vclComplete[is.true(vclComplete)]
## Error in eval(expr, envir, enclos): could not find function "is.true"
df3 <- df3[vclComplete, ]
nrow(df3)
## [1] 60
# Display top 10 records based on BMI-Value.
head(arrange(df3, desc(BMIValue)), 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/SG/009 Sammy WHITE MALE FALSE 166.84 88.25
## 2 AC/SG/064 Jon WHITE MALE FALSE 169.16 90.08
## 3 AC/AH/076 Albert WHITE MALE FALSE 176.22 97.67
## 4 AC/AH/022 Lupe WHITE MALE FALSE 175.66 94.54
## 5 AC/AH/248 Andrea WHITE MALE FALSE 178.64 97.05
## 6 AC/SG/067 Thomas WHITE MALE FALSE 167.51 84.15
## 7 AC/AH/052 Courtney WHITE MALE TRUE 175.39 92.22
## 8 AC/AH/127 Jame WHITE MALE FALSE 167.75 82.06
## 9 AC/SG/055 Evan WHITE MALE FALSE 166.75 79.06
## 10 AC/SG/181 Terry HISPANIC MALE FALSE 177.14 88.70
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 04-03-1972 Vermont DOG Good Health FALSE 25-06-2016 31.70402
## 2 04-10-1972 Illinois CAT Normal TRUE 25-07-2016 31.47988
## 3 08-04-1973 Louisiana CAT Normal FALSE 25-12-2015 31.45218
## 4 11-08-1972 Florida CAT Good Health FALSE 25-11-2015 30.63867
## 5 12-01-1973 Indiana CAT Good Health TRUE 25-05-2016 30.41152
## 6 19-07-1972 Pennsylvania BIRD Normal TRUE 25-07-2016 29.98974
## 7 16-03-1972 Indiana BIRD Bad Health FALSE 25-12-2015 29.97888
## 8 29-10-1972 Texas DOG Good Health TRUE 25-01-2016 29.16127
## 9 24-02-1972 Illinois BIRD Bad Health TRUE 25-07-2016 28.43316
## 10 24-11-1971 Indiana CAT Bad Health TRUE 25-09-2016 28.26769
## BMILabel
## 1 Obese
## 2 Obese
## 3 Obese
## 4 Obese
## 5 Obese
## 6 OVERWEIGHT
## 7 OVERWEIGHT
## 8 OVERWEIGHT
## 9 OVERWEIGHT
## 10 OVERWEIGHT
# Display bottom 10 records based on BMI-Value.
head(arrange(df3, BMIValue), 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/SG/193 Ronnie WHITE MALE TRUE 185.43 73.63
## 2 AC/SG/099 Leslie ASIAN MALE FALSE 172.72 67.62
## 3 AC/AH/001 Demetrius WHITE MALE FALSE 182.87 76.57
## 4 AC/AH/086 Kyle BLACK MALE TRUE 180.11 75.72
## 5 AC/AH/045 Shirley WHITE MALE FALSE 181.32 76.90
## 6 AC/AH/114 Kris HISPANIC MALE FALSE 177.75 74.84
## 7 AC/AH/077 Tommy BLACK MALE FALSE 174.09 72.20
## 8 AC/AH/150 Brett WHITE MALE TRUE 181.56 79.54
## 9 AC/AH/057 Vernon WHITE FEMALE TRUE 163.79 65.76
## 10 AC/AH/207 Bobbie WHITE FEMALE FALSE 163.01 65.19
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 05-06-1973 Iowa DOG Bad Health FALSE 25-09-2016 21.41385
## 2 04-02-1972 Ohio CAT Good Health FALSE 25-07-2016 22.66678
## 3 31-01-1972 Georgia DOG Normal FALSE 25-11-2015 22.89674
## 4 12-05-1973 Georgia CAT Bad Health FALSE 25-12-2015 23.34183
## 5 25-12-1971 Louisiana DOG Good Health FALSE 25-11-2015 23.39025
## 6 19-11-1972 Pennsylvania BIRD Bad Health FALSE 25-01-2016 23.68725
## 7 01-02-1973 Washington CAT Bad Health FALSE 25-12-2015 23.82262
## 8 03-05-1972 Kentucky DOG Good Health TRUE 25-02-2016 24.12933
## 9 06-01-1972 Illinois CAT Bad Health FALSE 25-12-2015 24.51247
## 10 17-05-1973 Florida DOG Normal FALSE 25-03-2016 24.53310
## BMILabel
## 1 NORMAL
## 2 NORMAL
## 3 NORMAL
## 4 NORMAL
## 5 NORMAL
## 6 NORMAL
## 7 NORMAL
## 8 NORMAL
## 9 NORMAL
## 10 NORMAL
# Gender > Race - frequency / counts
summarise(group_by(df3, Gender, Race), n())
## Source: local data frame [8 x 3]
## Groups: Gender [?]
##
## Gender Race `n()`
## <chr> <chr> <int>
## 1 FEMALE ASIAN 2
## 2 FEMALE BLACK 1
## 3 FEMALE HISPANIC 4
## 4 FEMALE WHITE 27
## 5 MALE ASIAN 2
## 6 MALE BLACK 2
## 7 MALE HISPANIC 5
## 8 MALE WHITE 17
# Race > Gender - max, min and average values for BMI-Values
summarise(group_by(df3, Race, Gender), min(BMIValue), mean(BMIValue), max(BMIValue))
## Source: local data frame [8 x 5]
## Groups: Race [?]
##
## Race Gender `min(BMIValue)` `mean(BMIValue)` `max(BMIValue)`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 ASIAN FEMALE 25.57631 26.88531 28.19431
## 2 ASIAN MALE 22.66678 24.95782 27.24885
## 3 BLACK FEMALE 26.71407 26.71407 26.71407
## 4 BLACK MALE 23.34183 23.58223 23.82262
## 5 HISPANIC FEMALE 25.03916 26.29513 26.89942
## 6 HISPANIC MALE 23.68725 26.39844 28.26769
## 7 WHITE FEMALE 24.51247 26.60612 28.24834
## 8 WHITE MALE 21.41385 27.53445 31.70402
# all dead people
filter(df3, Died==TRUE)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/049 Martin WHITE FEMALE FALSE 160.06 72.37
## 2 AC/AH/127 Jame WHITE MALE FALSE 167.75 82.06
## 3 AC/AH/133 Clyde HISPANIC MALE FALSE 181.15 83.93
## 4 AC/AH/150 Brett WHITE MALE TRUE 181.56 79.54
## 5 AC/AH/154 Tony WHITE FEMALE FALSE 160.03 64.30
## 6 AC/AH/156 George WHITE MALE FALSE 165.62 76.72
## 7 AC/AH/160 Rory ASIAN FEMALE FALSE 159.67 71.88
## 8 AC/AH/176 Jerry ASIAN MALE FALSE 175.21 83.65
## 9 AC/AH/180 Drew WHITE FEMALE FALSE 160.80 64.77
## 10 AC/AH/186 Christopher WHITE FEMALE FALSE 157.95 67.41
## 11 AC/AH/211 Son WHITE FEMALE FALSE 157.16 69.64
## 12 AC/AH/219 Jay WHITE FEMALE FALSE 163.47 72.89
## 13 AC/AH/233 Marion WHITE FEMALE FALSE 163.97 66.71
## 14 AC/AH/248 Andrea WHITE MALE FALSE 178.64 97.05
## 15 AC/AH/249 Jesus HISPANIC FEMALE TRUE 159.78 68.31
## 16 AC/SG/010 Theo ASIAN FEMALE FALSE 159.32 64.92
## 17 AC/SG/016 Jimmie BLACK FEMALE FALSE 161.84 69.97
## 18 AC/SG/046 Carl HISPANIC MALE FALSE 171.41 81.70
## 19 AC/SG/055 Evan WHITE MALE FALSE 166.75 79.06
## 20 AC/SG/064 Jon WHITE MALE FALSE 169.16 90.08
## 21 AC/SG/065 Shayne WHITE FEMALE FALSE 157.01 66.56
## 22 AC/SG/067 Thomas WHITE MALE FALSE 167.51 84.15
## 23 AC/SG/068 Valentine HISPANIC FEMALE FALSE 160.47 68.20
## 24 AC/SG/084 Brian HISPANIC MALE FALSE 174.25 80.93
## 25 AC/SG/101 Jason WHITE FEMALE FALSE 159.23 69.96
## 26 AC/SG/123 Darnell WHITE FEMALE TRUE 162.32 72.72
## 27 AC/SG/134 Daryl WHITE FEMALE TRUE 162.59 69.76
## 28 AC/SG/155 Raymond WHITE FEMALE FALSE 158.35 69.72
## 29 AC/SG/165 Elmer WHITE FEMALE FALSE 162.18 67.81
## 30 AC/SG/179 Logan WHITE MALE FALSE 183.10 82.47
## 31 AC/SG/181 Terry HISPANIC MALE FALSE 177.14 88.70
## 32 AC/SG/197 Stacy WHITE FEMALE FALSE 159.44 66.21
## 33 AC/SG/234 Luis HISPANIC FEMALE FALSE 164.88 68.07
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 28-04-1972 California HORSE Normal TRUE 25-12-2015 28.24834
## 2 29-10-1972 Texas DOG Good Health TRUE 25-01-2016 29.16127
## 3 13-10-1973 Washington CAT Bad Health TRUE 25-02-2016 25.57647
## 4 03-05-1972 Kentucky DOG Good Health TRUE 25-02-2016 24.12933
## 5 30-08-1973 California DOG Good Health TRUE 25-02-2016 25.10777
## 6 09-07-1972 California DOG Good Health TRUE 25-02-2016 27.96939
## 7 22-09-1973 Florida CAT Normal TRUE 25-02-2016 28.19431
## 8 01-05-1973 Virginia DOG Bad Health TRUE 25-03-2016 27.24885
## 9 18-02-1973 Oregon CAT Good Health TRUE 25-03-2016 25.04966
## 10 06-05-1972 New Jersey DOG Bad Health TRUE 25-03-2016 27.01998
## 11 14-07-1973 California CAT Normal TRUE 25-04-2016 28.19517
## 12 07-04-1972 North Carolina BIRD Good Health TRUE 25-04-2016 27.27670
## 13 23-12-1971 Ohio CAT Bad Health TRUE 25-04-2016 24.81202
## 14 12-01-1973 Indiana CAT Good Health TRUE 25-05-2016 30.41152
## 15 23-04-1972 Alabama CAT Normal TRUE 25-05-2016 26.75713
## 16 29-01-1973 New York CAT Normal TRUE 25-06-2016 25.57631
## 17 03-04-1972 Arizona CAT Bad Health TRUE 25-06-2016 26.71407
## 18 05-08-1973 Mississippi BIRD Normal TRUE 25-06-2016 27.80672
## 19 24-02-1972 Illinois BIRD Bad Health TRUE 25-07-2016 28.43316
## 20 04-10-1972 Illinois CAT Normal TRUE 25-07-2016 31.47988
## 21 05-04-1972 California DOG Bad Health TRUE 25-07-2016 26.99968
## 22 19-07-1972 Pennsylvania BIRD Normal TRUE 25-07-2016 29.98974
## 23 15-04-1972 Tennessee CAT Bad Health TRUE 25-07-2016 26.48480
## 24 06-03-1972 Virginia DOG Normal TRUE 25-07-2016 26.65410
## 25 28-09-1973 Michigan DOG Normal TRUE 25-07-2016 27.59307
## 26 03-09-1972 North Carolina BIRD Good Health TRUE 25-08-2016 27.60005
## 27 28-05-1972 Texas CAT Normal TRUE 25-08-2016 26.38875
## 28 02-06-1972 California CAT Bad Health TRUE 25-08-2016 27.80489
## 29 25-03-1972 Washington BIRD Good Health TRUE 25-08-2016 25.78096
## 30 24-10-1972 Ohio DOG Bad Health TRUE 25-09-2016 24.59910
## 31 24-11-1971 Indiana CAT Bad Health TRUE 25-09-2016 28.26769
## 32 08-11-1972 New York CAT Good Health TRUE 25-10-2016 26.04528
## 33 10-11-1971 Pennsylvania CAT Bad Health TRUE 25-10-2016 25.03916
## BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 NORMAL
## 5 OVERWEIGHT
## 6 OVERWEIGHT
## 7 OVERWEIGHT
## 8 OVERWEIGHT
## 9 OVERWEIGHT
## 10 OVERWEIGHT
## 11 OVERWEIGHT
## 12 OVERWEIGHT
## 13 NORMAL
## 14 Obese
## 15 OVERWEIGHT
## 16 OVERWEIGHT
## 17 OVERWEIGHT
## 18 OVERWEIGHT
## 19 OVERWEIGHT
## 20 Obese
## 21 OVERWEIGHT
## 22 OVERWEIGHT
## 23 OVERWEIGHT
## 24 OVERWEIGHT
## 25 OVERWEIGHT
## 26 OVERWEIGHT
## 27 OVERWEIGHT
## 28 OVERWEIGHT
## 29 OVERWEIGHT
## 30 NORMAL
## 31 OVERWEIGHT
## 32 OVERWEIGHT
## 33 OVERWEIGHT
nrow(filter(df3, Died==TRUE))
## [1] 33
# Hispanic Females
filter(df3, Race=="HISPANIC" & Gender=="FEMALE")
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/249 Jesus HISPANIC FEMALE TRUE 159.78 68.31
## 2 AC/SG/068 Valentine HISPANIC FEMALE FALSE 160.47 68.20
## 3 AC/SG/122 Michal HISPANIC FEMALE FALSE 160.09 68.94
## 4 AC/SG/234 Luis HISPANIC FEMALE FALSE 164.88 68.07
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 23-04-1972 Alabama CAT Normal TRUE 25-05-2016 26.75713
## 2 15-04-1972 Tennessee CAT Bad Health TRUE 25-07-2016 26.48480
## 3 16-12-1971 South Carolina DOG Good Health FALSE 25-08-2016 26.89942
## 4 10-11-1971 Pennsylvania CAT Bad Health TRUE 25-10-2016 25.03916
## BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OVERWEIGHT
nrow(filter(df3, Race=="HISPANIC" & Gender=="FEMALE"))
## [1] 4
# 7 sample records from the dataset using seed(707)
set.seed(707)
sample_n(df3, 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 13 AC/AH/052 Courtney WHITE MALE TRUE 175.39 92.22
## 48 AC/AH/219 Jay WHITE FEMALE FALSE 163.47 72.89
## 30 AC/AH/150 Brett WHITE MALE TRUE 181.56 79.54
## 55 AC/AH/248 Andrea WHITE MALE FALSE 178.64 97.05
## 73 AC/SG/084 Brian HISPANIC MALE FALSE 174.25 80.93
## 67 AC/SG/064 Jon WHITE MALE FALSE 169.16 90.08
## 80 AC/SG/122 Michal HISPANIC FEMALE FALSE 160.09 68.94
## 9 AC/AH/045 Shirley WHITE MALE FALSE 181.32 76.90
## 20 AC/AH/086 Kyle BLACK MALE TRUE 180.11 75.72
## 57 AC/SG/002 Jan WHITE FEMALE TRUE 161.57 67.92
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 13 16-03-1972 Indiana BIRD Bad Health FALSE 25-12-2015 29.97888
## 48 07-04-1972 North Carolina BIRD Good Health TRUE 25-04-2016 27.27670
## 30 03-05-1972 Kentucky DOG Good Health TRUE 25-02-2016 24.12933
## 55 12-01-1973 Indiana CAT Good Health TRUE 25-05-2016 30.41152
## 73 06-03-1972 Virginia DOG Normal TRUE 25-07-2016 26.65410
## 67 04-10-1972 Illinois CAT Normal TRUE 25-07-2016 31.47988
## 80 16-12-1971 South Carolina DOG Good Health FALSE 25-08-2016 26.89942
## 9 25-12-1971 Louisiana DOG Good Health FALSE 25-11-2015 23.39025
## 20 12-05-1973 Georgia CAT Bad Health FALSE 25-12-2015 23.34183
## 57 03-07-1973 Arizona DOG Bad Health FALSE 25-05-2016 26.01814
## BMILabel
## 13 OVERWEIGHT
## 48 OVERWEIGHT
## 30 NORMAL
## 55 Obese
## 73 OVERWEIGHT
## 67 Obese
## 80 OVERWEIGHT
## 9 NORMAL
## 20 NORMAL
## 57 OVERWEIGHT
The dataset contained of 45 Males and 55 Females out of which only 60 were analysed When we looked into the sample size where people smoked it was found that people who smoke are normally in the Health-Grade of 2 &3. Majority of the Patient Population belongs to the White Population .Hence if a Chemist or a Doctor wants to open his/her clinic it would be prefrable that they open it in a locality where Majority of the White Population resides
Hence the Objective set at the start of the Assignment has been fullfilled that is learning RMD file and also Analyising Patient Data and drawing insights from the data