To understand the working of RMD tool using various syntax Conveting a RMD file into a Markdown using knit pacakage and finally to HTML code
Analyze the patient data with the help of deployr Package
Data Manipulation
1. Understand and learn how to use dplyr properly
knitr Global Options
# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')
Working Directory
setwd("F:/Management/Trimester 3/8. R/Class/R-BA/R_scripts")
Load Libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Read Data
# Reading the Patient-Data .csv file
dfrPatient <- read.csv("./data/patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male FALSE 182.87 76.57
## 2 AC/AH/017 Rosario White Male FALSE 179.12 80.43
## 3 AC/AH/020 Julio Black Male FALSE 169.15 75.48
## 4 AC/AH/022 Lupe White Male FALSE 175.66 94.54
## 5 AC/AH/029 Lavern White Female FALSE 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female TRUE 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate
## 1 31-01-1972 Georgia,xxx Dog 2 FALSE 25-11-2015
## 2 09-06-1972 Missouri Dog 2 FALSE 25-11-2015
## 3 03-07-1972 Pennsylvania None 2 FALSE 25-11-2015
## 4 11-08-1972 Florida Cat 1 FALSE 25-11-2015
## 5 06-06-1973 Iowa NULL 2 TRUE 25-11-2015
## 6 25-06-1973 Maryland Dog 2 FALSE 25-11-2015
Total Rows Of Patient File: 100
Add coloumn BMI-Value
# Provide New Column BMI-Value (BodyMassIndex)
dfrPatient <- mutate(dfrPatient, BMIValue=(WeightInKgs/(HeightInCms/100)^2))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male FALSE 182.87 76.57
## 2 AC/AH/017 Rosario White Male FALSE 179.12 80.43
## 3 AC/AH/020 Julio Black Male FALSE 169.15 75.48
## 4 AC/AH/022 Lupe White Male FALSE 175.66 94.54
## 5 AC/AH/029 Lavern White Female FALSE 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female TRUE 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 Georgia,xxx Dog 2 FALSE 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 FALSE 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 FALSE 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 FALSE 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 TRUE 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 FALSE 25-11-2015 27.90487
Add column BMI-Label
# Provide New Column BMI-Label based on BMI-Value
dfrPatient <- mutate(dfrPatient, BMILabel=NA)
dfrPatient$BMILabel <- ifelse(dfrPatient$BMIValue < 18.50,"UNDERWEIGHT",
ifelse(dfrPatient$BMIValue > 18.50 & dfrPatient$BMIValue < 25.00, "NORMAL",
ifelse(dfrPatient$BMIValue > 25.00 & dfrPatient$BMIValue < 30.00, "OVERWEIGHT",
ifelse(dfrPatient$BMIValue > 30.00,"Obese", NA))))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male FALSE 182.87 76.57
## 2 AC/AH/017 Rosario White Male FALSE 179.12 80.43
## 3 AC/AH/020 Julio Black Male FALSE 169.15 75.48
## 4 AC/AH/022 Lupe White Male FALSE 175.66 94.54
## 5 AC/AH/029 Lavern White Female FALSE 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female TRUE 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 31-01-1972 Georgia,xxx Dog 2 FALSE 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 FALSE 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 FALSE 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 FALSE 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 TRUE 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 FALSE 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT
Using summarise function to check for any error values or missing data in columns
# Error Handling
#1. BMILabel
summarise(group_by(dfrPatient, BMILabel), n())
## # A tibble: 3 × 2
## BMILabel `n()`
## <chr> <int>
## 1 NORMAL 23
## 2 Obese 6
## 3 OVERWEIGHT 71
#2. Gender
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 6 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 6
## 2 Male 3
## 3 Female 45
## 4 Female 4
## 5 Male 40
## 6 Male 2
#3. Race
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Dog 1
## 5 Hispanic 17
## 6 White 68
#4. Died
summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
## Died `n()`
## <lgl> <int>
## 1 FALSE 46
## 2 TRUE 54
#5. Pet
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 24
## 3 CAT 5
## 4 Dog 28
## 5 DOG 4
## 6 Horse 1
## 7 None 23
## 8 NONE 1
## 9 NULL 3
## 10 <NA> 2
#6. Smokes
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
## Smokes `n()`
## <chr> <int>
## 1 FALSE 72
## 2 No 6
## 3 TRUE 18
## 4 Yes 4
#7. HealthGrade
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <int> <int>
## 1 1 29
## 2 2 30
## 3 3 34
## 4 99 7
#8. State
summarise(group_by(dfrPatient, State), n())
## # A tibble: 34 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 3
## 8 Georgia,xxx 1
## 9 Hawaii 2
## 10 Illinois 4
## # ... with 24 more rows
Error handling in gender
#---
# There seems to be leading or trailing spaces, hence removing them
# Using functions TRIM, UPPER/LOWER, to ensure the correctness of data
#---
# Before modification
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 6 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 6
## 2 Male 3
## 3 Female 45
## 4 Female 4
## 5 Male 40
## 6 Male 2
dfrPatient$Gender <- trimws(toupper(dfrPatient$Gender))
# After modification
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
## Gender `n()`
## <chr> <int>
## 1 FEMALE 55
## 2 MALE 45
Error handling in race
#---
# Removing the irrelevant data to ensure the correctness of data
#---
# Before modification
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Dog 1
## 5 Hispanic 17
## 6 White 68
dfrPatient$Race <- trimws(toupper(dfrPatient$Race))
dfrPatient$Race[dfrPatient$Race=="DOG"] <- NA
dfrPatient$Race[dfrPatient$Race=="BI-RACIAL"] <- NA
# After modification
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 5 × 2
## Race `n()`
## <chr> <int>
## 1 ASIAN 5
## 2 BLACK 8
## 3 HISPANIC 17
## 4 WHITE 68
## 5 <NA> 2
Error handling in pet
#---
# Maintaining uniformity & converting None, NULL to NA
#---
# Before modification
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 24
## 3 CAT 5
## 4 Dog 28
## 5 DOG 4
## 6 Horse 1
## 7 None 23
## 8 NONE 1
## 9 NULL 3
## 10 <NA> 2
dfrPatient$Pet <- trimws(toupper(dfrPatient$Pet))
dfrPatient$Pet[dfrPatient$Pet=="NONE"] <- NA
dfrPatient$Pet[dfrPatient$Pet=="NULL"] <- NA
# After modification
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 5 × 2
## Pet `n()`
## <chr> <int>
## 1 BIRD 9
## 2 CAT 29
## 3 DOG 32
## 4 HORSE 1
## 5 <NA> 29
Error handling in smokes
#---
# Maintaining uniformity, converting character to LOgical
#---
# Before modification
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
## Smokes `n()`
## <chr> <int>
## 1 FALSE 72
## 2 No 6
## 3 TRUE 18
## 4 Yes 4
class(dfrPatient$Smokes)
## [1] "character"
dfrPatient$Smokes[dfrPatient$Smokes=="Yes"]<-"TRUE"
dfrPatient$Smokes[dfrPatient$Smokes=="No"] <-"FALSE"
dfrPatient$Smokes <- as.logical(dfrPatient$Smokes)
# After modification
class(dfrPatient$Smokes)
## [1] "logical"
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 2 × 2
## Smokes `n()`
## <lgl> <int>
## 1 FALSE 78
## 2 TRUE 22
Error handling in HealthGrade
#---
# Mapping the character values to the column
#---
# Before modification
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <int> <int>
## 1 1 29
## 2 2 30
## 3 3 34
## 4 99 7
class(dfrPatient$HealthGrade)
## [1] "integer"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==1] <- "GOOD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==2] <- "NORMAL"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==3] <- "BAD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==99] <- NA
# After modification
class(dfrPatient$HealthGrade)
## [1] "character"
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <chr> <int>
## 1 BAD 34
## 2 GOOD 29
## 3 NORMAL 30
## 4 <NA> 7
Error handling in State
#---
# Correcting the state value
#---
# Before modification
summarise(group_by(dfrPatient, State), n())
## # A tibble: 34 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 3
## 8 Georgia,xxx 1
## 9 Hawaii 2
## 10 Illinois 4
## # ... with 24 more rows
dfrPatient$State[dfrPatient$State=="Georgia,xxx"] <- "Georgia"
# After modification
summarise(group_by(dfrPatient, State), n())
## # A tibble: 33 × 2
## State `n()`
## <chr> <int>
## 1 Alabama 2
## 2 Arizona 2
## 3 California 13
## 4 Colorado 1
## 5 Connecticut 1
## 6 Florida 8
## 7 Georgia 4
## 8 Hawaii 2
## 9 Illinois 4
## 10 Indiana 4
## # ... with 23 more rows
print(dfrPatient$State[dfrPatient$State=="Georgia,xxx"])
## character(0)
Handling NA values
#---
# Removing all records with NA in any columns using complete.cases
#---
vclComplete <- complete.cases(dfrPatient)
vclComplete[is.true(vclComplete)]
## Error in eval(expr, envir, enclos): could not find function "is.true"
dfrPatient <- dfrPatient[vclComplete, ]
nrow(dfrPatient)
## [1] 66
#1.) Display top 10 records based on BMI-Value**
head(arrange(dfrPatient, desc(BMIValue)), 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/SG/009 Sammy WHITE MALE FALSE 166.84 88.25
## 2 AC/SG/064 Jon WHITE MALE FALSE 169.16 90.08
## 3 AC/AH/076 Albert WHITE MALE FALSE 176.22 97.67
## 4 AC/AH/022 Lupe WHITE MALE FALSE 175.66 94.54
## 5 AC/AH/248 Andrea WHITE MALE FALSE 178.64 97.05
## 6 AC/SG/067 Thomas WHITE MALE FALSE 167.51 84.15
## 7 AC/AH/052 Courtney WHITE MALE TRUE 175.39 92.22
## 8 AC/AH/159 Edward WHITE MALE FALSE 181.64 96.91
## 9 AC/AH/127 Jame WHITE MALE FALSE 167.75 82.06
## 10 AC/SG/015 Shaun WHITE MALE TRUE 170.51 84.35
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 04-03-1972 Vermont DOG GOOD FALSE 25-06-2016 31.70402
## 2 04-10-1972 Illinois CAT NORMAL TRUE 25-07-2016 31.47988
## 3 08-04-1973 Louisiana CAT NORMAL FALSE 25-12-2015 31.45218
## 4 11-08-1972 Florida CAT GOOD FALSE 25-11-2015 30.63867
## 5 12-01-1973 Indiana CAT GOOD TRUE 25-05-2016 30.41152
## 6 19-07-1972 Pennsylvania BIRD NORMAL TRUE 25-07-2016 29.98974
## 7 16-03-1972 Indiana BIRD BAD FALSE 25-12-2015 29.97888
## 8 04-12-1972 Connecticut CAT NORMAL FALSE 25-02-2016 29.37282
## 9 29-10-1972 Texas DOG GOOD TRUE 25-01-2016 29.16127
## 10 09-11-1972 New Jersey DOG BAD TRUE 25-06-2016 29.01252
## BMILabel
## 1 Obese
## 2 Obese
## 3 Obese
## 4 Obese
## 5 Obese
## 6 OVERWEIGHT
## 7 OVERWEIGHT
## 8 OVERWEIGHT
## 9 OVERWEIGHT
## 10 OVERWEIGHT
#2.) Display bottom 10 records based on BMI-Value.
head(arrange(dfrPatient, BMIValue), 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/SG/193 Ronnie WHITE MALE TRUE 185.43 73.63
## 2 AC/SG/099 Leslie ASIAN MALE FALSE 172.72 67.62
## 3 AC/AH/001 Demetrius WHITE MALE FALSE 182.87 76.57
## 4 AC/AH/086 Kyle BLACK MALE TRUE 180.11 75.72
## 5 AC/AH/045 Shirley WHITE MALE FALSE 181.32 76.90
## 6 AC/AH/114 Kris HISPANIC MALE FALSE 177.75 74.84
## 7 AC/AH/077 Tommy BLACK MALE FALSE 174.09 72.20
## 8 AC/AH/150 Brett WHITE MALE TRUE 181.56 79.54
## 9 AC/AH/057 Vernon WHITE FEMALE TRUE 163.79 65.76
## 10 AC/AH/207 Bobbie WHITE FEMALE FALSE 163.01 65.19
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 05-06-1973 Iowa DOG BAD FALSE 25-09-2016 21.41385
## 2 04-02-1972 Ohio CAT GOOD FALSE 25-07-2016 22.66678
## 3 31-01-1972 Georgia DOG NORMAL FALSE 25-11-2015 22.89674
## 4 12-05-1973 Georgia CAT BAD FALSE 25-12-2015 23.34183
## 5 25-12-1971 Louisiana DOG GOOD FALSE 25-11-2015 23.39025
## 6 19-11-1972 Pennsylvania BIRD BAD FALSE 25-01-2016 23.68725
## 7 01-02-1973 Washington CAT BAD FALSE 25-12-2015 23.82262
## 8 03-05-1972 Kentucky DOG GOOD TRUE 25-02-2016 24.12933
## 9 06-01-1972 Illinois CAT BAD FALSE 25-12-2015 24.51247
## 10 17-05-1973 Florida DOG NORMAL FALSE 25-03-2016 24.53310
## BMILabel
## 1 NORMAL
## 2 NORMAL
## 3 NORMAL
## 4 NORMAL
## 5 NORMAL
## 6 NORMAL
## 7 NORMAL
## 8 NORMAL
## 9 NORMAL
## 10 NORMAL
#3.) Gender > Race - frequency / counts
summarise(group_by(dfrPatient, Gender, Race), n())
## Source: local data frame [8 x 3]
## Groups: Gender [?]
##
## Gender Race `n()`
## <chr> <chr> <int>
## 1 FEMALE ASIAN 2
## 2 FEMALE BLACK 1
## 3 FEMALE HISPANIC 4
## 4 FEMALE WHITE 29
## 5 MALE ASIAN 2
## 6 MALE BLACK 2
## 7 MALE HISPANIC 5
## 8 MALE WHITE 21
#4.) Race > Gender - max, min and average values for BMI-Values
summarise(group_by(dfrPatient, Race, Gender), min(BMIValue), mean(BMIValue), max(BMIValue))
## Source: local data frame [8 x 5]
## Groups: Race [?]
##
## Race Gender `min(BMIValue)` `mean(BMIValue)` `max(BMIValue)`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 ASIAN FEMALE 25.57631 26.88531 28.19431
## 2 ASIAN MALE 22.66678 24.95782 27.24885
## 3 BLACK FEMALE 26.71407 26.71407 26.71407
## 4 BLACK MALE 23.34183 23.58223 23.82262
## 5 HISPANIC FEMALE 25.03916 26.29513 26.89942
## 6 HISPANIC MALE 23.68725 26.39844 28.26769
## 7 WHITE FEMALE 24.51247 26.61097 28.24834
## 8 WHITE MALE 21.41385 27.71432 31.70402
#5.) All Dead people
filter(dfrPatient, Died==TRUE)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/049 Martin WHITE FEMALE FALSE 160.06 72.37
## 2 AC/AH/127 Jame WHITE MALE FALSE 167.75 82.06
## 3 AC/AH/133 Clyde HISPANIC MALE FALSE 181.15 83.93
## 4 AC/AH/150 Brett WHITE MALE TRUE 181.56 79.54
## 5 AC/AH/154 Tony WHITE FEMALE FALSE 160.03 64.30
## 6 AC/AH/156 George WHITE MALE FALSE 165.62 76.72
## 7 AC/AH/160 Rory ASIAN FEMALE FALSE 159.67 71.88
## 8 AC/AH/171 Devin WHITE FEMALE FALSE 163.35 70.46
## 9 AC/AH/176 Jerry ASIAN MALE FALSE 175.21 83.65
## 10 AC/AH/180 Drew WHITE FEMALE FALSE 160.80 64.77
## 11 AC/AH/186 Christopher WHITE FEMALE FALSE 157.95 67.41
## 12 AC/AH/211 Son WHITE FEMALE FALSE 157.16 69.64
## 13 AC/AH/219 Jay WHITE FEMALE FALSE 163.47 72.89
## 14 AC/AH/233 Marion WHITE FEMALE FALSE 163.97 66.71
## 15 AC/AH/248 Andrea WHITE MALE FALSE 178.64 97.05
## 16 AC/AH/249 Jesus HISPANIC FEMALE TRUE 159.78 68.31
## 17 AC/SG/008 Dana WHITE MALE TRUE 169.66 77.30
## 18 AC/SG/010 Theo ASIAN FEMALE FALSE 159.32 64.92
## 19 AC/SG/015 Shaun WHITE MALE TRUE 170.51 84.35
## 20 AC/SG/016 Jimmie BLACK FEMALE FALSE 161.84 69.97
## 21 AC/SG/046 Carl HISPANIC MALE FALSE 171.41 81.70
## 22 AC/SG/055 Evan WHITE MALE FALSE 166.75 79.06
## 23 AC/SG/064 Jon WHITE MALE FALSE 169.16 90.08
## 24 AC/SG/065 Shayne WHITE FEMALE FALSE 157.01 66.56
## 25 AC/SG/067 Thomas WHITE MALE FALSE 167.51 84.15
## 26 AC/SG/068 Valentine HISPANIC FEMALE FALSE 160.47 68.20
## 27 AC/SG/084 Brian HISPANIC MALE FALSE 174.25 80.93
## 28 AC/SG/101 Jason WHITE FEMALE FALSE 159.23 69.96
## 29 AC/SG/123 Darnell WHITE FEMALE TRUE 162.32 72.72
## 30 AC/SG/134 Daryl WHITE FEMALE TRUE 162.59 69.76
## 31 AC/SG/155 Raymond WHITE FEMALE FALSE 158.35 69.72
## 32 AC/SG/165 Elmer WHITE FEMALE FALSE 162.18 67.81
## 33 AC/SG/172 Whitney WHITE MALE FALSE 171.45 84.29
## 34 AC/SG/179 Logan WHITE MALE FALSE 183.10 82.47
## 35 AC/SG/181 Terry HISPANIC MALE FALSE 177.14 88.70
## 36 AC/SG/197 Stacy WHITE FEMALE FALSE 159.44 66.21
## 37 AC/SG/234 Luis HISPANIC FEMALE FALSE 164.88 68.07
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 28-04-1972 California HORSE NORMAL TRUE 25-12-2015 28.24834
## 2 29-10-1972 Texas DOG GOOD TRUE 25-01-2016 29.16127
## 3 13-10-1973 Washington CAT BAD TRUE 25-02-2016 25.57647
## 4 03-05-1972 Kentucky DOG GOOD TRUE 25-02-2016 24.12933
## 5 30-08-1973 California DOG GOOD TRUE 25-02-2016 25.10777
## 6 09-07-1972 California DOG GOOD TRUE 25-02-2016 27.96939
## 7 22-09-1973 Florida CAT NORMAL TRUE 25-02-2016 28.19431
## 8 16-04-1973 California BIRD BAD TRUE 25-03-2016 26.40611
## 9 01-05-1973 Virginia DOG BAD TRUE 25-03-2016 27.24885
## 10 18-02-1973 Oregon CAT GOOD TRUE 25-03-2016 25.04966
## 11 06-05-1972 New Jersey DOG BAD TRUE 25-03-2016 27.01998
## 12 14-07-1973 California CAT NORMAL TRUE 25-04-2016 28.19517
## 13 07-04-1972 North Carolina BIRD GOOD TRUE 25-04-2016 27.27670
## 14 23-12-1971 Ohio CAT BAD TRUE 25-04-2016 24.81202
## 15 12-01-1973 Indiana CAT GOOD TRUE 25-05-2016 30.41152
## 16 23-04-1972 Alabama CAT NORMAL TRUE 25-05-2016 26.75713
## 17 26-05-1973 Nevada DOG GOOD TRUE 25-05-2016 26.85472
## 18 29-01-1973 New York CAT NORMAL TRUE 25-06-2016 25.57631
## 19 09-11-1972 New Jersey DOG BAD TRUE 25-06-2016 29.01252
## 20 03-04-1972 Arizona CAT BAD TRUE 25-06-2016 26.71407
## 21 05-08-1973 Mississippi BIRD NORMAL TRUE 25-06-2016 27.80672
## 22 24-02-1972 Illinois BIRD BAD TRUE 25-07-2016 28.43316
## 23 04-10-1972 Illinois CAT NORMAL TRUE 25-07-2016 31.47988
## 24 05-04-1972 California DOG BAD TRUE 25-07-2016 26.99968
## 25 19-07-1972 Pennsylvania BIRD NORMAL TRUE 25-07-2016 29.98974
## 26 15-04-1972 Tennessee CAT BAD TRUE 25-07-2016 26.48480
## 27 06-03-1972 Virginia DOG NORMAL TRUE 25-07-2016 26.65410
## 28 28-09-1973 Michigan DOG NORMAL TRUE 25-07-2016 27.59307
## 29 03-09-1972 North Carolina BIRD GOOD TRUE 25-08-2016 27.60005
## 30 28-05-1972 Texas CAT NORMAL TRUE 25-08-2016 26.38875
## 31 02-06-1972 California CAT BAD TRUE 25-08-2016 27.80489
## 32 25-03-1972 Washington BIRD GOOD TRUE 25-08-2016 25.78096
## 33 25-02-1972 Florida DOG NORMAL TRUE 25-09-2016 28.67484
## 34 24-10-1972 Ohio DOG BAD TRUE 25-09-2016 24.59910
## 35 24-11-1971 Indiana CAT BAD TRUE 25-09-2016 28.26769
## 36 08-11-1972 New York CAT GOOD TRUE 25-10-2016 26.04528
## 37 10-11-1971 Pennsylvania CAT BAD TRUE 25-10-2016 25.03916
## BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 NORMAL
## 5 OVERWEIGHT
## 6 OVERWEIGHT
## 7 OVERWEIGHT
## 8 OVERWEIGHT
## 9 OVERWEIGHT
## 10 OVERWEIGHT
## 11 OVERWEIGHT
## 12 OVERWEIGHT
## 13 OVERWEIGHT
## 14 NORMAL
## 15 Obese
## 16 OVERWEIGHT
## 17 OVERWEIGHT
## 18 OVERWEIGHT
## 19 OVERWEIGHT
## 20 OVERWEIGHT
## 21 OVERWEIGHT
## 22 OVERWEIGHT
## 23 Obese
## 24 OVERWEIGHT
## 25 OVERWEIGHT
## 26 OVERWEIGHT
## 27 OVERWEIGHT
## 28 OVERWEIGHT
## 29 OVERWEIGHT
## 30 OVERWEIGHT
## 31 OVERWEIGHT
## 32 OVERWEIGHT
## 33 OVERWEIGHT
## 34 NORMAL
## 35 OVERWEIGHT
## 36 OVERWEIGHT
## 37 OVERWEIGHT
nrow(filter(dfrPatient, Died==TRUE))
## [1] 37
#6.) Hispanic Females
filter(dfrPatient, Race=="HISPANIC" & Gender=="FEMALE")
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/249 Jesus HISPANIC FEMALE TRUE 159.78 68.31
## 2 AC/SG/068 Valentine HISPANIC FEMALE FALSE 160.47 68.20
## 3 AC/SG/122 Michal HISPANIC FEMALE FALSE 160.09 68.94
## 4 AC/SG/234 Luis HISPANIC FEMALE FALSE 164.88 68.07
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 1 23-04-1972 Alabama CAT NORMAL TRUE 25-05-2016 26.75713
## 2 15-04-1972 Tennessee CAT BAD TRUE 25-07-2016 26.48480
## 3 16-12-1971 South Carolina DOG GOOD FALSE 25-08-2016 26.89942
## 4 10-11-1971 Pennsylvania CAT BAD TRUE 25-10-2016 25.03916
## BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OVERWEIGHT
nrow(filter(dfrPatient, Race=="HISPANIC" & Gender=="FEMALE"))
## [1] 4
#7.) 7 sample records from the dataset using seed(707)
set.seed(707)
sample_n(dfrPatient, 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 14 AC/AH/053 Francis WHITE FEMALE TRUE 164.70 75.69
## 48 AC/AH/219 Jay WHITE FEMALE FALSE 163.47 72.89
## 32 AC/AH/156 George WHITE MALE FALSE 165.62 76.72
## 55 AC/AH/248 Andrea WHITE MALE FALSE 178.64 97.05
## 70 AC/SG/068 Valentine HISPANIC FEMALE FALSE 160.47 68.20
## 65 AC/SG/055 Evan WHITE MALE FALSE 166.75 79.06
## 80 AC/SG/122 Michal HISPANIC FEMALE FALSE 160.09 68.94
## 9 AC/AH/045 Shirley WHITE MALE FALSE 181.32 76.90
## 22 AC/AH/100 Michel WHITE FEMALE FALSE 161.92 69.92
## 59 AC/SG/008 Dana WHITE MALE TRUE 169.66 77.30
## BirthDate State Pet HealthGrade Died RecordDate BMIValue
## 14 16-11-1971 Virginia DOG GOOD FALSE 25-12-2015 27.90303
## 48 07-04-1972 North Carolina BIRD GOOD TRUE 25-04-2016 27.27670
## 32 09-07-1972 California DOG GOOD TRUE 25-02-2016 27.96939
## 55 12-01-1973 Indiana CAT GOOD TRUE 25-05-2016 30.41152
## 70 15-04-1972 Tennessee CAT BAD TRUE 25-07-2016 26.48480
## 65 24-02-1972 Illinois BIRD BAD TRUE 25-07-2016 28.43316
## 80 16-12-1971 South Carolina DOG GOOD FALSE 25-08-2016 26.89942
## 9 25-12-1971 Louisiana DOG GOOD FALSE 25-11-2015 23.39025
## 22 27-12-1972 Georgia DOG GOOD FALSE 25-12-2015 26.66861
## 59 26-05-1973 Nevada DOG GOOD TRUE 25-05-2016 26.85472
## BMILabel
## 14 OVERWEIGHT
## 48 OVERWEIGHT
## 32 OVERWEIGHT
## 55 Obese
## 70 OVERWEIGHT
## 65 OVERWEIGHT
## 80 OVERWEIGHT
## 9 NORMAL
## 22 OVERWEIGHT
## 59 OVERWEIGHT
1- We find that all the top 10 patients with high BMI values are
a. All males belonging to race WHITE
b. 80% of them dont smoke & 50% of them have died
c. Therefore we can say that white people & mainly those who dont smoke
they have a high BMI value.
2- We find that all the top 10 patients with low BMI values are
a. 50% of them are white amd out of them all are mostly male
b. Healthgrade is moderately good with only 1 person dead out of this sample of 10 people
c. For this sample BMI Value turns out to be normal
i.e their height and weight are normal wrt to each other.
3- The patients records are mostly of white people.
4- From the mean values we can say that
a. In case of females belonging to different races there is not much of difference in the mean BMI value
the value is approximately same for all i.e 26.
b. In case of males belonging to different races the mean BMI value varies such that the approx values
For Black- 23, Asian- 24, Hispanic- 26, White- 27
Therefore from the given report we can say that
i) the Black males have less weight than the other race or White males have more weight than the other race.
ii) the Black males have more height than the other race or White males have less height than the other race.
5- The patients who have died are mostly white females(21/37) &
also mostly the dead people were non smokers but with overweight BMI Value.
6- Hispanic females 3/4 are dead, mostly they were non smokers but with overweight BMI Value.
1.)RMD tool
1. Understood the working of RMD tool using various syntax.
2.)Analysis
1. Most of the data provided was analyzed but still the Analysis could have been done better if the data supplied
had appropriate values i.e without missing values & also the outliers.
2. The data analysis has been done only for 67 records but the main file had 100 records hence the analysis will work
with 67% records !