Understand and leran how to use RMD and dplyr properly..
1. Learn RMD
2. Analysis, Cleaning, Reporting of Patient Data
Provide R code to read the Patient.csv file and do data Preparation & Reporting
- Add two new columns BMI Value and BMI Lebel
- Data Cleaning
- Report Preparation
knitr Global Options
# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')
Load Libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
setwd(“D:/Welingkar/Trim 3/R/Assignment”)
Read Data
# inline comments
dfrPatient <- read.csv("./patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015
## 4 11-08-1972 Florida Cat 1 False 25-11-2015
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015
Total Rows Of Patient File: 100
1. Add coloumn BMI-Value
# inline comments
dfrPatient <- mutate(dfrPatient, BMI=(WeightInKgs/(HeightInCms/100)^2))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMI
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015 27.90487
2. Add column BMI-Label
# inline comments
dfrPatient <- mutate(dfrPatient, BMILabel=NA)
dfrPatient$BMILabel <- ifelse(dfrPatient$BMI < 18.50,"UNDERWEIGHT",
ifelse(dfrPatient$BMI >= 18.50 & dfrPatient$BMI < 25.00, "NORMAL",
ifelse(dfrPatient$BMI >= 25.00 & dfrPatient$BMI < 30.00, "OVERWEIGHT",
ifelse(dfrPatient$BMI >= 30.00,"Obese", NA))))
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMI
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT
val_count <- nrow(dfrPatient)
The no of rows before data cleaning are 100
3.1 Distinct values in the columns
# inline comments
#####BMI
summarise(group_by(dfrPatient, BMI), n())
## # A tibble: 100 × 2
## BMI `n()`
## <dbl> <int>
## 1 21.41385 1
## 2 22.04640 1
## 3 22.66678 1
## 4 22.89674 1
## 5 23.06452 1
## 6 23.34183 1
## 7 23.39025 1
## 8 23.51295 1
## 9 23.62505 1
## 10 23.68725 1
## # ... with 90 more rows
#####Gender
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 6 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 6
## 2 Male 3
## 3 Female 45
## 4 Female 4
## 5 Male 40
## 6 Male 2
#####Race
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Dog 1
## 5 Hispanic 17
## 6 White 68
#####Died
summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
## Died `n()`
## <chr> <int>
## 1 False 46
## 2 True 54
#####Pet
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 24
## 3 CAT 5
## 4 Dog 28
## 5 DOG 4
## 6 Horse 1
## 7 None 23
## 8 NONE 1
## 9 NULL 3
## 10 <NA> 2
#####Smokes
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
## Smokes `n()`
## <chr> <int>
## 1 False 72
## 2 No 6
## 3 True 18
## 4 Yes 4
#####HealthGrade
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <int> <int>
## 1 1 29
## 2 2 30
## 3 3 34
## 4 99 7
3.2 Removing the Extra spaces in columns Gender (Leading/Trailing)
# inline comments
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 6 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 6
## 2 Male 3
## 3 Female 45
## 4 Female 4
## 5 Male 40
## 6 Male 2
dfrPatient$Gender<-trimws(dfrPatient$Gender, which = c("both", "left", "right"))
summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
## Gender `n()`
## <chr> <int>
## 1 Female 55
## 2 Male 45
head(dfrPatient)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White Male False 182.87 76.57
## 2 AC/AH/017 Rosario White Male False 179.12 80.43
## 3 AC/AH/020 Julio Black Male False 169.15 75.48
## 4 AC/AH/022 Lupe White Male False 175.66 94.54
## 5 AC/AH/029 Lavern White Female False 164.47 71.78
## 6 AC/AH/033 Bernie Dog Female True 158.27 69.90
## BirthDate State Pet HealthGrade Died RecordDate BMI
## 1 31-01-1972 Georgia,xxx Dog 2 False 25-11-2015 22.89674
## 2 09-06-1972 Missouri Dog 2 False 25-11-2015 25.06859
## 3 03-07-1972 Pennsylvania None 2 False 25-11-2015 26.38080
## 4 11-08-1972 Florida Cat 1 False 25-11-2015 30.63867
## 5 06-06-1973 Iowa NULL 2 True 25-11-2015 26.53567
## 6 25-06-1973 Maryland Dog 2 False 25-11-2015 27.90487
## BMILabel
## 1 NORMAL
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT
3.3 Error handling in Race column
# inline comments
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Dog 1
## 5 Hispanic 17
## 6 White 68
dfrPatient[,3]<- ifelse(dfrPatient[,3]=="White", "White",
ifelse(dfrPatient[,3]=="Black", "Black",
ifelse(dfrPatient[,3]=="Hispanic", "Hispanic",
ifelse(dfrPatient[,3]=="Bi-Racial", "Bi-Racial",
ifelse(dfrPatient[,3]=="Asian", "Asian","NA")))))
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
## Race `n()`
## <chr> <int>
## 1 Asian 5
## 2 Bi-Racial 1
## 3 Black 8
## 4 Hispanic 17
## 5 NA 1
## 6 White 68
3.4 error handling in pet
# inline comments
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 10 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 24
## 3 CAT 5
## 4 Dog 28
## 5 DOG 4
## 6 Horse 1
## 7 None 23
## 8 NONE 1
## 9 NULL 3
## 10 <NA> 2
dfrPatient$Pet<-ifelse(is.na(dfrPatient$Pet),"NA",
ifelse(dfrPatient$Pet=="DOG", "Dog",
ifelse(dfrPatient$Pet=="CAT", "Cat",
ifelse(dfrPatient$Pet=="None", "NA",
ifelse(dfrPatient$Pet=="NONE", "NA",
ifelse(dfrPatient$Pet=="NULL", "NA",dfrPatient$Pet))))))
summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 5 × 2
## Pet `n()`
## <chr> <int>
## 1 Bird 9
## 2 Cat 29
## 3 Dog 32
## 4 Horse 1
## 5 NA 29
3.5 Error handling in Smokes column
# inline comments
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 4 × 2
## Smokes `n()`
## <chr> <int>
## 1 False 72
## 2 No 6
## 3 True 18
## 4 Yes 4
dfrPatient[,5] <- ifelse(dfrPatient[,5] == "No","False" ,
ifelse(dfrPatient[,5] == "Yes", "True",
ifelse(dfrPatient[,5]=="True", "True",
ifelse(dfrPatient[,5]=="False","False","NA"))))
summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 2 × 2
## Smokes `n()`
## <chr> <int>
## 1 False 78
## 2 True 22
3.6 Error handling in state column
# inline comments
dfrPatient$State[dfrPatient$State=="Georgia,xxx"] <- "Georgia"
3.7 Error handling in Healthgrade
# inline comments
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <int> <int>
## 1 1 29
## 2 2 30
## 3 3 34
## 4 99 7
class(dfrPatient$HealthGrade)
## [1] "integer"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==1] <- "GOOD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==2] <- "NORMAL"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==3] <- "BAD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==99] <- NA
class(dfrPatient$HealthGrade)
## [1] "character"
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
## HealthGrade `n()`
## <chr> <int>
## 1 BAD 34
## 2 GOOD 29
## 3 NORMAL 30
## 4 <NA> 7
3.8 Remove all records with NA in any columns (complete cases)
# inline comments
vclComplete <- complete.cases(dfrPatient)
vclComplete[is.na(vclComplete)]
## logical(0)
dfrPatient <- dfrPatient[vclComplete, ]
val1 <- nrow(dfrPatient)
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 3 × 2
## HealthGrade `n()`
## <chr> <int>
## 1 BAD 34
## 2 GOOD 29
## 3 NORMAL 30
The no of rows after data cleaning are 93
1. Display top 10 records based on BMI-Value
# inline comments
head(arrange(dfrPatient, desc(BMI)), 10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/SG/009 Sammy White Male False 166.84 88.25
## 2 AC/SG/064 Jon White Male False 169.16 90.08
## 3 AC/AH/076 Albert White Male False 176.22 97.67
## 4 AC/AH/104 Jeremy White Male True 169.85 90.63
## 5 AC/AH/022 Lupe White Male False 175.66 94.54
## 6 AC/AH/248 Andrea White Male False 178.64 97.05
## 7 AC/SG/067 Thomas White Male False 167.51 84.15
## 8 AC/AH/052 Courtney White Male True 175.39 92.22
## 9 AC/AH/159 Edward White Male False 181.64 96.91
## 10 AC/AH/127 Jame White Male False 167.75 82.06
## BirthDate State Pet HealthGrade Died RecordDate BMI
## 1 04-03-1972 Vermont Dog GOOD False 25-06-2016 31.70402
## 2 04-10-1972 Illinois Cat NORMAL True 25-07-2016 31.47988
## 3 08-04-1973 Louisiana Cat NORMAL False 25-12-2015 31.45218
## 4 12-04-1972 Kentucky NA GOOD True 25-12-2015 31.41528
## 5 11-08-1972 Florida Cat GOOD False 25-11-2015 30.63867
## 6 12-01-1973 Indiana Cat GOOD True 25-05-2016 30.41152
## 7 19-07-1972 Pennsylvania Bird NORMAL True 25-07-2016 29.98974
## 8 16-03-1972 Indiana Bird BAD False 25-12-2015 29.97888
## 9 04-12-1972 Connecticut Cat NORMAL False 25-02-2016 29.37282
## 10 29-10-1972 Texas Dog GOOD True 25-01-2016 29.16127
## BMILabel
## 1 Obese
## 2 Obese
## 3 Obese
## 4 Obese
## 5 Obese
## 6 Obese
## 7 OVERWEIGHT
## 8 OVERWEIGHT
## 9 OVERWEIGHT
## 10 OVERWEIGHT
2. Display Bottom Ten records according to BMI
# inline comments
fr <- as.integer(nrow(arrange(dfrPatient, desc(BMI)))-9)
to <- nrow(dfrPatient)
dfr_bottom_ten<- slice(arrange(dfrPatient, desc(BMI)), fr:to)
val_row <- nrow(dfr_bottom_ten)
head(dfr_bottom_ten,10)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/150 Brett White Male True 181.56 79.54
## 2 AC/AH/077 Tommy Black Male False 174.09 72.20
## 3 AC/AH/114 Kris Hispanic Male False 177.75 74.84
## 4 AC/AH/164 Shane Hispanic Male True 177.03 74.04
## 5 AC/AH/089 Dong White Male False 179.24 75.54
## 6 AC/AH/045 Shirley White Male False 181.32 76.90
## 7 AC/AH/086 Kyle Black Male True 180.11 75.72
## 8 AC/AH/001 Demetrius White Male False 182.87 76.57
## 9 AC/SG/099 Leslie Asian Male False 172.72 67.62
## 10 AC/SG/193 Ronnie White Male True 185.43 73.63
## BirthDate State Pet HealthGrade Died RecordDate BMI
## 1 03-05-1972 Kentucky Dog GOOD True 25-02-2016 24.12933
## 2 01-02-1973 Washington Cat BAD False 25-12-2015 23.82262
## 3 19-11-1972 Pennsylvania Bird BAD False 25-01-2016 23.68725
## 4 18-02-1972 Florida NA NORMAL False 25-02-2016 23.62505
## 5 11-03-1972 California NA NORMAL True 25-12-2015 23.51295
## 6 25-12-1971 Louisiana Dog GOOD False 25-11-2015 23.39025
## 7 12-05-1973 Georgia Cat BAD False 25-12-2015 23.34183
## 8 31-01-1972 Georgia Dog NORMAL False 25-11-2015 22.89674
## 9 04-02-1972 Ohio Cat GOOD False 25-07-2016 22.66678
## 10 05-06-1973 Iowa Dog BAD False 25-09-2016 21.41385
## BMILabel
## 1 NORMAL
## 2 NORMAL
## 3 NORMAL
## 4 NORMAL
## 5 NORMAL
## 6 NORMAL
## 7 NORMAL
## 8 NORMAL
## 9 NORMAL
## 10 NORMAL
No of records in above table are 10
3. Provide frequency / counts of Gender > Race
# inline comments
summarise(group_by(dfrPatient, Gender, Race), n())
## Source: local data frame [10 x 3]
## Groups: Gender [?]
##
## Gender Race `n()`
## <chr> <chr> <int>
## 1 Female Asian 3
## 2 Female Black 2
## 3 Female Hispanic 6
## 4 Female NA 1
## 5 Female White 39
## 6 Male Asian 2
## 7 Male Bi-Racial 1
## 8 Male Black 4
## 9 Male Hispanic 9
## 10 Male White 26
4. Race > Gender - max, min and average values for BMI-Values
# inline comments
summarise(group_by(dfrPatient, Race, Gender), min(BMI), mean(BMI), max(BMI))
## Source: local data frame [10 x 5]
## Groups: Race [?]
##
## Race Gender `min(BMI)` `mean(BMI)` `max(BMI)`
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 Asian Female 24.42511 26.06524 28.19431
## 2 Asian Male 22.66678 24.95782 27.24885
## 3 Bi-Racial Male 24.83473 24.83473 24.83473
## 4 Black Female 25.22482 25.96945 26.71407
## 5 Black Male 23.34183 25.03778 26.60586
## 6 Hispanic Female 25.03916 26.52176 27.84206
## 7 Hispanic Male 23.62505 26.02289 28.26769
## 8 NA Female 27.90487 27.90487 27.90487
## 9 White Female 24.21459 26.41985 28.24834
## 10 White Male 21.41385 27.67114 31.70402
5. Data related to Dead people
# inline comments
filter(dfrPatient, Died=="True")
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/029 Lavern White Female False 164.47 71.78
## 2 AC/AH/049 Martin White Female False 160.06 72.37
## 3 AC/AH/089 Dong White Male False 179.24 75.54
## 4 AC/AH/104 Jeremy White Male True 169.85 90.63
## 5 AC/AH/127 Jame White Male False 167.75 82.06
## 6 AC/AH/133 Clyde Hispanic Male False 181.15 83.93
## 7 AC/AH/150 Brett White Male True 181.56 79.54
## 8 AC/AH/154 Tony White Female False 160.03 64.30
## 9 AC/AH/156 George White Male False 165.62 76.72
## 10 AC/AH/160 Rory Asian Female False 159.67 71.88
## 11 AC/AH/171 Devin White Female False 163.35 70.46
## 12 AC/AH/176 Jerry Asian Male False 175.21 83.65
## 13 AC/AH/180 Drew White Female False 160.80 64.77
## 14 AC/AH/186 Christopher White Female False 157.95 67.41
## 15 AC/AH/192 Dominique White Male False 180.61 83.59
## 16 AC/AH/211 Son White Female False 157.16 69.64
## 17 AC/AH/219 Jay White Female False 163.47 72.89
## 18 AC/AH/233 Marion White Female False 163.97 66.71
## 19 AC/AH/248 Andrea White Male False 178.64 97.05
## 20 AC/AH/249 Jesus Hispanic Female True 159.78 68.31
## 21 AC/SG/003 Walter White Female False 161.83 66.03
## 22 AC/SG/008 Dana White Male True 169.66 77.30
## 23 AC/SG/010 Theo Asian Female False 159.32 64.92
## 24 AC/SG/015 Shaun White Male True 170.51 84.35
## 25 AC/SG/016 Jimmie Black Female False 161.84 69.97
## 26 AC/SG/046 Carl Hispanic Male False 171.41 81.70
## 27 AC/SG/055 Evan White Male False 166.75 79.06
## 28 AC/SG/056 Merrill Asian Female True 166.19 67.46
## 29 AC/SG/064 Jon White Male False 169.16 90.08
## 30 AC/SG/065 Shayne White Female False 157.01 66.56
## 31 AC/SG/067 Thomas White Male False 167.51 84.15
## 32 AC/SG/068 Valentine Hispanic Female False 160.47 68.20
## 33 AC/SG/084 Brian Hispanic Male False 174.25 80.93
## 34 AC/SG/101 Jason White Female False 159.23 69.96
## 35 AC/SG/116 Connie Black Male False 184.34 90.41
## 36 AC/SG/123 Darnell White Female True 162.32 72.72
## 37 AC/SG/134 Daryl White Female True 162.59 69.76
## 38 AC/SG/155 Raymond White Female False 158.35 69.72
## 39 AC/SG/165 Elmer White Female False 162.18 67.81
## 40 AC/SG/167 Jimmy White Female False 159.38 70.37
## 41 AC/SG/172 Whitney White Male False 171.45 84.29
## 42 AC/SG/179 Logan White Male False 183.10 82.47
## 43 AC/SG/181 Terry Hispanic Male False 177.14 88.70
## 44 AC/SG/182 Jamie Hispanic Male True 171.08 72.51
## 45 AC/SG/191 Lacy Hispanic Female False 159.33 70.68
## 46 AC/SG/197 Stacy White Female False 159.44 66.21
## 47 AC/SG/216 Alva White Female False 159.13 66.96
## 48 AC/SG/217 Dean White Female False 160.58 71.49
## 49 AC/SG/234 Luis Hispanic Female False 164.88 68.07
## BirthDate State Pet HealthGrade Died RecordDate BMI
## 1 06-06-1973 Iowa NA NORMAL True 25-11-2015 26.53567
## 2 28-04-1972 California Horse NORMAL True 25-12-2015 28.24834
## 3 11-03-1972 California NA NORMAL True 25-12-2015 23.51295
## 4 12-04-1972 Kentucky NA GOOD True 25-12-2015 31.41528
## 5 29-10-1972 Texas Dog GOOD True 25-01-2016 29.16127
## 6 13-10-1973 Washington Cat BAD True 25-02-2016 25.57647
## 7 03-05-1972 Kentucky Dog GOOD True 25-02-2016 24.12933
## 8 30-08-1973 California Dog GOOD True 25-02-2016 25.10777
## 9 09-07-1972 California Dog GOOD True 25-02-2016 27.96939
## 10 22-09-1973 Florida Cat NORMAL True 25-02-2016 28.19431
## 11 16-04-1973 California Bird BAD True 25-03-2016 26.40611
## 12 01-05-1973 Virginia Dog BAD True 25-03-2016 27.24885
## 13 18-02-1973 Oregon Cat GOOD True 25-03-2016 25.04966
## 14 06-05-1972 New Jersey Dog BAD True 25-03-2016 27.01998
## 15 24-03-1972 Michigan NA BAD True 25-03-2016 25.62541
## 16 14-07-1973 California Cat NORMAL True 25-04-2016 28.19517
## 17 07-04-1972 North Carolina Bird GOOD True 25-04-2016 27.27670
## 18 23-12-1971 Ohio Cat BAD True 25-04-2016 24.81202
## 19 12-01-1973 Indiana Cat GOOD True 25-05-2016 30.41152
## 20 23-04-1972 Alabama Cat NORMAL True 25-05-2016 26.75713
## 21 11-07-1972 Oregon NA NORMAL True 25-05-2016 25.21292
## 22 26-05-1973 Nevada Dog GOOD True 25-05-2016 26.85472
## 23 29-01-1973 New York Cat NORMAL True 25-06-2016 25.57631
## 24 09-11-1972 New Jersey Dog BAD True 25-06-2016 29.01252
## 25 03-04-1972 Arizona Cat BAD True 25-06-2016 26.71407
## 26 05-08-1973 Mississippi Bird NORMAL True 25-06-2016 27.80672
## 27 24-02-1972 Illinois Bird BAD True 25-07-2016 28.43316
## 28 27-11-1972 Indiana NA BAD True 25-07-2016 24.42511
## 29 04-10-1972 Illinois Cat NORMAL True 25-07-2016 31.47988
## 30 05-04-1972 California Dog BAD True 25-07-2016 26.99968
## 31 19-07-1972 Pennsylvania Bird NORMAL True 25-07-2016 29.98974
## 32 15-04-1972 Tennessee Cat BAD True 25-07-2016 26.48480
## 33 06-03-1972 Virginia Dog NORMAL True 25-07-2016 26.65410
## 34 28-09-1973 Michigan Dog NORMAL True 25-07-2016 27.59307
## 35 05-06-1972 Florida NA BAD True 25-08-2016 26.60586
## 36 03-09-1972 North Carolina Bird GOOD True 25-08-2016 27.60005
## 37 28-05-1972 Texas Cat NORMAL True 25-08-2016 26.38875
## 38 02-06-1972 California Cat BAD True 25-08-2016 27.80489
## 39 25-03-1972 Washington Bird GOOD True 25-08-2016 25.78096
## 40 30-09-1973 Washington NA NORMAL True 25-09-2016 27.70256
## 41 25-02-1972 Florida Dog NORMAL True 25-09-2016 28.67484
## 42 24-10-1972 Ohio Dog BAD True 25-09-2016 24.59910
## 43 24-11-1971 Indiana Cat BAD True 25-09-2016 28.26769
## 44 25-03-1973 Louisiana NA BAD True 25-09-2016 24.77419
## 45 21-06-1973 Texas NA BAD True 25-09-2016 27.84206
## 46 08-11-1972 New York Cat GOOD True 25-10-2016 26.04528
## 47 19-06-1972 Alabama NA GOOD True 25-10-2016 26.44304
## 48 11-11-1972 Ohio NA GOOD True 25-10-2016 27.72441
## 49 10-11-1971 Pennsylvania Cat BAD True 25-10-2016 25.03916
## BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 NORMAL
## 4 Obese
## 5 OVERWEIGHT
## 6 OVERWEIGHT
## 7 NORMAL
## 8 OVERWEIGHT
## 9 OVERWEIGHT
## 10 OVERWEIGHT
## 11 OVERWEIGHT
## 12 OVERWEIGHT
## 13 OVERWEIGHT
## 14 OVERWEIGHT
## 15 OVERWEIGHT
## 16 OVERWEIGHT
## 17 OVERWEIGHT
## 18 NORMAL
## 19 Obese
## 20 OVERWEIGHT
## 21 OVERWEIGHT
## 22 OVERWEIGHT
## 23 OVERWEIGHT
## 24 OVERWEIGHT
## 25 OVERWEIGHT
## 26 OVERWEIGHT
## 27 OVERWEIGHT
## 28 NORMAL
## 29 Obese
## 30 OVERWEIGHT
## 31 OVERWEIGHT
## 32 OVERWEIGHT
## 33 OVERWEIGHT
## 34 OVERWEIGHT
## 35 OVERWEIGHT
## 36 OVERWEIGHT
## 37 OVERWEIGHT
## 38 OVERWEIGHT
## 39 OVERWEIGHT
## 40 OVERWEIGHT
## 41 OVERWEIGHT
## 42 NORMAL
## 43 OVERWEIGHT
## 44 NORMAL
## 45 OVERWEIGHT
## 46 OVERWEIGHT
## 47 OVERWEIGHT
## 48 OVERWEIGHT
## 49 OVERWEIGHT
Val_Dead <- nrow(filter(dfrPatient, Died=="True"))
The no of Dead people are 49
6. Display All Records for “Hispanic Females”
# inline comments
filter(dfrPatient, Race=="Hispanic" & Gender=="Female")
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/208 Lawrence Hispanic Female False 165.80 71.77
## 2 AC/AH/249 Jesus Hispanic Female True 159.78 68.31
## 3 AC/SG/068 Valentine Hispanic Female False 160.47 68.20
## 4 AC/SG/122 Michal Hispanic Female False 160.09 68.94
## 5 AC/SG/191 Lacy Hispanic Female False 159.33 70.68
## 6 AC/SG/234 Luis Hispanic Female False 164.88 68.07
## BirthDate State Pet HealthGrade Died RecordDate BMI
## 1 07-08-1973 Louisiana NA GOOD False 25-03-2016 26.10802
## 2 23-04-1972 Alabama Cat NORMAL True 25-05-2016 26.75713
## 3 15-04-1972 Tennessee Cat BAD True 25-07-2016 26.48480
## 4 16-12-1971 South Carolina Dog GOOD False 25-08-2016 26.89942
## 5 21-06-1973 Texas NA BAD True 25-09-2016 27.84206
## 6 10-11-1971 Pennsylvania Cat BAD True 25-10-2016 25.03916
## BMILabel
## 1 OVERWEIGHT
## 2 OVERWEIGHT
## 3 OVERWEIGHT
## 4 OVERWEIGHT
## 5 OVERWEIGHT
## 6 OVERWEIGHT
Val_HF <- nrow(filter(dfrPatient, Race=="Hispanic" & Gender=="Female"))
The no of Hispanic Females are 6
7. Seven sample records from the dataset using seed(707)
# inline comments
set.seed(707)
sample_n(dfrPatient, 7)
## ID Name Race Gender Smokes HeightInCms WeightInKgs
## 10 AC/AH/048 Merle Hispanic Male False 167.37 79.06
## 44 AC/AH/208 Lawrence Hispanic Female False 165.80 71.77
## 27 AC/AH/115 Tracy Bi-Racial Male True 183.21 83.36
## 53 AC/AH/241 Lindsay White Female False 161.38 73.55
## 75 AC/SG/099 Leslie Asian Male False 172.72 67.62
## 68 AC/SG/065 Shayne White Female False 157.01 66.56
## 83 AC/SG/139 Jordan White Male False 171.94 82.11
## BirthDate State Pet HealthGrade Died RecordDate BMI
## 10 13-07-1973 North Carolina NA NORMAL False 25-12-2015 28.22290
## 44 07-08-1973 Louisiana NA GOOD False 25-03-2016 26.10802
## 27 29-09-1973 California Dog NORMAL False 25-01-2016 24.83473
## 53 08-02-1972 Florida Cat BAD False 25-05-2016 28.24121
## 75 04-02-1972 Ohio Cat GOOD False 25-07-2016 22.66678
## 68 05-04-1972 California Dog BAD True 25-07-2016 26.99968
## 83 06-10-1973 Michigan NA GOOD False 25-08-2016 27.77424
## BMILabel
## 10 OVERWEIGHT
## 44 OVERWEIGHT
## 27 NORMAL
## 53 OVERWEIGHT
## 75 NORMAL
## 68 OVERWEIGHT
## 83 OVERWEIGHT
# inline comments
val_total <- nrow(filter(dfrPatient))
val_female <- nrow(filter(dfrPatient, Gender=="Female"))
val_Male <- nrow(filter(dfrPatient, Gender=="Male"))
val_died <- nrow(filter(dfrPatient, Died=="True"))
val_good_health <- nrow(filter(dfrPatient, HealthGrade=="GOOD"))
val_bad_health <- nrow(filter(dfrPatient, HealthGrade=="BAD"))
val_bmi <- nrow(filter(dfrPatient, BMILabel=="Underweight" | BMILabel=="Overweight" | BMILabel=="Obese"))
val_maxbmi <- max(dfrPatient$BMI)
val_minbmi <- min(dfrPatient$BMI)
val_meanbmi <- mean(dfrPatient$BMI)
Note
Patint data file was having information of 100 patients, After cleaning the data there are total information of 93 patients.
The no of males are 42 while no of females are 51
Out of 93 patients, 29 people are having good health while 34 people are having bad health.
6 people are not having Normal BMI Value.
Out of 93 patients, maximum BMI value is 31.7040213, Minimum BMI value is 21.4138523 while average BMI value is 26.624746
It was a good exercise which helped to know about the RMD and how to show the information in HTML.