Assignment no.3

Sayli Gharat

March 28,2017

Analysis of patient-data

Objective: To analyse patient data using rmd

Problem definition: To analyse the problem -file using rmd and summarizing the results

working dir

setwd("D:/R-BA/R-Scripts")

**** test.rmd

knitr Global Options

# for development
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=TRUE, warning=TRUE, message=TRUE, cache=FALSE, tidy=FALSE, fig.path='figures/')
# for production
#knitr::opts_chunk$set(echo=TRUE, eval=TRUE, error=FALSE, warning=FALSE, message=FALSE, cache=FALSE, tidy=FALSE, fig.path='figures/')

Load Libraries

library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

reading the file

dfrPatient <- read.csv("./data/patient-data.csv", header=T, stringsAsFactors=F)
intRowCount <- nrow(dfrPatient)
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  FALSE      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern White Female  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   TRUE      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate
## 1 31-01-1972  Georgia,xxx  Dog           2 FALSE 25-11-2015
## 2 09-06-1972     Missouri  Dog           2 FALSE 25-11-2015
## 3 03-07-1972 Pennsylvania None           2 FALSE 25-11-2015
## 4 11-08-1972      Florida  Cat           1 FALSE 25-11-2015
## 5 06-06-1973         Iowa NULL           2  TRUE 25-11-2015
## 6 25-06-1973     Maryland  Dog           2 FALSE 25-11-2015

1.adding a new column BMI-Index

dfrPatient <- 
mutate(dfrPatient,BodyMassIndex=((WeightInKgs*10000)/(HeightInCms*HeightInCms)))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  FALSE      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern White Female  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   TRUE      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BodyMassIndex
## 1 31-01-1972  Georgia,xxx  Dog           2 FALSE 25-11-2015      22.89674
## 2 09-06-1972     Missouri  Dog           2 FALSE 25-11-2015      25.06859
## 3 03-07-1972 Pennsylvania None           2 FALSE 25-11-2015      26.38080
## 4 11-08-1972      Florida  Cat           1 FALSE 25-11-2015      30.63867
## 5 06-06-1973         Iowa NULL           2  TRUE 25-11-2015      26.53567
## 6 25-06-1973     Maryland  Dog           2 FALSE 25-11-2015      27.90487

2.adding a new column BMI-Label

dfrPatient <- mutate(dfrPatient,BMILabel = 
  ifelse(dfrPatient$BodyMassIndex < 18.50, "Underweight",                           ifelse(dfrPatient$BodyMassIndex>=18.50 & dfrPatient$BodyMassIndex<25, "Normal",
  ifelse(dfrPatient$BodyMassIndex>=25 & dfrPatient$BodyMassIndex<30 ,"Overweight", "Obese"))))
head(dfrPatient)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   Male  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario White   Male  FALSE      179.12       80.43
## 3 AC/AH/020     Julio Black   Male  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe White   Male  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern White Female  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie   Dog Female   TRUE      158.27       69.90
##    BirthDate        State  Pet HealthGrade  Died RecordDate BodyMassIndex
## 1 31-01-1972  Georgia,xxx  Dog           2 FALSE 25-11-2015      22.89674
## 2 09-06-1972     Missouri  Dog           2 FALSE 25-11-2015      25.06859
## 3 03-07-1972 Pennsylvania None           2 FALSE 25-11-2015      26.38080
## 4 11-08-1972      Florida  Cat           1 FALSE 25-11-2015      30.63867
## 5 06-06-1973         Iowa NULL           2  TRUE 25-11-2015      26.53567
## 6 25-06-1973     Maryland  Dog           2 FALSE 25-11-2015      27.90487
##     BMILabel
## 1     Normal
## 2 Overweight
## 3 Overweight
## 4      Obese
## 5 Overweight
## 6 Overweight

to rectify error in data

to maintain uniformity of dataset we change all the values in Pet column to lower case & trim white spaces

dfrPatient$Pet <- trimws(tolower(dfrPatient$Pet))
head(dfrPatient,20)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/001 Demetrius    White   Male  FALSE      182.87       76.57
## 2  AC/AH/017   Rosario    White   Male  FALSE      179.12       80.43
## 3  AC/AH/020     Julio    Black   Male  FALSE      169.15       75.48
## 4  AC/AH/022      Lupe    White   Male  FALSE      175.66       94.54
## 5  AC/AH/029    Lavern    White Female  FALSE      164.47       71.78
## 6  AC/AH/033    Bernie      Dog Female   TRUE      158.27       69.90
## 7  AC/AH/037    Samuel    White Female  FALSE      161.69       68.85
## 8  AC/AH/044     Clair    White Female     No      165.84       70.44
## 9  AC/AH/045   Shirley    White   Male  FALSE      181.32       76.90
## 10 AC/AH/048     Merle Hispanic   Male  FALSE      167.37       79.06
## 11 AC/AH/049    Martin    White Female  FALSE      160.06       72.37
## 12 AC/AH/050   Frances    White Female  FALSE      166.48       67.34
## 13 AC/AH/052  Courtney    White   Male   TRUE      175.39       92.22
## 14 AC/AH/053   Francis    White Female   TRUE      164.70       75.69
## 15 AC/AH/057    Vernon    White Female   TRUE      163.79       65.76
## 16 AC/AH/061    Lester    Black   Male  FALSE      181.13       72.33
## 17 AC/AH/063     Robin Hispanic   Male  FALSE      169.24       73.30
## 18 AC/AH/076    Albert    White   Male  FALSE      176.22       97.67
## 19 AC/AH/077     Tommy    Black   Male  FALSE      174.09       72.20
## 20 AC/AH/086      Kyle    Black   Male   TRUE      180.11       75.72
##     BirthDate          State   Pet HealthGrade  Died RecordDate
## 1  31-01-1972    Georgia,xxx   dog           2 FALSE 25-11-2015
## 2  09-06-1972       Missouri   dog           2 FALSE 25-11-2015
## 3  03-07-1972   Pennsylvania  none           2 FALSE 25-11-2015
## 4  11-08-1972        Florida   cat           1 FALSE 25-11-2015
## 5  06-06-1973           Iowa  null           2  TRUE 25-11-2015
## 6  25-06-1973       Maryland   dog           2 FALSE 25-11-2015
## 7  20-03-1972   Pennsylvania  none           1 FALSE 25-11-2015
## 8  05-05-1973 North Carolina  none           1 FALSE 25-11-2015
## 9  25-12-1971      Louisiana   dog           1 FALSE 25-11-2015
## 10 13-07-1973 North Carolina  none           2 FALSE 25-12-2015
## 11 28-04-1972     California horse           2  TRUE 25-12-2015
## 12 08-11-1971       Michigan  none           1 FALSE 25-12-2015
## 13 16-03-1972        Indiana  bird           3 FALSE 25-12-2015
## 14 16-11-1971       Virginia   dog           1 FALSE 25-12-2015
## 15 06-01-1972       Illinois   cat           3 FALSE 25-12-2015
## 16 16-11-1972      Wisconsin   dog          99  TRUE 25-12-2015
## 17 16-11-1971       Illinois  none           3 FALSE 25-12-2015
## 18 08-04-1973      Louisiana   cat           2 FALSE 25-12-2015
## 19 01-02-1973     Washington   cat           3 FALSE 25-12-2015
## 20 12-05-1973        Georgia   cat           3 FALSE 25-12-2015
##    BodyMassIndex   BMILabel
## 1       22.89674     Normal
## 2       25.06859 Overweight
## 3       26.38080 Overweight
## 4       30.63867      Obese
## 5       26.53567 Overweight
## 6       27.90487 Overweight
## 7       26.33526 Overweight
## 8       25.61184 Overweight
## 9       23.39025     Normal
## 10      28.22290 Overweight
## 11      28.24834 Overweight
## 12      24.29679     Normal
## 13      29.97888 Overweight
## 14      27.90303 Overweight
## 15      24.51247     Normal
## 16      22.04640     Normal
## 17      25.59163 Overweight
## 18      31.45218      Obese
## 19      23.82262     Normal
## 20      23.34183     Normal

converting none, null values in Pet column to NA

summarise(group_by(dfrPatient,Pet),n())
## # A tibble: 7 × 2
##     Pet `n()`
##   <chr> <int>
## 1  bird     9
## 2   cat    29
## 3   dog    32
## 4 horse     1
## 5  none    24
## 6  null     3
## 7  <NA>     2
class(dfrPatient$Pet)
## [1] "character"
dfrPatient$Pet[dfrPatient$Pet=="none"] <- NA
dfrPatient$Pet[dfrPatient$Pet=="null"] <- NA
summarise(group_by(dfrPatient,Pet),n())
## # A tibble: 5 × 2
##     Pet `n()`
##   <chr> <int>
## 1  bird     9
## 2   cat    29
## 3   dog    32
## 4 horse     1
## 5  <NA>    29

removing the outlier values

Removing white spaces in gender data

cat("\014")

dfrPatient$Gender <- trimws(tolower(dfrPatient$Gender))
head(dfrPatient,20)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/001 Demetrius    White   male  FALSE      182.87       76.57
## 2  AC/AH/017   Rosario    White   male  FALSE      179.12       80.43
## 3  AC/AH/020     Julio    Black   male  FALSE      169.15       75.48
## 4  AC/AH/022      Lupe    White   male  FALSE      175.66       94.54
## 5  AC/AH/029    Lavern    White female  FALSE      164.47       71.78
## 6  AC/AH/033    Bernie      Dog female   TRUE      158.27       69.90
## 7  AC/AH/037    Samuel    White female  FALSE      161.69       68.85
## 8  AC/AH/044     Clair    White female     No      165.84       70.44
## 9  AC/AH/045   Shirley    White   male  FALSE      181.32       76.90
## 10 AC/AH/048     Merle Hispanic   male  FALSE      167.37       79.06
## 11 AC/AH/049    Martin    White female  FALSE      160.06       72.37
## 12 AC/AH/050   Frances    White female  FALSE      166.48       67.34
## 13 AC/AH/052  Courtney    White   male   TRUE      175.39       92.22
## 14 AC/AH/053   Francis    White female   TRUE      164.70       75.69
## 15 AC/AH/057    Vernon    White female   TRUE      163.79       65.76
## 16 AC/AH/061    Lester    Black   male  FALSE      181.13       72.33
## 17 AC/AH/063     Robin Hispanic   male  FALSE      169.24       73.30
## 18 AC/AH/076    Albert    White   male  FALSE      176.22       97.67
## 19 AC/AH/077     Tommy    Black   male  FALSE      174.09       72.20
## 20 AC/AH/086      Kyle    Black   male   TRUE      180.11       75.72
##     BirthDate          State   Pet HealthGrade  Died RecordDate
## 1  31-01-1972    Georgia,xxx   dog           2 FALSE 25-11-2015
## 2  09-06-1972       Missouri   dog           2 FALSE 25-11-2015
## 3  03-07-1972   Pennsylvania  <NA>           2 FALSE 25-11-2015
## 4  11-08-1972        Florida   cat           1 FALSE 25-11-2015
## 5  06-06-1973           Iowa  <NA>           2  TRUE 25-11-2015
## 6  25-06-1973       Maryland   dog           2 FALSE 25-11-2015
## 7  20-03-1972   Pennsylvania  <NA>           1 FALSE 25-11-2015
## 8  05-05-1973 North Carolina  <NA>           1 FALSE 25-11-2015
## 9  25-12-1971      Louisiana   dog           1 FALSE 25-11-2015
## 10 13-07-1973 North Carolina  <NA>           2 FALSE 25-12-2015
## 11 28-04-1972     California horse           2  TRUE 25-12-2015
## 12 08-11-1971       Michigan  <NA>           1 FALSE 25-12-2015
## 13 16-03-1972        Indiana  bird           3 FALSE 25-12-2015
## 14 16-11-1971       Virginia   dog           1 FALSE 25-12-2015
## 15 06-01-1972       Illinois   cat           3 FALSE 25-12-2015
## 16 16-11-1972      Wisconsin   dog          99  TRUE 25-12-2015
## 17 16-11-1971       Illinois  <NA>           3 FALSE 25-12-2015
## 18 08-04-1973      Louisiana   cat           2 FALSE 25-12-2015
## 19 01-02-1973     Washington   cat           3 FALSE 25-12-2015
## 20 12-05-1973        Georgia   cat           3 FALSE 25-12-2015
##    BodyMassIndex   BMILabel
## 1       22.89674     Normal
## 2       25.06859 Overweight
## 3       26.38080 Overweight
## 4       30.63867      Obese
## 5       26.53567 Overweight
## 6       27.90487 Overweight
## 7       26.33526 Overweight
## 8       25.61184 Overweight
## 9       23.39025     Normal
## 10      28.22290 Overweight
## 11      28.24834 Overweight
## 12      24.29679     Normal
## 13      29.97888 Overweight
## 14      27.90303 Overweight
## 15      24.51247     Normal
## 16      22.04640     Normal
## 17      25.59163 Overweight
## 18      31.45218      Obese
## 19      23.82262     Normal
## 20      23.34183     Normal

changing character values YES/NO in Smokes column to boolean values TRUE/ FALSE.

summarise(group_by(dfrPatient,Smokes),n())
## # A tibble: 4 × 2
##   Smokes `n()`
##    <chr> <int>
## 1  FALSE    72
## 2     No     6
## 3   TRUE    18
## 4    Yes     4
class(dfrPatient$Smokes)
## [1] "character"
dfrPatient$Smokes <- as.logical(dfrPatient$Smokes)
class(dfrPatient$Smokes)
## [1] "logical"
summarise(group_by(dfrPatient,Smokes),n())
## # A tibble: 3 × 2
##   Smokes `n()`
##    <lgl> <int>
## 1  FALSE    72
## 2   TRUE    18
## 3     NA    10
head(dfrPatient,20)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/001 Demetrius    White   male  FALSE      182.87       76.57
## 2  AC/AH/017   Rosario    White   male  FALSE      179.12       80.43
## 3  AC/AH/020     Julio    Black   male  FALSE      169.15       75.48
## 4  AC/AH/022      Lupe    White   male  FALSE      175.66       94.54
## 5  AC/AH/029    Lavern    White female  FALSE      164.47       71.78
## 6  AC/AH/033    Bernie      Dog female   TRUE      158.27       69.90
## 7  AC/AH/037    Samuel    White female  FALSE      161.69       68.85
## 8  AC/AH/044     Clair    White female     NA      165.84       70.44
## 9  AC/AH/045   Shirley    White   male  FALSE      181.32       76.90
## 10 AC/AH/048     Merle Hispanic   male  FALSE      167.37       79.06
## 11 AC/AH/049    Martin    White female  FALSE      160.06       72.37
## 12 AC/AH/050   Frances    White female  FALSE      166.48       67.34
## 13 AC/AH/052  Courtney    White   male   TRUE      175.39       92.22
## 14 AC/AH/053   Francis    White female   TRUE      164.70       75.69
## 15 AC/AH/057    Vernon    White female   TRUE      163.79       65.76
## 16 AC/AH/061    Lester    Black   male  FALSE      181.13       72.33
## 17 AC/AH/063     Robin Hispanic   male  FALSE      169.24       73.30
## 18 AC/AH/076    Albert    White   male  FALSE      176.22       97.67
## 19 AC/AH/077     Tommy    Black   male  FALSE      174.09       72.20
## 20 AC/AH/086      Kyle    Black   male   TRUE      180.11       75.72
##     BirthDate          State   Pet HealthGrade  Died RecordDate
## 1  31-01-1972    Georgia,xxx   dog           2 FALSE 25-11-2015
## 2  09-06-1972       Missouri   dog           2 FALSE 25-11-2015
## 3  03-07-1972   Pennsylvania  <NA>           2 FALSE 25-11-2015
## 4  11-08-1972        Florida   cat           1 FALSE 25-11-2015
## 5  06-06-1973           Iowa  <NA>           2  TRUE 25-11-2015
## 6  25-06-1973       Maryland   dog           2 FALSE 25-11-2015
## 7  20-03-1972   Pennsylvania  <NA>           1 FALSE 25-11-2015
## 8  05-05-1973 North Carolina  <NA>           1 FALSE 25-11-2015
## 9  25-12-1971      Louisiana   dog           1 FALSE 25-11-2015
## 10 13-07-1973 North Carolina  <NA>           2 FALSE 25-12-2015
## 11 28-04-1972     California horse           2  TRUE 25-12-2015
## 12 08-11-1971       Michigan  <NA>           1 FALSE 25-12-2015
## 13 16-03-1972        Indiana  bird           3 FALSE 25-12-2015
## 14 16-11-1971       Virginia   dog           1 FALSE 25-12-2015
## 15 06-01-1972       Illinois   cat           3 FALSE 25-12-2015
## 16 16-11-1972      Wisconsin   dog          99  TRUE 25-12-2015
## 17 16-11-1971       Illinois  <NA>           3 FALSE 25-12-2015
## 18 08-04-1973      Louisiana   cat           2 FALSE 25-12-2015
## 19 01-02-1973     Washington   cat           3 FALSE 25-12-2015
## 20 12-05-1973        Georgia   cat           3 FALSE 25-12-2015
##    BodyMassIndex   BMILabel
## 1       22.89674     Normal
## 2       25.06859 Overweight
## 3       26.38080 Overweight
## 4       30.63867      Obese
## 5       26.53567 Overweight
## 6       27.90487 Overweight
## 7       26.33526 Overweight
## 8       25.61184 Overweight
## 9       23.39025     Normal
## 10      28.22290 Overweight
## 11      28.24834 Overweight
## 12      24.29679     Normal
## 13      29.97888 Overweight
## 14      27.90303 Overweight
## 15      24.51247     Normal
## 16      22.04640     Normal
## 17      25.59163 Overweight
## 18      31.45218      Obese
## 19      23.82262     Normal
## 20      23.34183     Normal

handling error in health_grade column. changing value 99 to NA & 1,2,3 to good ,normal and bad respectively

cat("\014")

summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <int> <int>
## 1           1    29
## 2           2    30
## 3           3    34
## 4          99     7
class(dfrPatient$HealthGrade)
## [1] "integer"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==1] <- "GOOD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==2] <- "NORMAL"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==3] <- "BAD"
dfrPatient$HealthGrade[dfrPatient$HealthGrade==99] <- NA
class(dfrPatient$HealthGrade)
## [1] "character"
summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 4 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1         BAD    34
## 2        GOOD    29
## 3      NORMAL    30
## 4        <NA>     7
head(dfrPatient,20)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/001 Demetrius    White   male  FALSE      182.87       76.57
## 2  AC/AH/017   Rosario    White   male  FALSE      179.12       80.43
## 3  AC/AH/020     Julio    Black   male  FALSE      169.15       75.48
## 4  AC/AH/022      Lupe    White   male  FALSE      175.66       94.54
## 5  AC/AH/029    Lavern    White female  FALSE      164.47       71.78
## 6  AC/AH/033    Bernie      Dog female   TRUE      158.27       69.90
## 7  AC/AH/037    Samuel    White female  FALSE      161.69       68.85
## 8  AC/AH/044     Clair    White female     NA      165.84       70.44
## 9  AC/AH/045   Shirley    White   male  FALSE      181.32       76.90
## 10 AC/AH/048     Merle Hispanic   male  FALSE      167.37       79.06
## 11 AC/AH/049    Martin    White female  FALSE      160.06       72.37
## 12 AC/AH/050   Frances    White female  FALSE      166.48       67.34
## 13 AC/AH/052  Courtney    White   male   TRUE      175.39       92.22
## 14 AC/AH/053   Francis    White female   TRUE      164.70       75.69
## 15 AC/AH/057    Vernon    White female   TRUE      163.79       65.76
## 16 AC/AH/061    Lester    Black   male  FALSE      181.13       72.33
## 17 AC/AH/063     Robin Hispanic   male  FALSE      169.24       73.30
## 18 AC/AH/076    Albert    White   male  FALSE      176.22       97.67
## 19 AC/AH/077     Tommy    Black   male  FALSE      174.09       72.20
## 20 AC/AH/086      Kyle    Black   male   TRUE      180.11       75.72
##     BirthDate          State   Pet HealthGrade  Died RecordDate
## 1  31-01-1972    Georgia,xxx   dog      NORMAL FALSE 25-11-2015
## 2  09-06-1972       Missouri   dog      NORMAL FALSE 25-11-2015
## 3  03-07-1972   Pennsylvania  <NA>      NORMAL FALSE 25-11-2015
## 4  11-08-1972        Florida   cat        GOOD FALSE 25-11-2015
## 5  06-06-1973           Iowa  <NA>      NORMAL  TRUE 25-11-2015
## 6  25-06-1973       Maryland   dog      NORMAL FALSE 25-11-2015
## 7  20-03-1972   Pennsylvania  <NA>        GOOD FALSE 25-11-2015
## 8  05-05-1973 North Carolina  <NA>        GOOD FALSE 25-11-2015
## 9  25-12-1971      Louisiana   dog        GOOD FALSE 25-11-2015
## 10 13-07-1973 North Carolina  <NA>      NORMAL FALSE 25-12-2015
## 11 28-04-1972     California horse      NORMAL  TRUE 25-12-2015
## 12 08-11-1971       Michigan  <NA>        GOOD FALSE 25-12-2015
## 13 16-03-1972        Indiana  bird         BAD FALSE 25-12-2015
## 14 16-11-1971       Virginia   dog        GOOD FALSE 25-12-2015
## 15 06-01-1972       Illinois   cat         BAD FALSE 25-12-2015
## 16 16-11-1972      Wisconsin   dog        <NA>  TRUE 25-12-2015
## 17 16-11-1971       Illinois  <NA>         BAD FALSE 25-12-2015
## 18 08-04-1973      Louisiana   cat      NORMAL FALSE 25-12-2015
## 19 01-02-1973     Washington   cat         BAD FALSE 25-12-2015
## 20 12-05-1973        Georgia   cat         BAD FALSE 25-12-2015
##    BodyMassIndex   BMILabel
## 1       22.89674     Normal
## 2       25.06859 Overweight
## 3       26.38080 Overweight
## 4       30.63867      Obese
## 5       26.53567 Overweight
## 6       27.90487 Overweight
## 7       26.33526 Overweight
## 8       25.61184 Overweight
## 9       23.39025     Normal
## 10      28.22290 Overweight
## 11      28.24834 Overweight
## 12      24.29679     Normal
## 13      29.97888 Overweight
## 14      27.90303 Overweight
## 15      24.51247     Normal
## 16      22.04640     Normal
## 17      25.59163 Overweight
## 18      31.45218      Obese
## 19      23.82262     Normal
## 20      23.34183     Normal

handling error in State. changing ‘Georgia,xxx’ to ‘Georgia’

summarise(group_by(dfrPatient, State), n())
## # A tibble: 34 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     3
## 8  Georgia,xxx     1
## 9       Hawaii     2
## 10    Illinois     4
## # ... with 24 more rows
dfrPatient$State[dfrPatient$State=="Georgia,xxx"] <- "Georgia"
summarise(group_by(dfrPatient, State), n())
## # A tibble: 33 × 2
##          State `n()`
##          <chr> <int>
## 1      Alabama     2
## 2      Arizona     2
## 3   California    13
## 4     Colorado     1
## 5  Connecticut     1
## 6      Florida     8
## 7      Georgia     4
## 8       Hawaii     2
## 9     Illinois     4
## 10     Indiana     4
## # ... with 23 more rows
head(dfrPatient,8)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius White   male  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario White   male  FALSE      179.12       80.43
## 3 AC/AH/020     Julio Black   male  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe White   male  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern White female  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie   Dog female   TRUE      158.27       69.90
## 7 AC/AH/037    Samuel White female  FALSE      161.69       68.85
## 8 AC/AH/044     Clair White female     NA      165.84       70.44
##    BirthDate          State  Pet HealthGrade  Died RecordDate
## 1 31-01-1972        Georgia  dog      NORMAL FALSE 25-11-2015
## 2 09-06-1972       Missouri  dog      NORMAL FALSE 25-11-2015
## 3 03-07-1972   Pennsylvania <NA>      NORMAL FALSE 25-11-2015
## 4 11-08-1972        Florida  cat        GOOD FALSE 25-11-2015
## 5 06-06-1973           Iowa <NA>      NORMAL  TRUE 25-11-2015
## 6 25-06-1973       Maryland  dog      NORMAL FALSE 25-11-2015
## 7 20-03-1972   Pennsylvania <NA>        GOOD FALSE 25-11-2015
## 8 05-05-1973 North Carolina <NA>        GOOD FALSE 25-11-2015
##   BodyMassIndex   BMILabel
## 1      22.89674     Normal
## 2      25.06859 Overweight
## 3      26.38080 Overweight
## 4      30.63867      Obese
## 5      26.53567 Overweight
## 6      27.90487 Overweight
## 7      26.33526 Overweight
## 8      25.61184 Overweight

handling error in Race

summarise(group_by(dfrPatient, Race), n())
## # A tibble: 6 × 2
##        Race `n()`
##       <chr> <int>
## 1     Asian     5
## 2 Bi-Racial     1
## 3     Black     8
## 4       Dog     1
## 5  Hispanic    17
## 6     White    68
dfrPatient$Race <- trimws(tolower(dfrPatient$Race))
dfrPatient$Race[dfrPatient$Race=="dog"] <- NA
dfrPatient$Race[dfrPatient$Race=="bi-racial"] <- NA
summarise(group_by(dfrPatient, Race), n())
## # A tibble: 5 × 2
##       Race `n()`
##      <chr> <int>
## 1    asian     5
## 2    black     8
## 3 hispanic    17
## 4    white    68
## 5     <NA>     2
head(dfrPatient,8)
##          ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/001 Demetrius white   male  FALSE      182.87       76.57
## 2 AC/AH/017   Rosario white   male  FALSE      179.12       80.43
## 3 AC/AH/020     Julio black   male  FALSE      169.15       75.48
## 4 AC/AH/022      Lupe white   male  FALSE      175.66       94.54
## 5 AC/AH/029    Lavern white female  FALSE      164.47       71.78
## 6 AC/AH/033    Bernie  <NA> female   TRUE      158.27       69.90
## 7 AC/AH/037    Samuel white female  FALSE      161.69       68.85
## 8 AC/AH/044     Clair white female     NA      165.84       70.44
##    BirthDate          State  Pet HealthGrade  Died RecordDate
## 1 31-01-1972        Georgia  dog      NORMAL FALSE 25-11-2015
## 2 09-06-1972       Missouri  dog      NORMAL FALSE 25-11-2015
## 3 03-07-1972   Pennsylvania <NA>      NORMAL FALSE 25-11-2015
## 4 11-08-1972        Florida  cat        GOOD FALSE 25-11-2015
## 5 06-06-1973           Iowa <NA>      NORMAL  TRUE 25-11-2015
## 6 25-06-1973       Maryland  dog      NORMAL FALSE 25-11-2015
## 7 20-03-1972   Pennsylvania <NA>        GOOD FALSE 25-11-2015
## 8 05-05-1973 North Carolina <NA>        GOOD FALSE 25-11-2015
##   BodyMassIndex   BMILabel
## 1      22.89674     Normal
## 2      25.06859 Overweight
## 3      26.38080 Overweight
## 4      30.63867      Obese
## 5      26.53567 Overweight
## 6      27.90487 Overweight
## 7      26.33526 Overweight
## 8      25.61184 Overweight

removing all NA values from the dataset

dfrPatient <- na.omit(dfrPatient)
head(dfrPatient,8)
##           ID      Name  Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/001 Demetrius white   male  FALSE      182.87       76.57
## 2  AC/AH/017   Rosario white   male  FALSE      179.12       80.43
## 4  AC/AH/022      Lupe white   male  FALSE      175.66       94.54
## 9  AC/AH/045   Shirley white   male  FALSE      181.32       76.90
## 11 AC/AH/049    Martin white female  FALSE      160.06       72.37
## 13 AC/AH/052  Courtney white   male   TRUE      175.39       92.22
## 14 AC/AH/053   Francis white female   TRUE      164.70       75.69
## 15 AC/AH/057    Vernon white female   TRUE      163.79       65.76
##     BirthDate      State   Pet HealthGrade  Died RecordDate BodyMassIndex
## 1  31-01-1972    Georgia   dog      NORMAL FALSE 25-11-2015      22.89674
## 2  09-06-1972   Missouri   dog      NORMAL FALSE 25-11-2015      25.06859
## 4  11-08-1972    Florida   cat        GOOD FALSE 25-11-2015      30.63867
## 9  25-12-1971  Louisiana   dog        GOOD FALSE 25-11-2015      23.39025
## 11 28-04-1972 California horse      NORMAL  TRUE 25-12-2015      28.24834
## 13 16-03-1972    Indiana  bird         BAD FALSE 25-12-2015      29.97888
## 14 16-11-1971   Virginia   dog        GOOD FALSE 25-12-2015      27.90303
## 15 06-01-1972   Illinois   cat         BAD FALSE 25-12-2015      24.51247
##      BMILabel
## 1      Normal
## 2  Overweight
## 4       Obese
## 9      Normal
## 11 Overweight
## 13 Overweight
## 14 Overweight
## 15     Normal

summarizing to ensure there are no more errors present in the dataset

cat("\014")

summarise(group_by(dfrPatient, BMILabel), n())
## # A tibble: 3 × 2
##     BMILabel `n()`
##        <chr> <int>
## 1     Normal    12
## 2      Obese     5
## 3 Overweight    43
cat("\014")

summarise(group_by(dfrPatient, Gender), n())
## # A tibble: 2 × 2
##   Gender `n()`
##    <chr> <int>
## 1 female    34
## 2   male    26
cat("\014")

summarise(group_by(dfrPatient, Race), n())
## # A tibble: 4 × 2
##       Race `n()`
##      <chr> <int>
## 1    asian     4
## 2    black     3
## 3 hispanic     9
## 4    white    44
cat("\014")

summarise(group_by(dfrPatient, Died), n())
## # A tibble: 2 × 2
##    Died `n()`
##   <lgl> <int>
## 1 FALSE    27
## 2  TRUE    33
cat("\014")

summarise(group_by(dfrPatient, Pet), n())
## # A tibble: 4 × 2
##     Pet `n()`
##   <chr> <int>
## 1  bird     8
## 2   cat    28
## 3   dog    23
## 4 horse     1
cat("\014")

summarise(group_by(dfrPatient, Smokes), n())
## # A tibble: 2 × 2
##   Smokes `n()`
##    <lgl> <int>
## 1  FALSE    50
## 2   TRUE    10
cat("\014")

summarise(group_by(dfrPatient, HealthGrade), n())
## # A tibble: 3 × 2
##   HealthGrade `n()`
##         <chr> <int>
## 1         BAD    24
## 2        GOOD    19
## 3      NORMAL    17
cat("\014")

summarise(group_by(dfrPatient, State), n())
## # A tibble: 28 × 2
##         State `n()`
##         <chr> <int>
## 1     Alabama     1
## 2     Arizona     2
## 3  California     7
## 4     Florida     5
## 5     Georgia     3
## 6      Hawaii     1
## 7    Illinois     3
## 8     Indiana     3
## 9        Iowa     1
## 10     Kansas     1
## # ... with 18 more rows

reporting data

1. displaying top 10 records based on BMI

head(arrange(dfrPatient,desc(BodyMassIndex)),10)
##           ID     Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/009    Sammy    white   male  FALSE      166.84       88.25
## 2  AC/SG/064      Jon    white   male  FALSE      169.16       90.08
## 3  AC/AH/076   Albert    white   male  FALSE      176.22       97.67
## 4  AC/AH/022     Lupe    white   male  FALSE      175.66       94.54
## 5  AC/AH/248   Andrea    white   male  FALSE      178.64       97.05
## 6  AC/SG/067   Thomas    white   male  FALSE      167.51       84.15
## 7  AC/AH/052 Courtney    white   male   TRUE      175.39       92.22
## 8  AC/AH/127     Jame    white   male  FALSE      167.75       82.06
## 9  AC/SG/055     Evan    white   male  FALSE      166.75       79.06
## 10 AC/SG/181    Terry hispanic   male  FALSE      177.14       88.70
##     BirthDate        State  Pet HealthGrade  Died RecordDate BodyMassIndex
## 1  04-03-1972      Vermont  dog        GOOD FALSE 25-06-2016      31.70402
## 2  04-10-1972     Illinois  cat      NORMAL  TRUE 25-07-2016      31.47988
## 3  08-04-1973    Louisiana  cat      NORMAL FALSE 25-12-2015      31.45218
## 4  11-08-1972      Florida  cat        GOOD FALSE 25-11-2015      30.63867
## 5  12-01-1973      Indiana  cat        GOOD  TRUE 25-05-2016      30.41152
## 6  19-07-1972 Pennsylvania bird      NORMAL  TRUE 25-07-2016      29.98974
## 7  16-03-1972      Indiana bird         BAD FALSE 25-12-2015      29.97888
## 8  29-10-1972        Texas  dog        GOOD  TRUE 25-01-2016      29.16127
## 9  24-02-1972     Illinois bird         BAD  TRUE 25-07-2016      28.43316
## 10 24-11-1971      Indiana  cat         BAD  TRUE 25-09-2016      28.26769
##      BMILabel
## 1       Obese
## 2       Obese
## 3       Obese
## 4       Obese
## 5       Obese
## 6  Overweight
## 7  Overweight
## 8  Overweight
## 9  Overweight
## 10 Overweight

2. displaying bottom 10 records based on BMI

head(arrange(dfrPatient, BodyMassIndex),10)
##           ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/SG/193    Ronnie    white   male   TRUE      185.43       73.63
## 2  AC/SG/099    Leslie    asian   male  FALSE      172.72       67.62
## 3  AC/AH/001 Demetrius    white   male  FALSE      182.87       76.57
## 4  AC/AH/086      Kyle    black   male   TRUE      180.11       75.72
## 5  AC/AH/045   Shirley    white   male  FALSE      181.32       76.90
## 6  AC/AH/114      Kris hispanic   male  FALSE      177.75       74.84
## 7  AC/AH/077     Tommy    black   male  FALSE      174.09       72.20
## 8  AC/AH/150     Brett    white   male   TRUE      181.56       79.54
## 9  AC/AH/057    Vernon    white female   TRUE      163.79       65.76
## 10 AC/AH/207    Bobbie    white female  FALSE      163.01       65.19
##     BirthDate        State  Pet HealthGrade  Died RecordDate BodyMassIndex
## 1  05-06-1973         Iowa  dog         BAD FALSE 25-09-2016      21.41385
## 2  04-02-1972         Ohio  cat        GOOD FALSE 25-07-2016      22.66678
## 3  31-01-1972      Georgia  dog      NORMAL FALSE 25-11-2015      22.89674
## 4  12-05-1973      Georgia  cat         BAD FALSE 25-12-2015      23.34183
## 5  25-12-1971    Louisiana  dog        GOOD FALSE 25-11-2015      23.39025
## 6  19-11-1972 Pennsylvania bird         BAD FALSE 25-01-2016      23.68725
## 7  01-02-1973   Washington  cat         BAD FALSE 25-12-2015      23.82262
## 8  03-05-1972     Kentucky  dog        GOOD  TRUE 25-02-2016      24.12933
## 9  06-01-1972     Illinois  cat         BAD FALSE 25-12-2015      24.51247
## 10 17-05-1973      Florida  dog      NORMAL FALSE 25-03-2016      24.53310
##    BMILabel
## 1    Normal
## 2    Normal
## 3    Normal
## 4    Normal
## 5    Normal
## 6    Normal
## 7    Normal
## 8    Normal
## 9    Normal
## 10   Normal

3.providing frequency/counts of gender > race

summarise(group_by(dfrPatient,Gender,Race), n())
## Source: local data frame [8 x 3]
## Groups: Gender [?]
## 
##   Gender     Race `n()`
##    <chr>    <chr> <int>
## 1 female    asian     2
## 2 female    black     1
## 3 female hispanic     4
## 4 female    white    27
## 5   male    asian     2
## 6   male    black     2
## 7   male hispanic     5
## 8   male    white    17

4.Providing max, min and average values for BMI-Values as per Race > Gender.

summarise(group_by(dfrPatient, Race, Gender), min(BodyMassIndex),max(BodyMassIndex), mean(BodyMassIndex))
## Source: local data frame [8 x 5]
## Groups: Race [?]
## 
##       Race Gender `min(BodyMassIndex)` `max(BodyMassIndex)`
##      <chr>  <chr>                <dbl>                <dbl>
## 1    asian female             25.57631             28.19431
## 2    asian   male             22.66678             27.24885
## 3    black female             26.71407             26.71407
## 4    black   male             23.34183             23.82262
## 5 hispanic female             25.03916             26.89942
## 6 hispanic   male             23.68725             28.26769
## 7    white female             24.51247             28.24834
## 8    white   male             21.41385             31.70402
## # ... with 1 more variables: `mean(BodyMassIndex)` <dbl>

5.Displaying records for all people who are dead

filter(dfrPatient, Died==TRUE)
##           ID        Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/049      Martin    white female  FALSE      160.06       72.37
## 2  AC/AH/127        Jame    white   male  FALSE      167.75       82.06
## 3  AC/AH/133       Clyde hispanic   male  FALSE      181.15       83.93
## 4  AC/AH/150       Brett    white   male   TRUE      181.56       79.54
## 5  AC/AH/154        Tony    white female  FALSE      160.03       64.30
## 6  AC/AH/156      George    white   male  FALSE      165.62       76.72
## 7  AC/AH/160        Rory    asian female  FALSE      159.67       71.88
## 8  AC/AH/176       Jerry    asian   male  FALSE      175.21       83.65
## 9  AC/AH/180        Drew    white female  FALSE      160.80       64.77
## 10 AC/AH/186 Christopher    white female  FALSE      157.95       67.41
## 11 AC/AH/211         Son    white female  FALSE      157.16       69.64
## 12 AC/AH/219         Jay    white female  FALSE      163.47       72.89
## 13 AC/AH/233      Marion    white female  FALSE      163.97       66.71
## 14 AC/AH/248      Andrea    white   male  FALSE      178.64       97.05
## 15 AC/AH/249       Jesus hispanic female   TRUE      159.78       68.31
## 16 AC/SG/010        Theo    asian female  FALSE      159.32       64.92
## 17 AC/SG/016      Jimmie    black female  FALSE      161.84       69.97
## 18 AC/SG/046        Carl hispanic   male  FALSE      171.41       81.70
## 19 AC/SG/055        Evan    white   male  FALSE      166.75       79.06
## 20 AC/SG/064         Jon    white   male  FALSE      169.16       90.08
## 21 AC/SG/065      Shayne    white female  FALSE      157.01       66.56
## 22 AC/SG/067      Thomas    white   male  FALSE      167.51       84.15
## 23 AC/SG/068   Valentine hispanic female  FALSE      160.47       68.20
## 24 AC/SG/084       Brian hispanic   male  FALSE      174.25       80.93
## 25 AC/SG/101       Jason    white female  FALSE      159.23       69.96
## 26 AC/SG/123     Darnell    white female   TRUE      162.32       72.72
## 27 AC/SG/134       Daryl    white female   TRUE      162.59       69.76
## 28 AC/SG/155     Raymond    white female  FALSE      158.35       69.72
## 29 AC/SG/165       Elmer    white female  FALSE      162.18       67.81
## 30 AC/SG/179       Logan    white   male  FALSE      183.10       82.47
## 31 AC/SG/181       Terry hispanic   male  FALSE      177.14       88.70
## 32 AC/SG/197       Stacy    white female  FALSE      159.44       66.21
## 33 AC/SG/234        Luis hispanic female  FALSE      164.88       68.07
##     BirthDate          State   Pet HealthGrade Died RecordDate
## 1  28-04-1972     California horse      NORMAL TRUE 25-12-2015
## 2  29-10-1972          Texas   dog        GOOD TRUE 25-01-2016
## 3  13-10-1973     Washington   cat         BAD TRUE 25-02-2016
## 4  03-05-1972       Kentucky   dog        GOOD TRUE 25-02-2016
## 5  30-08-1973     California   dog        GOOD TRUE 25-02-2016
## 6  09-07-1972     California   dog        GOOD TRUE 25-02-2016
## 7  22-09-1973        Florida   cat      NORMAL TRUE 25-02-2016
## 8  01-05-1973       Virginia   dog         BAD TRUE 25-03-2016
## 9  18-02-1973         Oregon   cat        GOOD TRUE 25-03-2016
## 10 06-05-1972     New Jersey   dog         BAD TRUE 25-03-2016
## 11 14-07-1973     California   cat      NORMAL TRUE 25-04-2016
## 12 07-04-1972 North Carolina  bird        GOOD TRUE 25-04-2016
## 13 23-12-1971           Ohio   cat         BAD TRUE 25-04-2016
## 14 12-01-1973        Indiana   cat        GOOD TRUE 25-05-2016
## 15 23-04-1972        Alabama   cat      NORMAL TRUE 25-05-2016
## 16 29-01-1973       New York   cat      NORMAL TRUE 25-06-2016
## 17 03-04-1972        Arizona   cat         BAD TRUE 25-06-2016
## 18 05-08-1973    Mississippi  bird      NORMAL TRUE 25-06-2016
## 19 24-02-1972       Illinois  bird         BAD TRUE 25-07-2016
## 20 04-10-1972       Illinois   cat      NORMAL TRUE 25-07-2016
## 21 05-04-1972     California   dog         BAD TRUE 25-07-2016
## 22 19-07-1972   Pennsylvania  bird      NORMAL TRUE 25-07-2016
## 23 15-04-1972      Tennessee   cat         BAD TRUE 25-07-2016
## 24 06-03-1972       Virginia   dog      NORMAL TRUE 25-07-2016
## 25 28-09-1973       Michigan   dog      NORMAL TRUE 25-07-2016
## 26 03-09-1972 North Carolina  bird        GOOD TRUE 25-08-2016
## 27 28-05-1972          Texas   cat      NORMAL TRUE 25-08-2016
## 28 02-06-1972     California   cat         BAD TRUE 25-08-2016
## 29 25-03-1972     Washington  bird        GOOD TRUE 25-08-2016
## 30 24-10-1972           Ohio   dog         BAD TRUE 25-09-2016
## 31 24-11-1971        Indiana   cat         BAD TRUE 25-09-2016
## 32 08-11-1972       New York   cat        GOOD TRUE 25-10-2016
## 33 10-11-1971   Pennsylvania   cat         BAD TRUE 25-10-2016
##    BodyMassIndex   BMILabel
## 1       28.24834 Overweight
## 2       29.16127 Overweight
## 3       25.57647 Overweight
## 4       24.12933     Normal
## 5       25.10777 Overweight
## 6       27.96939 Overweight
## 7       28.19431 Overweight
## 8       27.24885 Overweight
## 9       25.04966 Overweight
## 10      27.01998 Overweight
## 11      28.19517 Overweight
## 12      27.27670 Overweight
## 13      24.81202     Normal
## 14      30.41152      Obese
## 15      26.75713 Overweight
## 16      25.57631 Overweight
## 17      26.71407 Overweight
## 18      27.80672 Overweight
## 19      28.43316 Overweight
## 20      31.47988      Obese
## 21      26.99968 Overweight
## 22      29.98974 Overweight
## 23      26.48480 Overweight
## 24      26.65410 Overweight
## 25      27.59307 Overweight
## 26      27.60005 Overweight
## 27      26.38875 Overweight
## 28      27.80489 Overweight
## 29      25.78096 Overweight
## 30      24.59910     Normal
## 31      28.26769 Overweight
## 32      26.04528 Overweight
## 33      25.03916 Overweight
nrow(filter(dfrPatient, Died==TRUE))
## [1] 33

for analysis purpose

filter(dfrPatient, Died==TRUE & BMILabel=="Overweight")
##           ID        Name     Race Gender Smokes HeightInCms WeightInKgs
## 1  AC/AH/049      Martin    white female  FALSE      160.06       72.37
## 2  AC/AH/127        Jame    white   male  FALSE      167.75       82.06
## 3  AC/AH/133       Clyde hispanic   male  FALSE      181.15       83.93
## 4  AC/AH/154        Tony    white female  FALSE      160.03       64.30
## 5  AC/AH/156      George    white   male  FALSE      165.62       76.72
## 6  AC/AH/160        Rory    asian female  FALSE      159.67       71.88
## 7  AC/AH/176       Jerry    asian   male  FALSE      175.21       83.65
## 8  AC/AH/180        Drew    white female  FALSE      160.80       64.77
## 9  AC/AH/186 Christopher    white female  FALSE      157.95       67.41
## 10 AC/AH/211         Son    white female  FALSE      157.16       69.64
## 11 AC/AH/219         Jay    white female  FALSE      163.47       72.89
## 12 AC/AH/249       Jesus hispanic female   TRUE      159.78       68.31
## 13 AC/SG/010        Theo    asian female  FALSE      159.32       64.92
## 14 AC/SG/016      Jimmie    black female  FALSE      161.84       69.97
## 15 AC/SG/046        Carl hispanic   male  FALSE      171.41       81.70
## 16 AC/SG/055        Evan    white   male  FALSE      166.75       79.06
## 17 AC/SG/065      Shayne    white female  FALSE      157.01       66.56
## 18 AC/SG/067      Thomas    white   male  FALSE      167.51       84.15
## 19 AC/SG/068   Valentine hispanic female  FALSE      160.47       68.20
## 20 AC/SG/084       Brian hispanic   male  FALSE      174.25       80.93
## 21 AC/SG/101       Jason    white female  FALSE      159.23       69.96
## 22 AC/SG/123     Darnell    white female   TRUE      162.32       72.72
## 23 AC/SG/134       Daryl    white female   TRUE      162.59       69.76
## 24 AC/SG/155     Raymond    white female  FALSE      158.35       69.72
## 25 AC/SG/165       Elmer    white female  FALSE      162.18       67.81
## 26 AC/SG/181       Terry hispanic   male  FALSE      177.14       88.70
## 27 AC/SG/197       Stacy    white female  FALSE      159.44       66.21
## 28 AC/SG/234        Luis hispanic female  FALSE      164.88       68.07
##     BirthDate          State   Pet HealthGrade Died RecordDate
## 1  28-04-1972     California horse      NORMAL TRUE 25-12-2015
## 2  29-10-1972          Texas   dog        GOOD TRUE 25-01-2016
## 3  13-10-1973     Washington   cat         BAD TRUE 25-02-2016
## 4  30-08-1973     California   dog        GOOD TRUE 25-02-2016
## 5  09-07-1972     California   dog        GOOD TRUE 25-02-2016
## 6  22-09-1973        Florida   cat      NORMAL TRUE 25-02-2016
## 7  01-05-1973       Virginia   dog         BAD TRUE 25-03-2016
## 8  18-02-1973         Oregon   cat        GOOD TRUE 25-03-2016
## 9  06-05-1972     New Jersey   dog         BAD TRUE 25-03-2016
## 10 14-07-1973     California   cat      NORMAL TRUE 25-04-2016
## 11 07-04-1972 North Carolina  bird        GOOD TRUE 25-04-2016
## 12 23-04-1972        Alabama   cat      NORMAL TRUE 25-05-2016
## 13 29-01-1973       New York   cat      NORMAL TRUE 25-06-2016
## 14 03-04-1972        Arizona   cat         BAD TRUE 25-06-2016
## 15 05-08-1973    Mississippi  bird      NORMAL TRUE 25-06-2016
## 16 24-02-1972       Illinois  bird         BAD TRUE 25-07-2016
## 17 05-04-1972     California   dog         BAD TRUE 25-07-2016
## 18 19-07-1972   Pennsylvania  bird      NORMAL TRUE 25-07-2016
## 19 15-04-1972      Tennessee   cat         BAD TRUE 25-07-2016
## 20 06-03-1972       Virginia   dog      NORMAL TRUE 25-07-2016
## 21 28-09-1973       Michigan   dog      NORMAL TRUE 25-07-2016
## 22 03-09-1972 North Carolina  bird        GOOD TRUE 25-08-2016
## 23 28-05-1972          Texas   cat      NORMAL TRUE 25-08-2016
## 24 02-06-1972     California   cat         BAD TRUE 25-08-2016
## 25 25-03-1972     Washington  bird        GOOD TRUE 25-08-2016
## 26 24-11-1971        Indiana   cat         BAD TRUE 25-09-2016
## 27 08-11-1972       New York   cat        GOOD TRUE 25-10-2016
## 28 10-11-1971   Pennsylvania   cat         BAD TRUE 25-10-2016
##    BodyMassIndex   BMILabel
## 1       28.24834 Overweight
## 2       29.16127 Overweight
## 3       25.57647 Overweight
## 4       25.10777 Overweight
## 5       27.96939 Overweight
## 6       28.19431 Overweight
## 7       27.24885 Overweight
## 8       25.04966 Overweight
## 9       27.01998 Overweight
## 10      28.19517 Overweight
## 11      27.27670 Overweight
## 12      26.75713 Overweight
## 13      25.57631 Overweight
## 14      26.71407 Overweight
## 15      27.80672 Overweight
## 16      28.43316 Overweight
## 17      26.99968 Overweight
## 18      29.98974 Overweight
## 19      26.48480 Overweight
## 20      26.65410 Overweight
## 21      27.59307 Overweight
## 22      27.60005 Overweight
## 23      26.38875 Overweight
## 24      27.80489 Overweight
## 25      25.78096 Overweight
## 26      28.26769 Overweight
## 27      26.04528 Overweight
## 28      25.03916 Overweight
nrow(filter(dfrPatient, Died==TRUE & BMILabel=="Overweight"))
## [1] 28

6.Displaying all Records for “hispanic females”

filter(dfrPatient, Race == "hispanic" & Gender == "female")
##          ID      Name     Race Gender Smokes HeightInCms WeightInKgs
## 1 AC/AH/249     Jesus hispanic female   TRUE      159.78       68.31
## 2 AC/SG/068 Valentine hispanic female  FALSE      160.47       68.20
## 3 AC/SG/122    Michal hispanic female  FALSE      160.09       68.94
## 4 AC/SG/234      Luis hispanic female  FALSE      164.88       68.07
##    BirthDate          State Pet HealthGrade  Died RecordDate BodyMassIndex
## 1 23-04-1972        Alabama cat      NORMAL  TRUE 25-05-2016      26.75713
## 2 15-04-1972      Tennessee cat         BAD  TRUE 25-07-2016      26.48480
## 3 16-12-1971 South Carolina dog        GOOD FALSE 25-08-2016      26.89942
## 4 10-11-1971   Pennsylvania cat         BAD  TRUE 25-10-2016      25.03916
##     BMILabel
## 1 Overweight
## 2 Overweight
## 3 Overweight
## 4 Overweight
nrow(filter(dfrPatient, Race == "hispanic" & Gender == "female"))
## [1] 4

7.Providing 7 sample records from the Dataset using seed(707)

set.seed(707)
sample_n(dfrPatient, 10)
##           ID     Name     Race Gender Smokes HeightInCms WeightInKgs
## 13 AC/AH/052 Courtney    white   male   TRUE      175.39       92.22
## 48 AC/AH/219      Jay    white female  FALSE      163.47       72.89
## 30 AC/AH/150    Brett    white   male   TRUE      181.56       79.54
## 55 AC/AH/248   Andrea    white   male  FALSE      178.64       97.05
## 73 AC/SG/084    Brian hispanic   male  FALSE      174.25       80.93
## 67 AC/SG/064      Jon    white   male  FALSE      169.16       90.08
## 80 AC/SG/122   Michal hispanic female  FALSE      160.09       68.94
## 9  AC/AH/045  Shirley    white   male  FALSE      181.32       76.90
## 20 AC/AH/086     Kyle    black   male   TRUE      180.11       75.72
## 57 AC/SG/002      Jan    white female   TRUE      161.57       67.92
##     BirthDate          State  Pet HealthGrade  Died RecordDate
## 13 16-03-1972        Indiana bird         BAD FALSE 25-12-2015
## 48 07-04-1972 North Carolina bird        GOOD  TRUE 25-04-2016
## 30 03-05-1972       Kentucky  dog        GOOD  TRUE 25-02-2016
## 55 12-01-1973        Indiana  cat        GOOD  TRUE 25-05-2016
## 73 06-03-1972       Virginia  dog      NORMAL  TRUE 25-07-2016
## 67 04-10-1972       Illinois  cat      NORMAL  TRUE 25-07-2016
## 80 16-12-1971 South Carolina  dog        GOOD FALSE 25-08-2016
## 9  25-12-1971      Louisiana  dog        GOOD FALSE 25-11-2015
## 20 12-05-1973        Georgia  cat         BAD FALSE 25-12-2015
## 57 03-07-1973        Arizona  dog         BAD FALSE 25-05-2016
##    BodyMassIndex   BMILabel
## 13      29.97888 Overweight
## 48      27.27670 Overweight
## 30      24.12933     Normal
## 55      30.41152      Obese
## 73      26.65410 Overweight
## 67      31.47988      Obese
## 80      26.89942 Overweight
## 9       23.39025     Normal
## 20      23.34183     Normal
## 57      26.01814 Overweight

Summary of the analysis of patient-data

  1. Out of 100 rows in the original dataset , 60 rows are retained after removing the rows having ‘NA’ values in some column or the other.
  2. On the basis of BMI- Value, the patients with highest values are all males, the top 9 records being of ‘white’ race.
  3. Also, looking at the 10 lowest BMI-Values, only 1 person out of them has died.
  4. Thus, from the data we can conclude people with lesser BMI-Value have a lower chance of dying
  5. Maximum count in data is that of people belonging to white race with “female-white” being 27 & “male-white” being 17 out of filtered 60 records
  6. Another observtion is that , out of 33 people who have died, 28 were ‘Overweight’.
  7. Also, all hispanic females are found to be ‘Overweight’