Biostatistics

estra <- read.csv(file.choose(), header = T, sep = "")
head(estra,3)

##   Id Estradl Ethnic Entage Numchild Agefbo Anykids Agemenar     bmi  whr
## 1  2   94.00      0     30        0      0       0       11 18.9038 0.70
## 2  2   14.00      0     23        0      0       0       15 20.4386 0.70
## 3  3   28.33      0     21        0      0       0       13 22.2578 0.75

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

estra <- estra %>% mutate(ethnicity = ifelse(Ethnic == 0, "AM", "CAU"),
                          nchild = ifelse(Numchild == 9, "NA", Numchild),
                          ageb = ifelse(Agefbo == 99, "NA", Agefbo),
                          kid = ifelse(Anykids == 1, "YES", ifelse(0, "NO", "NA")),
                          agemena = ifelse(Agemenar == 99, "NA", Agemenar))
head(estra,2)

##   Id Estradl Ethnic Entage Numchild Agefbo Anykids Agemenar     bmi whr
## 1  2      94      0     30        0      0       0       11 18.9038 0.7
## 2  2      14      0     23        0      0       0       15 20.4386 0.7
##   ethnicity nchild ageb kid agemena
## 1        AM      0    0  NA      11
## 2        AM      0    0  NA      15

Answer 3 - Descriptive satistics

estra %>% summarise(meanEntage = mean(Entage), sdEntage = sd(Entage),
                    meanbmi = mean(bmi), sdbmi = sd(bmi), meanwhr = mean(whr),
                    sdwhr = sd(whr)) ## mean and standard deviation of the variables

##   meanEntage sdEntage  meanbmi    sdbmi   meanwhr      sdwhr
## 1   26.18009 5.411257 25.77651 5.416825 0.7591469 0.06864839

quantile(estra$Entage) # 5 number summary for age (o% = min, 25% = 1st quartile, 50% = 2nd, 75% = 3rd, 100%= max)

##   0%  25%  50%  75% 100% 
##   18   21   26   31   37

quantile(estra$bmi) # 5 number summary for BMI ## (o% = min, 25% = 1st quartile, 50% = 2nd, 75% = 3rd, 100%= max)

##       0%      25%      50%      75%     100% 
## 17.72120 21.35590 24.50710 29.55665 42.24240

quantile(estra$whr) # 5 number summary for whr # (o% = min, 25% = 1st quartile, 50% = 2nd, 75% = 3rd, 100%= max)

##   0%  25%  50%  75% 100% 
## 0.62 0.71 0.74 0.81 0.98

Answer 4 - Table of counts

table(estra$ethnicity) # AM = African American, CAU = caucasian

## 
##  AM CAU 
##  60 151

p1 <- 60 / (60 + 151) 
round(p1, 2) # proportion of African Americans

## [1] 0.28

p2 <- 151 / (60 + 151) 
round(p2, 2)   # proportion of Caucasians

## [1] 0.72

Answer 5 - the histogram of BMI

hist(estra$bmi, xlab = "BMI", main = "Histogram")

Answer 6

The plot in (5) shows that the distribution of BMI is positively skewed and has a mode located between the values of 22 and 22.

Answer 7 - The histogram of Waist-hip ratio

hist(estra$whr, xlab = "Waist-hip ratio", main = "Histogram")

Answer 8

The distribution of Waist-hip ration is also positively skewed and had a mode around the range 0.7 - 0.74

Answer 9 - Interpretations

The means are the averages measures on the variables discussed. The standard deviations quantify the spread of the values on the variables. There are women in the sample who have relatively higher BMI and waist-hip ration, compared to the average.

Biostatistics

J. Mess

November 10, 2018