Import the data into RStudio and display it. Define the unit of study, define the variables, and explain which measurement scales they belong to.

mydata <- read.table("./Marathon.csv", header=TRUE, sep=";", dec=",")

head(mydata)
##   ID Weight Height Pressure Beat Hemoglobin Hematocrit Cholesterol Glucose
## 1  1     72  179.0      105   64        160         50         4.9     4.7
## 2  2     68  178.0      105   60        158         51         4.8     4.9
## 3  3     64  174.0      109   54        155         51         4.5     7.0
## 4  4     63  174.0      112   54        153         58         8.0     7.2
## 5  5     61  173.5      100   53        152         59         4.6     6.7
## 6  6     60  173.0       99   53        158         49         3.9     6.0
##   Gender
## 1      1
## 2      0
## 3      0
## 4      0
## 5      0
## 6      0

Estimate the arithemtic mean and standard deviation for Height.

mean(mydata$Height)
## [1] 176.9571
sd(mydata$Height)
## [1] 5.85156

Change the variable Gender into a factor.

mydata$GenderFactor <- factor(mydata$Gender, 
                              levels = c(0, 1), 
                              labels = c("Female", "Male"))

Estimate the descriptive statistics for the Glucose separately for each gender.

library(psych)
describeBy(mydata$Glucose, group = mydata$GenderFactor)
## 
##  Descriptive statistics by group 
## group: Female
##    vars  n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 14 5.96 0.93    5.8    5.97 1.33 4.6 7.2   2.6 0.12    -1.62 0.25
## ------------------------------------------------------------ 
## group: Male
##    vars  n mean  sd median trimmed  mad min max range skew kurtosis   se
## X1    1 21 4.54 0.7    4.6    4.45 0.74 3.8   6   2.2 0.97    -0.13 0.15

Use the stat.desc function to describe the variables and check that you know all the estimated parameters. Which variable has the greatest variability?

library(pastecs)
round(stat.desc(mydata[ , c(-1, -10, -11) ]), 2)
##               Weight  Height Pressure    Beat Hemoglobin Hematocrit Cholesterol
## nbr.val        35.00   35.00    35.00   35.00      35.00      35.00       35.00
## nbr.null        0.00    0.00     0.00    0.00       0.00       0.00        0.00
## nbr.na          0.00    0.00     0.00    0.00       0.00       0.00        0.00
## min            55.00  166.00    90.00   49.00     143.00      45.00        3.40
## max            81.00  189.00   135.00   64.00     183.00      69.00        8.00
## range          26.00   23.00    45.00   15.00      40.00      24.00        4.60
## sum          2375.00 6193.50  3838.00 1967.00    5445.00    1801.00      167.60
## median         68.00  177.00   108.00   55.00     157.00      51.00        4.70
## mean           67.86  176.96   109.66   56.20     155.57      51.46        4.79
## SE.mean         1.30    0.99     1.79    0.67       1.45       0.82        0.17
## CI.mean.0.95    2.64    2.01     3.64    1.37       2.94       1.66        0.34
## var            59.01   34.24   112.47   15.81      73.13      23.49        1.00
## std.dev         7.68    5.85    10.61    3.98       8.55       4.85        1.00
## coef.var        0.11    0.03     0.10    0.07       0.05       0.09        0.21
##              Glucose
## nbr.val        35.00
## nbr.null        0.00
## nbr.na          0.00
## min             3.80
## max             7.20
## range           3.40
## sum           178.65
## median          4.80
## mean            5.10
## SE.mean         0.18
## CI.mean.0.95    0.36
## var             1.12
## std.dev         1.06
## coef.var        0.21

For the variable Hematocrit, plot and describe the frequency distribution.

library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
ggplot(mydata, aes(x=Hematocrit)) +
  geom_histogram(binwidth = 5, colour = "black")+
  ylab("Frequency")

Draw the boxplot for the variable Glucose, separated by gender.

ggplot(mydata, aes(x=Glucose, y = GenderFactor, fill = GenderFactor)) +
  geom_boxplot() +
  scale_fill_brewer(palette="spectral") +
  xlab("Glucose") +
  labs(fill="Gender")
## Warning: Unknown palette: "spectral"