1 Problem:

For this assignment, you will be submitting a webpage with the requested analysis of the data.  The html link (only) should be submitted through blackboard.  The webpage should be created with RMarkdown and analysis self-contained (i.e. all data manipulation, analysis, plotting, etc. should be done within R).  The code that was used should be included and displayed results throughout your webpage (echo=TRUE) and the complete code should also be included at the end of your webpage (eval=FALSE).   

Specifically, consider the file normtemp.csv that contains measurements on the resting body temperature and resting heart rate of n=65 randomly sampled males (1) and n=65 randomly sampled females (2).  This file may be downloaded directly into R using read.csv() with the following link.  

https://raw.githubusercontent.com/tmatis12/datafiles/main/normtemp.csv

We would like to analyze only the resting heart rate in this analysis

  • For males, perform an analysis that includes the descriptive statistics (e.g. min, max, sample mean, sample standard deviation, sample median, quartiles), histogram, and normal probability plot.  Comment on the statistics and plots.  Repeat the same for females.  Be sure to uniquely label the title and x-axis, and color the histograms (male-blue, female-pink).  

  • Create side by side box plots that compare the resting heart rate of males and females.  Be sure to title and label (male/female) the box plots. Comment on what you see in the box plots (similarities and/or differences)

2 Male Analysis:

dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/normtemp.csv")
Males<-dat[dat$Sex==1, ]
malebeats<-Males$Beats
#OR USE THIS
M<-dat[1:65,3]

2.1 Descriptive Statistics for Male:

min(M)
## [1] 58
max(M)
## [1] 86
mean(M)
## [1] 73.36923
median(M)
## [1] 73
sd(M)
## [1] 5.875184
summary(M)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   58.00   70.00   73.00   73.37   78.00   86.00
qqnorm(M)

hist(M,main = "Heart rate of males",xlab = "heartrates",col = "Blue")

Comment:

Normal probability plot for male population is approx fitting the straight line, so we can assume that sample data is normally distributed.

3 Female Analysis:

F<-dat[66:130,3]

3.1 Descriptive Statistics for Female:

min(F)
## [1] 57
max(F)
## [1] 89
mean(F)
## [1] 74.15385
median(F)
## [1] 76
sd(F)
## [1] 8.105227
summary(F)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   57.00   68.00   76.00   74.15   80.00   89.00
qqnorm(F)

hist(F,main = "Heart rate of females",xlab = "heartrates",col = "pink")

Comment:

Normal probability plot for female population is approx fitting the straight line, so we can assume that sample data is normally distributed.

4 Side by Side Box Plots of Male & Female:

boxplot(M,F,names = c("Males","Females"),main="boxplot of Males and Females",ylab="Heartrates")

5 Overall Analysis:

General Analysis:

  • The median for the males is greater than that of the females. the median for the male is 73 where as the median for the female is 76.

  • The percentage difference between means of Male and Female is 1.06 %, with mean value of female data being higher.

  • The percentage difference between Standard Deviations of Male and Female is 31.9 %, with Std. Dev of Female being higher.

  • Male: The normal probability curve & histogram produces a taller and a narrow density curve because the Standard Deviation is relatively less then that of females and therefore it fits the bell curve accurately.

  • Female: The normal probability curve & histogram produces a flatter and a wider density curve because the Standard Deviation is relatively higher then that of males and therefore the data is more skewed.

  • Level of Skewness: The plots and statistical data for both Male and Females are negatively skewed i.e. Male Skewness = -0.05, where as skewness level for females is -0.28. Therefore female date is more negatively skewed

    Conclusion:

  • Since the statistical data i.e. Mean, Standard Deviation and Interquartile ranges for female data is higher than the statistical data of the males, we can conclude that the female data is more variable as compared to male data.

6 Complete R-Code:

getwd()
dat<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/normtemp.csv")

#Males:

#Separating the data for males and females
Males<-dat[dat$Sex==1, ]
#If using above command, then create malebeats vector separatly i.e.
malebeats<-Males$Beats

# OR use this to separate males beats data
M<-dat[1:65,3]

#Evaluating descriptive statistic of Males Data using M
min(M)
max(M)
mean(M)
median(M)
sd(M)
summary(M)
qqnorm(M)
hist(M,main = "Heart rate of males",xlab = "heartrates",col = "Blue")

#Females:

#Creating separate data for Females
F<-dat[66:130,3]

#Evaluating descriptive statistic of Females
min(F)
max(F)
mean(F)
median(F)
sd(F)
summary(F)
qqnorm(F)
hist(F,main = "Heart rate of females",xlab = "heartrates",col = "pink")

#Boxplot of Male & Female Heart Rates

boxplot(M,F,names = c("Males","Females"),main="boxplot of Males and Females",ylab="Heartrates")