For this assignment, we will be submitting a webpage with the requested analysis of the data. The html link (only) will be submitted through blackboard. The webpage will be created with RMarkdown and analysis self-contained (i.e. all data manipulation, analysis, plotting, etc. should be done within R). The code that was used will be included and displayed results throughout the webpage (echo=TRUE) and the complete code will also be included at the end of the webpage (eval=FALSE).

Specifically, we will be working on the file normtemp.csv that contains measurements on the resting body temperature and resting heart rate of n=65 randomly sampled males (1) and n=65 randomly sampled females (2). This file may be downloaded directly into R using read.csv() with the following link below.

https://raw.githubusercontent.com/tmatis12/datafiles/main/normtemp.csv

Further focus analysis will be on the resting heart rate for both male and female

First will download the URL and return the first 6 rows of the data using a command in R

urldata<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/normtemp.csv")
head(urldata)
##   Temp Sex Beats
## 1 96.3   1    70
## 2 96.7   1    71
## 3 96.9   1    74
## 4 97.0   1    80
## 5 97.1   1    73
## 6 97.1   1    75

Question 1

For males, perform an analysis that includes the descriptive statistics (e.g. min, max, sample mean, sample standard deviation, sample median, quartiles), histogram, and normal probability plot. Comment on the statistics and plots. Repeat the same for females. Be sure to uniquely label the title and x-axis, and color the histograms (male-blue, female-pink).

We can also review the data by using str command in R to get a brief description of the dataset components

str(urldata)
## 'data.frame':    130 obs. of  3 variables:
##  $ Temp : num  96.3 96.7 96.9 97 97.1 97.1 97.1 97.2 97.3 97.4 ...
##  $ Sex  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Beats: int  70 71 74 80 73 75 82 64 69 70 ...

As you will observe the data has 130 observations and 3 variables. But we also need to subset the data by grouping them into male and female to make it easier for data manipulation.

#Subsetting the data to male and female by using range and column index to select the heartrate column
Males<-urldata[1:65,3] 
Female<-urldata[66:130,3]

1b) Descriptive Statistics of Male

summary(Males)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   58.00   70.00   73.00   73.37   78.00   86.00

The summary description of male data shows the Minimum or lowest range of heartbeat frequency to be 58beats and maximum up to 86beats with a mean heartbeat of 73.37beats. The first quartile is 70beats which means that 25% of the male heartbeat range is below this rate. While 78beats of the Third quantile means that 75% of the male heartbeat range lies below this point.

Descriptive Statistics of Female

summary(Female)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   57.00   68.00   76.00   74.15   80.00   89.00

The summary description of female data shows the Minimum or lowest range of heartbeat frequency to be 57beats and maximum up to 89beats with a mean heartbeat of 74.15beats. The first quartile is 68beats which means that 25% of the female heartbeat range is below this rate. While 80beats of the Third quantile means that 75% of the female heartbeat range lies below this point.


1c) Standard Deviation of Male and Female beats

SD<-c(sd(Males), sd(Female))
SD
## [1] 5.875184 8.105227

Conclusion

When comparing the male and female heartbeat, It tells us that male has a smaller standard deviation beats of 5.8751841 which indicates that they much more clustered to the mean. Unlike the female which larger standard deviation beats of 8.1052274 that indicates more spread out from the mean.


1d) Histgram of Male Heartbeat Distribution

hist(Males, main="Histogram of Male Heartbeat", col = blues9)

From the male histogram we can that it is a slightly symmetric distribution. Lets create a density plot to get a better visualization of how the data is spread out

plot(density(Males), main="Smoothed Histogram of Male Heartbeat")

The density plot shows its almost evenly distributed but there is a slight difference between the mean heartbeats rate 73.3692308 and median heartbeats of 73 of the male distribution.

Histgram of Female Heartbeat Distribution

hist(Female, col = "pink")

From the female histogram we can see that the histogram is left skewed. By creating a density plot we can get a better visualization of the skewness of the data

plot(density(Female), main="Smoothed Histogram of Female Heartbeat")

The density plot shows it is left skewed and the difference can be further interpreted by the mean heartbeats rate 74.1538462 which is lower than the median heartbeats of 76 of the female distribution.


1e) Normnal Probability Plot of Male

qqnorm(Males, main="Normal probability plot of Males heartbeat", col = "blue",xlab = 'Male', ylab = 'Heartbeat')
qqline(Males, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.8,1.7), col="purple")

From the plot we can tell that 95% of the male heartbeat distribution is within 2 standard deviation of the from the mean. There are also presence of 5 outliers in the distribution

qqnorm(Female, main="Normal probability plot of Females heartbeat", col = "Pink",xlab = 'Female', ylab = 'Heartbeat')
qqline(Female, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.5,1.2), col="purple")

From the plot we can tell that 68% of the female heartbeat distribution is within 1 standard deviation of the from the mean. There are also presence of more outliers when compared to male heartbeat distribution


Question 2

Create side by side box plots that compare the resting heart rate of males and females. Be sure to title and label (male/female) the box plots. Comment on what you see in the box plots (similarities and/or differences).

boxplot(Males, Female, names = c("Males", "Female"), main="Comparing Boxplot for Male and Female", col=c("blue","pink"))

From the boxplot, it appears that female heartbeat rate data is distributed or more spread out when compared with males heartbeat rate, and the average heartbeat of female is higher when compared with male heartbeat rate. We can also observe that the whiskers at the bottom of both the male and female boxplot is longer than the top, which means there are more lower or extreme values towards the negative side, hence distribution is negatively skewed.


# Question 1)
urldata<-read.csv("https://raw.githubusercontent.com/tmatis12/datafiles/main/normtemp.csv")

#We can also review the data by using str( ) in R 
str(urldata)
#Subsetting the data to male and female by using range and column index to select the heartrate column
Males<-urldata[1:65,3] 
Female<-urldata[66:130,3]


# Question 1b) Descriptive Statistics of Male and Female 
summary(Males)
summary(Female)


# Question 1c) Standard Deviation of Male and Female beats
SD<-c(sd(Males), sd(Female))
SD


# Question 1d) Histgram of Male Heartbeat Distribution
  
hist(Males, main="Histogram of Male Heartbeat", col = blues9)
plot(density(Males), main="Smoothed Histogram of Male Heartbeat")

hist(Female, col = "pink")
plot(density(Female), main="Smoothed Histogram of Female Heartbeat")


# Question 1e) Normal Probability Plot

# Normal probability distribution for males
qqnorm(Males, main="Normal probability plot of Males heartbeat", col = "blue",xlab = 'Male', ylab = 'Heartbeat')
qqline(Males, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.8,1.7), col="purple")

# Normal probability distribution for females
qqnorm(Female, main="Normal probability plot of Females heartbeat", col = "Pink",xlab = 'Female', ylab = 'Heartbeat')
qqline(Female, datax = FALSE, distribution = qnorm,
       probs = c(0.25, 0.75), qtype = 7)
abline(v=c(-1.5,1.2), col="purple")


# Question 2

boxplot(Males, Female, names = c("Males", "Female"), main="Comparing Boxplot for Male and Female", col=c("blue","pink"))