dat <- read.csv('https://raw.githubusercontent.com/tmatis12/datafiles/main/normtemp.csv')
dat <- dat[order(dat$Sex),]#order the data so we know that we have Males first
Males=dat[1:65,3]
Females=dat[65:length(dat),3]
summary(Males)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 58.00 70.00 73.00 73.37 78.00 86.00
We can see that our mean is 73.37, with a range of 28. Our mean and median are very close to each other, so our data is probably not skewed.
summary(Females)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 58.00 70.00 73.00 73.46 78.00 86.00
The female dataset appears to be very similar to the male’s. Both have the same range of data(same min and max) and median, and the mean is only 0.09 different. Additionally, the standard deviation (shown below) is very similar as well, with only -0.0703357 difference.
sd(Males)
## [1] 5.875184
sd(Females)
## [1] 5.94552
par( mfrow=c(1,2),mar=c(4,4,1,0) )
malehist <- hist(Males,main='Male Heartbeats',col='light blue',xlab=' Male BPM',ylab='Num of Occurences')
femalehist <- hist(Females,main='Female Heartbeats',col='light pink',xlab='Female BPM',ylab='Num of Occurences')
It is not easily to tell where the Male and Female Heartbeats differ on R’s automatic column number for the histograms. By increasing the number of column’s we can see the slight differences between the data more clearly.
par( mfrow=c(1,2),mar=c(4,4,1,0) )
malehist <- hist(Males,main='Male Heartbeats',col='light blue',breaks=25,xlab='Male BPM',ylab='Num of Occurences')
femalehist <- hist(Females,main='Female Heartbeats',col='light pink',breaks=25,xlab='Female BPM',ylab='Num of Occurences')
We can see the slight differences in the data now, as well as an interesting spike in the 75-80 range, where 7 Males and Females both lay. It is interesting that this spike occured in both males and females.
We can see the difference between the two data sets even more clearly by overlaying the two graph onto each other.
hist(Females, breaks=30, col=rgb(1,0,0,0.5), main="Male and Female Heartbeats",xlab='Male/Female BPM',ylab='Num of Occurences')
legend("topright", c('Female','Male','Male/Female'), fill=c(rgb(1,0,0,0.5),rgb(0,0,1,0.5),'purple'))
# Second plot with add=T to plot on top with transperancy effect
hist(Males, breaks=30, col=rgb(0,0,1,0.5), add=T)
The purple in the graph above shows where the two graphs are overlapping.
We can see clearly that there are two entries in the male side that were different from the female side, one in the column for 70 BPM and one in 71 BPM. We do no see any pink columns here, meaning that every female entry has the exact matching male entry. Given that every single other entry besides the two blue columns we see peeking over is purple, we can now say with some confidence that the Male and Female data are copies of each other, with Males dataset simply having two extra entries. We can check this hypothesis by looking at the length of either data set.
length(Males)
## [1] 65
length(Females)
## [1] 63
As suspected, the data for Males has two more entries than the Female data.
Our data does appear to be mostly normally distributed, both in the Histogram and the straight, diagonal line shown on the Normal Probability Plot. This plot also shows without a doubt that our data is discrete, and probably uses only whole numbers, as it is clear that there are jumps in the Probability plot between each number (70, 71, 72, instead of 71.1, 71.7, 72.05 etc.)
par( mfrow=c(1,2),mar=c(4,4,1,0) )
qqnorm(Females,main='Plot of Female Heartbeats')
qqnorm(Males,main='Plot of Male Heartbeats')
Though we know now that Males only has two different data points then Females, it is difficult to spot the differences in the plots side by side.(I can only really tell that the 70 sample quantiles row of data has less spread than the males.) Mostly just for fun, can we see the differences if the Normal Probability Plots are overlayed?
qqnorm(Males,,main='Density Plot of Heartbeats',col='blue')
par(new=TRUE)
qqnorm(Females,main='Density Plot of Heartbeats',col='red')
legend("bottomright", c('Female','Male'),fill=c('red','blue'))
The answer seems to be no. We cannot tell with any assurance where the two differences between these graphs are. If the two datasets were more different, this graph could be more useful though.
As a final look, here are some boxplots. They once again show that, by the statistics, there is a mere 0.09 difference in the mean and a -0.0703357 difference in the standard deviation. Knowing now that there are only two different entries, this makes sense.
boxplot(Males,Females, main='Boxplot of Males and Females Heartbeat', names=c('Males','Females'),col=c('light blue','light pink'))
##How the data was read:
dat <- read.csv('https://raw.githubusercontent.com/tmatis12/datafiles/main/normtemp.csv')
dat <- dat[order(dat$Sex),]#order the data so we know that we have Males first
Males=dat[1:65,3]
Females=dat[65:length(dat),3]
##Comparison of Statistic Data
summary(Males)
summary(Females)
sd(Males)
sd(Females)
#Comparison Using Plots
par( mfrow=c(1,2),mar=c(4,4,1,0) )
malehist <- hist(Males,main='Male Heartbeats',col='light blue',xlab=' Male BPM',ylab='Num of Occurences')
femalehist <- hist(Females,main='Female Heartbeats',col='light pink',xlab='Female BPM',ylab='Num of Occurences')
#more columns on the histograms
par( mfrow=c(1,2),mar=c(4,4,1,0) )
malehist <- hist(Males,main='Male Heartbeats',col='light blue',breaks=25,xlab='Male BPM',ylab='Num of Occurences')
femalehist <- hist(Females,main='Female Heartbeats',col='light pink',breaks=25,xlab='Female BPM',ylab='Num of Occurences')
#overlaid male/female histogram
hist(Females, breaks=30, col=rgb(1,0,0,0.5), main="Male and Female Heartbeats",xlab='Male/Female BPM',ylab='Num of Occurences')
legend("topright", c('Female','Male','Male/Female'), fill=c(rgb(1,0,0,0.5),rgb(0,0,1,0.5),'purple'))
hist(Males, breaks=30, col=rgb(0,0,1,0.5), add=T)
#lengths
length(Males)
#Normal Probability Plots
par( mfrow=c(1,2),mar=c(4,4,1,0) )
qqnorm(Females,main='Plot of Female Heartbeats')
qqnorm(Males,main='Plot of Male Heartbeats')
#overlaid normal probability plots
qqnorm(Males,,main='Density Plot of Heartbeats',col='blue')
par(new=TRUE)
qqnorm(Females,main='Density Plot of Heartbeats',col='red')
#Boxplot
boxplot(Males,Females, main='Boxplot of Males and Females Heartbeat', names=c('Males','Females'),col=c('lightblue','lightpink'))