I proposed ten statistical question based on my own understanding of data.
What is the mean age for all participants in the data?
What is the correlation between body fat percentage and age?
What is the distribution of height?
What is the spread (variance) of Ankle circumferences?
What is the standard deviation of neck circumferences?
What is the 90th percentile of participants weights?
What is the box plot for abdomen circumferences?
what is the scatter diagram between weight and height?
What is the mode of biceps circumferences?
What is median of participants weights?
I used chat gpt to propose ten more statistical questions.
What is the average body fat percentage across all participants?
What is the median age of the individuals in the dataset?
What is the range of weights in the dataset?
What is the standard deviation of the height values?
How many participants have a neck circumference greater than 40 cm?
What is the median chest measurement for individual?
What is the minimum and maximum wrist circumference in the dataset?
What is the correlation between weight and body fat percentage?
What is the standard deviation of the abdomen measurements?
How does the average ankle circumference compare between men and women in the dataset?
We will explore ten questions in detail taken from both sets of my own and ai generated questions. The first five questions come from my set and the other five are generated by chat gpt.
My dataset below,
Bodyfat = read.csv("https://www.lock5stat.com/datasets3e/BodyFat.csv")
head(Bodyfat)
## Bodyfat Age Weight Height Neck Chest Abdomen Ankle Biceps Wrist
## 1 32.3 41 247.25 73.50 42.1 117.0 115.6 26.3 37.3 19.7
## 2 22.5 31 177.25 71.50 36.2 101.1 92.4 24.6 30.1 18.2
## 3 22.0 42 156.25 69.00 35.5 97.8 86.0 24.0 31.2 17.4
## 4 12.3 23 154.25 67.75 36.2 93.1 85.2 21.9 32.0 17.1
## 5 20.5 46 177.00 70.00 37.2 99.7 95.6 22.5 29.1 17.7
## 6 22.6 54 198.00 72.00 39.9 107.6 100.0 22.0 35.9 18.9
mean(Bodyfat$Age, na.rm = TRUE)
## [1] 44.88
The mean Age of participants in the data is 44.88 years old.
cor(Bodyfat$Bodyfat, Bodyfat$Age, use = "complete.obs")
## [1] 0.2557976
The correlation is between participants body fat and age is 0.256%.
hist(Bodyfat$Height, main = "Histogram of Height", xlab = "Height", col = "blue")
quantile(Bodyfat$Weight, 0.90)
## 90%
## 217.15
The 90th percentile of participants weights is 217.15lbs.
boxplot(Bodyfat$Abdomen, main = "Distribution of Abdomen size", col = "green", xlab = "Abdomen size", horizontal = TRUE)
median(Bodyfat$Age)
## [1] 44
The median age of participants in the data set is 44 years old.
range(Bodyfat$Weight)
## [1] 127.50 262.75
The range of participant weights in the data set is 262.75lbs - 127.50lbs = 135.25lbs.
hist(Bodyfat$Neck, main = "Distribution of Neck circumference", col = "red", xlab = "Neck circumference", ylab = "count")
Based on this histogram, the total number of participants with a neck circumference greater than 40cm is 19.
sd(Bodyfat$Abdomen)
## [1] 10.26123
The standard deviation of abdomen measurements is 10.26123cm.
quantile(Bodyfat$Chest, .50)
## 50%
## 99.25
The median chest measurement for participants in this data set is 99.25cm.
This project allowed us to analyse an imported data set using statistical methods from chapter 6 in Rstudio. The first part of this project was to generate ten statistical problems based on chapter 6 notes. Next I had to generate ten statistical questions using chat gpt. During the project I came across some unique situation when generating my questions. When using chatgpt it is very picky with the prompt parameters and some of the questions I generated where not applicable to this assignment. I had to make some modifications to the questions themselves to meet assignment parameters, such as, the removal of range specific questions. Next I selected five questions from each sets of questions and solved them using RStudio stats functions. This was a simple process after learning the basics of RStudio. Overall though I found Rstudio to be a useful tool for generating statistical data results.
# Q1 mean(Bodyfat$Age, na.rm = TRUE)
# Q2 cor(Bodyfat$Bodyfat, Bodyfat$Age, use = "complete.obs")
# Q3 hist(Bodyfat$Height, main = "Histogram of Height", xlab = "Height", col = "blue")
# Q4 quantile(Bodyfat$Weight, 0.90)
# Q5 boxplot(Bodyfat$Abdomen, main = "Distribution of Abdomen size", col = "green", xlab = "Abdomen size", horizontal = TRUE)
# Q6 median(Bodyfat$Age)
# Q7 range(Bodyfat$Weight)
# Q8 hist(Bodyfat$Neck, main = "Distribution of Neck circumference", col = "red", xlab = "Neck circumference", ylab = "count")
# Q9 sd(Bodyfat$Abdomen)
# Q10 quantile(Bodyfat$Chest, .50)