1. Introduction

I proposed ten statistical question based on my own understanding of data.

  1. What is the mean age for all participants in the data?

  2. What is the correlation between body fat percentage and age?

  3. What is the distribution of height?

  4. What is the spread (variance) of Ankle circumferences?

  5. What is the standard deviation of neck circumferences?

  6. What is the 90th percentile of participants weights?

  7. What is the box plot for abdomen circumferences?

  8. what is the scatter diagram between weight and height?

  9. What is the mode of biceps circumferences?

  10. What is median of participants weights?

I used chat gpt to propose ten more statistical questions.

  1. What is the average body fat percentage across all participants?

  2. What is the median age of the individuals in the dataset?

  3. What is the range of weights in the dataset?

  4. What is the standard deviation of the height values?

  5. How many participants have a neck circumference greater than 40 cm?

  6. What is the median chest measurement for individual?

  7. What is the minimum and maximum wrist circumference in the dataset?

  8. What is the correlation between weight and body fat percentage?

  9. What is the standard deviation of the abdomen measurements?

  10. How does the average ankle circumference compare between men and women in the dataset?

2. Analysis

We will explore ten questions in detail taken from both sets of my own and ai generated questions. The first five questions come from my set and the other five are generated by chat gpt.

My dataset below,

Bodyfat = read.csv("https://www.lock5stat.com/datasets3e/BodyFat.csv")
head(Bodyfat)
##   Bodyfat Age Weight Height Neck Chest Abdomen Ankle Biceps Wrist
## 1    32.3  41 247.25  73.50 42.1 117.0   115.6  26.3   37.3  19.7
## 2    22.5  31 177.25  71.50 36.2 101.1    92.4  24.6   30.1  18.2
## 3    22.0  42 156.25  69.00 35.5  97.8    86.0  24.0   31.2  17.4
## 4    12.3  23 154.25  67.75 36.2  93.1    85.2  21.9   32.0  17.1
## 5    20.5  46 177.00  70.00 37.2  99.7    95.6  22.5   29.1  17.7
## 6    22.6  54 198.00  72.00 39.9 107.6   100.0  22.0   35.9  18.9

Q1: What is the mean age for all participants in the data?

mean(Bodyfat$Age, na.rm = TRUE)
## [1] 44.88

The mean Age of participants in the data is 44.88 years old.

Q2: What is the correlation between body fat percentage and age?

cor(Bodyfat$Bodyfat, Bodyfat$Age, use = "complete.obs")
## [1] 0.2557976

The correlation is between participants body fat and age is 0.256%.

Q3: What is the distribution of height?

hist(Bodyfat$Height, main = "Histogram of Height", xlab = "Height", col = "blue")

Q4: What is the 90th percentile of participants weights?

quantile(Bodyfat$Weight, 0.90)
##    90% 
## 217.15

The 90th percentile of participants weights is 217.15lbs.

Q5: What is the box plot for abdomen circumferences?

boxplot(Bodyfat$Abdomen, main = "Distribution of Abdomen size", col = "green", xlab = "Abdomen size", horizontal = TRUE)

Q6: What is the median age of the individuals in the dataset?

median(Bodyfat$Age)
## [1] 44

The median age of participants in the data set is 44 years old.

Q7: What is the range of weights in the dataset?

range(Bodyfat$Weight)
## [1] 127.50 262.75

The range of participant weights in the data set is 262.75lbs - 127.50lbs = 135.25lbs.

Q8: How many participants have a neck circumference greater than 40 cm?

hist(Bodyfat$Neck, main = "Distribution of Neck circumference", col = "red", xlab = "Neck circumference", ylab = "count")

Based on this histogram, the total number of participants with a neck circumference greater than 40cm is 19.

Q9: What is the standard deviation of the abdomen measurements?

sd(Bodyfat$Abdomen)
## [1] 10.26123

The standard deviation of abdomen measurements is 10.26123cm.

Q10: What is the median chest measurement for individual?

quantile(Bodyfat$Chest, .50)
##   50% 
## 99.25

The median chest measurement for participants in this data set is 99.25cm.

3. Summary Report

This project allowed us to analyse an imported data set using statistical methods from chapter 6 in Rstudio. The first part of this project was to generate ten statistical problems based on chapter 6 notes. Next I had to generate ten statistical questions using chat gpt. During the project I came across some unique situation when generating my questions. When using chatgpt it is very picky with the prompt parameters and some of the questions I generated where not applicable to this assignment. I had to make some modifications to the questions themselves to meet assignment parameters, such as, the removal of range specific questions. Next I selected five questions from each sets of questions and solved them using RStudio stats functions. This was a simple process after learning the basics of RStudio. Overall though I found Rstudio to be a useful tool for generating statistical data results.

Appendix

# Q1 mean(Bodyfat$Age, na.rm = TRUE)
# Q2 cor(Bodyfat$Bodyfat, Bodyfat$Age, use = "complete.obs")
# Q3 hist(Bodyfat$Height, main = "Histogram of Height", xlab = "Height", col = "blue")
# Q4 quantile(Bodyfat$Weight, 0.90)
# Q5 boxplot(Bodyfat$Abdomen, main = "Distribution of Abdomen size", col = "green", xlab = "Abdomen size", horizontal = TRUE)
# Q6 median(Bodyfat$Age)
# Q7 range(Bodyfat$Weight)
# Q8 hist(Bodyfat$Neck, main = "Distribution of Neck circumference", col = "red", xlab = "Neck circumference", ylab = "count")
# Q9 sd(Bodyfat$Abdomen) 
# Q10 quantile(Bodyfat$Chest, .50)