title: “Week 6 Assignment – Height Analysis” author: “Tyler Whittney” date: “2026-02-25” output: html_document ———————

Overview

This assignment analyzes the height variable from the ok_cupid_data_full.csv dataset. I calculate measures of center, measures of spread, and create two graphs.

Load Packages

library(readr)
library(ggplot2)

Import Data

okcupid_data <- read_csv("ok_cupid_data_full.csv")

## Rows: 300 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (16): body_type, diet, drinks, drugs, education, ethnicity, job, offspri...
## dbl  (3): age, height, income
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Select Height Variable

height <- na.omit(okcupid_data$height)

(a) Central Tendency

Mean

mean(height)

## [1] 67.98

Median

median(height)

## [1] 68

Mode

get_mode <- function(x) {
  uniq_x <- unique(x)
  uniq_x[which.max(tabulate(match(x, uniq_x)))]
}

get_mode(height)

## [1] 67

Explanation: The mean is the average height. The median is the middle height. The mode is the most common height.

(b) Variability

Range

range(height)

## [1] 59 80

Variance

var(height)

## [1] 15.42435

Standard Deviation

sd(height)

## [1] 3.927384

Explanation: The range shows the smallest and largest heights. Variance and standard deviation show how spread out the heights are.

(c) Visualizations

Histogram

ggplot(okcupid_data, aes(x = height)) +
  geom_histogram(bins = 30) +
  labs(title = "Histogram of Height",
       x = "Height (inches)",
       y = "Frequency")

Boxplot

ggplot(okcupid_data, aes(y = height)) +
  geom_boxplot() +
  labs(title = "Boxplot of Height",
       y = "Height (inches)")

Explanation: The histogram shows the shape of the height distribution. The boxplot shows the median, spread, and possible outliers.

Conclusion

I imported a CSV file, analyzed the height variable, calculated measures of center and spread, and created two graphs to summarize the data.