Star Wars Analysis

Jesslyn Angelia

2025-09-09

Introduction

This analysis uses the Star Wars dataset from the ‘dplyr’ package. We will explore character attributes such as height and birth year.

Dataset Preview

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- starwars
head(starwars)
## # A tibble: 6 × 14
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Luke Sky…    172    77 blond      fair       blue            19   male  mascu…
## 2 C-3PO        167    75 <NA>       gold       yellow         112   none  mascu…
## 3 R2-D2         96    32 <NA>       white, bl… red             33   none  mascu…
## 4 Darth Va…    202   136 none       white      yellow          41.9 male  mascu…
## 5 Leia Org…    150    49 brown      light      brown           19   fema… femin…
## 6 Owen Lars    178   120 brown, gr… light      blue            52   male  mascu…
## # ℹ 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

Data Cleaning

We remove rows with missing values to ensure accurate calculations.

clean_data <- na.omit(data)

Height Summary Statistics

We calculate the mean, median, variance, and standard deviation of character heights.

mean(clean_data$height)
## [1] 178.6552
median(clean_data$height)
## [1] 180
var(clean_data$height)
## [1] 501.734
sd(clean_data$height)
## [1] 22.39942

Range of Heights

We identify the minimum and maximum character heights.

range(clean_data$height)
## [1]  88 228

Quantiles of Height

We compute the distribution of heights at different quantile levels.

quantile(clean_data$height)
##   0%  25%  50%  75% 100% 
##   88  172  180  188  228

Correlation Between Height and Birth Year

We examine the relationship between character height and birth year.

cor(clean_data$height, clean_data$birth_year)
## [1] 0.5999684