Introduction

In this assignment, from Starwars dataset, I would like to explore the relationship between height and mass in different genders. Height is the indipendent variable and mass is the dependent variable. Based on common sense, I suppose that the height and mass has positive relations.

Methods

Firstly, I see missing values in the data set so I need to remove them before making the plot. This step aims to minimize the bias. Secondly, I will create a scatter plot and a density plot of height and mass in genders via geom_point and geom_density(). Thirdly, a result analysis will be given in the summary.

Scatter plot

filtered_starwars <- starwars %>% 
  drop_na(height, mass)
ggplot(data = filtered_starwars) +
  geom_point(mapping = aes(x = height, 
                           y = mass, 
                           colour = gender
                           )
             ) +
  coord_cartesian(xlim = c(0, 300))

### The Linear Model of height and mass
ggplot(data = filtered_starwars, 
       mapping = aes(x = height, y = mass)
       ) + 
  geom_point() + 
  geom_smooth(method = 'lm', formula = y ~ x)

###Density Plot

### A simple density plot

ggplot(starwars, aes(height)) +
  geom_density()

### A better quality density plot by gender
filtered_data <- starwars %>%
  filter(gender %in% c("female", "male"))

ggplot(filtered_data) +
  geom_density(aes(height, fill = gender))

###Results and Summary

The scatter plot shows that height and mass have positive relationship, which means that the higher the role is, the bigger the mass he or she has. There is a outlier with the hermaphrodite gender but it does not actually affect the trend. From the density plot, we can see that most height fall between 130 to 225 across the gender, and it is approximately normal distributed.