2023-10-12

This data set is from Kaggle.com

  • It is called ‘Simple Gender Classification’
  • This data set is about specific individuals in a population and their personal information.
  • The columns are:
  • Gender: either male or female
  • Age : in years
  • Height : in cm
  • Weight : in kg
  • Occupation
  • Education Level : ranging from high school diploma to doctorate degree
  • Marital Status: single, married, Other
  • Income: in U.S. Dollars

Using hypothesis testing with ggplot2.

Our hypothesis is that Individuals with a higher education level will also have a higher average income in U.S. Dollars. This is proven correct from the graph above.

Hypothesis testing with ggplot2.

Our hypothesis is that Men will have a higher average income than women. This is proven correct from the graph above

Using a 3-D scatterplot

This is analyzing the relationships between age, height, and income by multiple linear regression.

Multiple linear regression is a statistical method that will model the relationship between multiple variables. This can help us understand how variables can predict or explain another.

\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + ... \]

The equation above is represented by these variables:

\[ \beta = slope \ coeficient \\ y = dependent \ variable \\ x = independent \ variable \]

Code for my plotly graph

plot <- plot_ly(data = df, x = ~Age, y = ~Height..cm., 
                z = ~Income..USD., type = 'scatter3d',
                mode = 'markers', 
               marker = list(size = 9, opacity = 0.8, color ="#7C0811"))

plot <- plot %>% layout(
  title = "Scatter Plot of Age vs. Height vs. Income",
  scene = list(
  xaxis = list(title = "Age"),
  yaxis = list(title = "Height(cm)"),
  zaxis = list(title = "Income(USD)")),
  showlegend = FALSE
)


plot

Thank you!