Total Births by Sex Over the Years

Aggregate counts for the name “Jordan” by year and sex Code

data <- babynames %>%
  filter(name == "Jordan") %>%
  group_by(year, sex) %>%
  summarize(count = sum(n), .groups = 'drop')

# Plot
plot_ly(data, x = ~year, 
        y = ~count, 
        color = ~sex, 
        type = 'scatter', 
        mode = 'markers', 
        text = ~sex) %>%
  
  layout(title = 'Name "Jordan" Popularity Correlation by Gender',
         xaxis = list(title = 'Year'),
         yaxis = list(title = 'Count'))

Aggregate counts for the name “Jordan” by year and sex

Most Popular Names by Decade

Correlation Between ‘Noah’ and ‘Liam’ Over the Years

Popularity of Selected Names Over the Years

Simple Linear Regression

The formula for a simple linear regression model is:

\[ y = \beta_0 + \beta_1x + \epsilon \]

  • \(y\) is the dependent variable.
  • \(x\) is the independent variable.
  • \(\beta_0\) is the y-intercept.
  • \(\beta_1\) is the slope of the line.
  • \(\epsilon\) is the error term.

This model attempts to describe the relationship between two variables by fitting a linear equation to observed data.

Hypothesis Testing

Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data. Hypothesis Testing is basically an assumption that we make about the population parameter.

Key terms:

  • Null hypothesis (\(H_0\)): Assumes that no effect or no difference exists.
  • Alternative hypothesis (\(H_a\)): Contrary to the null hypothesis, it assumes that some difference or effect exists.

\[ H_0: \mu = \mu_0 \] \[ H_a: \mu \neq \mu_0 \]

Where \(\mu\) is the population mean and \(\mu_0\) is a specific value of the population mean that we are testing for.