library(tidyverse)
library(scales)
# Import data
recent_grads <- read.csv("file:///C:/Users/User/Documents/recent_grads.csv") %>% as_tibble()
# Create a scatterplot
recent_grads %>%
  ggplot(aes(ShareWomen, Unemployment_rate)) +
  geom_point()


# Compute correlation coefficient
cor(recent_grads$Unemployment_rate, recent_grads$ShareWomen, use = "pairwise.complete.obs")
## [1] 0.07320458

# Create a linear model 1
mod_1 <- lm(Unemployment_rate ~ ShareWomen, data = recent_grads)

# View summary of model 1
summary(mod_1)
## 
## Call:
## lm(formula = Unemployment_rate ~ ShareWomen, data = recent_grads)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.069268 -0.017685 -0.001467  0.018476  0.112827 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.063007   0.005730  10.996   <2e-16 ***
## ShareWomen  0.009606   0.010037   0.957     0.34    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03035 on 170 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.005359,   Adjusted R-squared:  -0.0004919 
## F-statistic: 0.9159 on 1 and 170 DF,  p-value: 0.3399

Q1. Create the same scatterplot to examine the relationship between ShareWomen and median earnings.

# Create a scatterplot
recent_grads %>%
  ggplot(aes(ShareWomen, Median)) +
  geom_point()

Q2 Compute correlation coefficient between the two variables and interpret them.

# Compute correlation coefficient
cor(recent_grads$Median, recent_grads$ShareWomen, use = "pairwise.complete.obs")
## [1] -0.6186898

Q3 Build a regression model to predict median earnings using the share of women.

# Create a linear model 1
mod_1 <- lm(Median ~ ShareWomen, data = recent_grads)

# View summary of model 1
summary(mod_1)
## 
## Call:
## lm(formula = Median ~ ShareWomen, data = recent_grads)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -17261  -5474  -1007   3502  57604 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    56093       1705   32.90   <2e-16 ***
## ShareWomen    -30670       2987  -10.27   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9031 on 170 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3828, Adjusted R-squared:  0.3791 
## F-statistic: 105.4 on 1 and 170 DF,  p-value: < 2.2e-16

Q4. Is the coefficient of ShareWomen statistically significant at 5%? Interpret the coefficient.

I donโ€™t understand this question. Coefficients are not statistically significant at 5% but this coefficient is not 5%, it is close to 0% which is statistically significant.

Q5. How much median earnings does the model predict for a major that has 60% as the share of women?

Intercept + Sharewomen0.6 = 56093 + (-30670)0.6 = 37691 The model predicts a median earnings of $37,691 for a major that has a 60% share of women.

Q6. Interpret the reported residual standard error.

Residual standard error is the standard deviation from the mean. It indicates that of the 170 observed datapoints, there is a variance in median income of $9,031 per standard deviation.

Q7. Interpret the reported adjusted R squared.

The adjusted r-squared value indicates we have a moderately weak correlation between median income and percent share of women. While we cannot trust a precise prediction using our model, we can reasonably infer that the model itself will provide a relatively accurate output.