library(tidyverse)
library(scales)
# Import data
recent_grads <- read.csv("~/R/Business Sat/DATA/recent_grads.csv") %>% as_tibble()
# Create a scatterplot
recent_grads %>%
  ggplot(aes(ShareWomen, Unemployment_rate)) +
  geom_point()


# Compute correlation coefficient
cor(recent_grads$Unemployment_rate, recent_grads$ShareWomen, use = "pairwise.complete.obs")
## [1] 0.07320458

# Create a linear model 1
mod_1 <- lm(Unemployment_rate ~ ShareWomen, data = recent_grads)

# View summary of model 1
summary(mod_1)
## 
## Call:
## lm(formula = Unemployment_rate ~ ShareWomen, data = recent_grads)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.069268 -0.017685 -0.001467  0.018476  0.112827 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.063007   0.005730  10.996   <2e-16 ***
## ShareWomen  0.009606   0.010037   0.957     0.34    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03035 on 170 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.005359,   Adjusted R-squared:  -0.0004919 
## F-statistic: 0.9159 on 1 and 170 DF,  p-value: 0.3399

Interpretation

Q1. Create the same scatterplot to examine the relationship between ShareWomen and median earnings.

recent_grads %>%
  ggplot(aes(ShareWomen, Median)) +
  geom_point()

Q2 Compute correlation coefficient between the two variables and interpret them.

Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.

# Compute correlation coefficient
cor(recent_grads$Median,recent_grads$ShareWomen,use="pairwise.complete.obs")
## [1] -0.6186898

Q3 Build a regression model to predict median earnings using the share of women.

# Create a linear model 1
mod_1 <- lm(Unemployment_rate ~ ShareWomen, data = recent_grads)
# View summary of model 1
summary(mod_1)
## 
## Call:
## lm(formula = Unemployment_rate ~ ShareWomen, data = recent_grads)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.069268 -0.017685 -0.001467  0.018476  0.112827 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.063007   0.005730  10.996   <2e-16 ***
## ShareWomen  0.009606   0.010037   0.957     0.34    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03035 on 170 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.005359,   Adjusted R-squared:  -0.0004919 
## F-statistic: 0.9159 on 1 and 170 DF,  p-value: 0.3399

Q4. Is the coefficient of ShareWomen statistically significant at 5%? Interpret the coefficient.

yes because there are three stars. ShareWomen by 100 percent increases then the Median earning decrease by -30670 dollars

Q5. How much median earnings does the model predict for a major that has 60% as the share of women?

Q6. Interpret the reported residual standard error.

Q7. Interpret the reported adjusted R squared.