Exercise on Correlation

Q1. Create the same scatterplot to examine the relationship between ShareWomen and median earnings.
Q2 Compute correlation coefficient between the two variables and interpret them.
Q3 Build a regression model to predict median earnings using the share of women.
Q4. Is the coefficient of ShareWomen statistically significant at 5%? Interpret the coefficient.
Q5. How much median earnings does the model predict for a major that has 60% as the share of women?
Q6. Interpret the reported residual standard error.
Q7. Interpret the reported adjusted R squared.

library(tidyverse)
library(scales)

# Import data
recent_grads <- read.csv("~/R/Business Sat/DATA/recent_grads.csv") %>% as_tibble()

# Create a scatterplot
recent_grads %>%
  ggplot(aes(ShareWomen, Unemployment_rate)) +
  geom_point()


# Compute correlation coefficient
cor(recent_grads$Unemployment_rate, recent_grads$ShareWomen, use = "pairwise.complete.obs")
## [1] 0.07320458

# Create a linear model 1
mod_1 <- lm(Unemployment_rate ~ ShareWomen, data = recent_grads)

# View summary of model 1
summary(mod_1)
## 
## Call:
## lm(formula = Unemployment_rate ~ ShareWomen, data = recent_grads)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.069268 -0.017685 -0.001467  0.018476  0.112827 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.063007   0.005730  10.996   <2e-16 ***
## ShareWomen  0.009606   0.010037   0.957     0.34    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03035 on 170 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.005359,   Adjusted R-squared:  -0.0004919 
## F-statistic: 0.9159 on 1 and 170 DF,  p-value: 0.3399

Interpretation

correlation coefficient There is a very weak (the coefficient’s absolute value < 0.4) positive (its sign) association between ShareWomen and Unemployment_rate. Correlation coefficient ranges from -1 to 1. A correrlation coefficient with its absolute value > 0.6 can be considered a strong association, one with its absolute value < 0.6 but > 0.4 a moderate association, and one with its absolute value < 0.4 a weak association.
significance of coefficients summary(mod_1) returns coefficients and their significance under Coefficients. The number of * at the end of the line indicates how significant the coefficient is. *** at the end of the Intercept line indicates that the coefficient is significant at 0.1% signficance level (low p-values). It means that we are 99.9% confident that the interecept is true. One the other hand, ShareWomen has no stars, which means that we are not confident of the reported coefficient at all. In other words, changes in the ShareWomen are highly unlikely meaningful in explaining changes in the Unemployment_rate.
coefficient of ShareWomen The coefficient of ShareWomen is 0.009606. One unit increase in ShareWomen is associated with little changes in the Unemployment_rate (only 0.96%). When interpreting coeffcients, make sure to check the unit of the variables in the data. Both ShareWomen and Unemployment_rate are expressed in the decimal form.
intercept The intercept is 0.063007. It means that a major with no women (ShareWomen = 0) is predicted to have the unemployment rate of 0.063007 or 6.3%.
residual standard error The typical difference between the actual Unemployment_rate and the Unemployment_rate predicted by the model is 0.03035 (3.0%). In other words, the model estimated Unemployment_rate misses the actual Unemployment_rate by about 3.0%.
Adjusted R-squared The reported R^2 of the model is -0.0004919. Note that R^2 is usually a positive value. For example, R^2 of 0.5136 would mean that 51.36% of the variability in Unemployment_rate can be explained by ShareWomen. In this case, R^2 is negative. It just means that the model doesn’t do a good job of explaining the variability in Unemployment_rate at all. ShareWomen is not a good predictor of the Unemployment_rate.
Making predictions One could predict the Unemployment_rate using a good model. How much unemployment rate does the given model predict for a major with 50% of women? intercept + coefficient of ShareWomen * 0.05 = 0.063007 + 0.009606 * 0.05 = 0.0634873

Q1. Create the same scatterplot to examine the relationship between ShareWomen and median earnings.

recent_grads %>%
  ggplot(aes(ShareWomen, Median)) +
  geom_point()

Q2 Compute correlation coefficient between the two variables and interpret them.

Hint: Make sure to interpret the direction and the magnitude of the relationship. In addition, keep in mind that correlation (or regression) coefficients do not show causation but only association.

# Compute correlation coefficient
cor(recent_grads$Median,recent_grads$ShareWomen,use="pairwise.complete.obs")
## [1] -0.6186898

Q3 Build a regression model to predict median earnings using the share of women.

# Create a linear model 1
mod_1 <- lm(Unemployment_rate ~ ShareWomen, data = recent_grads)
# View summary of model 1
summary(mod_1)
## 
## Call:
## lm(formula = Unemployment_rate ~ ShareWomen, data = recent_grads)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.069268 -0.017685 -0.001467  0.018476  0.112827 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.063007   0.005730  10.996   <2e-16 ***
## ShareWomen  0.009606   0.010037   0.957     0.34    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03035 on 170 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.005359,   Adjusted R-squared:  -0.0004919 
## F-statistic: 0.9159 on 1 and 170 DF,  p-value: 0.3399

Q4. Is the coefficient of ShareWomen statistically significant at 5%? Interpret the coefficient.

yes because there are three stars. ShareWomen by 100 percent increases then the Median earning decrease by -30670 dollars

Exercise on Correlation

Niti Bista

Q1. Create the same scatterplot to examine the relationship between ShareWomen and median earnings.

Q2 Compute correlation coefficient between the two variables and interpret them.

Q3 Build a regression model to predict median earnings using the share of women.

Q4. Is the coefficient of ShareWomen statistically significant at 5%? Interpret the coefficient.

Q5. How much median earnings does the model predict for a major that has 60% as the share of women?

Q6. Interpret the reported residual standard error.

Q7. Interpret the reported adjusted R squared.