sat_gpa <- read_csv("C:/Users/tophe/Downloads/sat_gpa.csv")
## Rows: 1000 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): sex
## dbl (5): sat_verbal, sat_math, sat_total, gpa_hs, gpa_fy
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data("World")
## Warning in data("World"): data set 'World' not found

Task 1 Calculate and interpret the correlation between SAT math scores (sat_math) and freshman GPA (gpa_fy). Next, visualize this relationship with a scatterplot, including a regression line.

cor(sat_gpa$gpa_fy, sat_gpa$sat_math)
## [1] 0.3871178
print("With a value of 0.387, the variables are moderately correlated")
## [1] "With a value of 0.387, the variables are moderately correlated"
ggplot(sat_gpa, aes(x = sat_math, y = gpa_fy)) + geom_point(size = 0.5) + geom_smooth(method = "lm", se = FALSE) +
  labs(x = "SAT Math Score", y = "Freshman GPA")
## `geom_smooth()` using formula = 'y ~ x'

Task 2 Create regression tables to analyze the relationship between SAT scores (total and verbal) and freshman GPA, using both the modelsummary and sjPlot packages. Interpret.

model <- lm(gpa_fy ~ sat_total, data = sat_gpa)
modelsummary(model)
tinytable_4tzjjl691zoyqxpxyqrt
(1)
(Intercept) 0.002
(0.152)
sat_total 0.024
(0.001)
Num.Obs. 1000
R2 0.212
R2 Adj. 0.211
AIC 2004.8
BIC 2019.5
Log.Lik. -999.382
RMSE 0.66
tab_model(model)
  gpa_fy
Predictors Estimates CI p
(Intercept) 0.00 -0.30 – 0.30 0.990
sat total 0.02 0.02 – 0.03 <0.001
Observations 1000
R2 / R2 adjusted 0.212 / 0.211

Interpret: The y-intercept value is 0.002, meaning a person who got a zero on their SAT can expect to get about a 0.002 gpa in their freshamn year. The slope is 0.024 which implies that getting a single point higher on the SATs would increase someone’s predicted freshman gpa by 0.024 points. The p-value of the data is below 0.001 which is below any cutoffs for alpha values, meaning the data is statistically significant and a null hypothesis that is drawn from the data can be rejected. The Y-intercept is not statistically significant for the data due to its high p-value. Finally, the r-squared value of 0.212 tells us that about 21.2 percent of variation can be explained by the model.

Task 3 Explore the relationship between Happy Life Years (HLY) and GDP per Capita using the HappyPlanetIndex dataset. Visualize the coefficients using both the modelsummary and Sjplot packages.

cor(World$HLY, World$GDPperCapita)
## [1] 0.7520931
ggplot(World, aes(x = GDPperCapita, y = HLY)) + geom_point(size = 0.5) +
  geom_smooth(method = "lm", se = FALSE) + 
  labs(x = "GDP per Capita", y = "Happy Life Years")
## `geom_smooth()` using formula = 'y ~ x'

model2 <- lm(HLY ~ GDPperCapita, data = World)
modelsummary(model2)
tinytable_ekglv0lp1ww4pft518mo
(1)
(Intercept) 31.182
(1.114)
GDPperCapita 0.001
(0.000)
Num.Obs. 141
R2 0.566
R2 Adj. 0.563
AIC 1043.2
BIC 1052.0
Log.Lik. -518.576
RMSE 9.57
tab_model(model2)
  HLY
Predictors Estimates CI p
(Intercept) 31.18 28.98 – 33.38 <0.001
GDPperCapita 0.00 0.00 – 0.00 <0.001
Observations 141
R2 / R2 adjusted 0.566 / 0.563

Interpretation: The correlation coefficient of 0.75 suggests that these two variables are moderately if not strongly positively correlated. The y-intercept of 31.18 says that a country with a GDP per Capita of 0, they would be expected to have 31.18 Happy Life Years. The slope of 0.001 means the number of Happy Life years goes up by 0.001 for every increase GDP per Capita of 1. The r-squared is 0.566 which means 56.6 percent of the variation is explainable with this model.

Task 4 Visualize the coefficients of the models specified below using both modelplot from the modelsummary package and plot_model from the sjPlot package. Fit three models to predict happiness (Happiness) using different predictors:

Model 1: Life Expectancy (LifeExpectancy)

Model 2: Ecological Footprint (Footprint)

Model 3: GDP per Capita (GDPperCapita), Human Development Index (HDI), and Population (Population)

model3 <- lm(Happiness ~ LifeExpectancy, data = World)
model4 <- lm(Happiness ~ Footprint, data = World)
model5 <- lm(Happiness ~ GDPperCapita + HDI + Population, data = World)
modelplot(list(model3, model4, model5), coef_omit = "Intercept") +
    geom_point(aes(x = estimate, y = term, color = model, shape = model), size = 2) +
    geom_errorbarh(aes(xmin = conf.low, xmax = conf.high, y = term, color = model), height = 0.1) + scale_color_brewer(palette = "Set2") +
    labs(x = 'Coefficient Estimate', y = 'Term',
    title = 'Model Coefficients with Confidence Intervals',
    caption = 'Comparison of Models 3, 4, and 5') +
    theme_minimal() +
    theme(legend.position = "left") +
    facet_wrap(~ model, scales = "free_x")

plot_model(model5, 
           type = "std2",
           show.values = TRUE,
           ci.lvl = 0.95,
           title = "Model 5: Happiness ~ GDPperCapita + HDI + Population") +
  theme_minimal() +
  theme(legend.position = "left")

plot_model(model4, 
           type = "std2",
           show.values = TRUE,
           ci.lvl = 0.95,
           title = "Model 4: Happiness ~ Footprint") +
  theme_minimal() +
  theme(legend.position = "left")

plot_model(model3, 
           type = "std2",
           show.values = TRUE,
           ci.lvl = 0.95,
           title = "Model 3: Happiness ~ LifeExpectancy") +
  theme_minimal() +
  theme(legend.position = "left")

Task 5 Predict murder rates using both internet penetration and GDP, from the Violence dataset. Create regression tables for the model using both the modelsummary and sjPlot packages. Interpret.

model7 <- lm(MurderRate ~ GDP + Internet, data = Violence)
  modelsummary(model7)
tinytable_8bzvu3ncnph9845sogks
(1)
(Intercept) 28.984
(11.930)
GDP -0.001
(0.001)
Internet 0.463
(0.438)
Num.Obs. 8
R2 0.602
R2 Adj. 0.443
AIC 72.1
BIC 72.4
Log.Lik. -32.059
RMSE 13.31
  tab_model(model7)
  MurderRate
Predictors Estimates CI p
(Intercept) 28.98 -1.68 – 59.65 0.059
GDP -0.00 -0.00 – 0.00 0.080
Internet 0.46 -0.66 – 1.59 0.339
Observations 8
R2 / R2 adjusted 0.602 / 0.443

Interpretation: The y-intercept being at 28.98 states that at zero gdp and internet penetration, the crime rate of a country would be 28.98. The slope of GDP is 0 which means GDP would not impact the murder rate if the it changed while the Internet penetration has a slope of 0.46 which implies for every one point increase in internet penetration, the murder rate would rise 0.46. The r-squared is 0.602 so it could hypothetically explain over 60 percent of variations in crime rate, but none of the data is significantly significant since none of the variables’ p-values are below 0.05.