TASK 1
# Calculate the relationship between SAT scores and freshman GPA
cor(sat_gpa$gpa_fy, sat_gpa$sat_total)
## [1] 0.460281
Interpretation: A positive correlation suggests that usually higher freshman GPAs correlate with better SAT math scores. The accompanying scatterplot with a regression line graphically supports this positive trend by showing that freshman GPA typically rises as SAT math scores rise.
# Visualize the relationship with a scatterplot and regression line
ggplot(sat_gpa, aes(x = sat_math, y = gpa_fy)) +
geom_point(size = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "SAT Math Score", y = "Freshman GPA")
## `geom_smooth()` using formula = 'y ~ x'
TASK 2
# Linear models for SAT total and verbal scores
model_sat_total <- lm(gpa_fy ~ sat_total, data = sat_gpa)
model_sat_verbal <- lm(gpa_fy ~ sat_verbal, data = sat_gpa)
# Create regression tables using modelsummary
models <- list(model_sat_total, model_sat_verbal)
modelsummary(models)
| (1) | (2) | |
|---|---|---|
| (Intercept) | 0.002 | 0.701 |
| (0.152) | (0.129) | |
| sat_total | 0.024 | |
| (0.001) | ||
| sat_verbal | 0.036 | |
| (0.003) | ||
| Num.Obs. | 1000 | 1000 |
| R2 | 0.212 | 0.161 |
| R2 Adj. | 0.211 | 0.160 |
| AIC | 2004.8 | 2067.2 |
| BIC | 2019.5 | 2081.9 |
| Log.Lik. | -999.382 | -1030.580 |
| RMSE | 0.66 | 0.68 |
# Create regression tables using sjPlot
tab_model(model_sat_total, model_sat_verbal)
| gpa_fy | gpa_fy | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 0.00 | -0.30 – 0.30 | 0.990 | 0.70 | 0.45 – 0.95 | <0.001 |
| sat total | 0.02 | 0.02 – 0.03 | <0.001 | |||
| sat verbal | 0.04 | 0.03 – 0.04 | <0.001 | |||
| Observations | 1000 | 1000 | ||||
| R2 / R2 adjusted | 0.212 / 0.211 | 0.161 / 0.160 | ||||
Interpretation: Significant relationships are found by the regression analyses for both SAT total and SAT verbal scores as predictors of freshman GPA. The coefficients show that increases in freshmen GPA correspond with increases in SAT scores. Higher SAT scores are linked to higher freshman GPAs, according to the results of both the modelsummary and sjPlot packages; SAT scores are significant predictors. The confidence intervals and p-values support the statistical significance of these findings.
TASK 3
# Load the HappyPlanetIndex dataset
data("HappyPlanetIndex")
world <- HappyPlanetIndex
# Check the first few rows of the dataset
head(world)
## Country Region Happiness LifeExpectancy Footprint HLY HPI HPIRank
## 1 Albania 7 5.5 76.2 2.2 41.7 47.91 54
## 2 Algeria 3 5.6 71.7 1.7 40.1 51.23 40
## 3 Angola 4 4.3 41.7 0.9 17.8 26.78 130
## 4 Argentina 1 7.1 74.8 2.5 53.4 58.95 15
## 5 Armenia 7 5.0 71.7 1.4 36.1 48.28 48
## 6 Australia 2 7.9 80.9 7.8 63.7 36.64 102
## GDPperCapita HDI Population
## 1 5316 0.801 3.15
## 2 7062 0.733 32.85
## 3 2335 0.446 16.10
## 4 14280 0.869 38.75
## 5 4945 0.775 3.02
## 6 31794 0.962 20.40
# Linear model for HLY and GDP per Capita
model_hly_gdp <- lm(HLY ~ GDPperCapita, data = world)
# Regression table using modelsummary
modelsummary(model_hly_gdp)
| (1) | |
|---|---|
| (Intercept) | 31.182 |
| (1.114) | |
| GDPperCapita | 0.001 |
| (0.000) | |
| Num.Obs. | 141 |
| R2 | 0.566 |
| R2 Adj. | 0.563 |
| AIC | 1043.2 |
| BIC | 1052.0 |
| Log.Lik. | -518.576 |
| RMSE | 9.57 |
# Regression table using sjPlot
tab_model(model_hly_gdp)
| HLY | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 31.18 | 28.98 – 33.38 | <0.001 |
| GDPperCapita | 0.00 | 0.00 – 0.00 | <0.001 |
| Observations | 141 | ||
| R2 / R2 adjusted | 0.566 / 0.563 | ||
Interpretation: Happy Life Years (HLY) and GDP per Capita have a significant positive correlation according the regression model. The coefficient indicates that HLY increases as GDP per Capita increases, meaning that greater economic development helps to explain better well-being and life expectancy. Both the modelsummary and sjPlot results are consistent, indicating that this relationship is reliable, as evidenced by statistically significant p-values and narrow confidence intervals.
TASK 4
# Linear models for happiness
model_life_expectancy <- lm(Happiness ~ LifeExpectancy, data = world)
model_footprint <- lm(Happiness ~ Footprint, data = world)
model_gdp_hdi_population <- lm(Happiness ~ GDPperCapita + HDI + Population, data = world)
models_happiness <- list(model_life_expectancy, model_footprint, model_gdp_hdi_population)
# Visualize coefficients using modelplot
modelplot(models_happiness) +
labs(title = "Model Coefficients with Confidence Intervals",
x = "Coefficient Estimate",
y = "Term")
# Visualize coefficients using sjPlot
plot_models(model_life_expectancy, model_footprint, model_gdp_hdi_population, show.values = TRUE, show.p = TRUE)
Interpretation: The visualizations of model coefficients show the
effects of various predictors on happiness. Model 1 shows that higher
life expectancy correlates with higher happiness levels. Model 2 shows a
negative relationship between Ecological Footprint and Happiness,
implying that a larger ecological footprint is linked with lower
happiness levels. Model 3, which includes GDP per capita, Human
Development Index (HDI), and population, shows that HDI and GDP per
capita have a positive effect on happiness, whereas the impact of
population is less clear. Using both ‘modelplot’and ’plot_model’, we see
a consistent and reliable representation of these effects, with
’modelplot’providing a simple view and ’plot_model’providing more
detail.
TASK 5
# Load the Violence dataset
load("~/Downloads/Violence.RData")
# Check the first few rows of the dataset
head(Violence)
## Country Code LandArea Population Energy Rural
## Austria Austria AUT 82450 8.337 33246 32.8
## Belgium Belgium BEL 30280 10.708 58583 2.6
## Guatemala Guatemala GUA 107160 13.686 8072 51.4
## Jamaica Jamaica JAM 10830 2.687 4387 46.7
## Dominican Republic Dominican Republic DOM 48320 9.953 8162 31.0
## South Africa South Africa RSA 1214470 48.793 134489 39.3
## Military Health HIV Internet Developed BirthRate ElderlyPop
## Austria 2.4 15.8 0.3 72.9 2 9.3 17.0
## Belgium 2.7 14.8 0.2 70.5 2 11.7 17.2
## Guatemala 3.6 15.9 0.8 14.3 1 33.0 4.4
## Jamaica 1.6 5.7 1.7 57.3 2 16.7 7.7
## Dominican Republic 3.8 10.4 0.9 20.8 1 22.5 5.9
## South Africa 4.3 10.4 17.9 8.6 2 22.0 4.4
## LifeExpectancy CO2 GDP Cell Electricity
## Austria 80.2 8.1235965 45209.396 145.99132 7944.3892
## Belgium 80.4 9.7927294 43144.343 111.71857 7903.0293
## Guatemala 70.3 0.8702226 2862.367 125.56855 548.1122
## Jamaica 71.8 4.5414469 5274.037 114.83936 1901.6175
## Dominican Republic 72.6 2.2366354 5214.537 89.57889 1358.1914
## South Africa 51.5 8.9332027 7275.344 100.76153 4532.0219
## MurderRate
## Austria 0.55
## Belgium 1.85
## Guatemala 46.00
## Jamaica 59.00
## Dominican Republic 24.80
## South Africa 36.70
# Linear model for murder rate prediction
model_murder_rate <- lm(MurderRate ~ Internet + GDP, data = Violence)
# Regression table using modelsummary
modelsummary(model_murder_rate)
| (1) | |
|---|---|
| (Intercept) | 28.984 |
| (11.930) | |
| Internet | 0.463 |
| (0.438) | |
| GDP | -0.001 |
| (0.001) | |
| Num.Obs. | 8 |
| R2 | 0.602 |
| R2 Adj. | 0.443 |
| AIC | 72.1 |
| BIC | 72.4 |
| Log.Lik. | -32.059 |
| RMSE | 13.31 |
# Regression table using sjPlot
tab_model(model_murder_rate)
| MurderRate | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 28.98 | -1.68 – 59.65 | 0.059 |
| Internet | 0.46 | -0.66 – 1.59 | 0.339 |
| GDP | -0.00 | -0.00 – 0.00 | 0.080 |
| Observations | 8 | ||
| R2 / R2 adjusted | 0.602 / 0.443 | ||
Interpretation: The regression analysis examines how internet penetration and GDP affect murder rates. The positive correlation between internet penetration and murder rates suggests that there may be more complex factors at play, such as how internet access could promote the coordination of criminal activities, or other socioeconomic factors that this model may have overlooked. The GDP coefficient is negative, indicating that higher GDP is associated with lower murder rates, implying that financial stability may help to reduce crime rates. Both ‘modelsummary’ and ‘sjPlot’ provide detailed coefficients, confidence intervals, and p-values to confirm the statistical significance of these relationships.