Problem Set 3

TASK 1

# Calculate the relationship between SAT scores and freshman GPA
cor(sat_gpa$gpa_fy, sat_gpa$sat_total)

## [1] 0.460281

Interpretation: A positive correlation suggests that usually higher freshman GPAs correlate with better SAT math scores. The accompanying scatterplot with a regression line graphically supports this positive trend by showing that freshman GPA typically rises as SAT math scores rise.

# Visualize the relationship with a scatterplot and regression line
ggplot(sat_gpa, aes(x = sat_math, y = gpa_fy)) +
  geom_point(size = 0.5) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "SAT Math Score", y = "Freshman GPA")

## `geom_smooth()` using formula = 'y ~ x'

TASK 2

# Linear models for SAT total and verbal scores
model_sat_total <- lm(gpa_fy ~ sat_total, data = sat_gpa)
model_sat_verbal <- lm(gpa_fy ~ sat_verbal, data = sat_gpa)

# Create regression tables using modelsummary
models <- list(model_sat_total, model_sat_verbal)
modelsummary(models)

tinytable_xrgpzd3yguu4op6podis

	(1)	(2)
(Intercept)	0.002	0.701
	(0.152)	(0.129)
sat_total	0.024
	(0.001)
sat_verbal		0.036
		(0.003)
Num.Obs.	1000	1000
R2	0.212	0.161
R2 Adj.	0.211	0.160
AIC	2004.8	2067.2
BIC	2019.5	2081.9
Log.Lik.	-999.382	-1030.580
RMSE	0.66	0.68

# Create regression tables using sjPlot
tab_model(model_sat_total, model_sat_verbal)

	gpa_fy			gpa_fy
Predictors	Estimates	CI	p	Estimates	CI	p
(Intercept)	0.00	-0.30 – 0.30	0.990	0.70	0.45 – 0.95	<0.001
sat total	0.02	0.02 – 0.03	<0.001
sat verbal				0.04	0.03 – 0.04	<0.001
Observations	1000			1000
R² / R² adjusted	0.212 / 0.211			0.161 / 0.160

Interpretation: Significant relationships are found by the regression analyses for both SAT total and SAT verbal scores as predictors of freshman GPA. The coefficients show that increases in freshmen GPA correspond with increases in SAT scores. Higher SAT scores are linked to higher freshman GPAs, according to the results of both the modelsummary and sjPlot packages; SAT scores are significant predictors. The confidence intervals and p-values support the statistical significance of these findings. TASK 3

# Load the HappyPlanetIndex dataset
data("HappyPlanetIndex")
world <- HappyPlanetIndex

# Check the first few rows of the dataset
head(world)

##     Country Region Happiness LifeExpectancy Footprint  HLY   HPI HPIRank
## 1   Albania      7       5.5           76.2       2.2 41.7 47.91      54
## 2   Algeria      3       5.6           71.7       1.7 40.1 51.23      40
## 3    Angola      4       4.3           41.7       0.9 17.8 26.78     130
## 4 Argentina      1       7.1           74.8       2.5 53.4 58.95      15
## 5   Armenia      7       5.0           71.7       1.4 36.1 48.28      48
## 6 Australia      2       7.9           80.9       7.8 63.7 36.64     102
##   GDPperCapita   HDI Population
## 1         5316 0.801       3.15
## 2         7062 0.733      32.85
## 3         2335 0.446      16.10
## 4        14280 0.869      38.75
## 5         4945 0.775       3.02
## 6        31794 0.962      20.40

# Linear model for HLY and GDP per Capita
model_hly_gdp <- lm(HLY ~ GDPperCapita, data = world)

# Regression table using modelsummary
modelsummary(model_hly_gdp)

tinytable_911spxit9vfxq2hro6c2

	(1)
(Intercept)	31.182
	(1.114)
GDPperCapita	0.001
	(0.000)
Num.Obs.	141
R2	0.566
R2 Adj.	0.563
AIC	1043.2
BIC	1052.0
Log.Lik.	-518.576
RMSE	9.57

# Regression table using sjPlot
tab_model(model_hly_gdp)

	HLY
Predictors	Estimates	CI	p
(Intercept)	31.18	28.98 – 33.38	<0.001
GDPperCapita	0.00	0.00 – 0.00	<0.001
Observations	141
R² / R² adjusted	0.566 / 0.563

Interpretation: Happy Life Years (HLY) and GDP per Capita have a significant positive correlation according the regression model. The coefficient indicates that HLY increases as GDP per Capita increases, meaning that greater economic development helps to explain better well-being and life expectancy. Both the modelsummary and sjPlot results are consistent, indicating that this relationship is reliable, as evidenced by statistically significant p-values and narrow confidence intervals. TASK 4

# Linear models for happiness
model_life_expectancy <- lm(Happiness ~ LifeExpectancy, data = world)
model_footprint <- lm(Happiness ~ Footprint, data = world)
model_gdp_hdi_population <- lm(Happiness ~ GDPperCapita + HDI + Population, data = world)
models_happiness <- list(model_life_expectancy, model_footprint, model_gdp_hdi_population)

# Visualize coefficients using modelplot
modelplot(models_happiness) +
  labs(title = "Model Coefficients with Confidence Intervals",
       x = "Coefficient Estimate",
       y = "Term")

# Visualize coefficients using sjPlot
plot_models(model_life_expectancy, model_footprint, model_gdp_hdi_population, show.values = TRUE, show.p = TRUE)

Interpretation: The visualizations of model coefficients show the effects of various predictors on happiness. Model 1 shows that higher life expectancy correlates with higher happiness levels. Model 2 shows a negative relationship between Ecological Footprint and Happiness, implying that a larger ecological footprint is linked with lower happiness levels. Model 3, which includes GDP per capita, Human Development Index (HDI), and population, shows that HDI and GDP per capita have a positive effect on happiness, whereas the impact of population is less clear. Using both ‘modelplot’and ’plot_model’, we see a consistent and reliable representation of these effects, with ’modelplot’providing a simple view and ’plot_model’providing more detail. TASK 5

# Load the Violence dataset
load("~/Downloads/Violence.RData")

# Check the first few rows of the dataset
head(Violence)

##                               Country Code LandArea Population Energy Rural
## Austria                       Austria  AUT    82450      8.337  33246  32.8
## Belgium                       Belgium  BEL    30280     10.708  58583   2.6
## Guatemala                   Guatemala  GUA   107160     13.686   8072  51.4
## Jamaica                       Jamaica  JAM    10830      2.687   4387  46.7
## Dominican Republic Dominican Republic  DOM    48320      9.953   8162  31.0
## South Africa             South Africa  RSA  1214470     48.793 134489  39.3
##                    Military Health  HIV Internet Developed BirthRate ElderlyPop
## Austria                 2.4   15.8  0.3     72.9         2       9.3       17.0
## Belgium                 2.7   14.8  0.2     70.5         2      11.7       17.2
## Guatemala               3.6   15.9  0.8     14.3         1      33.0        4.4
## Jamaica                 1.6    5.7  1.7     57.3         2      16.7        7.7
## Dominican Republic      3.8   10.4  0.9     20.8         1      22.5        5.9
## South Africa            4.3   10.4 17.9      8.6         2      22.0        4.4
##                    LifeExpectancy       CO2       GDP      Cell Electricity
## Austria                      80.2 8.1235965 45209.396 145.99132   7944.3892
## Belgium                      80.4 9.7927294 43144.343 111.71857   7903.0293
## Guatemala                    70.3 0.8702226  2862.367 125.56855    548.1122
## Jamaica                      71.8 4.5414469  5274.037 114.83936   1901.6175
## Dominican Republic           72.6 2.2366354  5214.537  89.57889   1358.1914
## South Africa                 51.5 8.9332027  7275.344 100.76153   4532.0219
##                    MurderRate
## Austria                  0.55
## Belgium                  1.85
## Guatemala               46.00
## Jamaica                 59.00
## Dominican Republic      24.80
## South Africa            36.70

# Linear model for murder rate prediction
model_murder_rate <- lm(MurderRate ~ Internet + GDP, data = Violence)

# Regression table using modelsummary
modelsummary(model_murder_rate)

tinytable_nutwq3f77agozecl8uj3

	(1)
(Intercept)	28.984
	(11.930)
Internet	0.463
	(0.438)
GDP	-0.001
	(0.001)
Num.Obs.	8
R2	0.602
R2 Adj.	0.443
AIC	72.1
BIC	72.4
Log.Lik.	-32.059
RMSE	13.31

# Regression table using sjPlot
tab_model(model_murder_rate)

	MurderRate
Predictors	Estimates	CI	p
(Intercept)	28.98	-1.68 – 59.65	0.059
Internet	0.46	-0.66 – 1.59	0.339
GDP	-0.00	-0.00 – 0.00	0.080
Observations	8
R² / R² adjusted	0.602 / 0.443

Interpretation: The regression analysis examines how internet penetration and GDP affect murder rates. The positive correlation between internet penetration and murder rates suggests that there may be more complex factors at play, such as how internet access could promote the coordination of criminal activities, or other socioeconomic factors that this model may have overlooked. The GDP coefficient is negative, indicating that higher GDP is associated with lower murder rates, implying that financial stability may help to reduce crime rates. Both ‘modelsummary’ and ‘sjPlot’ provide detailed coefficients, confidence intervals, and p-values to confirm the statistical significance of these relationships.

Problem Set 3

Anum Peshimam

2024-07-16