1 + 1[1] 2
Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
1 + 1[1] 2
You can add options to executable code like this
[1] 4
The echo: false option disables the printing of code (only output is displayed).
#title: "discussion"
#author: "Betty Wang"
#date: "2024-09-18"
#output: pdf_document
knitr::opts_chunk$set(echo = TRUE)
## R Markdown
#This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
#When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
library(MASS)
data("Boston")
#The Boston dataset contains 506 observations with 14 variables. It can be used for regression tasks to predict the median value of owner-occupied homes (medv) based on various predictors.\[ medv_i = \beta_0 + \beta_1 crim_i + \epsilon_i \]
beta1 = cov(Boston$medv,Boston$crim)/var(Boston$crim)
beta1[1] -0.4151903
reg1 <- lm(data=Boston,
formula = medv ~ crim)
library(stargazer)
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(reg1, type = "text" )
===============================================
Dependent variable:
---------------------------
medv
-----------------------------------------------
crim -0.415***
(0.044)
Constant 24.033***
(0.409)
-----------------------------------------------
Observations 506
R2 0.151
Adjusted R2 0.149
Residual Std. Error 8.484 (df = 504)
F Statistic 89.486*** (df = 1; 504)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
#Variance measures the spread or dispersion of a set of data points around their mean. Specifically, it quantifies how far the individual data points in a dataset are from the mean of the dataset on average.
#Covariance measures the direction of the linear relationship between two variables. It quantifies how much two random variables vary together. If the variables tend to increase and decrease together, the covariance is positive. If one variable tends to increase when the other decreases, the covariance is negative.
#Dividing the covariance of y and x by the variance of x isolates the effect of x on y and removes the volatility of x from the Cov(x,y). This ratio gives the slope in simple linear regression that quantifies how much y changes, on average, for each one-unit increase in x.
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ dplyr::select() masks MASS::select()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
str(Boston)'data.frame': 506 obs. of 14 variables:
$ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
$ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
$ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
$ chas : int 0 0 0 0 0 0 0 0 0 0 ...
$ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
$ rm : num 6.58 6.42 7.18 7 7.15 ...
$ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
$ dis : num 4.09 4.97 4.97 6.06 6.06 ...
$ rad : int 1 2 2 3 3 3 5 5 5 5 ...
$ tax : num 296 242 242 222 222 222 311 311 311 311 ...
$ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
$ black : num 397 397 393 395 397 ...
$ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
$ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
library(tidyverse)
glimpse(Boston)Rows: 506
Columns: 14
$ crim <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829,…
$ zn <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 1…
$ indus <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.…
$ chas <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ nox <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524,…
$ rm <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631,…
$ age <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 9…
$ dis <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505…
$ rad <int> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,…
$ tax <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 31…
$ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15…
$ black <dbl> 396.90, 396.90, 392.83, 394.63, 396.90, 394.12, 395.60, 396.90…
$ lstat <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10…
$ medv <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15…
# Fit a multivariable linear regression model
model <- lm(medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad + tax + ptratio + black + lstat, data = Boston)
# Summarize the model
summary(model)
Call:
lm(formula = medv ~ crim + zn + indus + chas + nox + rm + age +
dis + rad + tax + ptratio + black + lstat, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-15.595 -2.730 -0.518 1.777 26.199
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.646e+01 5.103e+00 7.144 3.28e-12 ***
crim -1.080e-01 3.286e-02 -3.287 0.001087 **
zn 4.642e-02 1.373e-02 3.382 0.000778 ***
indus 2.056e-02 6.150e-02 0.334 0.738288
chas 2.687e+00 8.616e-01 3.118 0.001925 **
nox -1.777e+01 3.820e+00 -4.651 4.25e-06 ***
rm 3.810e+00 4.179e-01 9.116 < 2e-16 ***
age 6.922e-04 1.321e-02 0.052 0.958229
dis -1.476e+00 1.995e-01 -7.398 6.01e-13 ***
rad 3.060e-01 6.635e-02 4.613 5.07e-06 ***
tax -1.233e-02 3.760e-03 -3.280 0.001112 **
ptratio -9.527e-01 1.308e-01 -7.283 1.31e-12 ***
black 9.312e-03 2.686e-03 3.467 0.000573 ***
lstat -5.248e-01 5.072e-02 -10.347 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.745 on 492 degrees of freedom
Multiple R-squared: 0.7406, Adjusted R-squared: 0.7338
F-statistic: 108.1 on 13 and 492 DF, p-value: < 2.2e-16
lm(formula = medv ~ crim + zn + indus + chas + nox + rm + age +
dis + rad + tax + ptratio + black + lstat, data = Boston)
Call:
lm(formula = medv ~ crim + zn + indus + chas + nox + rm + age +
dis + rad + tax + ptratio + black + lstat, data = Boston)
Coefficients:
(Intercept) crim zn indus chas nox
3.646e+01 -1.080e-01 4.642e-02 2.056e-02 2.687e+00 -1.777e+01
rm age dis rad tax ptratio
3.810e+00 6.922e-04 -1.476e+00 3.060e-01 -1.233e-02 -9.527e-01
black lstat
9.312e-03 -5.248e-01
\[ medv_i = \beta_0 + \beta_1 crim_i +\beta_2 zn_i + \beta_3 indus_i + \beta_4 chas_i + \beta_5 nox_i + \beta_6 rm_i + \beta_7 age_i + \beta_8 dis_i + \beta_9 rad_i + \beta_10 tax_i + \beta_11 ptratio_i + \beta_12 black_i + \beta_13 lstat_i + \epsilon_i \]
y <- as.vector(Boston$medv)
x <- as.matrix(subset(Boston,select=-medv))
int <- rep(x = 1,
times = length(y))
x <- cbind(int, x)
remove(int)
betas <- solve(t(x) %*% x) %*% t(x) %*% y
betas <- round(x = betas, digits = 2)
betas [,1]
int 36.46
crim -0.11
zn 0.05
indus 0.02
chas 2.69
nox -17.77
rm 3.81
age 0.00
dis -1.48
rad 0.31
tax -0.01
ptratio -0.95
black 0.01
lstat -0.52
betas_check <- solve(crossprod(x), crossprod(x,y))
betas == round(x = betas_check, digits = 2) [,1]
int TRUE
crim TRUE
zn TRUE
indus TRUE
chas TRUE
nox TRUE
rm TRUE
age TRUE
dis TRUE
rad TRUE
tax TRUE
ptratio TRUE
black TRUE
lstat TRUE
lm.mod <- lm(medv ~ crim + zn + indus + chas + nox + rm + age + dis + rad + tax + ptratio + black + lstat, data = Boston)
lm.betas <- round(x = lm.mod$coefficients, digits = 2)
?data.frame
results <- data.frame(our.results = betas, lm.results = lm.betas)
print(results) our.results lm.results
int 36.46 36.46
crim -0.11 -0.11
zn 0.05 0.05
indus 0.02 0.02
chas 2.69 2.69
nox -17.77 -17.77
rm 3.81 3.81
age 0.00 0.00
dis -1.48 -1.48
rad 0.31 0.31
tax -0.01 -0.01
ptratio -0.95 -0.95
black 0.01 0.01
lstat -0.52 -0.52
#Assume that Y = Xβ + ϵ, x is the matrix of variables
#1. compute β = (X'X\^-1)X'Y
#2. e = Y - hat Y
#3. calculate the variance of e
#4. calculate the variance of β = Var(e)\*(X'X)\^-1
#5. calculate the standard deviation of β
#The standard error (SE) of the coefficient β in linear regression is an estimate of the standard deviation of the sampling distribution of β-hat (the estimated regression coefficient).