I
1.
data <- read.csv("/Users/timyang/Downloads/GoldUP.csv")
data2 <- data[ , c("Gold_Price", "Interest_Rate", "CPI", "USD_Index")]
pairs(data2, pch = 18, col = "steelblue")
library(GGally)
## Loading required package: ggplot2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
#generate the pairs plot
ggpairs(data2)
model <- lm(Gold_Price~ Interest_Rate + CPI + USD_Index, data = data2)
hist(residuals(model), col = "steelblue")
#create fitted value vs residual plot
plot(fitted(model), residuals(model))
#add horizontal line at 0
abline(h = 0, lty = 2)
summary(model)
##
## Call:
## lm(formula = Gold_Price ~ Interest_Rate + CPI + USD_Index, data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4911.6 -1939.2 -633.6 1417.1 13820.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -284.018 1903.514 -0.149 0.88152
## Interest_Rate 491.182 162.744 3.018 0.00282 **
## CPI 380.144 6.642 57.231 < 2e-16 ***
## USD_Index -128.712 16.753 -7.683 4.2e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2923 on 235 degrees of freedom
## Multiple R-squared: 0.938, Adjusted R-squared: 0.9373
## F-statistic: 1186 on 3 and 235 DF, p-value: < 2.2e-16
# my equation is
# Gold_Price= -284.018+491.182*Interest_Rate+380.144*CPI-128.712*USD_Index
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(model, type = "text", title="Descriptive statistics", digits=1, out="table1.txt")
##
## Descriptive statistics
## ===============================================
## Dependent variable:
## ---------------------------
## Gold_Price
## -----------------------------------------------
## Interest_Rate 491.2***
## (162.7)
##
## CPI 380.1***
## (6.6)
##
## USD_Index -128.7***
## (16.8)
##
## Constant -284.0
## (1,903.5)
##
## -----------------------------------------------
## Observations 239
## R2 0.9
## Adjusted R2 0.9
## Residual Std. Error 2,922.9 (df = 235)
## F Statistic 1,186.1*** (df = 3; 235)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
2.
From the pairs plot we can see that the Gold_Price and Interest_Rate appear to have a strong negative linear correlation, Gold_Price and CPI appear to have a strong positive linear correlation, Gold_ Price and USD_Index appear to have a modest negative linear correlation, they are magnitude meaningful to statistical significance
The overall F-statistic of the model is 1186, and the corresponding p-value is 2.2e-16. This indicates that the overall model is statistically significant. In other words, the regression model as a whole is useful. through the residual graph, we are predicting the negative gold price, the relationship between the Gold Price and interest rate is nonlinear, the residual is more equally above or below the zero line.
4.
plot(model)
When met, the Gauss-Markov assumptions enable the ordinary least squares (OLS) estimator to belong to the class of linear estimators known as BLUE (Best Linear Unbiased Estimator). The term “linearity” refers to a linear relationship between the independent and dependent variables. Rigid Exogeneity is the situation in which, given any value of the independent variables, the expected value of the error term is zero. Although it makes hypothesis testing and interval estimation easier, this assumption is not required for OLS estimators to be impartial and effective. Whether these presumptions hold true in a particular analysis relies on the research question and the data’s context. It is crucial to verify these hypotheses with statistical tests and diagnostic instruments.
5.
The unbiased estimates generated by ordinary least squares (OLS) regression have the lowest variance among all potential linear estimators. The linear regression model’s parameters can be estimated using the OLS approach.
II
A linear relationship between the independent variable(s) and the dependent variable is the underlying assumption of linear regression. However, the relationship might not always be linear in real-world scenarios. The relationship can be changed to become more linear by taking the logarithm of one or more variables. Other modeling techniques or transformations can be better suitable in some situations. Furthermore, a grasp of the logarithmic scale is necessary for interpreting coefficients in log-transformed models, and logarithmic transformations can lessen the effect of extreme values or outliers. Taking the logarithm can assist lessen the disproportionate impact that outliers have on the model in linear regression.