what we study till now the list is below
RMARKDOWN
1 rmarkdown (why we use rmarkdown in our r program -> when we
want to tabulate the data on that time we use rmakdown for exmaple
page_table and let’s clarify with example below)
library(wooldridge)
library(rmarkdown)
data("airquality")
now we understand why we use rmarkdown
paged_table(airquality)
SUMMARY()
let’s move to another topic
2 summary() (when We want to pull data from a data in two different
ways and the example is below )
summary(lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R , data = airquality))
##
## Call:
## lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R, data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.014 -12.284 -3.302 8.454 95.348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -64.11632 23.48249 -2.730 0.00742 **
## Temp 1.89579 0.27389 6.922 3.66e-10 ***
## Wind -3.31844 0.64451 -5.149 1.23e-06 ***
## Month -3.03996 1.51346 -2.009 0.04714 *
## Day 0.27388 0.22967 1.192 0.23576
## Solar.R 0.05027 0.02342 2.147 0.03411 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.86 on 105 degrees of freedom
## (42 observations deleted due to missingness)
## Multiple R-squared: 0.6249, Adjusted R-squared: 0.6071
## F-statistic: 34.99 on 5 and 105 DF, p-value: < 2.2e-16
2
the second way is
lm(airquality$Ozone ~ airquality$Temp + airquality$Wind + airquality$Month + airquality$Day + airquality$Solar.R, data = airquality)
##
## Call:
## lm(formula = airquality$Ozone ~ airquality$Temp + airquality$Wind +
## airquality$Month + airquality$Day + airquality$Solar.R, data = airquality)
##
## Coefficients:
## (Intercept) airquality$Temp airquality$Wind airquality$Month
## -64.11632 1.89579 -3.31844 -3.03996
## airquality$Day airquality$Solar.R
## 0.27388 0.05027
STARGAZER
and the other topic we studied is stargazer and what is the the
function of stargazer in program (there are many ways of exporting
output into nice tables and compare to or more variables but one of them
is stargazer, and stargazer can export out data in two ways first out
put as text, which allows a quick view of results, and the second is out
put as html, which produce editable table for word documents) lets give
a example
model1 <- lm(Ozone ~ Temp + Wind + Month + log(Day) + Solar.R, data = airquality)
model2 <- lm(Ozone ~ Temp + log(Wind) + log(Month) + Day + Solar.R, data = airquality )
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(model1, model2, type = "text" )
##
## ===========================================================
## Dependent variable:
## ----------------------------
## Ozone
## (1) (2)
## -----------------------------------------------------------
## Temp 1.888*** 1.726***
## (0.276) (0.264)
##
## Wind -3.354***
## (0.652)
##
## Month -3.045**
## (1.524)
##
## log(Day) 1.326
## (2.423)
##
## log(Wind) -37.545***
## (5.490)
##
## log(Month) -20.520**
## (9.835)
##
## Day 0.302
## (0.214)
##
## Solar.R 0.049** 0.051**
## (0.024) (0.022)
##
## Constant -61.901** 17.303
## (24.096) (28.602)
##
## -----------------------------------------------------------
## Observations 111 111
## R2 0.621 0.676
## Adjusted R2 0.603 0.660
## Residual Std. Error (df = 105) 20.969 19.401
## F Statistic (df = 5; 105) 34.401*** 43.720***
## ===========================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
standardization
in fact we can do standardiztion by adding scale in our regression
let see the example below
lm(scale(Ozone) ~ scale(Temp) + scale(Wind) + scale(Month) + scale(Day) + scale(Solar.R), data = airquality )
##
## Call:
## lm(formula = scale(Ozone) ~ scale(Temp) + scale(Wind) + scale(Month) +
## scale(Day) + scale(Solar.R), data = airquality)
##
## Coefficients:
## (Intercept) scale(Temp) scale(Wind) scale(Month) scale(Day)
## 0.0235 0.5440 -0.3544 -0.1305 0.0736
## scale(Solar.R)
## 0.1373
Logarithmic
logaritim in regression give us Returns the percentage
(approximately) value of the price
lm(log(Ozone) ~ log(Temp) + log(Wind) + Month + Day + Solar.R, data = airquality )
##
## Call:
## lm(formula = log(Ozone) ~ log(Temp) + log(Wind) + Month + Day +
## Solar.R, data = airquality)
##
## Coefficients:
## (Intercept) log(Temp) log(Wind) Month Day Solar.R
## -11.613024 3.745397 -0.655830 -0.041479 0.004507 0.002369
burdan yorumlayablilirz ki wind,month,day, and solar sabit
tutuldungundan , temp % 1 artarsa hava kirligi yaklasik yuzde 4
artar
lets use the same equation but add the quadratic term of the
variable month and check it out what happen
model11 <- lm(log(Ozone) ~ log(Temp) + log(Wind) + Month + Day + I(Day^2) + Solar.R, data = airquality)
summary(model11)
##
## Call:
## lm(formula = log(Ozone) ~ log(Temp) + log(Wind) + Month + Day +
## I(Day^2) + Solar.R, data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.04667 -0.30038 0.00063 0.28701 1.17827
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.099e+01 2.309e+00 -4.759 6.30e-06 ***
## log(Temp) 3.602e+00 5.211e-01 6.914 3.93e-10 ***
## log(Wind) -6.122e-01 1.457e-01 -4.202 5.60e-05 ***
## Month -3.503e-02 3.721e-02 -0.941 0.349
## Day -2.313e-02 2.378e-02 -0.973 0.333
## I(Day^2) 8.608e-04 7.203e-04 1.195 0.235
## Solar.R 2.430e-03 5.689e-04 4.272 4.29e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5024 on 104 degrees of freedom
## (42 observations deleted due to missingness)
## Multiple R-squared: 0.6817, Adjusted R-squared: 0.6633
## F-statistic: 37.12 on 6 and 104 DF, p-value: < 2.2e-16
ve burda bakiyoruz ki t istatistimiz 1195 oldu ve anlamli
deyibilirz
and in fact we can use other method for rkara lets do it
model12 <- lm(log(Ozone) ~ log(Temp) + log(Wind) + Month + poly(Day,2, raw = TRUE) + Solar.R, data = airquality)
summary(model12)
##
## Call:
## lm(formula = log(Ozone) ~ log(Temp) + log(Wind) + Month + poly(Day,
## 2, raw = TRUE) + Solar.R, data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.04667 -0.30038 0.00063 0.28701 1.17827
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.099e+01 2.309e+00 -4.759 6.30e-06 ***
## log(Temp) 3.602e+00 5.211e-01 6.914 3.93e-10 ***
## log(Wind) -6.122e-01 1.457e-01 -4.202 5.60e-05 ***
## Month -3.503e-02 3.721e-02 -0.941 0.349
## poly(Day, 2, raw = TRUE)1 -2.313e-02 2.378e-02 -0.973 0.333
## poly(Day, 2, raw = TRUE)2 8.608e-04 7.203e-04 1.195 0.235
## Solar.R 2.430e-03 5.689e-04 4.272 4.29e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5024 on 104 degrees of freedom
## (42 observations deleted due to missingness)
## Multiple R-squared: 0.6817, Adjusted R-squared: 0.6633
## F-statistic: 37.12 on 6 and 104 DF, p-value: < 2.2e-16
yukardiyi bakabiliriz ki ayni sounc elde etik
move to other topic
ANOVA
useage of anova the anova table will be able to show us which
variable adds more explanatory power fo example
library(car)
## Zorunlu paket yükleniyor: carData
## Warning: package 'carData' was built under R version 4.2.2
Anova(model12)
## Anova Table (Type II tests)
##
## Response: log(Ozone)
## Sum Sq Df F value Pr(>F)
## log(Temp) 12.0652 1 47.7967 3.935e-10 ***
## log(Wind) 4.4578 1 17.6597 5.598e-05 ***
## Month 0.2237 1 0.8860 0.3487
## poly(Day, 2, raw = TRUE) 0.5280 2 1.0458 0.3551
## Solar.R 4.6073 1 18.2521 4.294e-05 ***
## Residuals 26.2524 104
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
we can see that the temp is mor expalantory variables in this
model
INTERCEPT
linear regression with intercept and linear regression with our
intercept
linear regression with intercept
model111 <- lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R , data = airquality)
summary(model111)
##
## Call:
## lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R, data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.014 -12.284 -3.302 8.454 95.348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -64.11632 23.48249 -2.730 0.00742 **
## Temp 1.89579 0.27389 6.922 3.66e-10 ***
## Wind -3.31844 0.64451 -5.149 1.23e-06 ***
## Month -3.03996 1.51346 -2.009 0.04714 *
## Day 0.27388 0.22967 1.192 0.23576
## Solar.R 0.05027 0.02342 2.147 0.03411 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20.86 on 105 degrees of freedom
## (42 observations deleted due to missingness)
## Multiple R-squared: 0.6249, Adjusted R-squared: 0.6071
## F-statistic: 34.99 on 5 and 105 DF, p-value: < 2.2e-16
linear regression without intercept
model22 <- lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R - 1, data = airquality)
summary(model22)
##
## Call:
## lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R - 1,
## data = airquality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.815 -13.270 -3.467 10.931 90.720
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## Temp 1.30317 0.17206 7.574 1.42e-11 ***
## Wind -4.49288 0.49437 -9.088 6.32e-15 ***
## Month -3.57541 1.54572 -2.313 0.0226 *
## Day 0.13811 0.23095 0.598 0.5511
## Solar.R 0.05105 0.02412 2.116 0.0367 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.48 on 106 degrees of freedom
## (42 observations deleted due to missingness)
## Multiple R-squared: 0.8464, Adjusted R-squared: 0.8392
## F-statistic: 116.8 on 5 and 106 DF, p-value: < 2.2e-16
plot
now lets study to plot the linear regression
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## Warning: package 'forcats' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::recode() masks car::recode()
## ✖ purrr::some() masks car::some()
qplot(airquality$Ozone, airquality$Temp)
## Warning: Removed 37 rows containing missing values (geom_point).

qplot(airquality$Ozone, airquality$Temp) + geom_smooth(method = "lm", se = F)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 37 rows containing non-finite values (stat_smooth).
## Warning: Removed 37 rows containing missing values (geom_point).

model4 <- lm(formula = Wind ~ Temp + Month + Day + Solar.R, airquality)
coef(model4)
## (Intercept) Temp Month Day Solar.R
## 23.660271478 -0.186385669 0.074952002 -0.011027951 0.002979738
model44 <- lm(formula = Wind ~ Temp + Month + Day + Solar.R, airquality)
cor(airquality$Wind, airquality$Temp)
## [1] -0.4579879
New York Air Quality Measurements
Daily readings of the following air quality values for May 1, 1973
(a Tuesday) to September 30, 1973.
bu veri bizi America da New York eyalettın dan doğan hava quality
veriyor ve bunu wooldridge bu veri Wind ve teperture kullanarak
açıklamak istiyor
aslında bu verıdı hava quality ve rüzgar ve temperture arasında
bağlant varmı yokmı onu bakmak ıstıyoruz
ve kontrol altında değişkenler wind tempruture month day dır
bu regressıon da temel yıl 1973 alınmıştır
ve kesen değişkenimiz 68..
burda t ıstatıstımığız çok yüksek çektı kı 15 dır
burada cormuz 0.3 çıktı ki bu aır qualıty ve değışkenlerımız arasınd
bağlant olduğun göstreyr
ve en açaklayıcı değışkenımız monthdır
ve deyibiliriz ki null hypothesis red ediyor ve 1 hypothesis
anlamlıdır (significanttır)
ve ayı değıştıkçı aır qualıty de değışıyour mesela mayıs ayında hava
daha qulıtysı ıyı oldu hama kışı donemında hava kalıtıs bıraz kotu
oluyr
ve en son deyibiliriz ki hypothesis significanttır
dplyer
and the last topic is dplyer and dplyer introduces you to dplyrs
basic set of tools and shows you how to apply them to data frames. dplyr
also supports databases via the dbplyr package lets do some example of
it
select, select columns from data
filter, subsets row of data
mutate, creats new colum
arrange, sort new data
group_by, aggregates data
summarise, calculating summary istatistig
ki bunu ben de tamm olarak anlamadım anlamyı çalışıyorm sonur bunu
okuyceğız