what we study till now the list is below

RMARKDOWN

1 rmarkdown (why we use rmarkdown in our r program -> when we want to tabulate the data on that time we use rmakdown for exmaple page_table and let’s clarify with example below)

library(wooldridge)
library(rmarkdown)
data("airquality")

now we understand why we use rmarkdown

paged_table(airquality)

SUMMARY()

let’s move to another topic

2 summary() (when We want to pull data from a data in two different ways and the example is below )

summary(lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R  , data = airquality))
## 
## Call:
## lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.014 -12.284  -3.302   8.454  95.348 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -64.11632   23.48249  -2.730  0.00742 ** 
## Temp          1.89579    0.27389   6.922 3.66e-10 ***
## Wind         -3.31844    0.64451  -5.149 1.23e-06 ***
## Month        -3.03996    1.51346  -2.009  0.04714 *  
## Day           0.27388    0.22967   1.192  0.23576    
## Solar.R       0.05027    0.02342   2.147  0.03411 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.86 on 105 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.6249, Adjusted R-squared:  0.6071 
## F-statistic: 34.99 on 5 and 105 DF,  p-value: < 2.2e-16

2

the second way is

lm(airquality$Ozone ~ airquality$Temp + airquality$Wind + airquality$Month + airquality$Day + airquality$Solar.R, data = airquality)
## 
## Call:
## lm(formula = airquality$Ozone ~ airquality$Temp + airquality$Wind + 
##     airquality$Month + airquality$Day + airquality$Solar.R, data = airquality)
## 
## Coefficients:
##        (Intercept)     airquality$Temp     airquality$Wind    airquality$Month  
##          -64.11632             1.89579            -3.31844            -3.03996  
##     airquality$Day  airquality$Solar.R  
##            0.27388             0.05027

STARGAZER

and the other topic we studied is stargazer and what is the the function of stargazer in program (there are many ways of exporting output into nice tables and compare to or more variables but one of them is stargazer, and stargazer can export out data in two ways first out put as text, which allows a quick view of results, and the second is out put as html, which produce editable table for word documents) lets give a example

model1 <- lm(Ozone ~ Temp + Wind + Month + log(Day) + Solar.R, data = airquality)
model2 <- lm(Ozone ~ Temp + log(Wind) + log(Month) + Day + Solar.R, data = airquality  )
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
stargazer(model1, model2, type = "text" )
## 
## ===========================================================
##                                    Dependent variable:     
##                                ----------------------------
##                                           Ozone            
##                                     (1)            (2)     
## -----------------------------------------------------------
## Temp                              1.888***      1.726***   
##                                   (0.276)        (0.264)   
##                                                            
## Wind                             -3.354***                 
##                                   (0.652)                  
##                                                            
## Month                             -3.045**                 
##                                   (1.524)                  
##                                                            
## log(Day)                           1.326                   
##                                   (2.423)                  
##                                                            
## log(Wind)                                      -37.545***  
##                                                  (5.490)   
##                                                            
## log(Month)                                      -20.520**  
##                                                  (9.835)   
##                                                            
## Day                                               0.302    
##                                                  (0.214)   
##                                                            
## Solar.R                           0.049**        0.051**   
##                                   (0.024)        (0.022)   
##                                                            
## Constant                         -61.901**       17.303    
##                                   (24.096)      (28.602)   
##                                                            
## -----------------------------------------------------------
## Observations                        111            111     
## R2                                 0.621          0.676    
## Adjusted R2                        0.603          0.660    
## Residual Std. Error (df = 105)     20.969        19.401    
## F Statistic (df = 5; 105)        34.401***      43.720***  
## ===========================================================
## Note:                           *p<0.1; **p<0.05; ***p<0.01

standardization

in fact we can do standardiztion by adding scale in our regression let see the example below

lm(scale(Ozone) ~ scale(Temp) + scale(Wind) + scale(Month) + scale(Day) + scale(Solar.R), data = airquality )
## 
## Call:
## lm(formula = scale(Ozone) ~ scale(Temp) + scale(Wind) + scale(Month) + 
##     scale(Day) + scale(Solar.R), data = airquality)
## 
## Coefficients:
##    (Intercept)     scale(Temp)     scale(Wind)    scale(Month)      scale(Day)  
##         0.0235          0.5440         -0.3544         -0.1305          0.0736  
## scale(Solar.R)  
##         0.1373

Logarithmic

logaritim in regression give us Returns the percentage (approximately) value of the price

lm(log(Ozone) ~ log(Temp) + log(Wind) + Month + Day + Solar.R, data = airquality )
## 
## Call:
## lm(formula = log(Ozone) ~ log(Temp) + log(Wind) + Month + Day + 
##     Solar.R, data = airquality)
## 
## Coefficients:
## (Intercept)    log(Temp)    log(Wind)        Month          Day      Solar.R  
##  -11.613024     3.745397    -0.655830    -0.041479     0.004507     0.002369

burdan yorumlayablilirz ki wind,month,day, and solar sabit tutuldungundan , temp % 1 artarsa hava kirligi yaklasik yuzde 4 artar

lets use the same equation but add the quadratic term of the variable month and check it out what happen

model11 <- lm(log(Ozone) ~ log(Temp) + log(Wind) + Month + Day + I(Day^2) + Solar.R, data = airquality)
summary(model11)
## 
## Call:
## lm(formula = log(Ozone) ~ log(Temp) + log(Wind) + Month + Day + 
##     I(Day^2) + Solar.R, data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.04667 -0.30038  0.00063  0.28701  1.17827 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.099e+01  2.309e+00  -4.759 6.30e-06 ***
## log(Temp)    3.602e+00  5.211e-01   6.914 3.93e-10 ***
## log(Wind)   -6.122e-01  1.457e-01  -4.202 5.60e-05 ***
## Month       -3.503e-02  3.721e-02  -0.941    0.349    
## Day         -2.313e-02  2.378e-02  -0.973    0.333    
## I(Day^2)     8.608e-04  7.203e-04   1.195    0.235    
## Solar.R      2.430e-03  5.689e-04   4.272 4.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5024 on 104 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.6817, Adjusted R-squared:  0.6633 
## F-statistic: 37.12 on 6 and 104 DF,  p-value: < 2.2e-16

ve burda bakiyoruz ki t istatistimiz 1195 oldu ve anlamli deyibilirz

and in fact we can use other method for rkara lets do it

model12 <- lm(log(Ozone) ~ log(Temp) + log(Wind) + Month + poly(Day,2, raw = TRUE) + Solar.R, data = airquality)
summary(model12)
## 
## Call:
## lm(formula = log(Ozone) ~ log(Temp) + log(Wind) + Month + poly(Day, 
##     2, raw = TRUE) + Solar.R, data = airquality)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.04667 -0.30038  0.00063  0.28701  1.17827 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               -1.099e+01  2.309e+00  -4.759 6.30e-06 ***
## log(Temp)                  3.602e+00  5.211e-01   6.914 3.93e-10 ***
## log(Wind)                 -6.122e-01  1.457e-01  -4.202 5.60e-05 ***
## Month                     -3.503e-02  3.721e-02  -0.941    0.349    
## poly(Day, 2, raw = TRUE)1 -2.313e-02  2.378e-02  -0.973    0.333    
## poly(Day, 2, raw = TRUE)2  8.608e-04  7.203e-04   1.195    0.235    
## Solar.R                    2.430e-03  5.689e-04   4.272 4.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5024 on 104 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.6817, Adjusted R-squared:  0.6633 
## F-statistic: 37.12 on 6 and 104 DF,  p-value: < 2.2e-16

yukardiyi bakabiliriz ki ayni sounc elde etik

move to other topic

ANOVA

useage of anova the anova table will be able to show us which variable adds more explanatory power fo example

library(car)
## Zorunlu paket yükleniyor: carData
## Warning: package 'carData' was built under R version 4.2.2
Anova(model12)
## Anova Table (Type II tests)
## 
## Response: log(Ozone)
##                           Sum Sq  Df F value    Pr(>F)    
## log(Temp)                12.0652   1 47.7967 3.935e-10 ***
## log(Wind)                 4.4578   1 17.6597 5.598e-05 ***
## Month                     0.2237   1  0.8860    0.3487    
## poly(Day, 2, raw = TRUE)  0.5280   2  1.0458    0.3551    
## Solar.R                   4.6073   1 18.2521 4.294e-05 ***
## Residuals                26.2524 104                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

we can see that the temp is mor expalantory variables in this model

INTERCEPT

linear regression with intercept and linear regression with our intercept

linear regression with intercept

model111 <- lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R  , data = airquality)
summary(model111)
## 
## Call:
## lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.014 -12.284  -3.302   8.454  95.348 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -64.11632   23.48249  -2.730  0.00742 ** 
## Temp          1.89579    0.27389   6.922 3.66e-10 ***
## Wind         -3.31844    0.64451  -5.149 1.23e-06 ***
## Month        -3.03996    1.51346  -2.009  0.04714 *  
## Day           0.27388    0.22967   1.192  0.23576    
## Solar.R       0.05027    0.02342   2.147  0.03411 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.86 on 105 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.6249, Adjusted R-squared:  0.6071 
## F-statistic: 34.99 on 5 and 105 DF,  p-value: < 2.2e-16

linear regression without intercept

model22 <- lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R - 1, data = airquality)
summary(model22)
## 
## Call:
## lm(formula = Ozone ~ Temp + Wind + Month + Day + Solar.R - 1, 
##     data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.815 -13.270  -3.467  10.931  90.720 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## Temp     1.30317    0.17206   7.574 1.42e-11 ***
## Wind    -4.49288    0.49437  -9.088 6.32e-15 ***
## Month   -3.57541    1.54572  -2.313   0.0226 *  
## Day      0.13811    0.23095   0.598   0.5511    
## Solar.R  0.05105    0.02412   2.116   0.0367 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.48 on 106 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.8464, Adjusted R-squared:  0.8392 
## F-statistic: 116.8 on 5 and 106 DF,  p-value: < 2.2e-16

plot

now lets study to plot the linear regression

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2
## Warning: package 'forcats' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ✖ dplyr::recode() masks car::recode()
## ✖ purrr::some()   masks car::some()
qplot(airquality$Ozone, airquality$Temp)
## Warning: Removed 37 rows containing missing values (geom_point).

qplot(airquality$Ozone, airquality$Temp) + geom_smooth(method = "lm", se = F)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 37 rows containing non-finite values (stat_smooth).
## Warning: Removed 37 rows containing missing values (geom_point).

model4 <- lm(formula =  Wind ~ Temp + Month + Day + Solar.R, airquality)
coef(model4)
##  (Intercept)         Temp        Month          Day      Solar.R 
## 23.660271478 -0.186385669  0.074952002 -0.011027951  0.002979738
model44 <- lm(formula = Wind ~ Temp + Month + Day + Solar.R, airquality)
cor(airquality$Wind, airquality$Temp)
## [1] -0.4579879

comment

now lets comment our linear regression that we did allready

once agin lets download our regression

library(wooldridge)
data("airquality")
summary(lm(lm(formula = Temp ~ Wind + Month + Day + Solar.R  , data = airquality)))
## 
## Call:
## lm(formula = lm(formula = Temp ~ Wind + Month + Day + Solar.R, 
##     data = airquality))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.9195  -5.7022   0.6468   4.8684  16.9400 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 68.76972    4.39084  15.662  < 2e-16 ***
## Wind        -1.00346    0.17619  -5.695 6.89e-08 ***
## Month        2.22475    0.44082   5.047 1.36e-06 ***
## Day         -0.08358    0.07000  -1.194 0.234439    
## Solar.R      0.02742    0.00687   3.991 0.000105 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.327 on 141 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.3866, Adjusted R-squared:  0.3692 
## F-statistic: 22.22 on 4 and 141 DF,  p-value: 3.073e-14

New York Air Quality Measurements

Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.

bu veri bizi America da New York eyalettın dan doğan hava quality veriyor ve bunu wooldridge bu veri Wind ve teperture kullanarak açıklamak istiyor

aslında bu verıdı hava quality ve rüzgar ve temperture arasında bağlant varmı yokmı onu bakmak ıstıyoruz

ve kontrol altında değişkenler wind tempruture month day dır

bu regressıon da temel yıl 1973 alınmıştır

ve kesen değişkenimiz 68..

burda t ıstatıstımığız çok yüksek çektı kı 15 dır

burada cormuz 0.3 çıktı ki bu aır qualıty ve değışkenlerımız arasınd bağlant olduğun göstreyr

ve en açaklayıcı değışkenımız monthdır

ve deyibiliriz ki null hypothesis red ediyor ve 1 hypothesis anlamlıdır (significanttır)

ve ayı değıştıkçı aır qualıty de değışıyour mesela mayıs ayında hava daha qulıtysı ıyı oldu hama kışı donemında hava kalıtıs bıraz kotu oluyr

ve en son deyibiliriz ki hypothesis significanttır

dplyer

and the last topic is dplyer and dplyer introduces you to dplyrs basic set of tools and shows you how to apply them to data frames. dplyr also supports databases via the dbplyr package lets do some example of it

select, select columns from data

filter, subsets row of data

mutate, creats new colum

arrange, sort new data

group_by, aggregates data

summarise, calculating summary istatistig

ki bunu ben de tamm olarak anlamadım anlamyı çalışıyorm sonur bunu okuyceğız