Gerekli Paketler

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(broom)
library(wooldridge)
library(rmarkdown)
data("discrim")
head(discrim)
##   psoda pfries pentree wagest nmgrs nregs hrsopen  emp psoda2 pfries2 pentree2
## 1  1.12   1.06    1.02   4.25     3     5    16.0 27.5   1.11    1.11     1.05
## 2  1.06   0.91    0.95   4.75     3     3    16.5 21.5   1.05    0.89     0.95
## 3  1.06   0.91    0.98   4.25     3     5    18.0 30.0   1.05    0.94     0.98
## 4  1.12   1.02    1.06   5.00     4     5    16.0 27.5   1.15    1.05     1.05
## 5  1.12     NA    0.49   5.00     3     3    16.0  5.0   1.04    1.01     0.58
## 6  1.06   0.95    1.01   4.25     4     4    15.0 17.5   1.05    0.94     1.00
##   wagest2 nmgrs2 nregs2 hrsopen2 emp2 compown chain density    crmrte state
## 1    5.05      5      5     15.0 27.0       1     3    4030 0.0528866     1
## 2    5.05      4      3     17.5 24.5       0     1    4030 0.0528866     1
## 3    5.05      4      5     17.5 25.0       0     1   11400 0.0360003     1
## 4    5.05      4      5     16.0   NA       0     3    8345 0.0484232     1
## 5    5.05      3      3     16.0 12.0       0     1     720 0.0615890     1
## 6    5.05      3      4     15.0 28.0       0     1    4424 0.0334823     1
##     prpblck    prppov   prpncar hseval nstores income county     lpsoda
## 1 0.1711542 0.0365789 0.0788428 148300       3  44534     18 0.11332869
## 2 0.1711542 0.0365789 0.0788428 148300       3  44534     18 0.05826885
## 3 0.0473602 0.0879072 0.2694298 169200       3  41164     12 0.05826885
## 4 0.0528394 0.0591227 0.1366903 171600       3  50366     10 0.11332869
## 5 0.0344800 0.0254145 0.0738020 249100       1  72287     10 0.11332869
## 6 0.0591327 0.0835001 0.1151341 148000       2  44515     18 0.05826885
##       lpfries  lhseval  lincome ldensity NJ BK KFC RR
## 1  0.05826885 11.90699 10.70401 8.301521  1  0   0  1
## 2 -0.09431065 11.90699 10.70401 8.301521  1  1   0  0
## 3 -0.09431065 12.03884 10.62532 9.341369  1  1   0  0
## 4  0.01980261 12.05292 10.82707 9.029418  1  0   0  1
## 5          NA 12.42561 11.18840 6.579251  1  1   0  0
## 6 -0.05129331 11.90497 10.70358 8.394799  1  1   0  0

Modelin değişkenlerinin ve prppov değişkeninin anlamları

psoda > sodanın fiyatı ,

prpblck > bölgedeki siyahi oranı ,

income > gelir düzeyi ,

prppov > yoksulluk oranı,

Ortalama prpblck ve income değerlerini standart sapmalarıyla birlikte bulma, prpblck ve income ölçü birimleri.

mean(discrim$prpblck)
## [1] NA
mean(discrim$prpblck)
## [1] NA
mean(discrim$income)
## [1] NA
sd(discrim$income)
## [1] NA
sum(is.na(discrim$prpblck))
## [1] 1
sum(is.na(discrim$income))
## [1] 1
mean(discrim$prpblck,na.rm = TRUE)
## [1] 0.1134864
sd(discrim$prpblck,na.rm = TRUE)
## [1] 0.1824165
mean(discrim$income, na.rm = TRUE)
## [1] 47053.78
sd(discrim$income, na.rm = TRUE)
## [1] 13179.29

prbblck değişkeninin ortalaması 0.113,

standart sapması 0.182,

income değişkeninin ortalaması 47053.78,

standart sapması 13179.29 olacaktır.

library(vtable)
## Zorunlu paket yükleniyor: kableExtra
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
sumtable(discrim, summ=c('notNA(x)', 'countNA(x)', 'mean(x)','sd(x)'),out='return')
##    Variable NotNA CountNA   Mean    Sd
## 1     psoda   402       8      1 0.089
## 2    pfries   393      17   0.92  0.11
## 3   pentree   398      12    1.3  0.64
## 4    wagest   390      20    4.6  0.35
## 5     nmgrs   404       6    3.4     1
## 6     nregs   388      22    3.6   1.2
## 7   hrsopen   410       0     14   2.8
## 8       emp   404       6     18   9.4
## 9    psoda2   388      22      1 0.094
## 10  pfries2   382      28   0.94  0.11
## 11 pentree2   386      24    1.4  0.65
## 12  wagest2   389      21      5  0.25
## 13   nmgrs2   404       6    3.5   1.1
## 14   nregs2   388      22    3.6   1.2
## 15 hrsopen2   399      11     14   2.8
## 16     emp2   397      13     18   8.6
## 17  compown   410       0   0.34  0.48
## 18    chain   410       0    2.1   1.1
## 19  density   409       1   4562  5132
## 20   crmrte   409       1  0.053 0.047
## 21    state   410       0    1.2  0.39
## 22  prpblck   409       1   0.11  0.18
## 23   prppov   409       1  0.071 0.067
## 24  prpncar   409       1   0.11  0.12
## 25   hseval   409       1 147399 56070
## 26  nstores   410       0    3.1   1.8
## 27   income   409       1  47054 13179
## 28   county   410       0     14     8
## 29   lpsoda   402       8   0.04 0.085
## 30  lpfries   393      17 -0.088  0.12
## 31  lhseval   409       1     12  0.39
## 32  lincome   409       1     11  0.28
## 33 ldensity   409       1      8     1
## 34       NJ   410       0   0.81  0.39
## 35       BK   410       0   0.42  0.49
## 36      KFC   410       0    0.2   0.4
## 37       RR   410       0   0.24  0.43

Bu modeli OLS ile tahmin etme ve sonuçları n ve R-Kare dahil olmak üzere denklem biçiminde rapor etme

discrimregr <- lm(psoda ~ prpblck + income, data=discrim)
summary(discrimregr)
## 
## Call:
## lm(formula = psoda ~ prpblck + income, data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.29401 -0.05242  0.00333  0.04231  0.44322 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.563e-01  1.899e-02  50.354  < 2e-16 ***
## prpblck     1.150e-01  2.600e-02   4.423 1.26e-05 ***
## income      1.603e-06  3.618e-07   4.430 1.22e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08611 on 398 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.06422,    Adjusted R-squared:  0.05952 
## F-statistic: 13.66 on 2 and 398 DF,  p-value: 1.835e-06

\(psoda=0.956+0.115prpblck+0.0000016income+u\)

prpblck %10 artarsa, soda fiyatının ekonomik olarak önemli olmayan derecede yaklaşık 1,15 cent artacağını gösterir.

Basit regresyon modeli

basitdiscrimregr <- lm(psoda~prpblck, data = discrim)
summary(basitdiscrimregr)
## 
## Call:
## lm(formula = psoda ~ prpblck, data = discrim)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30884 -0.05963  0.01135  0.03206  0.44840 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.03740    0.00519  199.87  < 2e-16 ***
## prpblck      0.06493    0.02396    2.71  0.00702 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0881 on 399 degrees of freedom
##   (9 observations deleted due to missingness)
## Multiple R-squared:  0.01808,    Adjusted R-squared:  0.01561 
## F-statistic: 7.345 on 1 and 399 DF,  p-value: 0.007015