1.Değişken adlarını Tanımları(Türkçe)
Küçük boy patates kızartması fiyatı
pentree: Yemeğin fiyatı (burger veya tavuk)
wagest: Başlangıç maaşı :
nregs: Kayıt sayısı
hrsopen: Açık kalma saati
emp: Çalışan sayısı
psoda2: Orta boy soda fiyatı
pfries2: Küçük boy patates kızartması fiyatı
pentree2: Yemeğin fiyatı
wagest2: Başlangıç maaşı
nmgrs2: Yönetici sayısı
nregs2: Kayıt sayısı
hrsopen2: Açık kalma saati
emp2: Çalışan sayısı
compown: Şirket sahipliği durumu = 1, eğer şirket sahibi chain: Zincir restoran (BK = 1, KFC = 2, Roy Rogers = 3, Wendy’s = 4)
density: Nüfus yoğunluğu, kasaba
crmrte: Suç oranı, kasaba
state: Eyalet (NJ = 1, PA = 2)
prpblck: Siyah oranı,
prppov: Yoksulluk oranı,
prpncar: Arabası olmayan oranı, oransal
hseval: Medyan konut değeri, oransal
nstores: Mağaza sayısı, oransal
income: Medyan aile geliri,oransal
county: İlçe etiketi
lpsoda: log(psoda)
lpfries: log(pfries)
lhseval: log(hseval)
lincome: log(income)
ldensity: log(density) NJ: =1, New Jersey BK: =1, Burger King ise KFC: =1, Kentucky Fried Chicken ise RR: =1, Roy Rogers ise
ortalama prpblck ve income değerlerini standart sapmalarıyla birlikte bulun. prpblck ve income ölçü birimleri nelerdir?
library(wooldridge)
data("discrim")
head(discrim)
## psoda pfries pentree wagest nmgrs nregs hrsopen emp psoda2 pfries2 pentree2
## 1 1.12 1.06 1.02 4.25 3 5 16.0 27.5 1.11 1.11 1.05
## 2 1.06 0.91 0.95 4.75 3 3 16.5 21.5 1.05 0.89 0.95
## 3 1.06 0.91 0.98 4.25 3 5 18.0 30.0 1.05 0.94 0.98
## 4 1.12 1.02 1.06 5.00 4 5 16.0 27.5 1.15 1.05 1.05
## 5 1.12 NA 0.49 5.00 3 3 16.0 5.0 1.04 1.01 0.58
## 6 1.06 0.95 1.01 4.25 4 4 15.0 17.5 1.05 0.94 1.00
## wagest2 nmgrs2 nregs2 hrsopen2 emp2 compown chain density crmrte state
## 1 5.05 5 5 15.0 27.0 1 3 4030 0.0528866 1
## 2 5.05 4 3 17.5 24.5 0 1 4030 0.0528866 1
## 3 5.05 4 5 17.5 25.0 0 1 11400 0.0360003 1
## 4 5.05 4 5 16.0 NA 0 3 8345 0.0484232 1
## 5 5.05 3 3 16.0 12.0 0 1 720 0.0615890 1
## 6 5.05 3 4 15.0 28.0 0 1 4424 0.0334823 1
## prpblck prppov prpncar hseval nstores income county lpsoda
## 1 0.1711542 0.0365789 0.0788428 148300 3 44534 18 0.11332869
## 2 0.1711542 0.0365789 0.0788428 148300 3 44534 18 0.05826885
## 3 0.0473602 0.0879072 0.2694298 169200 3 41164 12 0.05826885
## 4 0.0528394 0.0591227 0.1366903 171600 3 50366 10 0.11332869
## 5 0.0344800 0.0254145 0.0738020 249100 1 72287 10 0.11332869
## 6 0.0591327 0.0835001 0.1151341 148000 2 44515 18 0.05826885
## lpfries lhseval lincome ldensity NJ BK KFC RR
## 1 0.05826885 11.90699 10.70401 8.301521 1 0 0 1
## 2 -0.09431065 11.90699 10.70401 8.301521 1 1 0 0
## 3 -0.09431065 12.03884 10.62532 9.341369 1 1 0 0
## 4 0.01980261 12.05292 10.82707 9.029418 1 0 0 1
## 5 NA 12.42561 11.18840 6.579251 1 1 0 0
## 6 -0.05129331 11.90497 10.70358 8.394799 1 1 0 0
help("discrim")
## starting httpd help server ... done
mean(discrim$prpblck, na.rm = TRUE)
## [1] 0.1134864
sd(discrim$prpblck,na.rm = TRUE)
## [1] 0.1824165
mean(discrim$income, na.rm = TRUE)
## [1] 47053.78
sd(discrim$income, na.rm = TRUE)
## [1] 13179.29
library(vtable)
## Warning: package 'vtable' was built under R version 4.4.2
## Loading required package: kableExtra
## Warning: package 'kableExtra' was built under R version 4.4.2
sumtable(discrim, summ=c('notNA(x)', 'countNA(x)', 'mean(x)','sd(x)'),out='return')
## Variable NotNA CountNA Mean Sd
## 1 psoda 402 8 1 0.089
## 2 pfries 393 17 0.92 0.11
## 3 pentree 398 12 1.3 0.64
## 4 wagest 390 20 4.6 0.35
## 5 nmgrs 404 6 3.4 1
## 6 nregs 388 22 3.6 1.2
## 7 hrsopen 410 0 14 2.8
## 8 emp 404 6 18 9.4
## 9 psoda2 388 22 1 0.094
## 10 pfries2 382 28 0.94 0.11
## 11 pentree2 386 24 1.4 0.65
## 12 wagest2 389 21 5 0.25
## 13 nmgrs2 404 6 3.5 1.1
## 14 nregs2 388 22 3.6 1.2
## 15 hrsopen2 399 11 14 2.8
## 16 emp2 397 13 18 8.6
## 17 compown 410 0 0.34 0.48
## 18 chain 410 0 2.1 1.1
## 19 density 409 1 4562 5132
## 20 crmrte 409 1 0.053 0.047
## 21 state 410 0 1.2 0.39
## 22 prpblck 409 1 0.11 0.18
## 23 prppov 409 1 0.071 0.067
## 24 prpncar 409 1 0.11 0.12
## 25 hseval 409 1 147399 56070
## 26 nstores 410 0 3.1 1.8
## 27 income 409 1 47054 13179
## 28 county 410 0 14 8
## 29 lpsoda 402 8 0.04 0.085
## 30 lpfries 393 17 -0.088 0.12
## 31 lhseval 409 1 12 0.39
## 32 lincome 409 1 11 0.28
## 33 ldensity 409 1 8 1
## 34 NJ 410 0 0.81 0.39
## 35 BK 410 0 0.42 0.49
## 36 KFC 410 0 0.2 0.4
## 37 RR 410 0 0.24 0.43
BASİT REGRESYON
discrimreg <- lm(psoda~prpblck+income, data = discrim)
summary(discrimreg)
##
## Call:
## lm(formula = psoda ~ prpblck + income, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29401 -0.05242 0.00333 0.04231 0.44322
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.563e-01 1.899e-02 50.354 < 2e-16 ***
## prpblck 1.150e-01 2.600e-02 4.423 1.26e-05 ***
## income 1.603e-06 3.618e-07 4.430 1.22e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08611 on 398 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.06422, Adjusted R-squared: 0.05952
## F-statistic: 13.66 on 2 and 398 DF, p-value: 1.835e-06
basitdiscrimreg <- lm(psoda~prpblck, data = discrim)
summary(basitdiscrimreg)
##
## Call:
## lm(formula = psoda ~ prpblck, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30884 -0.05963 0.01135 0.03206 0.44840
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.03740 0.00519 199.87 < 2e-16 ***
## prpblck 0.06493 0.02396 2.71 0.00702 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0881 on 399 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.01808, Adjusted R-squared: 0.01561
## F-statistic: 7.345 on 1 and 399 DF, p-value: 0.007015
logdiscrimreg <- lm(log(psoda)~prpblck+log(income), data = discrim)
summary(logdiscrimreg)
##
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income), data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.33563 -0.04695 0.00658 0.04334 0.35413
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.79377 0.17943 -4.424 1.25e-05 ***
## prpblck 0.12158 0.02575 4.722 3.24e-06 ***
## log(income) 0.07651 0.01660 4.610 5.43e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0821 on 398 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.06809, Adjusted R-squared: 0.06341
## F-statistic: 14.54 on 2 and 398 DF, p-value: 8.039e-07
paste( (0.2*100)*0.122, "yüzdelik artış")
## [1] "2.44 yüzdelik artış"
logdiscrimregprpov <- lm(log(psoda)~prpblck+log(income)+prppov, data = discrim)
summary(logdiscrimregprpov)
##
## Call:
## lm(formula = log(psoda) ~ prpblck + log(income) + prppov, data = discrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32218 -0.04648 0.00651 0.04272 0.35622
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.46333 0.29371 -4.982 9.4e-07 ***
## prpblck 0.07281 0.03068 2.373 0.0181 *
## log(income) 0.13696 0.02676 5.119 4.8e-07 ***
## prppov 0.38036 0.13279 2.864 0.0044 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08137 on 397 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.08696, Adjusted R-squared: 0.08006
## F-statistic: 12.6 on 3 and 397 DF, p-value: 6.917e-08
Model yorumu Psoda = -1.46 + prpblck0.07281 + log(income)0.13696+prppov*0.13279
Siyahi orani %1 artarken Soda fiyati %0.07 Artmaktadir, Gelir orani %1 yükseldiğinde ise soda fiyati %0.14 artmakta. Yoksulluk orani %1 yukselirken %0.13 yukselmektedir.Sonuç Gelir artınca Soda fiyatları daha fazla artmaktadır.Bölgedeki siyahi orani artınca soda fiyatları normale göre daha pahalı olmakta (oran başına %7 daha pahali) ## F- Testi
f_test <- var.test(log(discrim$psoda),discrim$prpblck)
print(f_test)
##
## F test to compare two variances
##
## data: log(discrim$psoda) and discrim$prpblck
## F = 0.21575, num df = 401, denom df = 408, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.1775050 0.2622688
## sample estimates:
## ratio of variances
## 0.2157458
F istatistiği 0.21575 olarak hesaplanmış, bu da varyans oranının 1’den küçük olduğunu göstermektedir. P-değeri ise 2.2e-16 gibi çok düşük bir değerdir, bu da null hipotezinin (varyansların eşit olduğu) reddedildiğini ve varyanslar arasında anlamlı bir fark olduğunu ortaya koymaktadır. Ayrıca, %95 güven aralığına bakıldığında, varyans oranının 0.1775 ile 0.2623 arasında olduğu belirtilmiştir. Bu da varyansların önemli derecede farklı olduğunu gösteren bir başka bulgudur. Sonuç olarak, log(psoda) ve prpblck değişkenlerinin varyanslarının birbirinden farklı olduğu sonucuna varılmaktadır.