Chatterjee and Hadi (Regression by Examples, 2006) provided a link to the right to work data set on their web page. Display the relationship between Income and Taxes.
fL <- "http://www1.aucegypt.edu/faculty/hadi/RABE5/Data5/P005.txt"
dta_1 <- read.csv(fL, sep='\t')
dplyr::glimpse(dta_1)
## Rows: 38
## Columns: 8
## $ City <chr> "Atlanta", "Austin", "Bakersfield", "Baltimore", "Baton Rouge",~
## $ COL <int> 169, 143, 339, 173, 99, 363, 253, 117, 294, 291, 170, 239, 174,~
## $ PD <int> 414, 239, 43, 951, 255, 1257, 834, 162, 229, 1886, 643, 1295, 3~
## $ URate <dbl> 13.6, 11.0, 23.7, 21.0, 16.0, 24.4, 39.2, 31.5, 18.2, 31.5, 29.~
## $ Pop <int> 1790128, 396891, 349874, 2147850, 411725, 3914071, 1326848, 162~
## $ Taxes <int> 5128, 4303, 4166, 5001, 3965, 4928, 4471, 4813, 4839, 5408, 463~
## $ Income <int> 2961, 1711, 2122, 4654, 1620, 5634, 7213, 5535, 7224, 6113, 480~
## $ RTWL <int> 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, ~
class(dta_1)
## [1] "data.frame"
summary(dta_1)
## City COL PD URate
## Length:38 Min. : 99.0 Min. : 43.0 Min. : 6.50
## Class :character 1st Qu.:170.8 1st Qu.: 302.0 1st Qu.:17.82
## Mode :character Median :205.5 Median : 400.0 Median :24.05
## Mean :223.6 Mean : 780.2 Mean :24.22
## 3rd Qu.:266.5 3rd Qu.: 963.8 3rd Qu.:30.00
## Max. :381.0 Max. :6908.0 Max. :39.20
## Pop Taxes Income RTWL
## Min. : 162304 Min. :3965 Min. : 782 Min. :0.0000
## 1st Qu.: 497050 1st Qu.:4620 1st Qu.:3110 1st Qu.:0.0000
## Median :1408054 Median :4858 Median :4865 Median :0.0000
## Mean :2040736 Mean :4903 Mean :4709 Mean :0.2632
## 3rd Qu.:2355462 3rd Qu.:5166 3rd Qu.:6082 3rd Qu.:0.7500
## Max. :9561089 Max. :6404 Max. :8392 Max. :1.0000
1.同樣是txt檔,但匯入的方法都不太一樣,似乎都要用試試看的?
2.要知道Income and Taxes的關聯,還是需要先知道Income 、Taxes兩個變項的資料型態,稍微看一下summary的結果
cor.test(dta_1$Income,dta_1$Taxes)
##
## Pearson's product-moment correlation
##
## data: dta_1$Income and dta_1$Taxes
## t = 0.33696, df = 36, p-value = 0.7381
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2684231 0.3691383
## sample estimates:
## cor
## 0.0560718
library(ggplot2)
ggplot(data = dta_1, aes(x = Income, y = Taxes)) +
geom_point() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
- 以cor.test做Income和Taxes的檢定(t = 0.33696, df = 36, p-value = 0.7381),表示這兩個其實沒甚麼關聯性存在。
2.畫ggplot並加上regression line,看的出來兩者間真的沒甚麼關聯性。
3.圖示其實很直觀,畫圖就能很直覺的判斷。