Exercise_1

Chatterjee and Hadi (Regression by Examples, 2006) provided a link to the right to work data set on their web page. Display the relationship between Income and Taxes.

匯入資料

fL <- "http://www1.aucegypt.edu/faculty/hadi/RABE5/Data5/P005.txt"
dta_1 <- read.csv(fL, sep='\t')
dplyr::glimpse(dta_1)
## Rows: 38
## Columns: 8
## $ City   <chr> "Atlanta", "Austin", "Bakersfield", "Baltimore", "Baton Rouge",~
## $ COL    <int> 169, 143, 339, 173, 99, 363, 253, 117, 294, 291, 170, 239, 174,~
## $ PD     <int> 414, 239, 43, 951, 255, 1257, 834, 162, 229, 1886, 643, 1295, 3~
## $ URate  <dbl> 13.6, 11.0, 23.7, 21.0, 16.0, 24.4, 39.2, 31.5, 18.2, 31.5, 29.~
## $ Pop    <int> 1790128, 396891, 349874, 2147850, 411725, 3914071, 1326848, 162~
## $ Taxes  <int> 5128, 4303, 4166, 5001, 3965, 4928, 4471, 4813, 4839, 5408, 463~
## $ Income <int> 2961, 1711, 2122, 4654, 1620, 5634, 7213, 5535, 7224, 6113, 480~
## $ RTWL   <int> 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, ~
class(dta_1)
## [1] "data.frame"
summary(dta_1)
##      City                COL              PD             URate      
##  Length:38          Min.   : 99.0   Min.   :  43.0   Min.   : 6.50  
##  Class :character   1st Qu.:170.8   1st Qu.: 302.0   1st Qu.:17.82  
##  Mode  :character   Median :205.5   Median : 400.0   Median :24.05  
##                     Mean   :223.6   Mean   : 780.2   Mean   :24.22  
##                     3rd Qu.:266.5   3rd Qu.: 963.8   3rd Qu.:30.00  
##                     Max.   :381.0   Max.   :6908.0   Max.   :39.20  
##       Pop              Taxes          Income          RTWL       
##  Min.   : 162304   Min.   :3965   Min.   : 782   Min.   :0.0000  
##  1st Qu.: 497050   1st Qu.:4620   1st Qu.:3110   1st Qu.:0.0000  
##  Median :1408054   Median :4858   Median :4865   Median :0.0000  
##  Mean   :2040736   Mean   :4903   Mean   :4709   Mean   :0.2632  
##  3rd Qu.:2355462   3rd Qu.:5166   3rd Qu.:6082   3rd Qu.:0.7500  
##  Max.   :9561089   Max.   :6404   Max.   :8392   Max.   :1.0000

匯入資料心得:

1.同樣是txt檔,但匯入的方法都不太一樣,似乎都要用試試看的?

2.要知道Income and Taxes的關聯,還是需要先知道Income 、Taxes兩個變項的資料型態,稍微看一下summary的結果

Income and Taxes關聯:

cor.test(dta_1$Income,dta_1$Taxes)
## 
##  Pearson's product-moment correlation
## 
## data:  dta_1$Income and dta_1$Taxes
## t = 0.33696, df = 36, p-value = 0.7381
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2684231  0.3691383
## sample estimates:
##       cor 
## 0.0560718
library(ggplot2)
ggplot(data = dta_1, aes(x = Income, y = Taxes)) + 
  geom_point() + 
  geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

Income and Taxes關聯的心得:

  1. 以cor.test做Income和Taxes的檢定(t = 0.33696, df = 36, p-value = 0.7381),表示這兩個其實沒甚麼關聯性存在。

2.畫ggplot並加上regression line,看的出來兩者間真的沒甚麼關聯性。

3.圖示其實很直觀,畫圖就能很直覺的判斷。