The notation, women{datasets}, indicates that a data object by the name women is in the datasets package. This package is preloaded when R is invoked. Explain the difference between c(women) and c(as.matrix(women)) using the women{datasets}.

Women{dataset}

women  #內建於R裡面"dataset" package的資料
##    height weight
## 1      58    115
## 2      59    117
## 3      60    120
## 4      61    123
## 5      62    126
## 6      63    129
## 7      64    132
## 8      65    135
## 9      66    139
## 10     67    142
## 11     68    146
## 12     69    150
## 13     70    154
## 14     71    159
## 15     72    164
names(women) #列出women的標題欄
## [1] "height" "weight"
head(women) #列出前六個觀察值
##   height weight
## 1     58    115
## 2     59    117
## 3     60    120
## 4     61    123
## 5     62    126
## 6     63    129
class(women) #了解一下women資料結構
## [1] "data.frame"

結論:women是一個data.frame # str()

str(women) 
## 'data.frame':    15 obs. of  2 variables:
##  $ height: num  58 59 60 61 62 63 64 65 66 67 ...
##  $ weight: num  115 117 120 123 126 129 132 135 139 142 ...
str(women$height)
##  num [1:15] 58 59 60 61 62 63 64 65 66 67 ...
str(women$weight)
##  num [1:15] 115 117 120 123 126 129 132 135 139 142 ...

summary()

summary(women)
##      height         weight     
##  Min.   :58.0   Min.   :115.0  
##  1st Qu.:61.5   1st Qu.:124.5  
##  Median :65.0   Median :135.0  
##  Mean   :65.0   Mean   :136.7  
##  3rd Qu.:68.5   3rd Qu.:148.0  
##  Max.   :72.0   Max.   :164.0

結論: 1.當不了解資料結構時,可以使用str()或summary()。 2.str()和summary()的不同如上,str偏向資料的型態描述,summary偏向資料的統計描述。

c()

c(women)

c(women)
## $height
##  [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## 
## $weight
##  [1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
str(c(women))
## List of 2
##  $ height: num [1:15] 58 59 60 61 62 63 64 65 66 67 ...
##  $ weight: num [1:15] 115 117 120 123 126 129 132 135 139 142 ...

依據column列出women的資料 1.有height和weight兩個變項 2.有15個觀察值,兩變項中沒有miss data

class(c(women))
## [1] "list"

1.c(women)是由height和weight組成的兩個list。

c(as.matrix())

c(as.matrix(women))

c(as.matrix(women))
##  [1]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 115 117 120 123
## [20] 126 129 132 135 139 142 146 150 154 159 164
str(as.matrix(women))
##  num [1:15, 1:2] 58 59 60 61 62 63 64 65 66 67 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "height" "weight"

1.as.matrix(women)是二維矩陣的結構。 2.是一堆數字 num 排成的15*2的矩陣 [1:15, 1:2],然後列了一些前面的數據最代表,屬性附上了行列名稱(dimnames)用一個list加註在這個矩陣上。 3.?attr列出的特殊屬性代表有:class, comment, dim, dimnames, names, row.names, tsp…,這裡是dimnames。

class(as.matrix(women))
## [1] "matrix" "array"

1.as.matrix(women)是 matrix,也是array(二維的matrix是最簡單的array)。

結論

c(women)和as.matrix(women)雖然資料一樣,但對於R來講,兩者呈現資料方式不同,會影響後面資料處理的做法和判斷。(影響層面多大,我目前還不清楚…)