Concerning the anorexia{MASS}, explain the difference between:
anorexia[,2] anorexia[“Prewt”]
and
lm(Postwt ~ Treat, data=anorexia) lm(Postwt ~ as.numeric(Treat), data=anorexia)
Does the second output make sense in the context of the ‘anorexia’ data analysis?
library(MASS)
str(anorexia)
## 'data.frame': 72 obs. of 3 variables:
## $ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ...
## $ Prewt : num 80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ...
## $ Postwt: num 80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...
head(anorexia)
## Treat Prewt Postwt
## 1 Cont 80.7 80.2
## 2 Cont 89.4 80.1
## 3 Cont 91.8 86.4
## 4 Cont 74.0 86.3
## 5 Cont 78.1 76.1
## 6 Cont 88.3 78.1
anorexia[,2] #[row, column]列出第2個column的值
## [1] 80.7 89.4 91.8 74.0 78.1 88.3 87.3 75.1 80.6 78.4 77.6 88.7 81.3 78.1 70.5
## [16] 77.3 85.2 86.0 84.1 79.7 85.5 84.4 79.6 77.5 72.3 89.0 80.5 84.9 81.5 82.6
## [31] 79.9 88.7 94.9 76.3 81.0 80.5 85.0 89.2 81.3 76.5 70.0 80.4 83.3 83.0 87.7
## [46] 84.2 86.4 76.5 80.2 87.8 83.3 79.7 84.5 80.8 87.4 83.8 83.3 86.0 82.5 86.7
## [61] 79.6 76.9 94.2 73.4 80.5 81.6 82.1 77.6 83.5 89.9 86.0 87.3
Prewt_2 <- anorexia[,2]
anorexia["Prewt"]#列出Prewt這個column
## Prewt
## 1 80.7
## 2 89.4
## 3 91.8
## 4 74.0
## 5 78.1
## 6 88.3
## 7 87.3
## 8 75.1
## 9 80.6
## 10 78.4
## 11 77.6
## 12 88.7
## 13 81.3
## 14 78.1
## 15 70.5
## 16 77.3
## 17 85.2
## 18 86.0
## 19 84.1
## 20 79.7
## 21 85.5
## 22 84.4
## 23 79.6
## 24 77.5
## 25 72.3
## 26 89.0
## 27 80.5
## 28 84.9
## 29 81.5
## 30 82.6
## 31 79.9
## 32 88.7
## 33 94.9
## 34 76.3
## 35 81.0
## 36 80.5
## 37 85.0
## 38 89.2
## 39 81.3
## 40 76.5
## 41 70.0
## 42 80.4
## 43 83.3
## 44 83.0
## 45 87.7
## 46 84.2
## 47 86.4
## 48 76.5
## 49 80.2
## 50 87.8
## 51 83.3
## 52 79.7
## 53 84.5
## 54 80.8
## 55 87.4
## 56 83.8
## 57 83.3
## 58 86.0
## 59 82.5
## 60 86.7
## 61 79.6
## 62 76.9
## 63 94.2
## 64 73.4
## 65 80.5
## 66 81.6
## 67 82.1
## 68 77.6
## 69 83.5
## 70 89.9
## 71 86.0
## 72 87.3
Prewt_p <-anorexia["Prewt"]
結論: 1.column欄是直的,row列是橫的(這個很容易忘記) 2.[, 2]列出第2個column的值 3.[“Prewt”]列出Prewt這個column
class(Prewt_2)
## [1] "numeric"
dim(Prewt_2)
## NULL
typeof(Prewt_2)
## [1] "double"
class(Prewt_p)
## [1] "data.frame"
dim(Prewt_p)
## [1] 72 1
typeof(Prewt_p)
## [1] "list"
anorexia[,2] 和anorexia[“Prewt”]都是提取出相同的數值,但數值的資料型態和表現方式並不相同。
#Postwt的迴歸
lm(Postwt ~ Treat, data=anorexia)
##
## Call:
## lm(formula = Postwt ~ Treat, data = anorexia)
##
## Coefficients:
## (Intercept) TreatCont TreatFT
## 85.697 -4.589 4.798
lm(Postwt ~ as.numeric(Treat), data=anorexia)
##
## Call:
## lm(formula = Postwt ~ as.numeric(Treat), data = anorexia)
##
## Coefficients:
## (Intercept) as.numeric(Treat)
## 82.036 1.711
1.變項Treat在資料中為factor,3 levels “CBT”,“Cont”,“FT”。
2.在Postwt ~ Treat中,把Treat視為類別變項,但在Postwt ~ as.numeric(Treat)中,強迫Treat視為numeric。
3.兩者做出來的迴歸式不相同:
(1)Postwt ~ Treat(參考基準為Treat=CBT): Postwt=85.697-4.589(if TreatCont=1)+4.798(if TreatFT=1)+殘差
(2)Postwt ~ as.numeric(Treat):Postwt=82.036+ 1.711x (as.numeric(Treat))+殘差
4.實際上若以Treat的資料結構,lm(Postwt ~ Treat, data=anorexia)的做法比較正確
Find all possible sums from rolling three dice using R. If possible, construct a histogram for the sum of three dice. Is this a probability histogram or an empirical histogram?
dice_2 <- outer(1:6, 1:6, '+') #outer:數組外積是維度向量
dice_2
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 2 3 4 5 6 7
## [2,] 3 4 5 6 7 8
## [3,] 4 5 6 7 8 9
## [4,] 5 6 7 8 9 10
## [5,] 6 7 8 9 10 11
## [6,] 7 8 9 10 11 12
dice_3 <- outer(outer(1:6, 1:6, '+'), 1:6, '+')
dice_3
## , , 1
##
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 3 4 5 6 7 8
## [2,] 4 5 6 7 8 9
## [3,] 5 6 7 8 9 10
## [4,] 6 7 8 9 10 11
## [5,] 7 8 9 10 11 12
## [6,] 8 9 10 11 12 13
##
## , , 2
##
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 4 5 6 7 8 9
## [2,] 5 6 7 8 9 10
## [3,] 6 7 8 9 10 11
## [4,] 7 8 9 10 11 12
## [5,] 8 9 10 11 12 13
## [6,] 9 10 11 12 13 14
##
## , , 3
##
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 5 6 7 8 9 10
## [2,] 6 7 8 9 10 11
## [3,] 7 8 9 10 11 12
## [4,] 8 9 10 11 12 13
## [5,] 9 10 11 12 13 14
## [6,] 10 11 12 13 14 15
##
## , , 4
##
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 6 7 8 9 10 11
## [2,] 7 8 9 10 11 12
## [3,] 8 9 10 11 12 13
## [4,] 9 10 11 12 13 14
## [5,] 10 11 12 13 14 15
## [6,] 11 12 13 14 15 16
##
## , , 5
##
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 7 8 9 10 11 12
## [2,] 8 9 10 11 12 13
## [3,] 9 10 11 12 13 14
## [4,] 10 11 12 13 14 15
## [5,] 11 12 13 14 15 16
## [6,] 12 13 14 15 16 17
##
## , , 6
##
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 8 9 10 11 12 13
## [2,] 9 10 11 12 13 14
## [3,] 10 11 12 13 14 15
## [4,] 11 12 13 14 15 16
## [5,] 12 13 14 15 16 17
## [6,] 13 14 15 16 17 18
#outer指令結論:
1.outer(1:6, 1:6, ‘+’)表示計算(1:6)x(1:6)的總和
2.outer(outer(1:6, 1:6, ‘+’), 1:6, ‘+’) 表示計算[(1:6)x(1:6)]x(1:6)的總和
dice_3matrix <- c(as.matrix(dice_3))
dice_3matrix
## [1] 3 4 5 6 7 8 4 5 6 7 8 9 5 6 7 8 9 10 6 7 8 9 10 11 7
## [26] 8 9 10 11 12 8 9 10 11 12 13 4 5 6 7 8 9 5 6 7 8 9 10 6 7
## [51] 8 9 10 11 7 8 9 10 11 12 8 9 10 11 12 13 9 10 11 12 13 14 5 6 7
## [76] 8 9 10 6 7 8 9 10 11 7 8 9 10 11 12 8 9 10 11 12 13 9 10 11 12
## [101] 13 14 10 11 12 13 14 15 6 7 8 9 10 11 7 8 9 10 11 12 8 9 10 11 12
## [126] 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 7 8 9 10 11 12
## [151] 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 12
## [176] 13 14 15 16 17 8 9 10 11 12 13 9 10 11 12 13 14 10 11 12 13 14 15 11 12
## [201] 13 14 15 16 12 13 14 15 16 17 13 14 15 16 17 18
table(dice_3matrix)
## dice_3matrix
## 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 1 3 6 10 15 21 25 27 27 25 21 15 10 6 3 1
#三個骰子加總結論: 從table可看出3個骰子加總從(3:18)的次數
hist(x=dice_3matrix,
main="Histogram of Sum of Dices", #圖片的名稱
xlab="Sum", # X軸的名稱
ylab="Frequency")
#三個骰子畫成直方圖結論: 圖片看起來就像是標準常態分配。
The IQ and behavior problem page has a dataset and a script of R code chunks. Generate a markdown file from the script to push the output in HTML for posting to course Moodle site. Explain what the code chunks do with your comments in the markdown file.
dta <- read.table("IQ_Beh.txt", header = T, row.names = 1)
str(dta)
## 'data.frame': 94 obs. of 3 variables:
## $ Dep: chr "N" "N" "N" "N" ...
## $ IQ : int 103 124 124 104 96 92 124 99 92 116 ...
## $ BP : int 4 12 9 3 3 3 6 4 3 9 ...
IQ_Beh屬於data.frame,有94個觀察值,有3個變項。
Dep:character
IQ:integer
BP:integer
head(dta)
## Dep IQ BP
## 1 N 103 4
## 2 N 124 12
## 3 N 124 9
## 4 N 104 3
## 5 D 96 3
## 6 N 92 3
列出前6筆(6 rows)資料
class(dta)
## [1] "data.frame"
dta屬於data.frame,str(dta)已經有描述了。
nrow(dta) #number of row
## [1] 94
ncol(dta) #number of column
## [1] 3
dim(dta)
## [1] 94 3
dim(dta)為94x3 的矩陣 (row x column)
names(dta) #資料框架的名稱
## [1] "Dep" "IQ" "BP"
列出dat的變項名稱,有三項,與str是一致的
is.vector(dta$BP)
## [1] TRUE
BP的資料是vector嗎,結果TRUE。
dta[1, ] #[row, column]
## Dep IQ BP
## 1 N 103 4
擷取出資料第一個row
dta[1:3, "IQ"] #[row, column"IQ"]
## [1] 103 124 124
擷取出資料固定IQ的第1~第3個row 觀察值1-3的IQ分別為103,124,124
tail(dta[order(dta$BP), ]) #tail內定為最後6筆
## Dep IQ BP
## 16 N 89 11
## 58 N 117 11
## 66 N 126 11
## 2 N 124 12
## 73 D 99 13
## 12 D 22 17
summary(dta$BP)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.250 5.000 5.649 7.000 17.000
max(dta$BP)
## [1] 17
擷取出資料以BP小到大排序,列出最後6筆資料 用max和summary檢查一下
tail(dta[order(-dta$BP), ], 4)
## Dep IQ BP
## 77 N 124 1
## 80 N 121 1
## 24 N 106 0
## 75 N 122 0
#order(-dta$BP)由大到小
#tail由最後開始往前列出資料
#4代表4 row,改成10會變成10筆
head(dta[order(-dta$BP), ], 4)
## Dep IQ BP
## 12 D 22 17
## 73 D 99 13
## 2 N 124 12
## 11 N 99 11
dta_BP <-dta[order(-dta$BP), ]
dta_BP
## Dep IQ BP
## 12 D 22 17
## 73 D 99 13
## 2 N 124 12
## 11 N 99 11
## 16 N 89 11
## 58 N 117 11
## 66 N 126 11
## 15 D 100 10
## 29 D 84 10
## 37 N 120 10
## 46 N 101 10
## 56 N 112 10
## 68 N 89 10
## 83 N 110 10
## 3 N 124 9
## 10 N 116 9
## 17 N 125 9
## 18 N 127 9
## 20 N 48 9
## 36 N 109 9
## 22 N 118 8
## 54 N 128 8
## 74 N 99 8
## 14 N 117 7
## 33 N 124 7
## 34 N 110 7
## 42 N 115 7
## 45 N 92 7
## 50 N 127 7
## 51 N 113 7
## 55 N 86 7
## 79 N 114 7
## 92 D 101 7
## 7 N 124 6
## 13 D 81 6
## 31 D 101 6
## 38 N 127 6
## 39 N 103 6
## 61 D 139 6
## 89 N 110 6
## 93 D 121 6
## 19 N 112 5
## 23 D 107 5
## 26 N 117 5
## 40 N 118 5
## 41 N 117 5
## 49 N 119 5
## 63 N 96 5
## 64 D 111 5
## 70 N 134 5
## 71 N 93 5
## 76 N 106 5
## 81 N 119 5
## 88 D 102 5
## 1 N 103 4
## 8 N 99 4
## 25 D 129 4
## 28 N 118 4
## 32 N 141 4
## 43 N 119 4
## 44 N 117 4
## 57 N 115 4
## 60 N 110 4
## 62 N 117 4
## 65 N 118 4
## 69 N 102 4
## 78 N 100 4
## 82 N 108 4
## 84 N 127 4
## 90 N 114 4
## 4 N 104 3
## 5 D 96 3
## 6 N 92 3
## 9 N 92 3
## 35 N 98 3
## 48 N 144 3
## 53 N 103 3
## 59 N 99 3
## 86 N 107 3
## 21 N 139 2
## 27 N 123 2
## 30 N 117 2
## 47 N 119 2
## 52 N 127 2
## 67 N 126 2
## 85 N 118 2
## 87 D 123 2
## 91 N 118 2
## 94 N 114 2
## 72 N 115 1
## 77 N 124 1
## 80 N 121 1
## 24 N 106 0
## 75 N 122 0
擷取出資料以BP小到大排序的最後4筆資料 tail(dta[order(-dta$BP), ], 4)
改成head(dta[order(-dta$BP), ], 4) 沒辦法倒序?
with(dta, hist(IQ, xlab = "IQ", main = ""))
boxplot(BP ~ Dep, data = dta,
xlab = "Depression",
ylab = "Behavior problem score")
plot(IQ ~ BP, data = dta, pch = 20,
xlab = "Behavior problem score", ylab = "IQ")
grid()
knitr::include_graphics("homewok3_error.png")
#提問 這邊加上col = dta$Dep圖會畫不出來…
plot(BP ~ IQ, data = dta, type = "n",
ylab = "Behavior problem score", xlab = "IQ")
#畫BP和IQ的plot, X是IQ、Y是BP
text(dta$IQ, dta$BP, labels = dta$Dep, cex = 0.4)
#用Dep的內文標示點座標,大小cex=0.4
abline(lm(BP ~ IQ, data = dta, subset = Dep == "D"))
#畫BP和IQ與Dep=D的迴歸線,線的種類是lty=3
abline(lm(BP ~ IQ, data = dta, subset = Dep == "N"), lty = 3)
#畫BP和IQ與Dep=N的迴歸線,線的種類是lty=3
The usBirths2015.txt is a dataset of monthly births in the US in 2015. Summarize the number of births by season.
en subjects read a paragraph consisting of seven sentences. The reading time (in seconds) for each sentence was the outcome measure. The predictors are the serial position of the sentence (Sp), the number of words in the sentences (Wrds), and the number of new arguments in the sentence (New). (a) Rank subjects by their reading speeed (b) Estimate, on average, how long does it take to read a word.