Homework_1

Concerning the anorexia{MASS}, explain the difference between:

anorexia[,2] anorexia[“Prewt”]

and

lm(Postwt ~ Treat, data=anorexia) lm(Postwt ~ as.numeric(Treat), data=anorexia)

Does the second output make sense in the context of the ‘anorexia’ data analysis?

anorexia的資料型態

library(MASS)
str(anorexia)
## 'data.frame':    72 obs. of  3 variables:
##  $ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Prewt : num  80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ...
##  $ Postwt: num  80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...
head(anorexia)
##   Treat Prewt Postwt
## 1  Cont  80.7   80.2
## 2  Cont  89.4   80.1
## 3  Cont  91.8   86.4
## 4  Cont  74.0   86.3
## 5  Cont  78.1   76.1
## 6  Cont  88.3   78.1
anorexia[,2]  #[row, column]列出第2個column的值
##  [1] 80.7 89.4 91.8 74.0 78.1 88.3 87.3 75.1 80.6 78.4 77.6 88.7 81.3 78.1 70.5
## [16] 77.3 85.2 86.0 84.1 79.7 85.5 84.4 79.6 77.5 72.3 89.0 80.5 84.9 81.5 82.6
## [31] 79.9 88.7 94.9 76.3 81.0 80.5 85.0 89.2 81.3 76.5 70.0 80.4 83.3 83.0 87.7
## [46] 84.2 86.4 76.5 80.2 87.8 83.3 79.7 84.5 80.8 87.4 83.8 83.3 86.0 82.5 86.7
## [61] 79.6 76.9 94.2 73.4 80.5 81.6 82.1 77.6 83.5 89.9 86.0 87.3
Prewt_2 <- anorexia[,2]  
anorexia["Prewt"]#列出Prewt這個column
##    Prewt
## 1   80.7
## 2   89.4
## 3   91.8
## 4   74.0
## 5   78.1
## 6   88.3
## 7   87.3
## 8   75.1
## 9   80.6
## 10  78.4
## 11  77.6
## 12  88.7
## 13  81.3
## 14  78.1
## 15  70.5
## 16  77.3
## 17  85.2
## 18  86.0
## 19  84.1
## 20  79.7
## 21  85.5
## 22  84.4
## 23  79.6
## 24  77.5
## 25  72.3
## 26  89.0
## 27  80.5
## 28  84.9
## 29  81.5
## 30  82.6
## 31  79.9
## 32  88.7
## 33  94.9
## 34  76.3
## 35  81.0
## 36  80.5
## 37  85.0
## 38  89.2
## 39  81.3
## 40  76.5
## 41  70.0
## 42  80.4
## 43  83.3
## 44  83.0
## 45  87.7
## 46  84.2
## 47  86.4
## 48  76.5
## 49  80.2
## 50  87.8
## 51  83.3
## 52  79.7
## 53  84.5
## 54  80.8
## 55  87.4
## 56  83.8
## 57  83.3
## 58  86.0
## 59  82.5
## 60  86.7
## 61  79.6
## 62  76.9
## 63  94.2
## 64  73.4
## 65  80.5
## 66  81.6
## 67  82.1
## 68  77.6
## 69  83.5
## 70  89.9
## 71  86.0
## 72  87.3
Prewt_p <-anorexia["Prewt"]  

結論: 1.column欄是直的,row列是橫的(這個很容易忘記) 2.[, 2]列出第2個column的值 3.[“Prewt”]列出Prewt這個column

class(Prewt_2)
## [1] "numeric"
dim(Prewt_2)
## NULL
typeof(Prewt_2)
## [1] "double"
class(Prewt_p)
## [1] "data.frame"
dim(Prewt_p)
## [1] 72  1
typeof(Prewt_p)
## [1] "list"

anorexia資料型態結論:

anorexia[,2]anorexia[“Prewt”]都是提取出相同的數值,但數值的資料型態和表現方式並不相同。

#Postwt的迴歸

lm(Postwt ~ Treat, data=anorexia)
## 
## Call:
## lm(formula = Postwt ~ Treat, data = anorexia)
## 
## Coefficients:
## (Intercept)    TreatCont      TreatFT  
##      85.697       -4.589        4.798
lm(Postwt ~ as.numeric(Treat), data=anorexia)
## 
## Call:
## lm(formula = Postwt ~ as.numeric(Treat), data = anorexia)
## 
## Coefficients:
##       (Intercept)  as.numeric(Treat)  
##            82.036              1.711

Postwt的迴歸結論

1.變項Treat在資料中為factor,3 levels “CBT”,“Cont”,“FT”。

2.在Postwt ~ Treat中,把Treat視為類別變項,但在Postwt ~ as.numeric(Treat)中,強迫Treat視為numeric。

3.兩者做出來的迴歸式不相同:

(1)Postwt ~ Treat(參考基準為Treat=CBT): Postwt=85.697-4.589(if TreatCont=1)+4.798(if TreatFT=1)+殘差

(2)Postwt ~ as.numeric(Treat):Postwt=82.036+ 1.711x (as.numeric(Treat))+殘差

4.實際上若以Treat的資料結構,lm(Postwt ~ Treat, data=anorexia)的做法比較正確

Homework_2

Find all possible sums from rolling three dice using R. If possible, construct a histogram for the sum of three dice. Is this a probability histogram or an empirical histogram?

dice_2 <- outer(1:6, 1:6, '+') #outer:數組外積是維度向量
dice_2
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    2    3    4    5    6    7
## [2,]    3    4    5    6    7    8
## [3,]    4    5    6    7    8    9
## [4,]    5    6    7    8    9   10
## [5,]    6    7    8    9   10   11
## [6,]    7    8    9   10   11   12
dice_3 <- outer(outer(1:6, 1:6,  '+'), 1:6, '+')  
dice_3
## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    3    4    5    6    7    8
## [2,]    4    5    6    7    8    9
## [3,]    5    6    7    8    9   10
## [4,]    6    7    8    9   10   11
## [5,]    7    8    9   10   11   12
## [6,]    8    9   10   11   12   13
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    4    5    6    7    8    9
## [2,]    5    6    7    8    9   10
## [3,]    6    7    8    9   10   11
## [4,]    7    8    9   10   11   12
## [5,]    8    9   10   11   12   13
## [6,]    9   10   11   12   13   14
## 
## , , 3
## 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    5    6    7    8    9   10
## [2,]    6    7    8    9   10   11
## [3,]    7    8    9   10   11   12
## [4,]    8    9   10   11   12   13
## [5,]    9   10   11   12   13   14
## [6,]   10   11   12   13   14   15
## 
## , , 4
## 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    6    7    8    9   10   11
## [2,]    7    8    9   10   11   12
## [3,]    8    9   10   11   12   13
## [4,]    9   10   11   12   13   14
## [5,]   10   11   12   13   14   15
## [6,]   11   12   13   14   15   16
## 
## , , 5
## 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    7    8    9   10   11   12
## [2,]    8    9   10   11   12   13
## [3,]    9   10   11   12   13   14
## [4,]   10   11   12   13   14   15
## [5,]   11   12   13   14   15   16
## [6,]   12   13   14   15   16   17
## 
## , , 6
## 
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    8    9   10   11   12   13
## [2,]    9   10   11   12   13   14
## [3,]   10   11   12   13   14   15
## [4,]   11   12   13   14   15   16
## [5,]   12   13   14   15   16   17
## [6,]   13   14   15   16   17   18

#outer指令結論:

1.outer(1:6, 1:6, ‘+’)表示計算(1:6)x(1:6)的總和

2.outer(outer(1:6, 1:6, ‘+’), 1:6, ‘+’) 表示計算[(1:6)x(1:6)]x(1:6)的總和

dice_3matrix <- c(as.matrix(dice_3))
dice_3matrix
##   [1]  3  4  5  6  7  8  4  5  6  7  8  9  5  6  7  8  9 10  6  7  8  9 10 11  7
##  [26]  8  9 10 11 12  8  9 10 11 12 13  4  5  6  7  8  9  5  6  7  8  9 10  6  7
##  [51]  8  9 10 11  7  8  9 10 11 12  8  9 10 11 12 13  9 10 11 12 13 14  5  6  7
##  [76]  8  9 10  6  7  8  9 10 11  7  8  9 10 11 12  8  9 10 11 12 13  9 10 11 12
## [101] 13 14 10 11 12 13 14 15  6  7  8  9 10 11  7  8  9 10 11 12  8  9 10 11 12
## [126] 13  9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16  7  8  9 10 11 12
## [151]  8  9 10 11 12 13  9 10 11 12 13 14 10 11 12 13 14 15 11 12 13 14 15 16 12
## [176] 13 14 15 16 17  8  9 10 11 12 13  9 10 11 12 13 14 10 11 12 13 14 15 11 12
## [201] 13 14 15 16 12 13 14 15 16 17 13 14 15 16 17 18
table(dice_3matrix)
## dice_3matrix
##  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 
##  1  3  6 10 15 21 25 27 27 25 21 15 10  6  3  1

#三個骰子加總結論: 從table可看出3個骰子加總從(3:18)的次數

hist(x=dice_3matrix, 
     main="Histogram of Sum of Dices",  #圖片的名稱
     xlab="Sum",                      # X軸的名稱
     ylab="Frequency")   

#三個骰子畫成直方圖結論: 圖片看起來就像是標準常態分配。

Homework_3

The IQ and behavior problem page has a dataset and a script of R code chunks. Generate a markdown file from the script to push the output in HTML for posting to course Moodle site. Explain what the code chunks do with your comments in the markdown file.

載入資料IQ_Beh

dta <- read.table("IQ_Beh.txt", header = T, row.names = 1)
str(dta)
## 'data.frame':    94 obs. of  3 variables:
##  $ Dep: chr  "N" "N" "N" "N" ...
##  $ IQ : int  103 124 124 104 96 92 124 99 92 116 ...
##  $ BP : int  4 12 9 3 3 3 6 4 3 9 ...

IQ_Beh屬於data.frame,有94個觀察值,有3個變項。

Dep:character

IQ:integer

BP:integer

head(dta)
##   Dep  IQ BP
## 1   N 103  4
## 2   N 124 12
## 3   N 124  9
## 4   N 104  3
## 5   D  96  3
## 6   N  92  3

列出前6筆(6 rows)資料

class(dta)
## [1] "data.frame"

dta屬於data.frame,str(dta)已經有描述了。

nrow(dta) #number of row
## [1] 94
ncol(dta) #number of column
## [1] 3
dim(dta) 
## [1] 94  3

dim(dta)為94x3 的矩陣 (row x column)

names(dta) #資料框架的名稱
## [1] "Dep" "IQ"  "BP"

列出dat的變項名稱,有三項,與str是一致的

is.vector(dta$BP)
## [1] TRUE

BP的資料是vector嗎,結果TRUE。

dta[1, ] #[row, column]
##   Dep  IQ BP
## 1   N 103  4

擷取出資料第一個row

dta[1:3, "IQ"]  #[row, column"IQ"]
## [1] 103 124 124

擷取出資料固定IQ的第1~第3個row 觀察值1-3的IQ分別為103,124,124

tail(dta[order(dta$BP), ]) #tail內定為最後6筆
##    Dep  IQ BP
## 16   N  89 11
## 58   N 117 11
## 66   N 126 11
## 2    N 124 12
## 73   D  99 13
## 12   D  22 17
summary(dta$BP)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   3.250   5.000   5.649   7.000  17.000
max(dta$BP)
## [1] 17

擷取出資料以BP小到大排序,列出最後6筆資料 用max和summary檢查一下

tail(dta[order(-dta$BP), ], 4)  
##    Dep  IQ BP
## 77   N 124  1
## 80   N 121  1
## 24   N 106  0
## 75   N 122  0
#order(-dta$BP)由大到小
#tail由最後開始往前列出資料
#4代表4 row,改成10會變成10筆

head(dta[order(-dta$BP), ], 4)  
##    Dep  IQ BP
## 12   D  22 17
## 73   D  99 13
## 2    N 124 12
## 11   N  99 11
dta_BP <-dta[order(-dta$BP), ]
dta_BP
##    Dep  IQ BP
## 12   D  22 17
## 73   D  99 13
## 2    N 124 12
## 11   N  99 11
## 16   N  89 11
## 58   N 117 11
## 66   N 126 11
## 15   D 100 10
## 29   D  84 10
## 37   N 120 10
## 46   N 101 10
## 56   N 112 10
## 68   N  89 10
## 83   N 110 10
## 3    N 124  9
## 10   N 116  9
## 17   N 125  9
## 18   N 127  9
## 20   N  48  9
## 36   N 109  9
## 22   N 118  8
## 54   N 128  8
## 74   N  99  8
## 14   N 117  7
## 33   N 124  7
## 34   N 110  7
## 42   N 115  7
## 45   N  92  7
## 50   N 127  7
## 51   N 113  7
## 55   N  86  7
## 79   N 114  7
## 92   D 101  7
## 7    N 124  6
## 13   D  81  6
## 31   D 101  6
## 38   N 127  6
## 39   N 103  6
## 61   D 139  6
## 89   N 110  6
## 93   D 121  6
## 19   N 112  5
## 23   D 107  5
## 26   N 117  5
## 40   N 118  5
## 41   N 117  5
## 49   N 119  5
## 63   N  96  5
## 64   D 111  5
## 70   N 134  5
## 71   N  93  5
## 76   N 106  5
## 81   N 119  5
## 88   D 102  5
## 1    N 103  4
## 8    N  99  4
## 25   D 129  4
## 28   N 118  4
## 32   N 141  4
## 43   N 119  4
## 44   N 117  4
## 57   N 115  4
## 60   N 110  4
## 62   N 117  4
## 65   N 118  4
## 69   N 102  4
## 78   N 100  4
## 82   N 108  4
## 84   N 127  4
## 90   N 114  4
## 4    N 104  3
## 5    D  96  3
## 6    N  92  3
## 9    N  92  3
## 35   N  98  3
## 48   N 144  3
## 53   N 103  3
## 59   N  99  3
## 86   N 107  3
## 21   N 139  2
## 27   N 123  2
## 30   N 117  2
## 47   N 119  2
## 52   N 127  2
## 67   N 126  2
## 85   N 118  2
## 87   D 123  2
## 91   N 118  2
## 94   N 114  2
## 72   N 115  1
## 77   N 124  1
## 80   N 121  1
## 24   N 106  0
## 75   N 122  0

擷取出資料以BP小到大排序的最後4筆資料 tail(dta[order(-dta$BP), ], 4)
改成head(dta[order(-dta$BP), ], 4) 沒辦法倒序?

with(dta, hist(IQ, xlab = "IQ", main = ""))

boxplot(BP ~ Dep, data = dta, 
        xlab = "Depression", 
        ylab = "Behavior problem score")

plot(IQ ~ BP, data = dta, pch = 20, 
     xlab = "Behavior problem score", ylab = "IQ")
grid()

knitr::include_graphics("homewok3_error.png")

#提問 這邊加上col = dta$Dep圖會畫不出來…

plot(BP ~ IQ, data = dta, type = "n",
     ylab = "Behavior problem score", xlab = "IQ")
#畫BP和IQ的plot, X是IQ、Y是BP
text(dta$IQ, dta$BP, labels = dta$Dep, cex = 0.4)
#用Dep的內文標示點座標,大小cex=0.4
abline(lm(BP ~ IQ, data = dta, subset = Dep == "D"))
#畫BP和IQ與Dep=D的迴歸線,線的種類是lty=3
abline(lm(BP ~ IQ, data = dta, subset = Dep == "N"), lty = 3)

#畫BP和IQ與Dep=N的迴歸線,線的種類是lty=3

Homework_4:Haven’t finised yet

The usBirths2015.txt is a dataset of monthly births in the US in 2015. Summarize the number of births by season.

Homework_5:Haven’t finised yet

en subjects read a paragraph consisting of seven sentences. The reading time (in seconds) for each sentence was the outcome measure. The predictors are the serial position of the sentence (Sp), the number of words in the sentences (Wrds), and the number of new arguments in the sentence (New). (a) Rank subjects by their reading speeed (b) Estimate, on average, how long does it take to read a word.