1

Concerning the anorexia{MASS}, explain the difference between anorexia[,2] anorexia[“Prewt”]

and lm(Postwt ~ Treat, data=anorexia) lm(Postwt ~ as.numeric(Treat), data=anorexia)

library(MASS)
# str(anorexia)

# 擷取anorexia資料中的第2欄位,擷取出來的是vector
head(anorexia[,2])
## [1] 80.7 89.4 91.8 74.0 78.1 88.3
# is.vector(anorexia[,2])
# TRUE

# 擷取anorexia資料中的Prewt欄位,擷取出來的是data frame
head(anorexia["Prewt"])
##   Prewt
## 1  80.7
## 2  89.4
## 3  91.8
## 4  74.0
## 5  78.1
## 6  88.3
# is.data.frame(anorexia["Prewt"])
# TRUE

# Although the Treat is integer type
typeof(anorexia$Treat)
## [1] "integer"
## [1] "integer"

# but the structure show Treat is a factor with three level: "CBT","Cont","FT"
str(anorexia)
## 'data.frame':    72 obs. of  3 variables:
##  $ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Prewt : num  80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ...
##  $ Postwt: num  80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...
# Treat as a factor type
lm(Postwt ~ Treat, data=anorexia)  
## 
## Call:
## lm(formula = Postwt ~ Treat, data = anorexia)
## 
## Coefficients:
## (Intercept)    TreatCont      TreatFT  
##      85.697       -4.589        4.798
# Treat as a numeric type
# this is definitely not a proper data analysis for the context of the 'anorexia' 
lm(Postwt ~ as.numeric(Treat), data=anorexia)
## 
## Call:
## lm(formula = Postwt ~ as.numeric(Treat), data = anorexia)
## 
## Coefficients:
##       (Intercept)  as.numeric(Treat)  
##            82.036              1.711

2

Find all possible sums from rolling three dice using R. If possible, construct a histogram for the sum of three dice. Is this a probability histogram or an empirical histogram?

2-1 method from Google

set.seed(1234)
# create a function for dice sampling 
dice.sum <- function(n.dice)
  {
  dice <- sample(1:6, size = n.dice, replace = TRUE)
  return(sum(dice))
  }
# all possible sums from rolling three dice based on 10000 rolls
dice <- replicate(10000, dice.sum(3))
ftable(dice)
## dice    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18
##                                                                                     
##        36  131  286  465  723  968 1148 1229 1257 1138 1002  688  453  289  144   43
# histogram plot
dice |> hist(prob=T, main="")

## 2-2 method from class

sample(3:18, size=10000, replace=T, 
       prob=c(1/216, 3/216, 6/216, 10/216, 15/216, 21/216, 25/216, 27/216, 
              27/216, 25/216, 21/216, 15/216, 10/216, 6/216, 3/216, 1/216)) |>
  hist(prob=T, main="", xlab ="")

# Probability histogram represents "chance".

3

The IQ and behavior problem page has a dataset and a script of R code chunks. Generate a markdown file from the script to push the output in HTML for posting to course Moodle site. Explain what the code chunks do with your comments in the markdown file.

# An R script for IQ_Beh data set
dta <- read.table("C:/Users/Queen/Desktop/IQ_Beh.txt", header = T, row.names = 1)

#
str(dta)
## 'data.frame':    94 obs. of  3 variables:
##  $ Dep: chr  "N" "N" "N" "N" ...
##  $ IQ : int  103 124 124 104 96 92 124 99 92 116 ...
##  $ BP : int  4 12 9 3 3 3 6 4 3 9 ...
#
head(dta)
##   Dep  IQ BP
## 1   N 103  4
## 2   N 124 12
## 3   N 124  9
## 4   N 104  3
## 5   D  96  3
## 6   N  92  3
#
class(dta)
## [1] "data.frame"
#
dim(dta)
## [1] 94  3
#
names(dta)
## [1] "Dep" "IQ"  "BP"
#
is.vector(dta$BP)
## [1] TRUE
#
dta[1, ]
##   Dep  IQ BP
## 1   N 103  4
#
dta[1:3, "IQ"]
## [1] 103 124 124
#
tail(dta[order(dta$BP), ])
##    Dep  IQ BP
## 16   N  89 11
## 58   N 117 11
## 66   N 126 11
## 2    N 124 12
## 73   D  99 13
## 12   D  22 17
#
tail(dta[order(-dta$BP), ], 4)
##    Dep  IQ BP
## 77   N 124  1
## 80   N 121  1
## 24   N 106  0
## 75   N 122  0
#
with(dta, hist(IQ, xlab = "IQ", main = ""))

#
boxplot(BP ~ Dep, data = dta, 
        xlab = "Depression", 
        ylab = "Behavior problem score")

#
plot(BP ~ IQ, data = dta, type = "n",
     ylab = "Behavior problem score", xlab = "IQ")
text(dta$IQ, dta$BP, labels = dta$Dep, cex = 0.5)
abline(lm(BP ~ IQ, data = dta, subset = Dep == "D"))
abline(lm(BP ~ IQ, data = dta, subset = Dep == "N"), lty = 2)

4

The usBirths2015.txt is a dataset of monthly births in the US in 2015. Summarize the number of births by season.

# library(dplyr)
dta4 <- read.table("C:/Users/Queen/Desktop/usBirths2015.txt", header = T)
# dta4 <- dta4 |> mutate(season = ifelse(month %in% c("December", "January", "February"), "Winter", ifelse(month %in% c("March", "April", "May"), "Spring", ifelse(month %in% c("June", "July", "August"), "Summer","Fall"))))
dta4$season <- c("Winter", "Winter", "Spring", "Spring", "Spring", "Summer", 
                 "Summer", "Summer", "Fall", "Fall", "Fall", "Winter")
dta4$season <- as.factor(dta4$season)

aggregate(dta4$birth ~ dta4$season, FUN=sum)
##   dta4$season dta4$birth
## 1        Fall    1005343
## 2      Spring     977672
## 3      Summer    1035747
## 4      Winter     959735

5

Ten subjects read a paragraph consisting of seven sentences. The reading time (in seconds) for each sentence was the outcome measure. The predictors are the serial position of the sentence (Sp), the number of words in the sentences (Wrds), and the number of new arguments in the sentence (New).

Input data

dta5 <- read.table("C:/Users/Queen/Desktop/readingtimes.txt", header = T)

(a)Rank

rank(colMeans(dta5[,5:14]))
## S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 
##   5   4   2   3  10   9   8   7   1   6

(b)Average Time

# how long does it take to read a word for each students
colMeans(dta5[,5:14])/mean(dta5$Wrds)
##       S01       S02       S03       S04       S05       S06       S07       S08 
## 0.3569259 0.3324568 0.3261728 0.3266049 0.4856790 0.4641852 0.4518395 0.4330864 
##       S09       S10 
## 0.2368765 0.3883457

Answer:0.3802173