Concerning the anorexia{MASS}, explain the difference between anorexia[,2] anorexia[“Prewt”]
and lm(Postwt ~ Treat, data=anorexia) lm(Postwt ~ as.numeric(Treat), data=anorexia)
library(MASS)
# str(anorexia)
# 擷取anorexia資料中的第2欄位,擷取出來的是vector
head(anorexia[,2])
## [1] 80.7 89.4 91.8 74.0 78.1 88.3
# is.vector(anorexia[,2])
# TRUE
# 擷取anorexia資料中的Prewt欄位,擷取出來的是data frame
head(anorexia["Prewt"])
## Prewt
## 1 80.7
## 2 89.4
## 3 91.8
## 4 74.0
## 5 78.1
## 6 88.3
# is.data.frame(anorexia["Prewt"])
# TRUE
# Although the Treat is integer type
typeof(anorexia$Treat)
## [1] "integer"
## [1] "integer"
# but the structure show Treat is a factor with three level: "CBT","Cont","FT"
str(anorexia)
## 'data.frame': 72 obs. of 3 variables:
## $ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ...
## $ Prewt : num 80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ...
## $ Postwt: num 80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...
# Treat as a factor type
lm(Postwt ~ Treat, data=anorexia)
##
## Call:
## lm(formula = Postwt ~ Treat, data = anorexia)
##
## Coefficients:
## (Intercept) TreatCont TreatFT
## 85.697 -4.589 4.798
# Treat as a numeric type
# this is definitely not a proper data analysis for the context of the 'anorexia'
lm(Postwt ~ as.numeric(Treat), data=anorexia)
##
## Call:
## lm(formula = Postwt ~ as.numeric(Treat), data = anorexia)
##
## Coefficients:
## (Intercept) as.numeric(Treat)
## 82.036 1.711
Find all possible sums from rolling three dice using R. If possible, construct a histogram for the sum of three dice. Is this a probability histogram or an empirical histogram?
set.seed(1234)
# create a function for dice sampling
dice.sum <- function(n.dice)
{
dice <- sample(1:6, size = n.dice, replace = TRUE)
return(sum(dice))
}
# all possible sums from rolling three dice based on 10000 rolls
dice <- replicate(10000, dice.sum(3))
ftable(dice)
## dice 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
##
## 36 131 286 465 723 968 1148 1229 1257 1138 1002 688 453 289 144 43
# histogram plot
dice |> hist(prob=T, main="")
## 2-2 method from class
sample(3:18, size=10000, replace=T,
prob=c(1/216, 3/216, 6/216, 10/216, 15/216, 21/216, 25/216, 27/216,
27/216, 25/216, 21/216, 15/216, 10/216, 6/216, 3/216, 1/216)) |>
hist(prob=T, main="", xlab ="")
# Probability histogram represents "chance".
The IQ and behavior problem page has a dataset and a script of R code chunks. Generate a markdown file from the script to push the output in HTML for posting to course Moodle site. Explain what the code chunks do with your comments in the markdown file.
# An R script for IQ_Beh data set
dta <- read.table("C:/Users/Queen/Desktop/IQ_Beh.txt", header = T, row.names = 1)
#
str(dta)
## 'data.frame': 94 obs. of 3 variables:
## $ Dep: chr "N" "N" "N" "N" ...
## $ IQ : int 103 124 124 104 96 92 124 99 92 116 ...
## $ BP : int 4 12 9 3 3 3 6 4 3 9 ...
#
head(dta)
## Dep IQ BP
## 1 N 103 4
## 2 N 124 12
## 3 N 124 9
## 4 N 104 3
## 5 D 96 3
## 6 N 92 3
#
class(dta)
## [1] "data.frame"
#
dim(dta)
## [1] 94 3
#
names(dta)
## [1] "Dep" "IQ" "BP"
#
is.vector(dta$BP)
## [1] TRUE
#
dta[1, ]
## Dep IQ BP
## 1 N 103 4
#
dta[1:3, "IQ"]
## [1] 103 124 124
#
tail(dta[order(dta$BP), ])
## Dep IQ BP
## 16 N 89 11
## 58 N 117 11
## 66 N 126 11
## 2 N 124 12
## 73 D 99 13
## 12 D 22 17
#
tail(dta[order(-dta$BP), ], 4)
## Dep IQ BP
## 77 N 124 1
## 80 N 121 1
## 24 N 106 0
## 75 N 122 0
#
with(dta, hist(IQ, xlab = "IQ", main = ""))
#
boxplot(BP ~ Dep, data = dta,
xlab = "Depression",
ylab = "Behavior problem score")
#
plot(BP ~ IQ, data = dta, type = "n",
ylab = "Behavior problem score", xlab = "IQ")
text(dta$IQ, dta$BP, labels = dta$Dep, cex = 0.5)
abline(lm(BP ~ IQ, data = dta, subset = Dep == "D"))
abline(lm(BP ~ IQ, data = dta, subset = Dep == "N"), lty = 2)
The usBirths2015.txt is a dataset of monthly births in the US in 2015. Summarize the number of births by season.
# library(dplyr)
dta4 <- read.table("C:/Users/Queen/Desktop/usBirths2015.txt", header = T)
# dta4 <- dta4 |> mutate(season = ifelse(month %in% c("December", "January", "February"), "Winter", ifelse(month %in% c("March", "April", "May"), "Spring", ifelse(month %in% c("June", "July", "August"), "Summer","Fall"))))
dta4$season <- c("Winter", "Winter", "Spring", "Spring", "Spring", "Summer",
"Summer", "Summer", "Fall", "Fall", "Fall", "Winter")
dta4$season <- as.factor(dta4$season)
aggregate(dta4$birth ~ dta4$season, FUN=sum)
## dta4$season dta4$birth
## 1 Fall 1005343
## 2 Spring 977672
## 3 Summer 1035747
## 4 Winter 959735
Ten subjects read a paragraph consisting of seven sentences. The reading time (in seconds) for each sentence was the outcome measure. The predictors are the serial position of the sentence (Sp), the number of words in the sentences (Wrds), and the number of new arguments in the sentence (New).
dta5 <- read.table("C:/Users/Queen/Desktop/readingtimes.txt", header = T)
rank(colMeans(dta5[,5:14]))
## S01 S02 S03 S04 S05 S06 S07 S08 S09 S10
## 5 4 2 3 10 9 8 7 1 6
# how long does it take to read a word for each students
colMeans(dta5[,5:14])/mean(dta5$Wrds)
## S01 S02 S03 S04 S05 S06 S07 S08
## 0.3569259 0.3324568 0.3261728 0.3266049 0.4856790 0.4641852 0.4518395 0.4330864
## S09 S10
## 0.2368765 0.3883457
Answer:0.3802173