In-class exercises1
Summarize the backpain{HSAUR3} into the following format: driver suburban case control total no no ? ? ? no yes ? ? ? yes no ? ? ? yes yes ? ? ?
You should provide comments for each code chunk.
看檔案
[1] "ID" "status" "driver" "suburban"
開tidyverse
將原本檔案格式設定為指定格式,其中roup_by()函數的功能為設定分組依據,通常與summarise()聚合變數合併使用
spread() 函數:turn rows into columns
is.na() 測試資料中是否含有遺漏值
dta <- dta %>% group_by(driver, suburban)%>%tidyr::spread(key= 'status', value = 'status') %>%
summarize(case = sum(is.na(case)),
control = sum(is.na(control)),
total = n()) %>%
as.data.frame
head(dta) driver suburban case control total
1 no no 38 17 64
2 no yes 5 4 11
3 yes no 43 44 107
4 yes yes 37 58 158
In-class exercises2
Merge the two data sets: state.x77{datasets} and USArrests{datasets} and compute all pair-wise correlations for numerical variables. Is there anything interesting to report?
將state.x77與 USArrests分別設定
Population Income Illiteracy Life Exp Murder HS Grad Frost Area
Alabama 3615 3624 2.1 69.05 15.1 41.3 20 50708
Alaska 365 6315 1.5 69.31 11.3 66.7 152 566432
Arizona 2212 4530 1.8 70.55 7.8 58.1 15 113417
Murder Assault UrbanPop Rape
West Virginia 5.7 81 39 9.3
Wisconsin 2.6 53 66 10.8
Wyoming 6.8 161 60 15.6
看兩個檔案裡頭分別有甚麼
[1] "Population" "Income" "Illiteracy" "Life Exp" "Murder"
[6] "HS Grad" "Frost" "Area"
[1] "Murder" "Assault" "UrbanPop" "Rape"
將兩個檔案的分別做相關
分別檢視兩個檔案的相關
Population Income Illiteracy Life Exp Murder HS Grad
Population 1.00000000 0.2082276 0.1076224 -0.06805195 0.3436428 -0.09848975
Income 0.20822756 1.0000000 -0.4370752 0.34025534 -0.2300776 0.61993232
Illiteracy 0.10762237 -0.4370752 1.0000000 -0.58847793 0.7029752 -0.65718861
Life Exp -0.06805195 0.3402553 -0.5884779 1.00000000 -0.7808458 0.58221620
Murder 0.34364275 -0.2300776 0.7029752 -0.78084575 1.0000000 -0.48797102
HS Grad -0.09848975 0.6199323 -0.6571886 0.58221620 -0.4879710 1.00000000
Frost Area
Population -0.3321525 0.02254384
Income 0.2262822 0.36331544
Illiteracy -0.6719470 0.07726113
Life Exp 0.2620680 -0.10733194
Murder -0.5388834 0.22839021
HS Grad 0.3667797 0.33354187
在Alabama、Alaska、Arizona區域,大於0.7的高相關有 Illiteracy X Murder、Murder X Life Exp
Murder Assault UrbanPop Rape
Murder 1.00000000 0.8018733 0.06957262 0.5635788
Assault 0.80187331 1.0000000 0.25887170 0.6652412
UrbanPop 0.06957262 0.2588717 1.00000000 0.4113412
Rape 0.56357883 0.6652412 0.41134124 1.0000000
在West Virginia、Wisconsin、Wyoming,大於0.7的高相關有 Murder X Assault
Murder在state.x77和Illiteracy、Life Exp有高度相關,在USArrests和Assault有高度相關
如果直接合併
Murder Population Income Illiteracy Life Exp HS Grad Frost Area Assault
1 0.8 NA NA NA NA NA NA NA 45
2 1.4 637 5087 0.8 72.78 50.3 186 69273 NA
3 1.7 681 4167 0.5 72.08 53.3 172 75955 NA
4 2.1 NA NA NA NA NA NA NA 83
5 2.1 NA NA NA NA NA NA NA 57
6 2.2 NA NA NA NA NA NA NA 56
UrbanPop Rape
1 44 7.3
2 NA NA
3 NA NA
4 51 7.8
5 56 9.5
6 57 11.3
會有一堆數據跑不出來
但是…
Murder Population Income Illiteracy Life Exp HS Grad Frost Area Assault
1 2.7 1058 3694 0.7 70.39 54.7 161 30920 72
2 3.3 5814 4755 1.1 71.83 58.5 103 7826 110
3 3.3 812 4281 0.7 71.23 57.6 174 9027 110
4 4.3 3559 4864 0.6 71.72 63.5 32 66570 102
5 5.3 813 4119 0.6 71.87 59.5 126 82677 46
6 6.8 2541 4884 0.7 72.06 63.9 166 103766 161
UrbanPop Rape
1 66 14.9
2 77 11.1
3 77 11.1
4 62 16.5
5 83 20.2
6 60 15.6
我這樣就跑出來啦~~~(但是數字不正確)
In-class exercises3
Supply comments to each code chunk in the following survey rmarkdown file and preview it as an R notebook or knit to html.
https://rpubs.com/Onevoice/In-class_exercises_3
In-class exercises4
The data set Vocab{car} gives observations on gender, education and vocabulary, from respondents to U.S. General Social Surveys, 1972-2004. Summarize the relationship between education and vocabulary over the years by gender.
將Vocab{car}開出來
year sex education vocabulary
19740001 1974 Male 14 9
19740002 1974 Male 16 9
19740003 1974 Female 10 9
19740004 1974 Female 10 5
19740005 1974 Female 12 8
19740006 1974 Male 16 8
將此資料定義為dta,使用lattice套件,將每一年的資料單獨分割出來,各製比較圖
dta<- carData::Vocab
pacman::p_load(lattice)
dta1974 <- subset(dta, dta$year=="1974")
xyplot(vocabulary ~ education, groups=sex, data=dta1974, type=c("p", "g"), auto.key=list(columns=2))dta1984 <- subset(dta, dta$year=="1984")
xyplot(vocabulary ~ education, groups=sex, data=dta1984, type=c("p", "g"), auto.key=list(columns=2))dta1994 <- subset(dta, dta$year=="1994")
xyplot(vocabulary ~ education, groups=sex, data=dta1994, type=c("p", "g"), auto.key=list(columns=2))dta2004 <- subset(dta, dta$year=="2004")
xyplot(vocabulary ~ education, groups=sex, data=dta2004, type=c("p", "g"), auto.key=list(columns=2))放在一起互相比較
隨著時代進展,認識的單字量增加了
再將男性與女性資料各自分割製表
malec <- subset(dta, dta$sex=="Male")
lapply(split(malec, malec$year), function(x) coef(lm(x$vocabulary ~ x$education)))$`1974`
(Intercept) x$education
1.5318434 0.3713183
$`1976`
(Intercept) x$education
1.6342960 0.3555403
$`1978`
(Intercept) x$education
0.9762161 0.3963762
$`1982`
(Intercept) x$education
0.9730291 0.3832637
$`1984`
(Intercept) x$education
1.678465 0.337124
$`1987`
(Intercept) x$education
0.8103651 0.3818373
$`1988`
(Intercept) x$education
1.0459936 0.3592442
$`1989`
(Intercept) x$education
1.0596176 0.3708525
$`1990`
(Intercept) x$education
1.7000935 0.3377029
$`1991`
(Intercept) x$education
1.2504604 0.3683962
$`1993`
(Intercept) x$education
1.6384884 0.3221049
$`1994`
(Intercept) x$education
1.8684770 0.3146151
$`1996`
(Intercept) x$education
0.8221711 0.3770325
$`1998`
(Intercept) x$education
1.5199973 0.3314754
$`2000`
(Intercept) x$education
1.1203888 0.3558918
$`2004`
(Intercept) x$education
1.4259424 0.3411153
$`2006`
(Intercept) x$education
2.1383454 0.2952926
$`2008`
(Intercept) x$education
1.4212286 0.3277987
$`2010`
(Intercept) x$education
1.7996389 0.3135749
$`2012`
(Intercept) x$education
1.7303105 0.3061534
$`2014`
(Intercept) x$education
1.4804789 0.3262112
$`2016`
(Intercept) x$education
1.8562367 0.3031146
男性隨著時代進展,認識的單字量增加
femalec <- subset(dta, dta$sex=="Female")
lapply(split(femalec, femalec$year), function(x) coef(lm(x$vocabulary ~ x$education)))$`1974`
(Intercept) x$education
1.5652579 0.3816095
$`1976`
(Intercept) x$education
1.7021281 0.3824002
$`1978`
(Intercept) x$education
1.3006416 0.4002707
$`1982`
(Intercept) x$education
0.9829602 0.3949758
$`1984`
(Intercept) x$education
1.4536872 0.3728698
$`1987`
(Intercept) x$education
0.9647931 0.3843508
$`1988`
(Intercept) x$education
1.1634561 0.3763999
$`1989`
(Intercept) x$education
1.0682600 0.3863606
$`1990`
(Intercept) x$education
0.4594812 0.4346902
$`1991`
(Intercept) x$education
1.1543766 0.3875821
$`1993`
(Intercept) x$education
1.7388287 0.3286325
$`1994`
(Intercept) x$education
1.6453365 0.3422146
$`1996`
(Intercept) x$education
1.1482811 0.3727178
$`1998`
(Intercept) x$education
1.4472751 0.3592843
$`2000`
(Intercept) x$education
1.9276040 0.3155532
$`2004`
(Intercept) x$education
2.104150 0.304056
$`2006`
(Intercept) x$education
2.7777171 0.2535376
$`2008`
(Intercept) x$education
2.6074315 0.2553971
$`2010`
(Intercept) x$education
1.3520300 0.3468821
$`2012`
(Intercept) x$education
1.7535298 0.3080832
$`2014`
(Intercept) x$education
2.3445239 0.2663464
$`2016`
(Intercept) x$education
2.0055919 0.2928955
女性隨著時代進展,認識的單字量反而減少
In-class exercises5
The ‘MASS’ library has these two data sets: ‘Animals’ and ‘mammals’. Merge the two files and remove duplicated observations using ‘duplicated’.
將Animals和mammals資料調出來,分別看看是甚麼
body brain
Mountain beaver 1.35 8.1
Cow 465.00 423.0
Grey wolf 36.33 119.5
Goat 27.66 115.0
Guinea pig 1.04 5.5
Dipliodocus 11700.00 50.0
body brain
Arctic fox 3.385 44.5
Owl monkey 0.480 15.5
Mountain beaver 1.350 8.1
Cow 465.000 423.0
Grey wolf 36.330 119.5
Goat 27.660 115.0
將兩資料合併,重複資料刪掉
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE FALSE FALSE FALSE FALSE
檢查結果
body brain
1 0.005 0.14
2 0.010 0.25
3 0.023 0.30
4 0.023 0.40
5 0.048 0.33
6 0.060 1.00
In-class exercises6
Convert the data set probe words from long to wide format as described.
開檔案
dta <- read.table("D:Program Files/R/R-3.6.3/bin/probeL.txt", header=T , stringsAsFactor=F, fill=T )
head(dta) ID Response_Time Position
1 S01 51 1
2 S01 36 2
3 S01 50 3
4 S01 35 4
5 S01 42 5
6 S02 27 1
開tidyverse
使用mutate增加新變項分行命名
Response_Time Position
1 51 Pos_1
2 36 Pos_2
3 50 Pos_3
4 35 Pos_4
5 42 Pos_5
6 27 Pos_1
就出來了
Exercises1
Select at random one school per county in the data set Caschool{Ecdat} and draw a scatter diagram of average math score mathscr against average reading score readscr for the sampled data set. Make sure your results are reproducible (e.g., the same random sample will be drawn each time).
開Ecdat,使用Casschool資料
distcod county district grspan enrltot teachers
1 75119 Alameda Sunol Glen Unified KK-08 195 10.90
2 61499 Butte Manzanita Elementary KK-08 240 11.15
3 61549 Butte Thermalito Union Elementary KK-08 1550 82.90
4 61457 Butte Golden Feather Union Elementary KK-08 243 14.00
calwpct mealpct computer testscr compstu expnstu str avginc elpct
1 0.5102 2.0408 67 690.8 0.3435898 6384.911 17.88991 22.690 0.000000
2 15.4167 47.9167 101 661.2 0.4208333 5099.381 21.52466 9.824 4.583333
3 55.0323 76.3226 169 643.6 0.1090323 5501.955 18.69723 8.978 30.000002
4 36.4754 77.0492 85 647.7 0.3497942 7101.831 17.35714 8.978 0.000000
readscr mathscr
1 691.6 690.0
2 660.5 661.9
3 636.3 650.9
4 651.9 643.5
[ reached 'max' / getOption("max.print") -- omitted 2 rows ]
開tidyverse
隨機取樣
畫圖
dta %>% dplyr::transmute(mathscr, readscr) %T>%
plot(., xlab="readscr Status", ylab="mathscr score", pch='.') %>% colMeans mathscr readscr
653.3426 654.9705
Exercises2
Find 133 class-level 95%-confidence intervals for language test score means of the nlschools{MASS} data set by using the tidy approach. The tail end of the data object should looks as follows: classID language_mean language_lb language_ub 131 11.273 … … 132 10.550 … … 133 10.643 … …
開檔案
lang IQ class GS SES COMB
1 46 15.0 180 29 23 0
2 45 14.5 180 29 10 0
3 33 9.5 180 29 15 0
4 46 11.0 180 29 23 0
5 20 8.0 180 29 10 0
6 30 9.5 180 29 10 0
開tidyverse
開tibble用knitr
| lang | IQ | class | GS | SES | COMB |
|---|---|---|---|---|---|
| 46 | 15.0 | 180 | 29 | 23 | 0 |
| 45 | 14.5 | 180 | 29 | 10 | 0 |
| 33 | 9.5 | 180 | 29 | 15 | 0 |
| 46 | 11.0 | 180 | 29 | 23 | 0 |
| 20 | 8.0 | 180 | 29 | 10 | 0 |
| 30 | 9.5 | 180 | 29 | 10 | 0 |
做獨立樣本迴歸
dtac <- dta %>% group_by(class) %>%
group_by(lang) %>%
summarise(language_mean = mean(lang, na.rm = TRUE),
sd.lang = sd(lang, na.rm = TRUE),
n.lang = n()) %>%
mutate(se.lang = sd.lang / sqrt(n.lang),
language_lb = language_mean - qt(1 - (0.05 / 2), n.lang - 1) * se.lang,
language_ub = language_mean + qt(1 - (0.05 / 2), n.lang - 1) * se.lang)# A tibble: 47 x 7
lang language_mean sd.lang n.lang se.lang language_lb language_ub
<int> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 9 9 NaN 1 NaN NaN NaN
2 11 11 NaN 1 NaN NaN NaN
3 14 14 0 2 0 14 14
4 15 15 0 4 0 15 15
5 16 16 0 3 0 16 16
6 17 17 0 7 0 17 17
7 18 18 0 7 0 18 18
8 19 19 0 6 0 19 19
9 20 20 0 13 0 20 20
10 21 21 0 18 0 21 21
# ... with 37 more rows
Exercises3
Use the Prestige{car} data set for this problem. Find the median prestige score for each of the three types of occupation, respectively. Use the median score in each type of occupation to define two levels of prestige: High and low, for each occupation, respectively. Summarize the relationship between income and education for each category generated from crossing the factor prestige with the type of occupation.
開檔案
[1] "data.frame"
education income women prestige census type
gov.administrators 13.11 12351 11.16 68.8 1113 prof
general.managers 12.26 25879 4.02 69.1 1130 prof
accountants 12.77 9271 15.70 63.4 1171 prof
purchasing.officers 11.42 8865 9.11 56.8 1175 prof
chemists 14.62 8403 11.68 73.5 2111 prof
physicists 15.64 11030 5.13 77.6 2113 prof
[1] "education" "income" "women" "prestige" "census" "type"
使用aggregate重整數據的平均數
type prestige
1 bc 35.52727
2 prof 67.84839
3 wc 42.24348
使用quantiler每10%占多少比例
0% 10% 20% 30% 40% 50% 60% 70%
35.52727 36.87051 38.21375 39.55700 40.90024 42.24348 47.36446 52.48544
80% 90% 100%
57.60642 62.72741 67.84839
再用其資料分成高與低
dta1 <- with(dta, cut(prestige, ordered=T, breaks=c(0, 50, 100), labels=c("Low", "High")))
with(dta, table(dta1))dta1
Low High
2 1
[1] Low High Low
Levels: Low < High
最後將以上所有資料重組
dta1 prestige type
1 Low 38.88538 2
2 High 67.84839 2
開tidyverse
畫散佈圖
畫比較的散佈圖加迴歸線
Prestige$prestige <- as.factor(Prestige$prestige)
xyplot(income ~ education | dta1, data=Prestige, type=c("g","p","r"), auto.key=list(columns=2))Exercises4
Reverse the order of input to the series of dplyr::*_join examples using data from the Nobel laureates in literature and explain the resulting output.
開tidyverse
讀檔案
nbl_c <- read.table("C:/Users/boss/Desktop/nobel_countries.txt", h = T)
nbl_w <- read.table("C:/Users/boss/Desktop/nobel_winners.txt", h = T)使用merge合併
Year Country Name Gender
1 1950 UK Bertrand Russell Male
2 2012 China Mo Yan Male
3 2013 Canada Alice Munro Female
4 2014 France Patrick Modiano Male
5 2016 US Bob Dylan Male
6 2017 UK Kazuo Ishiguro Male
有東西不見了,使用True讓他全部跑出來
Year Country Name Gender
1 1938 <NA> Pearl Buck Female
2 1950 UK Bertrand Russell Male
3 2011 Sweden <NA> <NA>
4 2012 China Mo Yan Male
5 2013 Canada Alice Munro Female
6 2014 France Patrick Modiano Male
7 2015 Russia <NA> <NA>
8 2016 US Bob Dylan Male
9 2017 UK Kazuo Ishiguro Male
顯示nbl_w能與nbl_c匹配的
Country Year Name Gender
1 France 2014 Patrick Modiano Male
2 UK 1950 Bertrand Russell Male
3 UK 2017 Kazuo Ishiguro Male
4 US 2016 Bob Dylan Male
5 Canada 2013 Alice Munro Female
6 China 2012 Mo Yan Male
顯示nbl_w能與nbl_c匹配的,只留nbl_w的欄位
Country Year
1 France 2014
2 UK 1950
3 UK 2017
4 US 2016
5 Canada 2013
6 China 2012
使nbl_w為第一參數,nbl_c為第二參數,未匹配的全顯示為NA
Country Year Name Gender
1 France 2014 Patrick Modiano Male
2 UK 1950 Bertrand Russell Male
3 UK 2017 Kazuo Ishiguro Male
4 US 2016 Bob Dylan Male
5 Canada 2013 Alice Munro Female
6 China 2012 Mo Yan Male
7 Russia 2015 <NA> <NA>
8 Sweden 2011 <NA> <NA>
nbl_c所獨有的參數
Country Year
1 Russia 2015
2 Sweden 2011
nbl_c、nbl_w共有與獨有總和,未匹配到的值,全顯示為NA
Country Year Name Gender
1 France 2014 Patrick Modiano Male
2 UK 1950 Bertrand Russell Male
3 UK 2017 Kazuo Ishiguro Male
4 US 2016 Bob Dylan Male
5 Canada 2013 Alice Munro Female
6 China 2012 Mo Yan Male
7 Russia 2015 <NA> <NA>
8 Sweden 2011 <NA> <NA>
9 <NA> 1938 Pearl Buck Female
Exercises5
Augment the data object in the ‘SAT’ lecture note with state.division{datasets}. For each of the 9 divisions, find the slope estimate for regressing average SAT scores onto average teacher’s salary. How many of them are of negative signs?
開檔案
V2 V3 V4 V5 V6 V7 V8
Alabama 4.405 17.2 31.144 8 491 538 1029
Alaska 8.963 17.6 47.951 47 445 489 934
Arizona 4.778 19.3 32.175 27 448 496 944
Arkansas 4.459 17.1 28.934 6 482 523 1005
California 4.992 24.0 41.078 45 417 485 902
Colorado 5.443 18.4 34.571 29 462 518 980
開tidyverse
重新命名欄位
'data.frame': 50 obs. of 7 variables:
$ Spending: num 4.41 8.96 4.78 4.46 4.99 ...
$ PTR : num 17.2 17.6 19.3 17.1 24 18.4 14.4 16.6 19.1 16.3 ...
$ Salary : num 31.1 48 32.2 28.9 41.1 ...
$ PE : int 8 47 27 6 45 29 81 68 48 65 ...
$ Verbal : int 491 445 448 482 417 462 431 429 420 406 ...
$ Math : int 538 489 496 523 485 518 477 468 469 448 ...
$ SAT : int 1029 934 944 1005 902 980 908 897 889 854 ...
將9個區定義出來
divisions
New England Middle Atlantic South Atlantic East South Central
6 3 8 4
West South Central East North Central West North Central Mountain
4 5 7 8
Pacific
5
畫9個區的迴歸線
將9個區的SAT分數與教師薪水分別做相關
# A tibble: 9 x 2
divisions r
<fct> <dbl>
1 New England -0.0830
2 Middle Atlantic 0.662
3 South Atlantic 0.489
4 East South Central -0.372
5 West South Central -0.884
6 East North Central 0.524
7 West North Central -0.206
8 Mountain -0.729
9 Pacific 0.0649
有5個是負的
Exercises6
The HELP (Health Evaluation and Linkage to Primary Care) study was a clinical trial for adult inpatients recruited from a detoxification unit. Patients with no primary care physician were randomized to receive a multidisciplinary assessment and a brief motivational intervention or usual care, with the goal of linking them to primary medical care. Eligible subjects were adults, who spoke Spanish or English, reported alcohol, heroin or cocaine as their first or second drug of choice, resided in proximity to the primary care clinic to which they would be referred or were homeless. Subjects were interviewed at baseline during their detoxification stay and follow-up interviews were undertaken every 6 months for 2 years. A variety of continuous, count, discrete, and survival time predictors and outcomes were collected at each of these five occasions.
The following R script is used to manage the data file at the initial stage of investigation. Provide comments on what each line of the script is meant to achieve.
—-echo=FALSE,eval=TRUE————————————————
options(continue=" ")
設定小數點到第3位,線寬72,讀檔————————————————————————
options(digits=3) options(width=72) # narrow output ds = read.csv(“http://www.amherst.edu/~nhorton/r2/datasets/help.csv”) ##使用dplyr library(dplyr)
##將這些變項選取並定義(但是我跑這一條時出現錯誤,故不跑) newds = select(ds, cesd, female, i1, i2, id, treat, f1a, f1b, f1c, f1d, f1e, f1f, f1g, f1h, f1i, f1j, f1k, f1l, f1m, f1n, f1o, f1p, f1q, f1r, f1s, f1t)
看newds有甚麼————————————————————————
names(newds) str(newds[,1:10]) # 將newds做成1~10的變項的表格
————————————————————————
summary(newds[,1:10]) # 對這表格做描述估計
————————————————————————
head(newds, n=3)#看這表格的前三行
————————————————————————
comment(newds) = “HELP baseline dataset”#將表格命名為HELP baseline dataset comment(newds) save(ds, file=“savedfile”)"#存檔
————————————————————————
write.csv(ds, file=“ds.csv”)#載入檔案
————————————————————————
library(foreign)#開foreign,將此檔案寫入foreign write.foreign(newds, “file.dat”, “file.sas”, package=“SAS”)
————————————————————————
with(newds, cesd[1:10])#列出newds1~10筆數據 with(newds, head(cesd, 10))#列出newds cesd的前10筆數據
————————————————————————
with(newds, cesd[cesd > 56])#列出newds cesd的大於56的數據
————————————————————————
library(dplyr)#開dplyr filter(newds, cesd > 56) %>% select(id, cesd)#使用filter和select分別選擇要分析的觀察值及欄位,並串在一起執行
————————————————————————
with(newds, sort(cesd)[1:4])#列出newds中cesd的1到4筆資料 with(newds, which.min(cesd))#列出newds中cesd最小的資料
————————————————————————
library(mosaic)#開mosaic tally(~ is.na(f1g), data=newds)#使用tally將遺漏值用f1g的數值補上去包裝起來 favstats(~ f1g, data=newds)#使用favstats做統計包括平均數、標準差、四分位數…等等 ## ———————————————————————— # 反轉 code f1d, f1h, f1l and f1p cesditems = with(newds, cbind(f1a, f1b, f1c, (3 - f1d), f1e, f1f, f1g, (3 - f1h), f1i, f1j, f1k, (3 - f1l), f1m, f1n, f1o, (3 - f1p), f1q, f1r, f1s, f1t)) nmisscesd = apply(is.na(cesditems), 1, sum) ncesditems = cesditems ncesditems[is.na(cesditems)] = 0 newcesd = apply(ncesditems, 1, sum) imputemeancesd = 20/(20-nmisscesd)*newcesd
————————————————————————
data.frame(newcesd, newds$cesd, nmisscesd, imputemeancesd)[nmisscesd>0,]#製表
—-createdrink,ssage=FALSE——————————————-
library(dplyr)#開dplyr library(memisc)#開memisc newds = mutate(newds, drinkstat= cases( “abstinent” = i1==0, “moderate” = (i1>0 & i1<=1 & i2<=3 & female==1) | (i1>0 & i1<=2 & i2<=4 & female==0), “highrisk” = ((i1>1 | i2>3) & female==1) | ((i1>2 | i2>4) & female==0)))#在newds內創造新變數abstinent、moderate、highrisk
—-echo=FALSE———————————————————-
library(mosaic)#開mosaic
—-echo=FALSE———————————————————-
detach(package:memisc)#開memisc來看 detach(package:MASS)#開MASS來看
————————————————————————
library(dplyr)#開dplyr tmpds = select(newds, i1, i2, female, drinkstat)#選newds, i1, i2, female, drinkstat這些欄位當tmpds tmpds[365:370,]#把tmpds第365~370筆數據調出來
————————————————————————
library(dplyr)#開dplyr filter(tmpds, drinkstat==“moderate” & female==1)#選符合tmpds, drinkstat==“moderate” & female==1的觀察值
—-message=FALSE——————————————————-
library(gmodels)#開gmodels with(tmpds, CrossTable(drinkstat))#選tmpds, CrossTable(drinkstat)的欄位
————————————————————————
with(tmpds, CrossTable(drinkstat, female, prop.t=FALSE, prop.c=FALSE, prop.chisq=FALSE))#選tmpds, CrossTable(drinkstat, female, prop.t=FALSE, prop.c=FALSE, prop.chisq=FALSE的欄位
————————————————————————
newds = transform(newds, gender=factor(female, c(0,1), c(“Male”,“Female”)))#將newds, gender=factor(female, c(0,1), c(“Male”,“Female”的第一條目錄轉換為其data frame tally(~ female + gender, margin=FALSE, data=newds)#使用tally把female + gender包裝上去
————————————————————————
library(dplyr)#開dplyr newds = arrange(ds, cesd, i1)#將ds, cesd, i1的觀察值照遞增排好 newds[1:5, c(“cesd”, “i1”, “id”)]#將“cesd”, “i1”, “id”1~5筆數據調出來
————————————————————————
library(dplyr)#開dplyr females = filter(ds, female==1)#將ds, female==1的觀察值定義為females with(females, mean(cesd))#將females, mean(cesd)的欄位選起來 # an alternative approach mean(ds\(cesd[ds\)female==1])#將ds\(cesd[ds\)female==1]算平均
————————————————————————
with(ds, tapply(cesd, female, mean))#將以上項目的觀察值選起來 library(mosaic)#開mosaic mean(cesd ~ female, data=ds)#做cesd ~ female, data=ds平均