Exercise 1: NCKU Student
Load data file
## 座號 系.年.班
## 1 教師:U3023 許清芳
## 2 1 心理系 3
## 3 2 心理系 3
## 4 3 心理系 4
## 5 4 心理系 4
## 6 5 教育所 1 碩
## 開課系序號 學號
## 1 上課時間: 一[6-8];開課號:U3006 U7031
## 2 U7031 D840239
## 3 U7031 D840057
## 4 U7031 D841311
## 5 U7031 D840140
## 6 U3006 U360098
## 姓名 成績 選課時間
## 1 科目:資料管理 NA
## 2 蘇 NA 02/17/2016 09:17:40
## 3 吳 NA 02/17/2016 09:17:28
## 4 余 NA 02/17/2016 09:09:10
## 5 王 NA 02/17/2016 09:09:34
## 6 劉 NA 01/18/2016 14:56:35
Calculate Pearson’s correlation between income and taxes
## [1] 0.0560718
Plot a scatter plot

In my view, the results revealed by Pearson’s correlation test and plot show no relationship between income and taxes.
Exercise 3: junior school project
Load data file
## school class sex soc ravens pupil english math year
## 1 S1 C1 G 9 23 P1 72 23 0
## 2 S1 C1 G 9 23 P1 80 24 1
## 3 S1 C1 G 9 23 P1 39 23 2
## 4 S1 C1 B 2 15 P2 7 14 0
## 5 S1 C1 B 2 15 P2 17 11 1
## 6 S1 C1 B 2 22 P3 88 36 0
Change sex as Gender
## 'data.frame': 3236 obs. of 9 variables:
## $ school : Factor w/ 49 levels "S1","S10","S11",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ class : Factor w/ 4 levels "C1","C2","C3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ soc : int 9 9 9 2 2 2 2 2 9 9 ...
## $ ravens : int 23 23 23 15 15 22 22 22 14 14 ...
## $ pupil : Factor w/ 1192 levels "P1","P10","P100",..: 1 1 1 413 413 512 512 512 612 612 ...
## $ english: int 72 80 39 7 17 88 89 83 12 25 ...
## $ math : int 23 24 23 14 11 36 32 39 24 26 ...
## $ year : int 0 1 2 0 1 0 1 2 0 1 ...
## $ Gender : Factor w/ 2 levels "B","G": 2 2 2 1 1 1 1 1 1 1 ...
Re-label the values of the social class variable using long character strings
## [1] "I" "II" "III" "IIII" "IV" "V" "VI" "VII" "VIII"
## 'data.frame': 3236 obs. of 9 variables:
## $ school : Factor w/ 49 levels "S1","S10","S11",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ class : Factor w/ 4 levels "C1","C2","C3",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ soc : Factor w/ 9 levels "I","II","III",..: 9 9 9 2 2 2 2 2 9 9 ...
## $ ravens : int 23 23 23 15 15 22 22 22 14 14 ...
## $ pupil : Factor w/ 1192 levels "P1","P10","P100",..: 1 1 1 413 413 512 512 512 612 612 ...
## $ english: int 72 80 39 7 17 88 89 83 12 25 ...
## $ math : int 23 24 23 14 11 36 32 39 24 26 ...
## $ year : int 0 1 2 0 1 0 1 2 0 1 ...
## $ Gender : Factor w/ 2 levels "B","G": 2 2 2 1 1 1 1 1 1 1 ...
Plot a box plot of soc and math

save an data output and read data file
## school class soc ravens pupil english math year Gender
## 1 S1 C1 VIII 23 P1 72 23 0 G
## 2 S1 C1 VIII 23 P1 80 24 1 G
## 3 S1 C1 VIII 23 P1 39 23 2 G
## 4 S1 C1 II 15 P2 7 14 0 B
## 5 S1 C1 II 15 P2 17 11 1 B
## 6 S1 C1 II 22 P3 88 36 0 B
Exercise 4: laser-event potentials (LEP) data
I didn’t find a good way to load this data, but I found a package to deal with it.
merge all data and clean up
Sorry, I still not find a good way to plot all channels data, so just pick Fz to plot.

Exercise 5: Schizophrenics
Load data set
add group label and subject label
The way to add group level is not very good, actually,
schiz <- "http://www.stat.columbia.edu/~gelman/book/data/schiz.asc"
sch <- read.table(schiz, sep=" ", skip=4)
sch$Group <- c(1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2)
sch$Subject <- c(1:17)
I don’t know how to pre-process these kind of data to do anova testing, but I find some way to process the data frame to which I expected.
Gather the columns V1 to V30 into long format.
Convert id and time into factor variables
## ─ Attaching packages ────────────────────────── tidyverse 1.3.0 ─
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ✓ purrr 0.3.3
## ─ Conflicts ─────────────────────────── tidyverse_conflicts() ─
## x dplyr::between() masks data.table::between()
## x dplyr::filter() masks stats::filter()
## x dplyr::first() masks data.table::first()
## x dplyr::lag() masks stats::lag()
## x dplyr::last() masks data.table::last()
## x purrr::transpose() masks data.table::transpose()
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
## 'data.frame': 510 obs. of 4 variables:
## $ Group : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Subject: Factor w/ 17 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Time : Factor w/ 30 levels "V1","V10","V11",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ RT : int 312 354 256 260 204 590 308 244 232 318 ...
List mean of Reaction times by the measurements and groups
## 1 2
## V1 301.6364 453.3333
## V10 310.3636 566.6667
## V11 302.7273 526.0000
## V12 284.5455 489.6667
## V13 334.1818 537.6667
## V14 332.3636 698.0000
## V15 320.1818 639.6667
## V16 336.5455 480.6667
## V17 357.8182 413.3333
## V18 314.1818 494.3333
## V19 337.8182 463.0000
## V2 296.5455 553.3333
## V20 303.2727 357.6667
## V21 322.9091 407.6667
## V22 295.4545 471.6667
## V23 303.4545 811.0000
## V24 277.2727 467.3333
## V25 292.7273 438.3333
## V26 294.0000 507.0000
## V27 294.3636 582.3333
## V28 305.0909 363.0000
## V29 305.0909 478.3333
## V3 310.0000 556.0000
## V30 333.6364 464.0000
## V4 285.8182 485.0000
## V5 343.0909 491.6667
## V6 317.2727 607.3333
## V7 292.9091 481.6667
## V8 314.1818 393.3333
## V9 285.6364 527.0000
ANOVA testing
##
## Error: Subject
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 4506212 4506212 23.59 0.000209 ***
## Residuals 15 2865353 191024
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Subject:Time
## Df Sum Sq Mean Sq F value Pr(>F)
## Time 29 638735 22025 1.044 0.405
## Time:Group 29 1072883 36996 1.754 0.010 *
## Residuals 435 9174828 21092
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1