Exercise 1: NCKU Student

Load data file

dta <- read.csv("/Users/haolunfu/Documents/資料管理/week4/ncku_roster.csv", header = T, fileEncoding = "big-5")
head(dta)

##                     座號                                          系.年.班
## 1 教師:U3023  許清芳                                                      
## 2                      1 心理系           3                               
## 3                      2 心理系           3                               
## 4                      3 心理系           4                               
## 5                      4 心理系           4                               
## 6                      5 教育所           1 碩                            
##   開課系序號                                  學號
## 1            上課時間: 一[6-8];開課號:U3006  U7031
## 2      U7031                               D840239
## 3      U7031                               D840057
## 4      U7031                               D841311
## 5      U7031                               D840140
## 6      U3006                               U360098
##                                            姓名 成績              選課時間
## 1 科目:資料管理                                   NA                      
## 2                                            蘇   NA 02/17/2016 09:17:40  
## 3                                            吳   NA 02/17/2016 09:17:28  
## 4                                            余   NA 02/17/2016 09:09:10  
## 5                                            王   NA 02/17/2016 09:09:34  
## 6                                            劉   NA 01/18/2016 14:56:35

I find the first row was not student information, so delete it.

dtac <- dta[-1,]
list(dtac$系.年.班)

## [[1]]
##  [1] 心理系           3                               
##  [2] 心理系           3                               
##  [3] 心理系           4                               
##  [4] 心理系           4                               
##  [5] 教育所           1 碩                            
##  [6] 教育所           1 博                            
##  [7] 教育所           2 碩                            
##  [8] 教育所           2 博                            
##  [9] 心理所           1 碩                            
## [10] 心理所           1 碩                            
## [11] 心理所           1 碩                            
## [12] 心理所           1 碩                            
## [13] 心理所           1 碩                            
## [14] 心理所           2 碩                            
## [15] 心理所           2 碩                            
## 9 Levels:                                                   ...

Exercise 2:

Load data set

dta <- read.table("http://www1.aucegypt.edu/faculty/hadi/RABE5/Data5/P005.txt", header = T, sep="\t", fill=TRUE)

Calculate Pearson’s correlation between income and taxes

cor(dta$Income, dta$Taxes, method = "pearson")

## [1] 0.0560718

Plot a scatter plot

library(ggplot2)
ggplot(dta,aes(x= Income, y=Taxes)) +
  geom_point()

In my view, the results revealed by Pearson’s correlation test and plot show no relationship between income and taxes.

Exercise 3: junior school project

Load data file

jsp <- read.csv("/Users/haolunfu/Documents/資料管理/week4/juniorSchools.txt", header = T, sep="\t")
head(jsp)

##   school class sex soc ravens pupil english math year
## 1     S1    C1   G   9     23    P1      72   23    0
## 2     S1    C1   G   9     23    P1      80   24    1
## 3     S1    C1   G   9     23    P1      39   23    2
## 4     S1    C1   B   2     15    P2       7   14    0
## 5     S1    C1   B   2     15    P2      17   11    1
## 6     S1    C1   B   2     22    P3      88   36    0

Change sex as Gender

jsp$Gender <- jsp$sex
jsp <- jsp[,-3]
str(jsp)

## 'data.frame':    3236 obs. of  9 variables:
##  $ school : Factor w/ 49 levels "S1","S10","S11",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ class  : Factor w/ 4 levels "C1","C2","C3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ soc    : int  9 9 9 2 2 2 2 2 9 9 ...
##  $ ravens : int  23 23 23 15 15 22 22 22 14 14 ...
##  $ pupil  : Factor w/ 1192 levels "P1","P10","P100",..: 1 1 1 413 413 512 512 512 612 612 ...
##  $ english: int  72 80 39 7 17 88 89 83 12 25 ...
##  $ math   : int  23 24 23 14 11 36 32 39 24 26 ...
##  $ year   : int  0 1 2 0 1 0 1 2 0 1 ...
##  $ Gender : Factor w/ 2 levels "B","G": 2 2 2 1 1 1 1 1 1 1 ...

Re-label the values of the social class variable using long character strings

jsps <- c("I", "II", "III", "IIII", "IV", "V", "VI", "VII", "VIII")
jsp$soc <- factor(jsp$soc)
levels(jsp$soc) <- jsps
levels(jsp$soc)

## [1] "I"    "II"   "III"  "IIII" "IV"   "V"    "VI"   "VII"  "VIII"

str(jsp)

## 'data.frame':    3236 obs. of  9 variables:
##  $ school : Factor w/ 49 levels "S1","S10","S11",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ class  : Factor w/ 4 levels "C1","C2","C3",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ soc    : Factor w/ 9 levels "I","II","III",..: 9 9 9 2 2 2 2 2 9 9 ...
##  $ ravens : int  23 23 23 15 15 22 22 22 14 14 ...
##  $ pupil  : Factor w/ 1192 levels "P1","P10","P100",..: 1 1 1 413 413 512 512 512 612 612 ...
##  $ english: int  72 80 39 7 17 88 89 83 12 25 ...
##  $ math   : int  23 24 23 14 11 36 32 39 24 26 ...
##  $ year   : int  0 1 2 0 1 0 1 2 0 1 ...
##  $ Gender : Factor w/ 2 levels "B","G": 2 2 2 1 1 1 1 1 1 1 ...

Plot a box plot of soc and math

plot(x=jsp$soc, y=jsp$math)

save an data output and read data file

write.csv(jsp, file="/Users/haolunfu/Documents/資料管理/week4/jsp.csv", quote=F, row.names=F)
jsp_n <- read.csv("/Users/haolunfu/Documents/資料管理/week4/jsp.csv", header = T)
head(jsp_n)

##   school class  soc ravens pupil english math year Gender
## 1     S1    C1 VIII     23    P1      72   23    0      G
## 2     S1    C1 VIII     23    P1      80   24    1      G
## 3     S1    C1 VIII     23    P1      39   23    2      G
## 4     S1    C1   II     15    P2       7   14    0      B
## 5     S1    C1   II     15    P2      17   11    1      B
## 6     S1    C1   II     22    P3      88   36    0      B

Exercise 4: laser-event potentials (LEP) data

Load file and unzip

tmp <- tempfile()
zf <- "/Users/haolunfu/Documents/資料管理/week4/Subject1.zip"
unzip(zf, exdir="tmp_data")

I didn’t find a good way to load this data, but I found a package to deal with it.

library( data.table)
dta_1 <- data.table::fread("/Users/haolunfu/Documents/資料管理/week4/tmp_data/Subject1/1w.dat")
dta_2 <- data.table::fread("/Users/haolunfu/Documents/資料管理/week4/tmp_data/Subject1/2w.dat")
dta_3 <- data.table::fread("/Users/haolunfu/Documents/資料管理/week4/tmp_data/Subject1/3w.dat")
dta_4 <- data.table::fread("/Users/haolunfu/Documents/資料管理/week4/tmp_data/Subject1/4w.dat")

add time information and condition

time <- seq(-100,800,2)

dta_1$Condition <- "1w"
dta_1$Time <- time
dta_2$Condition <- "2w"
dta_2$Time <- time
dta_3$Condition <- "3w"
dta_3$Time <- time
dta_4$Condition <- "4w"
dta_4$Time <- time

merge all data and clean up

dta <- rbind(dta_1,dta_2,dta_3,dta_4)
dta <- dta[,-31]

Sorry, I still not find a good way to plot all channels data, so just pick Fz to plot.

dta_Fz <- dta[,c(13,31,32)]
ggplot(dta_Fz, aes(x=Time, y=dta_Fz$`[      Fz]`, col=Condition))+
  geom_line(aes(color=Condition))

Exercise 5: Schizophrenics

Load data set

add group label and subject label

The way to add group level is not very good, actually,

schiz <- "http://www.stat.columbia.edu/~gelman/book/data/schiz.asc"
sch <- read.table(schiz, sep=" ", skip=4)
sch$Group <- c(1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2)
sch$Subject <- c(1:17)

I don’t know how to pre-process these kind of data to do anova testing, but I find some way to process the data frame to which I expected.

Gather the columns V1 to V30 into long format.

Convert id and time into factor variables

library(tidyverse)

## ─ Attaching packages ────────────────────────── tidyverse 1.3.0 ─

## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ✓ purrr   0.3.3

## ─ Conflicts ─────────────────────────── tidyverse_conflicts() ─
## x dplyr::between()   masks data.table::between()
## x dplyr::filter()    masks stats::filter()
## x dplyr::first()     masks data.table::first()
## x dplyr::lag()       masks stats::lag()
## x dplyr::last()      masks data.table::last()
## x purrr::transpose() masks data.table::transpose()

library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

sch <- sch %>%
  gather(key = "Time", value = "RT", V1:V30) %>%
  convert_as_factor(Subject, Time)
str(sch)

## 'data.frame':    510 obs. of  4 variables:
##  $ Group  : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Subject: Factor w/ 17 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Time   : Factor w/ 30 levels "V1","V10","V11",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ RT     : int  312 354 256 260 204 590 308 244 232 318 ...

List mean of Reaction times by the measurements and groups

tapply(sch$RT, list(sch$Time, sch$Group), mean)

##            1        2
## V1  301.6364 453.3333
## V10 310.3636 566.6667
## V11 302.7273 526.0000
## V12 284.5455 489.6667
## V13 334.1818 537.6667
## V14 332.3636 698.0000
## V15 320.1818 639.6667
## V16 336.5455 480.6667
## V17 357.8182 413.3333
## V18 314.1818 494.3333
## V19 337.8182 463.0000
## V2  296.5455 553.3333
## V20 303.2727 357.6667
## V21 322.9091 407.6667
## V22 295.4545 471.6667
## V23 303.4545 811.0000
## V24 277.2727 467.3333
## V25 292.7273 438.3333
## V26 294.0000 507.0000
## V27 294.3636 582.3333
## V28 305.0909 363.0000
## V29 305.0909 478.3333
## V3  310.0000 556.0000
## V30 333.6364 464.0000
## V4  285.8182 485.0000
## V5  343.0909 491.6667
## V6  317.2727 607.3333
## V7  292.9091 481.6667
## V8  314.1818 393.3333
## V9  285.6364 527.0000

ANOVA testing

rst1 <- aov(RT~Time*Group + Error(Subject / Time),data=sch)
summary(rst1)

## 
## Error: Subject
##           Df  Sum Sq Mean Sq F value   Pr(>F)    
## Group      1 4506212 4506212   23.59 0.000209 ***
## Residuals 15 2865353  191024                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Subject:Time
##             Df  Sum Sq Mean Sq F value Pr(>F)  
## Time        29  638735   22025   1.044  0.405  
## Time:Group  29 1072883   36996   1.754  0.010 *
## Residuals  435 9174828   21092                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Week 4 Homework

Hao-Lun Fu

2020-03-24

Exercise 1: NCKU Student

Load data file

I find the first row was not student information, so delete it.

Exercise 2:

Load data set

Calculate Pearson’s correlation between income and taxes

Plot a scatter plot

In my view, the results revealed by Pearson’s correlation test and plot show no relationship between income and taxes.

Exercise 3: junior school project

Load data file

Change sex as Gender

Plot a box plot of soc and math

save an data output and read data file

Exercise 4: laser-event potentials (LEP) data

Load file and unzip

I didn’t find a good way to load this data, but I found a package to deal with it.

add time information and condition

merge all data and clean up

Sorry, I still not find a good way to plot all channels data, so just pick Fz to plot.

Exercise 5: Schizophrenics

Load data set

add group label and subject label

The way to add group level is not very good, actually,

I don’t know how to pre-process these kind of data to do anova testing, but I find some way to process the data frame to which I expected.

Gather the columns V1 to V30 into long format.

Convert id and time into factor variables

List mean of Reaction times by the measurements and groups

ANOVA testing