You are going to analyze a real data set. This data arises from a large study to examine EEG correlates of genetic predisposition to alcoholism. It contains measurements from 64 electrodes placed on subject’s scalps which were sampled at 256 Hz (3.9-msec epoch) for 1 second.
You can check the description of the data file in the following link https://archive.ics.uci.edu/ml/machine-learning-databases/eeg-mld/eeg.data.html.
You can use the letter of your lab group to get the seed you are going to use to know which users to read
Groups with A+ are going to read 10 users, while groups with A are going to read 5 users. The remaining groups are going to read 2 users. Which users? execute the following code with your proper lab letters and seed.
set.seed(myseed)
nAplus <- 10
nA <- 5
nBC <-2
myGroupn <- nAplus # Write here the correct one
usersToRead <- sample(1:length(result2),myGroupn,replace = FALSE)
# You need to load the following users
# result2[usersToRead]
data("eegdata")
result2[usersToRead]
## [1] "co2a0000424.tar.gz" "co2a0000371.tar.gz" "co2c0000364.tar.gz"
## [4] "co2c0000354.tar.gz" "co2a0000405.tar.gz" "co2a0000412.tar.gz"
## [7] "co2c0000339.tar.gz" "co2a0000416.tar.gz" "co2c0000342.tar.gz"
## [10] "co2a0000443.tar.gz"
The Data frame: 1. Fist Column: “alch” if the user is alcoholic, and “nonAl” if its control 2. Second Column: user identifier 3. Third Column: Paradigm S1 obj / S2 nomatch / s2 match 4. Fourth Column: replication number or trial (you can see there are many samples) 5. Fiveth Column: Channel, there are 64 channels / name (0/63) 6. Sixth Column: Time, there are values from 0/255 or from 1 to 256 7. Seventh Column: Microvolts
## 'data.frame': 0 obs. of 7 variables:
## $ UserType : chr
## $ UserId : int
## $ Paradigm : chr
## $ Replication: int
## $ Channel : int
## $ Time : int
## $ Microvolts : num
You can use the df data frame to fill it up with the correct data.
Load the data to the data frame and then save it with the name myDF.Rda Use your own function and explain how it works
Save the comple data frame
eegs1=geteegdata(indir="C:/Users/Carlos/Documents/Descargas Chrome/Estadisitca/Assigment 1/Users/", cond="S1",filename="eegtrainS1")
## subject: co2a0000364
## subject: co2a0000371
## subject: co2a0000405
## subject: co2a0000412
## subject: co2a0000416
## subject: co2a0000424
## subject: co2a0000443
## subject: co2c0000339
## subject: co2c0000342
## subject: co2c0000354
eegS2m=geteegdata(indir="C:/Users/Carlos/Documents/Descargas Chrome/Estadisitca/Assigment 1/Users/", cond="S2m",filename="eegtrainS2m")
## subject: co2a0000364
## subject: co2a0000371
## subject: co2a0000405
## subject: co2a0000412
## subject: co2a0000416
## subject: co2a0000424
## subject: co2a0000443
## subject: co2c0000339
## subject: co2c0000342
## subject: co2c0000354
eegS2n=geteegdata(indir="C:/Users/Carlos/Documents/Descargas Chrome/Estadisitca/Assigment 1/Users/", cond="S2n",filename="eegtrainS2n")
## subject: co2a0000364
## subject: co2a0000371
## subject: co2a0000405
## subject: co2a0000412
## subject: co2a0000416
## subject: co2a0000424
## subject: co2a0000443
## subject: co2c0000339
## subject: co2c0000342
## subject: co2c0000354
df <- rbind(eegs1,eegS2m,eegS2n)
#Formatting data frame
colnames(df) <- c('UserId','UserType','Paradigm','Replication','Channel','Time','Microvolts')
df$Paradigm <- str_replace_all(df$Paradigm,c("S1"= "S1obj","S2n" = "S2nomatch","S2m" = "S2match"))
df$UserType <- str_replace_all(df$UserType, c("a"="Al","c"= "nonAl"))
df$Channel <- str_replace_all(df$Channel, c("AF1"="0","AF2"="1","AF7"="2","AF8"="3","AFZ"="4","C1"="5","C2"="6","C3"="7","C4"="8","C5"="9","C6"="10","CP1"="11","CP2"="12","CP3"="13","CP4"="14","CP5"="15","CP6"="16","CPZ"="17","CZ"="18","F1"="19","F2"="20","F3"="21","F4"="22","F5"="23","F6"="24","F7"="25","F8"="26","FC1"="27","FC2"="28","FC3"="29","FC4"="30","FC5"="31","FC6"="32","FCZ"="33","FP1"="34","FP2"="35","FPZ"="36","FT7"="37","FT8"="38","FZ"="39","nd"="40","O1"="41","O2"="42","OZ"="43","P1"="44","P2"="45","P3"="46","P4"="47","P5"="48","P6"="49","P7"="50","P8"="51","PO1"="52","PO2"="53","PO7"="54","P08"="55","POZ"="56","PZ"="57","T7"="58","T8"="59","TP7"="60","TP8"="61","X"="62","Y"="63"))
save(df, file="myDF.Rda")
meanVoltage <- mean(df$Microvolts)
medianVoltage <- median(df$Microvolts)
rangeVoltage<-range(df$Microvolts)
sdVoltage<-sd(df$Microvolts)
qVoltage<-quantile(df$Microvolts)
IQRvoltage<-IQR(df$Microvolts)
hist(df$Microvolts,breaks =1000 ,main = "Histogram of Microvolts measures",xlab="Microvolts", xlim = c(-30,30),ylim = c(0,1500000))
boxplot(df$Microvolts,main = "Box Plot of Microvolts measures",ylab="Microvolts", ylim= c(-30,30),horizontal = TRUE)
* What information can you obtain from these two plots? - explain it brievely
#Alcoholic
microVoltsAl <- subset(df, Microvolts & UserType=='Al')
meanVoltsAl<-mean(microVoltsAl$Microvolts)
medianVoltsAl <- median(microVoltsAl$Microvolts)
rangeVoltsAl<-range(microVoltsAl$Microvolts)
sdVoltsAl<-sd(microVoltsAl$Microvolts)
qVoltsAl<-quantile(microVoltsAl$Microvolts)
IQRvoltsAl<-IQR(microVoltsAl$Microvolts)
#Non Alcoholic
microVoltsNal<- subset(df, Microvolts & UserType=='nonAl')
meanVoltsNal<-mean(microVoltsNal$Microvolts)
medianVoltsNal <- median(microVoltsNal$Microvolts)
rangeVoltsNal<-range(microVoltsNal$Microvolts)
sdVoltsNal<-sd(microVoltsNal$Microvolts)
qVoltsNal<-quantile(microVoltsNal$Microvolts)
IQRvoltNal<-IQR(microVoltsNal$Microvolts)
#Histograms
hist(microVoltsAl$Microvolts,breaks =1000,main = "Histogram of Microvolts measures (Alcoholic)",xlab="Microvolts",xlim = c(-30,30),ylim = c(0,1500000))
hist(microVoltsNal$Microvolts,breaks =1000,main = "Histogram of Microvolts measures (Non-Alcoholic)",xlab="Microvolts", xlim = c(-30,30),ylim = c(0,1500000))
#Box Plots
boxplot(microVoltsAl$Microvolts,main = "Box Plot of Microvolts measures (Alcoholic)",ylab="Microvolts", ylim= c(-30,30),horizontal = TRUE)
boxplot(microVoltsNal$Microvolts,main = "Box Plot of Microvolts measures (Non-Alcoholic)",ylab="Microvolts", ylim= c(-30,30),horizontal = TRUE)
* What information can you obtain from these plots? - explain it brievely
#Alcoholic S1
microVoltsAlS1 <- subset(df, Microvolts & UserType=='Al' & Paradigm =='S1obj')
meanVoltsAlS1 <-mean(microVoltsAlS1$Microvolts)
medianVoltsAlS1<-median(microVoltsAlS1$Microvolts)
rangeVoltsAlS1<-range(microVoltsAlS1$Microvolts)
sdVoltsAlS1<-sd(microVoltsAlS1$Microvolts)
qVoltsAlS1<-quantile(microVoltsAlS1$Microvolts)
IQRvoltsAlS1<-IQR(microVoltsAlS1$Microvolts)
#Alcoholic S2no match
microVoltsAlS2nomatch <- subset(df, Microvolts & UserType=='Al' & Paradigm =='S2nomatch')
meanVoltsAlS2nomatch<-mean(microVoltsAlS2nomatch$Microvolts)
medianVoltsAlS2nomatch<-median(microVoltsAlS2nomatch$Microvolts)
rangeVoltsAlS2nomatch<-range(microVoltsAlS2nomatch$Microvolts)
sdVoltsAlS2nomatch<-sd(microVoltsAlS2nomatch$Microvolts)
qVoltsAlS2nomatch<-quantile(microVoltsAlS2nomatch$Microvolts)
IQRvoltsAlS2nomatch<-IQR(microVoltsAlS2nomatch$Microvolts)
#Alcoholic S2 match
microVoltsAlS2match <- subset(df, Microvolts & UserType=='Al' & Paradigm =='S2match')
meanVoltsAlS2match<-mean(microVoltsAlS2match$Microvolts)
medianVoltsAlS2match<-median(microVoltsAlS2match$Microvolts)
rangeVoltsAlS2match<-range(microVoltsAlS2match$Microvolts)
sdVoltsAlS2match<-sd(microVoltsAlS2match$Microvolts)
qVoltsAlS2match<-quantile(microVoltsAlS2match$Microvolts)
IQRvoltsAlS2match<-IQR(microVoltsAlS2match$Microvolts)
#Non-Alcoholic S1
microVoltsNalS1 <- subset(df, Microvolts & UserType=='nonAl' & Paradigm =='S1obj')
meanVoltsNalS1<-mean(microVoltsNalS1$Microvolts)
medianVoltsNalS1<-median(microVoltsNalS1$Microvolts)
rangeVoltsNalS1<-range(microVoltsNalS1$Microvolts)
sdVoltsNalS1<-sd(microVoltsNalS1$Microvolts)
qVoltsNalS1<-quantile(microVoltsNalS1$Microvolts)
IQRvoltsNalS1<-IQR(microVoltsNalS1$Microvolts)
#Non-Alcoholic S2no match
microVoltsNalS2nomatch <- subset(df, Microvolts & UserType=='nonAl' & Paradigm =='S2nomatch')
meanVoltsNalS2nomatch<-mean(microVoltsNalS2nomatch$Microvolts)
medianVoltsNalS2nomatch<-median(microVoltsNalS2nomatch$Microvolts)
rangeVoltsNalS2nomatch<-range(microVoltsNalS2nomatch$Microvolts)
sdVoltsNalS2nomatch<-sd(microVoltsNalS2nomatch$Microvolts)
qVoltsNalS2nomatch<-quantile(microVoltsNalS2nomatch$Microvolts)
IQRvoltsNalS2nomatch<-IQR(microVoltsNalS2nomatch$Microvolts)
#Non-Alcoholic S2 match
microVoltsNalS2match <- subset(df, Microvolts & UserType=='nonAl' & Paradigm =='S2match')
meanVoltsNalS2match<-mean(microVoltsNalS2match$Microvolts)
medianVoltsNalS2match<-median(microVoltsNalS2match$Microvolts)
rangeVoltsNalS2match<-range(microVoltsNalS2match$Microvolts)
sdVoltsNalS2match<-sd(microVoltsNalS2match$Microvolts)
qVoltsNalS2match<-quantile(microVoltsNalS2match$Microvolts)
IQRvoltsNalS2match<-IQR(microVoltsNalS2match$Microvolts)
#Histograms
hist(microVoltsAlS1$Microvolts,breaks =1000,main = "Histogram of Microvolts measures S1 (Alcoholic)",xlab="Microvolts",xlim = c(-30,30),ylim = c(0,1500000))
hist(microVoltsAlS2nomatch$Microvolts,breaks =1000,main = "Histogram of Microvolts measures S2 no match (Non-Alcoholic)",xlab="Microvolts", xlim = c(-30,30),ylim = c(0,1500000))
hist(microVoltsAlS2match$Microvolts,breaks =1000,main = "Histogram of Microvolts measures S2 match (Non-Alcoholic)",xlab="Microvolts", xlim = c(-30,30),ylim = c(0,1500000))
#Box Plots
boxplot(microVoltsNalS1$Microvolts,main = "Box Plot of Microvolts measures S1 (Alcoholic)",ylab="Microvolts", ylim= c(-30,30),horizontal = TRUE)
boxplot(microVoltsNalS2nomatch$Microvolts,main = "Box Plot of Microvolts measures S2 no match (Non-Alcoholic)",ylab="Microvolts", ylim= c(-30,30),horizontal = TRUE)
boxplot(microVoltsNalS2match$Microvolts,main = "Box Plot of Microvolts measures S2 match (Non-Alcoholic)",ylab="Microvolts", ylim= c(-30,30),horizontal = TRUE)
* What information can you obtain from these plots? - explain it brievely
Let’s check for data within this ranges
## (-100,-10] (-7.777778,-5.555556) (-5.555556,-3.333333) (-3.333333,-1.111111) (-1.111111,1.111111) (1.111111,3.333333) (3.333333,5.555556) (5.555556,7.777778) (7.777778,10.000000) (10,100]
## NULL
row1<- c(-100,myRange[1])
row2<- c(myRange[2],myRange[3])
row3<- c(myRange[3],myRange[4])
row4<- c(myRange[4],myRange[5])
row5<- c(myRange[5],myRange[6])
row6<- c(myRange[6],myRange[7])
row7<- c(myRange[7],myRange[8])
row8<- c(myRange[8],myRange[9])
row9<- c(myRange[9],myRange[10])
row10<- c(myRange[10],100)
mymatrix<-rbind(row1,row2,row3,row4,row5,row6,row7,row8,row9,row10)
mymatrix
## [,1] [,2]
## row1 -100.000000 -10.000000
## row2 -7.777778 -5.555556
## row3 -5.555556 -3.333333
## row4 -3.333333 -1.111111
## row5 -1.111111 1.111111
## row6 1.111111 3.333333
## row7 3.333333 5.555556
## row8 5.555556 7.777778
## row9 7.777778 10.000000
## row10 10.000000 100.000000
var4 <-0
var5<- 0.0
resulTable<- matrix(nrow = 10, ncol = 4)
for(i in 1:10){
var2<- 0
var3<- 0.0
for(j in df$Microvolts){
if((j >= mymatrix[i,1]) & (j< mymatrix[i,2])){
var2<-var2+1
}
}
var3<- var2/14893056
var4<- var4+var2
var5<- var5+var3
resulTable[i,1]= var2
resulTable[i,2]= var3
resulTable[i,3]= var4
resulTable[i,4]= var5
}
rownames(resulTable) <- c("(-100,-10]","(-7.777778,-5.555556)","(-5.555556,-3.333333)","(-3.333333,-1.111111)","(-1.111111,1.111111)","(1.111111,3.333333)","(3.333333,5.555556)","(5.555556,7.777778)","(7.777778,10.000000)","(10,100]")
colnames(resulTable)<- c("Frequency","Rela.Freq","Cum.Freq.","Cum.Rel.Freq")
resulTable
## Frequency Rela.Freq Cum.Freq. Cum.Rel.Freq
## (-100,-10] 1610118 0.10811200 1610118 0.1081120
## (-7.777778,-5.555556) 1068359 0.07173538 2678477 0.1798474
## (-5.555556,-3.333333) 1528443 0.10262790 4206920 0.2824753
## (-3.333333,-1.111111) 2050032 0.13765019 6256952 0.4201255
## (-1.111111,1.111111) 2379477 0.15977090 8636429 0.5798964
## (1.111111,3.333333) 1889516 0.12687228 10525945 0.7067686
## (3.333333,5.555556) 1287186 0.08642860 11813131 0.7931972
## (5.555556,7.777778) 818997 0.05499187 12632128 0.8481891
## (7.777778,10.000000) 518065 0.03478567 13150193 0.8829748
## (10,100] 996131 0.06688560 14146324 0.9498604
Repeat the problem 5 but this time take into account only the following channels
## [1] 3 13 24 28 35 45 46 48 57 64
## Frequency Rela.Freq Cum.Freq. Cum.Rel.Freq
## [1,] 232704 0.015625 232704 0.015625
## [2,] 232704 0.015625 465408 0.031250
## [3,] 465408 0.031250 930816 0.062500
## [4,] 0 0.000000 930816 0.062500
## [5,] 232704 0.015625 1163520 0.078125
## [6,] 232704 0.015625 1396224 0.093750
## [7,] 232704 0.015625 1628928 0.109375
## [8,] 232704 0.015625 1861632 0.125000
## [9,] 232704 0.015625 2094336 0.140625
## [10,] 0 0.000000 2094336 0.140625
arrayAlnonAl <- c("Al","nonAl")
var4 <-0
var5<- 0.0
resulTableAlNonAl<- matrix(nrow = 2, ncol = 4)
for(i in 1:2){
var2<- 0
var3<- 0.0
for(j in df$UserType){
if(j == arrayAlnonAl[i]){
var2<-var2+1
}
}
var3<- var2/14893056
var4<- var4+var2
var5<- var5+var3
resulTableAlNonAl[i,1]= var2
resulTableAlNonAl[i,2]= var3
resulTableAlNonAl[i,3]= var4
resulTableAlNonAl[i,4]= var5
}
colnames(resulTableAlNonAl)<- c("Frequency","Rela.Freq","Cum.Freq.","Cum.Rel.Freq")
rownames(resulTableAlNonAl)<- c("Alcoholic", "Non- Alcoholic")
resulTableAlNonAl
## Frequency Rela.Freq Cum.Freq. Cum.Rel.Freq
## Alcoholic 10174464 0.6831683 10174464 0.6831683
## Non- Alcoholic 4718592 0.3168317 14893056 1.0000000
arrayStype <- c("S1obj","S2nomatch","S2match")
var4 <-0
var5<- 0.0
resulTableStype<- matrix(nrow = 3, ncol = 4)
for(i in 1:3){
var2<- 0
var3<- 0.0
for(j in df$Paradigm){
if(j == arrayStype[i]){
var2<-var2+1
}
}
var3<- var2/14893056
var4<- var4+var2
var5<- var5+var3
resulTableStype[i,1]= var2
resulTableStype[i,2]= var3
resulTableStype[i,3]= var4
resulTableStype[i,4]= var5
}
colnames(resulTableStype)<- c("Frequency","Rela.Freq","Cum.Freq.","Cum.Rel.Freq")
rownames(resulTableStype)<- c("S1 obj", "S2 no match", "S2 match")
resulTableStype
## Frequency Rela.Freq Cum.Freq. Cum.Rel.Freq
## S1 obj 7405568 0.4972497 7405568 0.4972497
## S2 no match 3768320 0.2530253 11173888 0.7502750
## S2 match 3719168 0.2497250 14893056 1.0000000
Repeat the problem 7 joing results (taking into account both factors)
Work in Progress
Select one of the users you have and for the same channels than problem 7 compute the correlation against Replication
Work In Progress
Is there any way to check for different brain activity between alcoholic users and non-alcoholic users?
To make it more understandable, we can make a graphic where it shows the different-users type (alcoholic and non-alcoholic) the differences. We can use the timing and microvolts data in order to see the user timing reaction and through the microvolts variable we can see which kind of reaction the user had. We should compute the quotient between the microvolts of that test and the time reaction it took (using a spot). We should set one color for all experiments related with alcoholic users and non-alcoholic users. In that way, we’ll be able to check easily the different time reactions and the different kind of voltage reactions between the different type of users.
plot(df$Microvolts / df$Time, main = "Brain activity between alcoholic users and non-alcoholic users", xlab = "Time", ylab = "Voltage Range",xlim=c(0,255), ylim = c(-100,100), col = "blue")
Is there any way to check for different brain activity between alcoholic users and non-alcoholic users, and for the different paradigms?
From our point of view there are two ways to compute this information. To make it understandable, we are going to generate again all this information through a graphic. The first way we see for computing the information is through one graphic where we have six different data, each one defended with a different color, and the way exactly the same with question 10. The only difference, that in this graphic we are going to have 6 different variables (3 different paradigm types (S1, S2 no match, S2 match) and for each one 2 different user types (alcoholic and non-alcoholic)). The other way we found in order to solve the problem is through 3 different graphics, where each graphic represents each kind of user paradigm (S1, S2 no match, S2 match) and each graphic has two different of data one for alcoholic users, and the other for non-alcoholic. Both data are defended with the timing and microvolts coordinates. So then at the end we can compare the tree different graphics we have as a result.