I downloaded the datasets (RIASEC) and (Big5) and created a new R-project called “Midterm_RH”. This Markdown Document will focus on the RIASEC-dataset.
riasec <- read.table("~/Documents/Psychologie/7. Semester/R/Blockseminar R/Midterm/RIASEC/data.csv", sep="\t", header = TRUE) # I set the header = TRUE because this excludes the headers from the other variables and data
head(riasec) #yes! it worked!
## implementation R1 R2 R3 R4 R5 R6 R7 R8 I1 I2 I3 I4 I5 I6 I7 I8 A1 A2 A3
## 1 2 3 1 4 2 1 2 1 1 5 4 3 4 2 5 2 4 2 5 5
## 2 2 1 1 1 1 1 1 1 1 4 4 3 1 2 4 2 2 5 3 4
## 3 2 3 2 1 1 1 1 2 1 5 2 3 3 4 1 4 2 1 2 1
## 4 2 3 2 1 2 2 3 1 2 5 4 4 5 4 4 4 3 4 5 3
## 5 2 -1 2 3 2 3 2 1 3 5 2 4 4 4 3 4 3 1 1 2
## 6 2 3 1 3 4 3 4 3 3 3 4 3 3 2 3 3 4 2 3 4
## A4 A5 A6 A7 A8 S1 S2 S3 S4 S5 S6 S7 S8 E1 E2 E3 E4 E5 E6 E7 E8 C1 C2 C3
## 1 5 5 4 2 5 4 4 3 4 4 4 3 3 2 2 3 1 4 1 1 4 1 1 1
## 2 3 3 5 1 3 1 3 3 1 3 2 2 3 1 1 1 1 1 1 1 1 1 1 1
## 3 1 2 1 3 1 4 5 4 4 4 2 2 4 3 2 2 3 4 2 4 2 4 3 2
## 4 4 3 5 4 3 3 4 3 4 4 5 3 2 1 1 4 2 3 4 2 2 2 4 3
## 5 1 2 2 4 3 3 2 3 4 2 3 3 2 3 4 2 1 4 3 4 2 3 4 4
## 6 2 3 3 4 4 2 2 3 2 2 2 3 2 3 2 3 2 4 2 4 4 3 4 3
## C4 C5 C6 C7 C8 accuracy elapse country fromsearch age gender
## 1 1 2 1 1 2 90 222 PT 0 -1 -1
## 2 1 1 1 1 1 100 102 US 0 -1 -1
## 3 3 3 4 4 4 95 264 US 1 -1 -1
## 4 2 1 3 2 1 60 189 SG 0 -1 -1
## 5 2 4 3 3 3 90 197 US 0 -1 -1
## 6 3 3 3 3 3 80 247 US 1 -1 -1
nrow(riasec) #This dataset contains 8855 rows of data
## [1] 8855
ncol(riasec) #This dataset is made up of columns, i.e. 55 variables
## [1] 55
names(riasec) #What are the names of the variables?
## [1] "implementation" "R1" "R2" "R3"
## [5] "R4" "R5" "R6" "R7"
## [9] "R8" "I1" "I2" "I3"
## [13] "I4" "I5" "I6" "I7"
## [17] "I8" "A1" "A2" "A3"
## [21] "A4" "A5" "A6" "A7"
## [25] "A8" "S1" "S2" "S3"
## [29] "S4" "S5" "S6" "S7"
## [33] "S8" "E1" "E2" "E3"
## [37] "E4" "E5" "E6" "E7"
## [41] "E8" "C1" "C2" "C3"
## [45] "C4" "C5" "C6" "C7"
## [49] "C8" "accuracy" "elapse" "country"
## [53] "fromsearch" "age" "gender"
We now know, that this dataset contains 8855 rows of data and 55 columns, i.e. 55 different variables. The names() function shows us the names of all 55 variables.
On the homegape and in the codebook, we find some additional information about the background and variables of this dataset:
Summary statistics of all variables
summary(riasec)
## implementation R1 R2 R3
## Min. :1.000 Min. :-1.00 Min. :-1.000 Min. :-1.000
## 1st Qu.:2.000 1st Qu.: 1.00 1st Qu.: 1.000 1st Qu.: 1.000
## Median :2.000 Median : 2.00 Median : 2.000 Median : 1.000
## Mean :1.759 Mean : 2.42 Mean : 2.159 Mean : 1.769
## 3rd Qu.:2.000 3rd Qu.: 3.00 3rd Qu.: 3.000 3rd Qu.: 2.000
## Max. :2.000 Max. : 5.00 Max. : 5.000 Max. : 5.000
##
## R4 R5 R6 R7
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 2.000 Median : 1.000 Median : 2.000 Median : 2.000
## Mean : 2.256 Mean : 1.664 Mean : 2.321 Mean : 1.898
## 3rd Qu.: 3.000 3rd Qu.: 2.000 3rd Qu.: 3.000 3rd Qu.: 3.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## R8 I1 I2 I3
## Min. :-1.000 Min. :-1.000 Min. :-1.00 Min. :-1.000
## 1st Qu.: 1.000 1st Qu.: 3.000 1st Qu.: 2.00 1st Qu.: 2.000
## Median : 2.000 Median : 4.000 Median : 4.00 Median : 3.000
## Mean : 1.958 Mean : 3.432 Mean : 3.34 Mean : 3.083
## 3rd Qu.: 3.000 3rd Qu.: 4.000 3rd Qu.: 4.00 3rd Qu.: 4.000
## Max. : 5.000 Max. : 5.000 Max. : 5.00 Max. : 5.000
##
## I4 I5 I6 I7
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 1.000
## Median : 3.000 Median : 3.000 Median : 3.000 Median : 3.000
## Mean : 2.927 Mean : 2.874 Mean : 2.995 Mean : 2.715
## 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## I8 A1 A2 A3
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 3.000 Median : 2.000 Median : 3.000 Median : 3.000
## Mean : 2.612 Mean : 2.505 Mean : 2.811 Mean : 3.062
## 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## A4 A5 A6 A7
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 1.000
## Median : 3.000 Median : 4.000 Median : 4.000 Median : 3.000
## Mean : 3.026 Mean : 3.227 Mean : 3.298 Mean : 2.688
## 3rd Qu.: 4.000 3rd Qu.: 5.000 3rd Qu.: 5.000 3rd Qu.: 4.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## A8 S1 S2 S3
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 3.000 1st Qu.: 2.000
## Median : 3.000 Median : 4.000 Median : 4.000 Median : 3.000
## Mean : 2.883 Mean : 3.342 Mean : 3.453 Mean : 3.012
## 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## S4 S5 S6 S7
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.: 2.000
## Median : 3.000 Median : 4.000 Median : 3.000 Median : 3.000
## Mean : 2.952 Mean : 3.302 Mean : 2.839 Mean : 3.121
## 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 4.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## S8 E1 E2 E3
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 3.000 Median : 2.000 Median : 2.000 Median : 3.000
## Mean : 2.561 Mean : 1.994 Mean : 2.194 Mean : 2.695
## 3rd Qu.: 4.000 3rd Qu.: 3.000 3rd Qu.: 3.000 3rd Qu.: 4.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## E4 E5 E6 E7
## Min. :-1.000 Min. :-1.00 Min. :-1.000 Min. :-1.000
## 1st Qu.: 1.000 1st Qu.: 2.00 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 2.000 Median : 3.00 Median : 2.000 Median : 2.000
## Mean : 2.203 Mean : 2.84 Mean : 2.454 Mean : 2.285
## 3rd Qu.: 3.000 3rd Qu.: 4.00 3rd Qu.: 3.000 3rd Qu.: 3.000
## Max. : 5.000 Max. : 5.00 Max. : 5.000 Max. : 5.000
##
## E8 C1 C2 C3
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 3.000 Median : 2.000 Median : 2.000 Median : 2.000
## Mean : 2.636 Mean : 2.103 Mean : 2.296 Mean : 2.268
## 3rd Qu.: 4.000 3rd Qu.: 3.000 3rd Qu.: 3.000 3rd Qu.: 3.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## C4 C5 C6 C7
## Min. :-1.000 Min. :-1.000 Min. :-1.000 Min. :-1.000
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 2.000 Median : 2.000 Median : 3.000 Median : 2.000
## Mean : 2.293 Mean : 2.478 Mean : 2.636 Mean : 2.064
## 3rd Qu.: 3.000 3rd Qu.: 4.000 3rd Qu.: 4.000 3rd Qu.: 3.000
## Max. : 5.000 Max. : 5.000 Max. : 5.000 Max. : 5.000
##
## C8 accuracy elapse country
## Min. :-1.000 Min. : -1 Min. : -1.0 US :5389
## 1st Qu.: 1.000 1st Qu.: 23 1st Qu.: 84.0 CA : 610
## Median : 2.000 Median : 85 Median : 160.0 GB : 480
## Mean : 2.132 Mean : 242582 Mean : 371.1 AU : 338
## 3rd Qu.: 3.000 3rd Qu.: 95 3rd Qu.: 230.0 MY : 274
## Max. : 5.000 Max. :2147483647 Max. :509296.0 (Other):1763
## NA's : 1
## fromsearch age gender
## Min. :0.0000 Min. :-1.0000 Min. : -1.000
## 1st Qu.:0.0000 1st Qu.:-1.0000 1st Qu.: -1.000
## Median :0.0000 Median :-1.0000 Median : -1.000
## Mean :0.4247 Mean :-0.3964 Mean : 6.892
## 3rd Qu.:1.0000 3rd Qu.:-1.0000 3rd Qu.: -1.000
## Max. :1.0000 Max. : 3.0000 Max. :100.000
##
When we look at the summary statistics, we notice that there are a few issues with the data.
So let’s get rolling and clean this dataset up.
#missing data
riasec[riasec == -1] <- NA #set all missing values to NA
head(riasec) #test to see if it worked
## implementation R1 R2 R3 R4 R5 R6 R7 R8 I1 I2 I3 I4 I5 I6 I7 I8 A1 A2 A3
## 1 2 3 1 4 2 1 2 1 1 5 4 3 4 2 5 2 4 2 5 5
## 2 2 1 1 1 1 1 1 1 1 4 4 3 1 2 4 2 2 5 3 4
## 3 2 3 2 1 1 1 1 2 1 5 2 3 3 4 1 4 2 1 2 1
## 4 2 3 2 1 2 2 3 1 2 5 4 4 5 4 4 4 3 4 5 3
## 5 2 NA 2 3 2 3 2 1 3 5 2 4 4 4 3 4 3 1 1 2
## 6 2 3 1 3 4 3 4 3 3 3 4 3 3 2 3 3 4 2 3 4
## A4 A5 A6 A7 A8 S1 S2 S3 S4 S5 S6 S7 S8 E1 E2 E3 E4 E5 E6 E7 E8 C1 C2 C3
## 1 5 5 4 2 5 4 4 3 4 4 4 3 3 2 2 3 1 4 1 1 4 1 1 1
## 2 3 3 5 1 3 1 3 3 1 3 2 2 3 1 1 1 1 1 1 1 1 1 1 1
## 3 1 2 1 3 1 4 5 4 4 4 2 2 4 3 2 2 3 4 2 4 2 4 3 2
## 4 4 3 5 4 3 3 4 3 4 4 5 3 2 1 1 4 2 3 4 2 2 2 4 3
## 5 1 2 2 4 3 3 2 3 4 2 3 3 2 3 4 2 1 4 3 4 2 3 4 4
## 6 2 3 3 4 4 2 2 3 2 2 2 3 2 3 2 3 2 4 2 4 4 3 4 3
## C4 C5 C6 C7 C8 accuracy elapse country fromsearch age gender
## 1 1 2 1 1 2 90 222 PT 0 NA NA
## 2 1 1 1 1 1 100 102 US 0 NA NA
## 3 3 3 4 4 4 95 264 US 1 NA NA
## 4 2 1 3 2 1 60 189 SG 0 NA NA
## 5 2 4 3 3 3 90 197 US 0 NA NA
## 6 3 3 3 3 3 80 247 US 1 NA NA
#messy data
#gender
summary(riasec$gender) #it's so messy that we are going to remove this variable
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 22.00 29.00 32.09 40.00 100.00 6743
riasec$gender <- NULL
#age
summary(riasec$age) #again, this looks nothing like what we would expect. So we are going to delete this column as well.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 1.000 2.000 1.515 2.000 3.000 6730
riasec$age <- NULL
#accuracy
summary(riasec$accuracy)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000e+00 8.000e+01 9.000e+01 3.195e+05 9.800e+01 2.147e+09 2132
riasec$accuracy[riasec$accuracy > 100] <- NA #we set all values over 100 to NA. We can see that now instead of 2132 NA's we have 2137 NA's. And the minimum is 1.0 which is good, because everyone who answered with 0, didn't want their data to be used (see codebook)
summary(riasec$accuracy) #the maximum is 100.0 which means that there are no values > 100.0. They have all been reassigned to NA.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.0 80.0 90.0 86.2 98.0 100.0 2137
#elapse
summary(riasec$elapse)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 3.0 145.0 187.0 489.2 262.0 509300.0 2132
plot(riasec$elapse) #we can see that there is one person who is really far off.
riasec$elapse[riasec$elapse > 1200] <- NA #set all values > 100 to NA
summary(riasec$elapse)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 3.0 144.0 185.0 225.8 254.0 1193.0 2288
plot(riasec$elapse) #this looks way better.
head(riasec) #let's take a look at the head of our clean dataset.
## implementation R1 R2 R3 R4 R5 R6 R7 R8 I1 I2 I3 I4 I5 I6 I7 I8 A1 A2 A3
## 1 2 3 1 4 2 1 2 1 1 5 4 3 4 2 5 2 4 2 5 5
## 2 2 1 1 1 1 1 1 1 1 4 4 3 1 2 4 2 2 5 3 4
## 3 2 3 2 1 1 1 1 2 1 5 2 3 3 4 1 4 2 1 2 1
## 4 2 3 2 1 2 2 3 1 2 5 4 4 5 4 4 4 3 4 5 3
## 5 2 NA 2 3 2 3 2 1 3 5 2 4 4 4 3 4 3 1 1 2
## 6 2 3 1 3 4 3 4 3 3 3 4 3 3 2 3 3 4 2 3 4
## A4 A5 A6 A7 A8 S1 S2 S3 S4 S5 S6 S7 S8 E1 E2 E3 E4 E5 E6 E7 E8 C1 C2 C3
## 1 5 5 4 2 5 4 4 3 4 4 4 3 3 2 2 3 1 4 1 1 4 1 1 1
## 2 3 3 5 1 3 1 3 3 1 3 2 2 3 1 1 1 1 1 1 1 1 1 1 1
## 3 1 2 1 3 1 4 5 4 4 4 2 2 4 3 2 2 3 4 2 4 2 4 3 2
## 4 4 3 5 4 3 3 4 3 4 4 5 3 2 1 1 4 2 3 4 2 2 2 4 3
## 5 1 2 2 4 3 3 2 3 4 2 3 3 2 3 4 2 1 4 3 4 2 3 4 4
## 6 2 3 3 4 4 2 2 3 2 2 2 3 2 3 2 3 2 4 2 4 4 3 4 3
## C4 C5 C6 C7 C8 accuracy elapse country fromsearch
## 1 1 2 1 1 2 90 222 PT 0
## 2 1 1 1 1 1 100 102 US 0
## 3 3 3 4 4 4 95 264 US 1
## 4 2 1 3 2 1 60 189 SG 0
## 5 2 4 3 3 3 90 197 US 0
## 6 3 3 3 3 3 80 247 US 1
Let’s take a look at the count of some demographic variables.
How many people participated from each country?
str(riasec$country) #this tells us that this variable has 127 levels, i.e. 127 countries from which people participated.
## Factor w/ 127 levels "A1","A2","AE",..: 102 122 122 109 122 122 122 44 122 122 ...
table(riasec$country)#this gives us an overview of how many people from each country participated
##
## A1 A2 AE AF AI AL AP AR AS AT AU AZ BA BD BE
## 2 2 14 1 1 6 6 10 1 14 338 2 3 1 29
## BG BH BM BN BO BR BS BY CA CH CL CN CO CR CV
## 10 2 1 3 1 28 1 3 610 35 7 26 8 6 1
## CW CY CZ DE DK DO EC EE EG ES EU FI FR GB GF
## 1 5 20 79 14 1 1 5 18 39 9 25 45 480 1
## GH GR GT GU GY HK HN HR HU ID IE IL IM IN IQ
## 4 21 2 2 1 66 1 13 7 39 65 15 1 115 2
## IR IS IT JE JM JO JP KE KG KR KW LB LK LT LU
## 12 4 39 1 16 2 14 9 1 17 2 14 1 6 3
## LV LY MA MD ME MK MM MN MO MT MU MW MX MY MZ
## 2 1 5 1 1 1 1 1 2 6 1 1 16 274 1
## NG NL NO NZ OM PF PH PK PL PR PS PT QA RO RS
## 2 73 18 95 40 1 51 12 22 4 1 17 1 32 10
## RU SA SE SG SI SK SL SV SY TH TJ TR TT TW TZ
## 10 11 37 124 36 7 1 1 1 22 1 22 11 3 2
## UA US UY VE VI VN ZA
## 3 5389 3 7 2 11 47
summary(riasec$country) #this function allows us to see the different countries in order (most participants to least). The country with the most participants was the US, the second largest participation rate was found in Canada and GB is next.
## US CA GB AU MY SG IN NZ DE
## 5389 610 480 338 274 124 115 95 79
## NL HK IE PH ZA FR OM ES ID
## 73 66 65 51 47 45 40 39 39
## IT SE SI CH RO BE BR CN FI
## 39 37 36 35 32 29 28 26 25
## PL TH TR GR CZ EG NO KR PT
## 22 22 22 21 20 18 18 17 17
## JM MX IL AE AT DK JP LB HR
## 16 16 15 14 14 14 14 14 13
## IR PK SA TT VN AR BG RS RU
## 12 12 11 11 11 10 10 10 10
## EU KE CO CL HU SK VE AL AP
## 9 9 8 7 7 7 7 6 6
## CR LT MT CY EE MA GH IS PR
## 6 6 6 5 5 5 4 4 4
## BA BN BY LU TW UA UY A1 A2
## 3 3 3 3 3 3 3 2 2
## AZ BH GT GU IQ JO KW LV MO
## 2 2 2 2 2 2 2 2 2
## NG TZ VI AF AI AS BD BM (Other)
## 2 2 2 1 1 1 1 1 29
## NA's
## 1
How many participants were referred to the test through a search engine (“fromsearch” = 1)?
table(riasec$fromsearch) #3761 participants were directed to the test through a search engine. 5094 were referred to the test through another website or sourse.
##
## 0 1
## 5094 3761
Let’s create a histogram to visualize how accurate people thought they were:
hist(riasec$accuracy, breaks = 10, col = "Pink") #seems like almost everyone was pretty confident about their accuracy.
#let's make it look a bit cooler
acc <- hist(riasec$accuracy, breaks = 10, plot = FALSE)
plot(acc, labels = TRUE, border = "Red", col = "Pink", main = "Histogram of participant's self-rated accuracy", xlab = "Accuracy")
Is there a difference on how people responded to item E8 depending on whether they were referred from a search engine or another source?
boxplot(E8 ~ fromsearch, data = riasec)
#let's make it look a bit cooler
boxplot(E8 ~ fromsearch,
data = riasec,
col = "turquoise",
notch = TRUE,
main = "Boxplot",
xlab = "Referral source",
ylab = "Item E8",
add = TRUE
)
Select columns that apparently belong to together (e.g. E1-E8) and compute the correlation, also show the correlation visually.
E <- data.frame(riasec$E1,riasec$E2, riasec$E3, riasec$E4, riasec$E5, riasec$E6, riasec$E7, riasec$E8)
head(E)
## riasec.E1 riasec.E2 riasec.E3 riasec.E4 riasec.E5 riasec.E6 riasec.E7
## 1 2 2 3 1 4 1 1
## 2 1 1 1 1 1 1 1
## 3 3 2 2 3 4 2 4
## 4 1 1 4 2 3 4 2
## 5 3 4 2 1 4 3 4
## 6 3 2 3 2 4 2 4
## riasec.E8
## 1 4
## 2 1
## 3 2
## 4 2
## 5 2
## 6 4
cor(na.omit(E)) #calculate the correlations between the E-variables
## riasec.E1 riasec.E2 riasec.E3 riasec.E4 riasec.E5 riasec.E6
## riasec.E1 1.0000000 0.3859963 0.4372540 0.2742256 0.4007532 0.3813569
## riasec.E2 0.3859963 1.0000000 0.3462392 0.3414766 0.2763812 0.5549665
## riasec.E3 0.4372540 0.3462392 1.0000000 0.3293392 0.5657515 0.4873247
## riasec.E4 0.2742256 0.3414766 0.3293392 1.0000000 0.2073681 0.5097738
## riasec.E5 0.4007532 0.2763812 0.5657515 0.2073681 1.0000000 0.4203779
## riasec.E6 0.3813569 0.5549665 0.4873247 0.5097738 0.4203779 1.0000000
## riasec.E7 0.4667292 0.3683945 0.4205886 0.2857571 0.3795733 0.3984515
## riasec.E8 0.2804928 0.3722055 0.3692438 0.3406575 0.2808970 0.4743587
## riasec.E7 riasec.E8
## riasec.E1 0.4667292 0.2804928
## riasec.E2 0.3683945 0.3722055
## riasec.E3 0.4205886 0.3692438
## riasec.E4 0.2857571 0.3406575
## riasec.E5 0.3795733 0.2808970
## riasec.E6 0.3984515 0.4743587
## riasec.E7 1.0000000 0.2822687
## riasec.E8 0.2822687 1.0000000
pairs(na.omit(E)) #visualize the correlations
#this does not seem to make much sense, but I don't know where the problem is. :(
This was an overview of some (very) basic dataset inspections. I’m sorry, I couldn’t really figure out how to sort the correlations stuff and I’m sure there are a easier ways to combine several variables (columns) into one data frame to look at the correlations.